A well-referenced site is first and foremost one that can be easily found and explored by the robots of search engines such as Google or Yahoo: that’s why the Robots.txt file plays such an essential role in natural search engine optimization. However, this file is sometimes underestimated by website owners… And yet, its optimization is a simple and effective operation for optimizing positions in search engine results pages.
Our SEO agency explains everything you need to know about modifying the robots.txt file on your WordPress site or blog.
Contents :
- Why is the robots.txt file important?
- How do I create a robots.txt file?
- Instructions in the robots.txt file
- Example of a robots.txt file for WordPress
- How do I test a file?
Robots.txt : Definition
The robots.txt file is a text file that tells search engine spiders how to crawl and index web pages. This file is located in the site’s root directory. The robots.txt file generated by default when new WordPress sites are installed is as follows:
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Its operation is simple. Each line of the file is an instruction allowing or disallowing search engine spiders to crawl specific URLs or directories. The rule being that if no instruction disallows a spider to crawl a URL, then it will be crawled. Important detail: Specifying that a URL should not be crawled does not mean that it will be de-indexed. If you wish to de-index a page, use the noindex instruction directly in the page concerned.
Why is the Robots.txt file important?
When googlebot scans a website , it allocates it a crawl budget. In short, the crawl budget is the maximum limit on the number of pages the search engine will crawl and index. Without specific instructions, googlebot will crawl your entire website, including all pages, content and directories not intended to be indexed. By letting robots crawl irrelevant content, you needlessly “burn” your crawl budget, and important content may not be crawledby the engine for lack of budget. For WordPress, for example, content that isn’t relevant for crawling is typically pages dedicated to administration, plugins, themes… By prohibiting their crawling in the Robots.txt file, you allow the robots to focus on pages that are of real importance to your SEO. Another important point is that when you prohibit robots from crawling certain content, you save the machine resources needed to display that content. And this can have an impact on website display performance.
How do I create a Robots.txt file in WordPress?
There are two main ways to create a Robots.txt file in WordPress: you’re free to use the one that suits you best.
Method 1: Edit Robots.txt manually
This method requires some basic technical knowledge, as it involves the use of an FTP client. Using the latter, connect to your WordPress site hosting and find the Robots.txt file in the root directory.
To modify it, use a simple text editor such as Notepad++. Once you’ve made your changes, re-upload the file to your website’s root folder using your FTP client.
Method 2: Modify Robots.txt using a plugin
If you want to modify your Robots.txt file without any technical manipulations, the easiest way is to use a plugin like All in One SEO, Yoast or RankMath.
All In One SEO is a very popular WordPress extension dedicated to SEO, including a Robots.txt file generator. This feature is included in the free version.
From the WordPress admin panel, click on All in One SEO > Tools to edit your file in just a few clicks. But before you do anything else, remember to check the box allowing you to customize Robots.txt. You can preview the file: as standard, it contains various default rules, added automatically by WordPress.
They tell search engines not to crawl the admin content of the WordPress blog. But allow spiders to crawl all other pages and content. To add your own custom rules and
to improve your natural referencing, all you have to do is :
- click on “Add a rule”;
- inform a user agent, if necessary;
- check the Allow or Disallow box;
- indicate the name of the file or folder you wish to authorize or prohibit.
Finally, don’t forget to click on “Save changes” before leaving this screen. Note, however, that other extensions such as Yoast will enable you to achieve the same results.
Open the SEO menu > Tools > File editors then edit the robots.txt file directly from the WordPress backoffice.
Basic instructions for robots.txt
User-agent: This instruction indicates the list of search engines to which your exploration directives apply. If you wish to address all search engines, simply add the * character after the user agent instruction. Disallow: As the name suggests, this instruction blocks the crawling of a page or folder. The content type is specified after the instruction. Allow: This is the opposite instruction. It authorizes crawling of the specified page or content. Sitemap: Is the instruction used to specify the location of one or more sitemaps to the indexing robots. Note: The # indicates the presence of a comment
What are the important elements to add to a robots.txt file?
The very first optimization to do on a WordPress site is toprevent the login page from being crawled. The instructions for doing this are as follows:
The following instructions may also be useful
WordPress directories not to be indexed :
The instructions below will block indexing not only of the wp-include folder, but also of directories containing plugins, themes and cache content used by WordPress.
Disallow: /wp-content/plugins Disallow: /wp-content/cache Disallow: /wp-content/themes
List of URLS not to be indexed
To prevent your blog’s RSS feeds from being indexed, you can use the following instructions:
Other URLsworth blocking include ping, trackback and comments.
Block unwanted files and URLs
Certain types of sensitive content should be kept out of search engines. Even if Google, Bing and Yahoo tend to manage this type of information well, by not displaying this type of sensitive content in the SERP, for safety’s sake it is strongly recommended to block the indexing of the following files. Although rarely used today, the cgi-bin folder may still exist on some servers. Historically, this folder was used to store executable code.
Another very useful optimization is to block access to URLs containing a question mark, for example:(http://www.mon-domaine.fr/page.php?id=2). Here’s how:
URLs worth blocking include :
- Urls ending in “.php”.
- .inc files, which are not pages per se, but rather a code file to be included.
- Files with the .gz extension are compressed files.
- .cgi or Common Gateway Interface files are script files that can be executed under certain conditions.
The “$” character is used to specify what the URL should end with.
This gives us the following code:
Finally, to avoid being penalized in mobile search, I recommend that youexplicitly authorize the indexing of CSS and JS files.
Example of a complete robots.txt file for WordPress
To recap, here’s the complete robots.txt file you get after optimization:
Disallow: /wp-includes Disallow: /wp-content/plugins Disallow: /wp-content/cache Disallow: /wp-content/themes Disallow: */feed Disallow: */rss2 Allow: /css? Allow: /js?
- Sitemap: http://www.mon-site.com/post-sitemap.xml
- Sitemap: http://www.mon-site.com/page-sitemap.xml
Personally, we prefer to indicate the location of a website’s sitemaps directly via Google Search Console.
How to test your Robots.txt file
Once you’ve optimized your robots.txt file, remember to check it.
To do this, use the robots.txt test tool from Google.
To use it, you must first associate your website with Google Search Console.
The tool will point out any syntax or logic errors and you can correct them directly in the text editor. Make the recommended changes until all errors are eliminated.
Please note that the tool does not modify the file itself, but only a copy of your robots.txt file.
If you’ve corrected any errors, copy and paste the corrected content into a new robots.txt file on your computer, then replace the current file by uploading the new version via FTP.
You can also make changes directly online, like Yoast (see above).
Whether you want to do it manually or using a dedicated plugin, optimizing the Robots.txt file is an effective way to boost your site’s SEO, making it easier for search engine spiders to crawl.