Robots.txt is a text file is used to give instructions to web crawlers (web spiders/robots) on which page to be crawl or which page not to crawl on your site.
The Robots.txt file is also known as the robot exclusion protocol or standard file.
For example, a bot wants to visits a website URL https://www.example.com/welcome but before crawling this URL, it first checks the https://www.example.com/robots.txt restrictions and guidelines.
The Basic format of the Robots.txt file contains two lines, one for user agents and another for directives.
User-agent: [user-agent name] Disallow: [URL string not to be crawled]
How to Optimize Robots.txt File for SEO
Here are some points to help you optimize Robots.txt file for SEO, but before that let’s learn how the Robots.txt file works.
How Robots.txt File Works?
All of the search engines divided their jobs into two parts, crawling and indexing the content to serve the information to the user.
Search engine crawlers crawl the website in a hyperlinked environment from one link to another link. This search engine exploration work is called “Spidering”
Before browsing a website, the search engine checks the Robots.txt file for any instructions to parse the website.
Old Google Search Console has the features of testing robots.txt file error by adding URL or code. After clicking the submit button, it displays errors and warning in the robots.txt file to resolve.
In my earlier post, I share the complete details about how to set up an account in Google Search Console.
Robots.txt File Quick Pointers
- The Robots.txt file must be placed in the root folder of the website.
- Robots.txt is case sensitive so the file name must be robots.txt. Upper letters are not allowed in the file name.
- This file is publically available on /robots.txt. if it is implemented then anyone can see the directives of the website.
- Domain and subdomain have separate robots.txt files with their own crawling directives.
- You must add sitemap URLs to the robots.txt file to speed up the crawling of newly added pages.
Guidelines to write Robots.txt file
- To exclude all robots from the entire search engines
User-agent: * Disallow: /
- To allow all the robots to complete access
User-agent: * Disallow:
- To exclude all the robots from part of the server
User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /junk/
- To exclude a single robot
User-agent: BadBot Disallow: /
- To allow a single robot
User-agent: Google Disallow: User-agent: * Disallow: /
- To exclude all files except one
User-agent: * Disallow: /~joe/stuff/
- To exclude some specific pages
User-agent: * Disallow: /~joe/junk.html Disallow: /~joe/foo.html Disallow: /~joe/bar.html
Do you have any other recommendations to be included, share your thoughts in the comments below.