We have heard of crawlers and bots are crawling our sites to scrap content for various reasons like indexing in search engines, identifying content, scanning email addresses, etc. There are all kinds of crawlers/bots which crawl websites. While some are good which should be allowed access to our site, but we might want to restrict some. In this post we will see how we can do this.
Topics Covered:
- What is robots.txt?
- When to use is?
- Do all crawlers follow robots.txt?
- How to use it?
- Allow everyone to access the site
- Allow only one crawler to access the site
- Disallow everyone from site
- Disallow access to specific directories
- Disallow access to specific bots
- Disallow different bots from different directories
- Delay crawl rate
- Specify the location of sitemap
- Robots Tag