{"id":321,"date":"2011-11-05T13:43:04","date_gmt":"2011-11-05T20:43:04","guid":{"rendered":"http:\/\/virendrachandak.wordpress.com\/?p=321"},"modified":"2016-01-26T08:52:56","modified_gmt":"2016-01-26T16:52:56","slug":"how-and-when-to-use-robots-txt-file","status":"publish","type":"post","link":"https:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/","title":{"rendered":"How and when to use robots.txt file"},"content":{"rendered":"<p>We have heard of crawlers and bots are crawling our sites to scrap content for various reasons like indexing in search engines, identifying content, scanning email addresses, etc. There are all kinds of crawlers\/bots which crawl websites. While some are good which should be allowed access to our site, but we might want to restrict some. In this post we will see how we can do this.<\/p>\n<p>Topics Covered:<\/p>\n<ul>\n<li><a href=\"http:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/#what\">What is robots.txt?<\/a><\/li>\n<li><a href=\"http:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/#when\">When to use is?<\/a><\/li>\n<li><a href=\"http:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/#who_follows\">Do all crawlers follow robots.txt?<\/a><\/li>\n<li><a href=\"http:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/#how\">How to use it?<\/a><\/li>\n<li><a href=\"http:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/#allow_all\">Allow everyone to access the site<\/a><\/li>\n<li><a href=\"http:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/#allow_one\">Allow only one crawler to access the site<\/a><\/li>\n<li><a href=\"http:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/#disallow_all\">Disallow everyone from site<\/a><\/li>\n<li><a href=\"http:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/#disallow_dir\">Disallow access to specific directories<\/a><\/li>\n<li><a href=\"http:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/#disallow_bot\">Disallow access to specific bots<\/a><\/li>\n<li><a href=\"http:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/#disallow_bots_dir\">Disallow different bots from different directories<\/a><\/li>\n<li><a href=\"http:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/#delay_crawlrate\">Delay crawl rate<\/a><\/li>\n<li><a href=\"http:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/#sitemap\">Specify the location of sitemap<\/a><\/li>\n<li><a href=\"http:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/#robots_metatag\">Robots <meta \/> Tag<\/a><\/li>\n<\/ul>\n<p><!--more--><\/p>\n<div id=\"what\">\n<h3>What is robots.txt<\/h3>\n<p>The robots.txt file is a simple text file which is read by bots and crawlers to identify how it should crawl the site. The bots that crawl the website are automated and they check for the robots.txt file before accessing the website. We can specify which crawlers are allowed to crawl the site, which directories should not be crawled, crawl rate, etc.<\/p>\n<\/div>\n<div id=\"when\">\n<h3>When to use it?<\/h3>\n<p>The robots.txt file is required only when you want to have some content on your site excluded from the search engines. If you don&#8217;t want to exclude anything (i.e. include everything) on the search engines than you don&#8217;t need robots.txt file.<\/p>\n<p>If you don&#8217;t have a robots.txt file sometimes the server might return a 404 or Permission Denied when trying to access the file and this might cause issues, but it is not a big problem. Hence, it is always better to have robots.txt whether it is blank or with code to allow access to everyone.<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n\tUser-Agent: *\r\n\tDisallow:\r\n<\/pre>\n<p>I would choose to have a robots.txt file with the above code to allow access to everything for all bots rather than having an empty or no robots.txt file<\/p>\n<\/div>\n<div id=\"who_follows\">\n<h3>Do all crawlers follow robots.txt<\/h3>\n<p>Most of the reputed crawlers like Google, Bing, etc follow the robots.txt file. However, there are many crawlers\/bots which simply choose to ignore the robots.txt file. It is not required by each crawler to follow the robots.txt file, so it is always better to protect the content you don&#8217;t want to allow everyone access using passwords.<\/p>\n<\/div>\n<div id=\"how\">\n<h3>How to use it?<\/h3>\n<p>The robots.txt file is a very simple text file which needs to be in the root folder of your domain. If you do not have access to the root domain, then you cannot use robots.txt file to block access. In this case you can use the <a href=\"http:\/\/www.virendrachandak.com\/techtalk\/\/how-and-when-to-use-robots-txt-file#robots_metatag\">robots meta tag<\/a>. Also, pages included in robots.txt file may be still be indexed if the are linked from some other places. So using the Robots <meta \/> tag on the page would prevent it from getting indexed.<\/p>\n<p>You can have different rules for different crawlers, but should have the rule for all crawlers first and then for specific crawlers. If you have your robots.txt file setup as this then the crawler will use the rules for all crawlers and then the specific crawler, with the rules for specific crawler overriding the rules for all crawlers.<\/p>\n<\/div>\n<div id=\"allow_all\">\n<h3>Allow everyone to access the site<\/h3>\n<p>To allow access to all crawlers to the all the pages and directories we can have a blank robots.txt or use the following code in the file.<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n\tUser-Agent: *\r\n\tDisallow:\r\n<\/pre>\n<\/div>\n<div id=\"allow_one\">\n<h3>Allow only one crawler to access the site<\/h3>\n<p>To allow access to only one crawler to the site and disallow all other crawlers<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n\tUser-Agent: GoogleBot\r\n\tDisallow:\r\n\r\n\tUser-Agent: *\r\n\tDisallow: \/\r\n<\/pre>\n<p>This will allow only &#8220;Googlebot&#8221; and disallow all other bots.<\/p>\n<\/div>\n<div id=\"disallow_all\">\n<h3>Disallow everyone from site<\/h3>\n<p>To disallow all crawlers from the site use the follow code:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n\tUser-Agent: *\r\n\tDisallow: \/\r\n<\/pre>\n<p><span style=\"text-decoration:underline;\"><strong>Note<\/strong><\/span>: If you do this than no crawler can crawl your site and this may result in the site not getting indexed in the search engines. Use this only if you really don&#8217;t want your content to be indexed anywhere.<\/p>\n<\/div>\n<div id=\"disallow_dir\">\n<h3>Disallow access to specific directories<\/h3>\n<p>When you want to disallow access to specific directories for all the bots.<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n\tUser-Agent: *\r\n\tDisallow: \/disallow_access\/\r\n\tDisallow: \/restricted\/\r\n<\/pre>\n<p>The above code will instruct all the crawlers to not crawl the &#8220;disallow_access&#8221; and &#8220;restricted&#8221; directories on your domain.<\/p>\n<\/div>\n<div id=\"disallow_bot\">\n<h3>Disallow access to specific bots<\/h3>\n<p>You might want to disallow access to specific bots from accessing your site.<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n\tUser-Agent: Googlebot\r\n\tDisallow: \/restricted\/\r\n<\/pre>\n<p>The above code will instruct &#8220;Googlebot&#8221; to not crawl the &#8220;restricted&#8221; directory on your domain. If you have only this code in your robots.txt file, than only &#8220;Googlebot&#8221; would be instructed not crawl the &#8220;restricted&#8221; directory. All other crawlers are allowed access to that directory.<\/p>\n<\/div>\n<div id=\"disallow_bots_dir\">\n<h3>Disallow different bots from different directories<\/h3>\n<p>To have different rules for different crawlers use the following:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n\tUser-Agent: *\r\n\tDisallow:\r\n\r\n\tUser-Agent: Googlebot\r\n\tDisallow: \/restricted\/\r\n\r\n\tUser-Agent: BadBot\r\n\tDisallow: \/disallow_access\/\r\n<\/pre>\n<p>This would disallow &#8220;GoogleBot&#8221; from &#8220;restricted&#8221;, &#8220;BadBot&#8221; from &#8220;disallow_access&#8221; directories.<\/p>\n<\/div>\n<div id=\"delay_crawlrate\">\n<h3>Delay crawl rate<\/h3>\n<p>You can delay the rate with which the crawler crawls the site. This value is relative with the default crawl rate of that particular crawler. It is best not to use this value for the common, well-behaved bots as they automatically determine the best crawl rate for your site.<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n\tUser-Agent: *\r\n\tCrawl-delay: 1\r\n<\/pre>\n<p>The value for Crawl-delay should be a positive integer. If no value is specified it means use the default crawl rate. If value is 1 it mean crawl slowly, 5 very slow and 10 extremely slow. This value does not affect how frequently a site is crawled, but only how fast it should process the site when it is crawling.<\/p>\n<\/div>\n<div id=\"sitemap\">\n<h3>Specify the location of sitemap<\/h3>\n<p>You can specify the location of your site map in the robots.txt file.<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n\tSitemap: http:\/\/www.virendrachandak.com\/techtalk\/sitemap.xml\r\n<\/pre>\n<\/div>\n<div id=\"robots_metatag\">\n<h3>Robots <meta \/> Tag<\/h3>\n<p>The <meta \/> tag can be used to tell the robots not to index the content of a page. It can also be used to allow\/disallow the crawler to follow the links of the page.<\/p>\n<p>The syntax is<\/p>\n<pre class=\"brush: xml; highlight: [4]; title: ; notranslate\" title=\"\">\r\n\t&lt;html&gt;\r\n\t&lt;head&gt;\r\n\t&lt;title&gt;...&lt;\/title&gt;\r\n\t&lt;META NAME=&quot;ROBOTS&quot; CONTENT=&quot;NOINDEX, NOFOLLOW&quot;&gt;\r\n\t&lt;\/head&gt;\r\n<\/pre>\n<p>To prevent the page from being indexed in the search engines but allows the crawler to follow the links present on the page use the follow <meta \/> tag on your page.<\/p>\n<pre class=\"brush: xml; title: ; notranslate\" title=\"\">\r\n\t&lt;meta name=&quot;robots&quot; content=&quot;noindex, follow&quot;&gt;\t\r\n<\/pre>\n<p><span style=\"text-decoration:underline;\"><strong>Note<\/strong><\/span>: All robots may not follow the <meta \/> tag, they can choose to ignore it.<\/p>\n<\/div>\n<div>In this post I have tried to cover what is robots.txt, when and how to use it. Be careful while using it as you may accidentally disallow all the crawlers from your site.<\/div>\n<div><strong>Related Articles<\/strong>:<\/p>\n<ul>\n<li><a title=\"The Web Robots Pages\" href=\"http:\/\/www.robotstxt.org\/\" target=\"_blank\" rel=\"external nofollow\">The Web Robots Pages<\/a><\/li>\n<li><a title=\"Block or remove pages using a robots.txt file\" href=\"http:\/\/www.google.com\/support\/webmasters\/bin\/answer.py?hl=en&amp;answer=156449&amp;from=35237&amp;rd=1\" target=\"_blank\" rel=\"external nofollow\">Block or remove pages using a robots.txt file<\/a><\/li>\n<li><a title=\"Serious Robots.txt Misuse &amp; High Impact Solutions\" href=\"http:\/\/www.seomoz.org\/blog\/serious-robotstxt-misuse-high-impact-solutions\" target=\"_blank\" rel=\"external nofollow\">Serious Robots.txt Misuse &amp; High Impact Solutions<\/a><\/li>\n<li><a title=\"Robots.txt And 404 pages: sometimes funny, always important\" href=\"http:\/\/econsultancy.com\/us\/blog\/8093-robots-txt-and-404-pages-sometimes-funny-always-important\" target=\"_blank\" rel=\"external nofollow\">Robots.txt And 404 pages: sometimes funny, always important<\/a><\/li>\n<li><a title=\"Google Webmaster Video\" href=\"http:\/\/www.youtube.com\/user\/GoogleWebmasterHelp#p\/u\/1\/P7GY1fE5JQQ\" target=\"_blank\" rel=\"external nofollow\">Google Webmaster Video<\/a><\/li>\n<li><a title=\"Crawl delay and the Bing crawler, MSNBot\" href=\"http:\/\/www.bing.com\/community\/site_blogs\/b\/webmaster\/archive\/2009\/08\/10\/crawl-delay-and-the-bing-crawler-msnbot.aspx\" target=\"_blank\" rel=\"external nofollow\">Crawl delay and the Bing crawler, MSNBot<\/a><\/li>\n<li><a title=\"Robots tag\" href=\"http:\/\/www.robotstxt.org\/meta.html\" target=\"_blank\" rel=\"external nofollow\">Robots <meta \/> tag<\/a><\/li>\n<li><a title=\"Robots.txt Checker\" href=\"http:\/\/tool.motoricerca.info\/robots-checker.phtml\" target=\"_blank\" rel=\"external nofollow\">Robots.txt Checker<\/a><\/li>\n<\/ul>\n<\/div>\n<div><span style=\"text-decoration:underline;\"><strong>Note<\/strong><\/span>: I do not take responsibility for proper functioning of the above mentioned steps under all circumstances. If you download any files, programs from my blog then make sure you protect yourself. I am not responsible for any damages to your computer, website, blog, application or any thing else. I am not affiliated with or do not endorse any of the above mentioned sites.<\/div>\n","protected":false},"excerpt":{"rendered":"<p>We have heard of crawlers and bots are crawling our sites to scrap content for various reasons like indexing in search engines, identifying content, scanning email addresses, etc. There are all kinds of crawlers\/bots which crawl websites. While some are good which should be allowed access to our site, but we might want to restrict [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_monsterinsights_skip_tracking":false,"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[138,6,8],"tags":[45,57,79,80],"class_list":["post-321","post","type-post","status-publish","format-standard","hentry","category-seo","category-server-configuration","category-web-development","tag-meta-element","tag-search-engine-optimization","tag-web-crawler","tag-web-design-and-development"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.1 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>How and when to use robots.txt file - Virendra&#039;s TechTalk<\/title>\n<meta name=\"description\" content=\"We have heard of crawlers and bots are crawling our sites to scrap content for various reasons like indexing in search engines, identifying content,\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How and when to use robots.txt file - Virendra&#039;s TechTalk\" \/>\n<meta property=\"og:description\" content=\"We have heard of crawlers and bots are crawling our sites to scrap content for various reasons like indexing in search engines, identifying content,\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/\" \/>\n<meta property=\"og:site_name\" content=\"Virendra&#039;s TechTalk\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/virendrachandak\" \/>\n<meta property=\"article:author\" content=\"https:\/\/www.facebook.com\/virendrachandak\" \/>\n<meta property=\"article:published_time\" content=\"2011-11-05T20:43:04+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2016-01-26T16:52:56+00:00\" \/>\n<meta name=\"author\" content=\"Virendra Chandak\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@virendrachandak\" \/>\n<meta name=\"twitter:site\" content=\"@virendrachandak\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Virendra Chandak\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/how-and-when-to-use-robots-txt-file\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/how-and-when-to-use-robots-txt-file\\\/\"},\"author\":{\"name\":\"Virendra Chandak\",\"@id\":\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/#\\\/schema\\\/person\\\/63f7ffa1ea125e32af9618d188349e17\"},\"headline\":\"How and when to use robots.txt file\",\"datePublished\":\"2011-11-05T20:43:04+00:00\",\"dateModified\":\"2016-01-26T16:52:56+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/how-and-when-to-use-robots-txt-file\\\/\"},\"wordCount\":1276,\"commentCount\":2,\"publisher\":{\"@id\":\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/#\\\/schema\\\/person\\\/63f7ffa1ea125e32af9618d188349e17\"},\"keywords\":[\"Meta element\",\"Search engine optimization\",\"Web crawler\",\"Web Design and Development\"],\"articleSection\":[\"SEO\",\"Server Configuration\",\"Web Development\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/how-and-when-to-use-robots-txt-file\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/how-and-when-to-use-robots-txt-file\\\/\",\"url\":\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/how-and-when-to-use-robots-txt-file\\\/\",\"name\":\"How and when to use robots.txt file - Virendra&#039;s TechTalk\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/#website\"},\"datePublished\":\"2011-11-05T20:43:04+00:00\",\"dateModified\":\"2016-01-26T16:52:56+00:00\",\"description\":\"We have heard of crawlers and bots are crawling our sites to scrap content for various reasons like indexing in search engines, identifying content,\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/how-and-when-to-use-robots-txt-file\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/how-and-when-to-use-robots-txt-file\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/how-and-when-to-use-robots-txt-file\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"TechTalk\",\"item\":\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"SEO\",\"item\":\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/category\\\/seo\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"How and when to use robots.txt file\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/#website\",\"url\":\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/\",\"name\":\"Virendra's TechTalk\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/#\\\/schema\\\/person\\\/63f7ffa1ea125e32af9618d188349e17\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":[\"Person\",\"Organization\"],\"@id\":\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/#\\\/schema\\\/person\\\/63f7ffa1ea125e32af9618d188349e17\",\"name\":\"Virendra Chandak\",\"logo\":{\"@id\":\"https:\\\/\\\/www.virendrachandak.com\\\/techtalk\\\/#\\\/schema\\\/person\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.virendrachandak.com\",\"https:\\\/\\\/www.facebook.com\\\/virendrachandak\",\"https:\\\/\\\/www.linkedin.com\\\/in\\\/virendrachandak\\\/\",\"https:\\\/\\\/x.com\\\/virendrachandak\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How and when to use robots.txt file - Virendra&#039;s TechTalk","description":"We have heard of crawlers and bots are crawling our sites to scrap content for various reasons like indexing in search engines, identifying content,","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/","og_locale":"en_US","og_type":"article","og_title":"How and when to use robots.txt file - Virendra&#039;s TechTalk","og_description":"We have heard of crawlers and bots are crawling our sites to scrap content for various reasons like indexing in search engines, identifying content,","og_url":"https:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/","og_site_name":"Virendra&#039;s TechTalk","article_publisher":"https:\/\/www.facebook.com\/virendrachandak","article_author":"https:\/\/www.facebook.com\/virendrachandak","article_published_time":"2011-11-05T20:43:04+00:00","article_modified_time":"2016-01-26T16:52:56+00:00","author":"Virendra Chandak","twitter_card":"summary_large_image","twitter_creator":"@virendrachandak","twitter_site":"@virendrachandak","twitter_misc":{"Written by":"Virendra Chandak","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/#article","isPartOf":{"@id":"https:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/"},"author":{"name":"Virendra Chandak","@id":"https:\/\/www.virendrachandak.com\/techtalk\/#\/schema\/person\/63f7ffa1ea125e32af9618d188349e17"},"headline":"How and when to use robots.txt file","datePublished":"2011-11-05T20:43:04+00:00","dateModified":"2016-01-26T16:52:56+00:00","mainEntityOfPage":{"@id":"https:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/"},"wordCount":1276,"commentCount":2,"publisher":{"@id":"https:\/\/www.virendrachandak.com\/techtalk\/#\/schema\/person\/63f7ffa1ea125e32af9618d188349e17"},"keywords":["Meta element","Search engine optimization","Web crawler","Web Design and Development"],"articleSection":["SEO","Server Configuration","Web Development"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/","url":"https:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/","name":"How and when to use robots.txt file - Virendra&#039;s TechTalk","isPartOf":{"@id":"https:\/\/www.virendrachandak.com\/techtalk\/#website"},"datePublished":"2011-11-05T20:43:04+00:00","dateModified":"2016-01-26T16:52:56+00:00","description":"We have heard of crawlers and bots are crawling our sites to scrap content for various reasons like indexing in search engines, identifying content,","breadcrumb":{"@id":"https:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.virendrachandak.com\/techtalk\/how-and-when-to-use-robots-txt-file\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"TechTalk","item":"https:\/\/www.virendrachandak.com\/techtalk\/"},{"@type":"ListItem","position":2,"name":"SEO","item":"https:\/\/www.virendrachandak.com\/techtalk\/category\/seo\/"},{"@type":"ListItem","position":3,"name":"How and when to use robots.txt file"}]},{"@type":"WebSite","@id":"https:\/\/www.virendrachandak.com\/techtalk\/#website","url":"https:\/\/www.virendrachandak.com\/techtalk\/","name":"Virendra's TechTalk","description":"","publisher":{"@id":"https:\/\/www.virendrachandak.com\/techtalk\/#\/schema\/person\/63f7ffa1ea125e32af9618d188349e17"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.virendrachandak.com\/techtalk\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":["Person","Organization"],"@id":"https:\/\/www.virendrachandak.com\/techtalk\/#\/schema\/person\/63f7ffa1ea125e32af9618d188349e17","name":"Virendra Chandak","logo":{"@id":"https:\/\/www.virendrachandak.com\/techtalk\/#\/schema\/person\/image\/"},"sameAs":["https:\/\/www.virendrachandak.com","https:\/\/www.facebook.com\/virendrachandak","https:\/\/www.linkedin.com\/in\/virendrachandak\/","https:\/\/x.com\/virendrachandak"]}]}},"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p2vTtQ-5b","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":205,"url":"https:\/\/www.virendrachandak.com\/techtalk\/tools-for-web-development\/","url_meta":{"origin":321,"position":0},"title":"Tools for Web Development","author":"Virendra Chandak","date":"August 15, 2011","format":false,"excerpt":"With so many tools available for Web Development its hard to find the right tools for the job. In this post I will mention some of the tools that I personally use for Web Development. Also, I will give links to some websites which can be really helpful for certain\u2026","rel":"","context":"In &quot;Tools&quot;","block_context":{"text":"Tools","link":"https:\/\/www.virendrachandak.com\/techtalk\/category\/tools\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":699,"url":"https:\/\/www.virendrachandak.com\/techtalk\/get-search-query-string-from-search-engines-using-php\/","url_meta":{"origin":321,"position":1},"title":"Get search query string from search engines using PHP","author":"Virendra Chandak","date":"February 4, 2012","format":false,"excerpt":"I have often come across situations when I would like to implement some functionality based on what the user searched for in a search engine like Google, Bing etc. The search query string is normally passed as GET variables 'q' or 'query'. The function below will return the search query\u2026","rel":"","context":"In &quot;PHP&quot;","block_context":{"text":"PHP","link":"https:\/\/www.virendrachandak.com\/techtalk\/category\/php\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":255,"url":"https:\/\/www.virendrachandak.com\/techtalk\/more-htaccess-tips\/","url_meta":{"origin":321,"position":2},"title":"more .htaccess tips","author":"Virendra Chandak","date":"October 16, 2011","format":false,"excerpt":"In my previous post .htaccess tips I had started with what is .htaccess file and some things that can be done using it. In this post I'll cover more about .htaccess files. Topics Covered: Directory index file Redirection Preferred domain (www or non-www) Redirect old site to new site Redirect\u2026","rel":"","context":"In &quot;Server Configuration&quot;","block_context":{"text":"Server Configuration","link":"https:\/\/www.virendrachandak.com\/techtalk\/category\/server-configuration\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":545,"url":"https:\/\/www.virendrachandak.com\/techtalk\/404-error-page-best-practices\/","url_meta":{"origin":321,"position":3},"title":"404 Error Page Best Practices","author":"Virendra Chandak","date":"November 20, 2011","format":false,"excerpt":"A 404 error page is the page that the user may reach due to various reasons. Some of the reasons are: The user has entered \/ spelt the URL incorrectly. The page has been moved or deleted. The URL they clicked on was incomplete or cut in an email or\u2026","rel":"","context":"In &quot;Functionality&quot;","block_context":{"text":"Functionality","link":"https:\/\/www.virendrachandak.com\/techtalk\/category\/functionality\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":803,"url":"https:\/\/www.virendrachandak.com\/techtalk\/parts-of-url\/","url_meta":{"origin":321,"position":4},"title":"Parts of URL","author":"Virendra Chandak","date":"May 26, 2012","format":false,"excerpt":"We see so many URLs everyday, but do we know what are the parts of an URL, what does each part means? In this post I will discuss the various parts of the URL. An URL may consists of as many as 6 parts. The different parts of the URL\u2026","rel":"","context":"In &quot;Glossary\/Definitions&quot;","block_context":{"text":"Glossary\/Definitions","link":"https:\/\/www.virendrachandak.com\/techtalk\/category\/glossarydefinitions\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":796,"url":"https:\/\/www.virendrachandak.com\/techtalk\/get-various-parts-url-using-javascript\/","url_meta":{"origin":321,"position":5},"title":"Get various parts of URL using JavaScript","author":"Virendra Chandak","date":"June 24, 2012","format":false,"excerpt":"Recently I wanted to get a domain name of the page on which I was currently on. I was trying to parse the entire URL of the page and then parse it using regular expressions etc. to get the domain name. I was able to get the required information using\u2026","rel":"","context":"In &quot;JavaScript&quot;","block_context":{"text":"JavaScript","link":"https:\/\/www.virendrachandak.com\/techtalk\/category\/javascript\/"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"https:\/\/www.virendrachandak.com\/techtalk\/wp-json\/wp\/v2\/posts\/321","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.virendrachandak.com\/techtalk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.virendrachandak.com\/techtalk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.virendrachandak.com\/techtalk\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.virendrachandak.com\/techtalk\/wp-json\/wp\/v2\/comments?post=321"}],"version-history":[{"count":0,"href":"https:\/\/www.virendrachandak.com\/techtalk\/wp-json\/wp\/v2\/posts\/321\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.virendrachandak.com\/techtalk\/wp-json\/wp\/v2\/media?parent=321"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.virendrachandak.com\/techtalk\/wp-json\/wp\/v2\/categories?post=321"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.virendrachandak.com\/techtalk\/wp-json\/wp\/v2\/tags?post=321"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}