The robots.txt file is a text document that informs the search engine spiders which parts of your website you would like to be crawled and which parts you do not want the search engines crawling. For example, if you have large source files, crawling them may overload the server and consume the search engine crawlers’ time indexing website files. You can create the robots.txt file using text editors, such as TextEdit for the Mac and Notepad for Windows.
How to Create A robots.txt File?
Answer: In the top-level directory of your web server.
When a robot looks for the “/robots.txt” file for URL, it strips the path component from the URL (everything from the first single slash), and puts “/robots.txt” in its place.
For example, for “http://www.example.com/code/index.html, it will remove the “/code/index.html“, and replace it with “/robots.txt“, and will end up with “http://www.example.com/robots.txt”.
So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site’s main “index.html” welcome page. Where exactly that is, and how to put the file there, depends on your web server software.
Remember to use all lower case for the filename: “robots.txt“, not “Robots.TXT.
What To Put in robots.txt ?
Answer: The “/robots.txt” file is a text file, with one or more records. Usually contains a single record looking like this:
User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /~joe/
In this example, three directories are excluded.
1. You need a separate “Disallow” line for every URL prefix you want to exclude.
2. You cannot type “Disallow: /cgi-bin/ /tmp/” on a single line
3. You may not have blank lines in a record, as they are used to delimit multiple records.
4. The ‘*’ in the User-agent field is a special value meaning “any robot”.
5. you cannot have lines like “User-agent: *bot*”, “Disallow: /tmp/*” or “Disallow: *.gif”. What you want to exclude depends on your server.
Here follow some examples:
1: To exclude all robots from the entire server
User-agent: * Disallow: /
2: To allow all robots complete access
User-agent: * Disallow:
(or just create an empty “/robots.txt” file, or don’t use one at all)
3: To exclude all robots from part of the server
User-agent: * Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /junk/
4: To exclude a single robot
User-agent: BadBot Disallow: /
5: To allow a single robot
User-agent: Google Disallow: User-agent: * Disallow: /
6: To exclude all files except one
User-agent: * Disallow: /~joe/stuff/
7: To allow specific URLs Only
User-agent: * Allow: /$ Allow: /developer Disallow: /
Alternatively you can explicitly disallow all disallowed pages:
User-agent: * Disallow: /~joe/junk.html Disallow: /~joe/foo.html Disallow: /~joe/bar.html
I hope you enjoyed reading, Please subscribe us to get more stuff like this. 🙂