The robots.txt file is a text document that informs the search engine spiders which parts of your website you would like to be crawled and which parts you do not want the search engines crawling. For example, if you have large source files, crawling them may overload the server and consume the search engine crawlers’ time indexing website files. You can create the robots.txt file using text editors, such as TextEdit for the Mac and Notepad for Windows.

How to Create A robots.txt File?

Answer: In the top-level directory of your web server.

When a robot looks for the “/robots.txt” file for URL, it strips the path component from the URL (everything from the first single slash), and puts “/robots.txt” in its place.

For example, for “http://www.example.com/code/index.html, it will remove the “/code/index.html“, and replace it with “/robots.txt“, and will end up with “http://www.example.com/robots.txt”.

So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site’s main “index.html” welcome page. Where exactly that is, and how to put the file there, depends on your web server software.

Remember to use all lower case for the filename: “robots.txt“, not “Robots.TXT.

What To Put in robots.txt ?

Answer: The “/robots.txt” file is a text file, with one or more records. Usually contains a single record looking like this:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/

In this example, three directories are excluded.

Notes: 

1. You need a separate “Disallow” line for every URL prefix you want to exclude.

2. You cannot type “Disallow: /cgi-bin/ /tmp/” on a single line

3. You may not have blank lines in a record, as they are used to delimit multiple records.

4. The ‘*’ in the User-agent field is a special value meaning “any robot”.

5.  you cannot have lines like “User-agent: *bot*”, “Disallow: /tmp/*” or “Disallow: *.gif”. What you want to exclude depends on your server.

Here follow some examples:

1: To exclude all robots from the entire server

User-agent: *
Disallow: /

2: To allow all robots complete access

User-agent: *
Disallow:

(or just create an empty “/robots.txt” file, or don’t use one at all)

3: To exclude all robots from part of the server

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/

4: To exclude a single robot

User-agent: BadBot
Disallow: /

5: To allow a single robot

User-agent: Google
Disallow:
User-agent: *
Disallow: /

6: To exclude all files except one

User-agent: *
Disallow: /~joe/stuff/

7: To allow specific URLs Only

User-agent: *
Allow: /$
Allow: /developer
Disallow: /

Alternatively you can explicitly disallow all disallowed pages:

User-agent: *
Disallow: /~joe/junk.html 
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html

I hope you enjoyed reading, Please subscribe us to get more stuff like this. 🙂