What is robots.txt file?

The robots.txt file is a text document that informs the search engine spiders which parts of your website you would like to be crawled and which parts you do not want the search engines crawling. For example, if you have large source files, crawling them may overload the server and consume the search engine crawlers’ time indexing website files. You can create the robots.txt file using text editors, such as TextEdit for the Mac and Notepad for Windows.

How to Create A robots.txt File?

Answer: In the top-level directory of your web server.

When a robot looks for the “/robots.txt” file for URL, it strips the path component from the URL (everything from the first single slash), and puts “/robots.txt” in its place.

For example, for “http://www.example.com/code/index.html, it will remove the “/code/index.html“, and replace it with “/robots.txt“, and will end up with “http://www.example.com/robots.txt”.

So, as a web site owner you need to put it in the right place on your web server for that resulting URL to work. Usually that is the same place where you put your web site’s main “index.html” welcome page. Where exactly that is, and how to put the file there, depends on your web server software.

Remember to use all lower case for the filename: “robots.txt“, not “Robots.TXT.

What To Put in robots.txt ?

Answer: The “/robots.txt” file is a text file, with one or more records. Usually contains a single record looking like this:

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /~joe/

In this example, three directories are excluded.

Notes:

1. You need a separate “Disallow” line for every URL prefix you want to exclude.

2. You cannot type “Disallow: /cgi-bin/ /tmp/” on a single line

3. You may not have blank lines in a record, as they are used to delimit multiple records.

4. The ‘*’ in the User-agent field is a special value meaning “any robot”.

5. you cannot have lines like “User-agent: *bot*”, “Disallow: /tmp/*” or “Disallow: *.gif”. What you want to exclude depends on your server.

Here follow some examples:

1: To exclude all robots from the entire server

User-agent: *
Disallow: /

2: To allow all robots complete access

User-agent: *
Disallow:

(or just create an empty “/robots.txt” file, or don’t use one at all)

3: To exclude all robots from part of the server

User-agent: *
Disallow: /cgi-bin/
Disallow: /tmp/
Disallow: /junk/

4: To exclude a single robot

User-agent: BadBot
Disallow: /

5: To allow a single robot

User-agent: Google
Disallow:

User-agent: *
Disallow: /

6: To exclude all files except one

User-agent: *
Disallow: /~joe/stuff/

7: To allow specific URLs Only

User-agent: *
Allow: /$
Allow: /developer
Disallow: /

Alternatively you can explicitly disallow all disallowed pages:

User-agent: *
Disallow: /~joe/junk.html 
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html

I hope you enjoyed reading, Please subscribe us to get more stuff like this. 🙂

A Programming and Technology Blog

Integrate Mailchimp API with PHP for Newsletter

Add Multiple Markers on Google MAPS (Using API 3)

Upload Photo or Image in PHP with Example code

WordPress Widgets Losing Widgets Text After Server Migration

How to use Authorize.net payment gateway (SIM) with PHP

Export MySQL data to Excel File (.xls)

Integrate Mailchimp API with PHP for Newsletter

Add Multiple Markers on Google MAPS (Using API 3)

How to submit form using jQuery with Example Code

Microsoft Dynamics CRM Online Increasing Transparency And Security Across Businesses

Microsoft Could Kill Internet Explorer; New Spartan Browser Coming Soon

AROUND(): A Google Search Operator

HOW TO DISABLE WINDOWS 8/8.1 LOCK SCREEN

5 Proven Steps To Make Your WordPress Site Safe and Sound

Auto Submit Password on Protected WordPress Pages

How to submit form using jQuery with Example Code

130,000 Chinese Rail passengers data leaked via official railway ticketing website

About the Author Akash Sharma

No Comment

Leave a reply Cancel reply