What is Robots.txt? In Plain English

Robots.txt is a small file that tells search engines which parts of your site they can or cannot crawl. Here's what it means in plain English.

In Plain English

Robots.txt is a small text file that tells search engines which parts of your website they can or cannot crawl.

Think of it like a sign on a shop door that says “Staff Only” or “Open to Public.” It doesn’t stop people from entering, but it gives clear instructions.

How Robots.txt Works

  • Search engines (like Google) look for robots.txt at the root of your domain:
    • Example: https://www.example.com/robots.txt
  • The file contains simple rules called directives, such as:
    • Allow – tells search engines they can crawl.
    • Disallow – tells them not to crawl a page or folder.

Example:

User-agent: *
Disallow: /private/
Allow: /public/

This means: “All search engines can crawl the /public/ folder, but not the /private/ folder.”

Why Robots.txt Matters for SEO

  • Controls crawling: Stops search engines wasting time on unimportant pages.
  • Protects sensitive areas: Prevents admin or test pages from appearing in search.
  • Improves efficiency: Helps search engines focus on your important content.

⚠️ Note: Robots.txt only controls crawling, not indexing. If a page is linked elsewhere, Google may still index it.

FAQs

Q: What is robots.txt?
It’s a text file that tells search engines which pages or folders they can and cannot crawl.

Q: Where do I find my robots.txt file?
At the root of your domain, e.g. https://www.example.com/robots.txt.

Q: Can robots.txt stop a page from appearing in Google?
Not always. It can block crawling, but a page may still appear in results if other sites link to it.