Resource
Template

AI-Friendly robots.txt Template

Configure your robots.txt to ensure AI crawlers can access the content that drives your AI visibility.

Quick answer
Your robots.txt file is the first thing AI crawlers check before indexing your content. A misconfigured robots.txt is one of the most common reasons brands have low AI visibility — if you block AI crawlers, your content simply cannot appear in AI engine responses.
Overview

About this template

This template provides copy-paste robots.txt configurations for common scenarios, along with guidance on which AI-specific user agents to allow and how to balance access control with visibility. It covers GPTBot, Google-Extended, PerplexityBot, Anthropic-AI, and other AI crawlers.

The template is designed to be practical. Each configuration block is annotated with explanations so you understand exactly what each directive does and can customise it for your specific needs without accidentally blocking valuable AI traffic.

AI Crawler User Agents

  • 1

    GPTBot (OpenAI / ChatGPT)

    Used by OpenAI to crawl content for ChatGPT responses and training. Blocking GPTBot removes you from ChatGPT citations.

  • 2

    Google-Extended (Gemini)

    Google's AI-specific crawler for Gemini. Separate from Googlebot — blocking it does not affect Google Search rankings.

  • 3

    PerplexityBot (Perplexity AI)

    Perplexity's crawler for real-time search-augmented responses. One of the most active AI crawlers.

  • 4

    ClaudeBot / Anthropic-AI (Claude)

    Anthropic's crawler for Claude. Allow access to appear in Claude's responses and citations.

  • 5

    Bytespider (TikTok / Doubao)

    ByteDance's crawler used for AI features. Relevant for brands targeting younger demographics.

  • 6

    Meta-ExternalAgent (Meta AI)

    Meta's AI crawler for their AI assistant products across Facebook, Instagram, and WhatsApp.

Recommended Base Configuration

  • 1

    Allow all AI crawlers access to public content pages

    The default stance should be permissive — allow access unless you have a specific reason to block.

  • 2

    Block access to admin, staging, and internal paths

    Prevent AI crawlers from indexing /admin, /staging, /internal, and similar non-public paths.

  • 3

    Block access to user-generated content sections

    Forums, comments, and user uploads may contain content you do not want AI engines to cite.

  • 4

    Allow access to your blog, docs, and resource pages

    These high-value content pages are most likely to be cited by AI engines.

  • 5

    Reference your sitemap URL in robots.txt

    Add Sitemap: directive pointing to your XML sitemap for AI crawler discovery.

Selective Access Patterns

  • 1

    Per-engine access control for competitive reasons

    You can allow some AI crawlers while blocking others based on where your audience searches.

  • 2

    Allow crawling but block training with meta tags

    Use noai or noimageai meta tags on specific pages to allow citations while opting out of training.

  • 3

    Crawl-delay directives for rate limiting

    If AI crawlers cause server load issues, use Crawl-delay to throttle rather than block entirely.

  • 4

    Conditional access for gated content

    Block paywalled or gated content from AI crawlers to avoid giving away premium material.

Testing & Validation

  • 1

    Use Google Search Console robots.txt tester

    Verify your file parses correctly and that key pages are not accidentally blocked.

  • 2

    Test with multiple user agents

    Check that each AI crawler user agent gets the expected allow/disallow results.

  • 3

    Verify no wildcard rules accidentally block AI crawlers

    Broad Disallow rules can unintentionally block AI user agents if not scoped correctly.

  • 4

    Monitor server logs for AI crawler activity

    Check that AI crawlers are actually visiting after you update robots.txt.

  • 5

    Re-test after any CMS or hosting migration

    Platform migrations often reset or overwrite robots.txt — always verify post-migration.

FAQ

AI-Friendly robots.txt Template FAQ

Get started

Start with the pages and proof that AI can actually use

Run the free audit to see what blocks AI from citing your site. Use the trial when you need ongoing monitoring, attribution, prompt discovery, and team workflows after the first fixes are live.