Glossary/technical

robots.txt for AI

Last updated March 22, 2026

Definition

Quick answer
robots.txt for AI refers to the practice of configuring your robots.txt file to explicitly manage access for AI-specific crawlers such as GPTBot, ClaudeBot, PerplexityBot, and Google-Extended. It is the gateway control that determines whether AI engines can discover and use your content in their responses.
Full definition

What is robots.txt for AI?

robots.txt for AI extends the traditional robots.txt paradigm into the age of answer engines. While robots.txt has governed search engine crawler access since the 1990s, the emergence of AI-specific crawlers requires brands to rethink their crawl directives with a new set of user agents, trade-offs, and strategic considerations.

The major AI crawlers each have distinct user-agent strings: GPTBot (OpenAI, powering ChatGPT), ClaudeBot and anthropic-ai (Anthropic, powering Claude), PerplexityBot (Perplexity), Google-Extended (Google, used for Gemini training data), and DeepSeekBot (DeepSeek). Each can be individually allowed or blocked in robots.txt, giving brands granular control over which AI companies can access their content.

The strategic calculation is more nuanced than traditional SEO crawl management. Blocking Googlebot was almost never advisable because it meant losing search visibility entirely. With AI crawlers, the trade-off is different: allowing them means your content can inform AI-generated responses (increasing AI visibility), but it also means your content may be used in model training. Some brands choose to allow retrieval-focused crawlers (like PerplexityBot, which fetches content for real-time answers) while blocking training-focused crawlers (like Google-Extended, which collects data for model training).

A common and costly mistake is inadvertently blocking AI crawlers. Many older robots.txt files contain broad Disallow rules or fail to explicitly allow newer AI user agents. Since some AI crawlers respect a default deny when no specific directive exists, auditing your robots.txt for AI crawler coverage is a critical first step in any Technical AEO programme.

Beyond basic allow and block directives, robots.txt for AI also involves setting appropriate crawl-delay values to manage server load from AI bots, which can be more aggressive than traditional search crawlers. Additionally, the emerging .well-known/ai.txt file complements robots.txt by providing richer metadata and preferences to AI systems, such as preferred content types, expertise signals, and licensing information.

Brands should treat robots.txt for AI as a living document. As new AI crawlers emerge and existing ones change their user-agent strings, regular audits ensure your crawl policies remain aligned with your AI visibility strategy.

Context

Why it matters

robots.txt is the first gate AI crawlers encounter when visiting your site. If AI bots are blocked—whether intentionally or by accident—your content cannot be indexed, retrieved, or cited by answer engines. Configuring robots.txt for AI is the lowest-effort, highest-impact Technical AEO action a brand can take.

Examples

Real-world examples

  • 1

    Auditing a legacy robots.txt and discovering that a blanket Disallow rule was preventing GPTBot and ClaudeBot from accessing product pages

  • 2

    Configuring selective access: allowing PerplexityBot and GPTBot for real-time retrieval while blocking Google-Extended to limit training data usage

  • 3

    Adding crawl-delay directives for AI bots after server logs showed PerplexityBot generating 10x more requests than Googlebot during peak hours

robots.txt for AI FAQ

Frequently asked questions about robots.txt for AI

Related terms

AI Crawlers

technical

AI Crawlers are automated bots operated by AI companies that scan websites to collect content for training data and real-time retrieval. Major AI crawlers include GPTBot (OpenAI), ClaudeBot (Anthropic), PerplexityBot (Perplexity), Google-Extended (Google), and Bingbot (Microsoft).

Technical AEO

technical

Technical AEO encompasses the infrastructure and technical configurations that help AI engines discover, crawl, parse, and cite your content. It includes AI-specific crawl policies, structured data implementation, llms.txt files, site architecture optimisation, and content formatting for AI consumption.

AI Crawler Visibility

technical

AI Crawler Visibility measures whether AI crawlers can reach, fetch, and interpret the pages that should influence your brand's presence in AI-generated answers. It is the technical visibility layer behind citation and recommendation outcomes.

Crawl Budget for AI

technical

Crawl Budget for AI refers to the finite capacity AI crawlers allocate to discovering and processing pages on your site. Managing it ensures that your most important content — category pages, comparison pages, glossary entries, and proof pages — is prioritised for AI engine consumption.

llms.txt

technical

llms.txt is a plain-text file placed at a website's root that provides structured, machine-readable information about a brand, product, or organisation specifically for consumption by large language models. It functions as a "robots.txt for AI" — telling AI crawlers what your brand is and how it should be described.

Get started

Start with the pages and proof that AI can actually use

Run the free audit to see what blocks AI from citing your site. Use the trial when you need ongoing monitoring, attribution, prompt discovery, and team workflows after the first fixes are live.