AI-Friendly robots.txt Template
Configure your robots.txt to ensure AI crawlers can access the content that drives your AI visibility.
About this template
This template provides copy-paste robots.txt configurations for common scenarios, along with guidance on which AI-specific user agents to allow and how to balance access control with visibility. It covers GPTBot, Google-Extended, PerplexityBot, Anthropic-AI, and other AI crawlers.
The template is designed to be practical. Each configuration block is annotated with explanations so you understand exactly what each directive does and can customise it for your specific needs without accidentally blocking valuable AI traffic.
AI Crawler User Agents
- 1
GPTBot (OpenAI / ChatGPT)
Used by OpenAI to crawl content for ChatGPT responses and training. Blocking GPTBot removes you from ChatGPT citations.
- 2
Google-Extended (Gemini)
Google's AI-specific crawler for Gemini. Separate from Googlebot — blocking it does not affect Google Search rankings.
- 3
PerplexityBot (Perplexity AI)
Perplexity's crawler for real-time search-augmented responses. One of the most active AI crawlers.
- 4
ClaudeBot / Anthropic-AI (Claude)
Anthropic's crawler for Claude. Allow access to appear in Claude's responses and citations.
- 5
Bytespider (TikTok / Doubao)
ByteDance's crawler used for AI features. Relevant for brands targeting younger demographics.
- 6
Meta-ExternalAgent (Meta AI)
Meta's AI crawler for their AI assistant products across Facebook, Instagram, and WhatsApp.
Recommended Base Configuration
- 1
Allow all AI crawlers access to public content pages
The default stance should be permissive — allow access unless you have a specific reason to block.
- 2
Block access to admin, staging, and internal paths
Prevent AI crawlers from indexing /admin, /staging, /internal, and similar non-public paths.
- 3
Block access to user-generated content sections
Forums, comments, and user uploads may contain content you do not want AI engines to cite.
- 4
Allow access to your blog, docs, and resource pages
These high-value content pages are most likely to be cited by AI engines.
- 5
Reference your sitemap URL in robots.txt
Add Sitemap: directive pointing to your XML sitemap for AI crawler discovery.
Selective Access Patterns
- 1
Per-engine access control for competitive reasons
You can allow some AI crawlers while blocking others based on where your audience searches.
- 2
Allow crawling but block training with meta tags
Use noai or noimageai meta tags on specific pages to allow citations while opting out of training.
- 3
Crawl-delay directives for rate limiting
If AI crawlers cause server load issues, use Crawl-delay to throttle rather than block entirely.
- 4
Conditional access for gated content
Block paywalled or gated content from AI crawlers to avoid giving away premium material.
Testing & Validation
- 1
Use Google Search Console robots.txt tester
Verify your file parses correctly and that key pages are not accidentally blocked.
- 2
Test with multiple user agents
Check that each AI crawler user agent gets the expected allow/disallow results.
- 3
Verify no wildcard rules accidentally block AI crawlers
Broad Disallow rules can unintentionally block AI user agents if not scoped correctly.
- 4
Monitor server logs for AI crawler activity
Check that AI crawlers are actually visiting after you update robots.txt.
- 5
Re-test after any CMS or hosting migration
Platform migrations often reset or overwrite robots.txt — always verify post-migration.
AI-Friendly robots.txt Template FAQ
Explore more resources
Complete AEO Audit Checklist
A step-by-step audit to evaluate and improve your site's readiness for AI answer engines.
llms.txt Template and Guide
Create and optimise your llms.txt file to help AI engines understand your site's purpose and structure.
Schema Markup Checklist for AI
Implement the essential structured data that helps AI engines understand and cite your content accurately.
Start with the pages and proof that AI can actually use
Run the free audit to see what blocks AI from citing your site. Use the trial when you need ongoing monitoring, attribution, prompt discovery, and team workflows after the first fixes are live.