Lesli RoseSEO & AI Discoverability

AI Crawler Directives:
Who Gets to Read Your Site

By Lesli Rose · April 3, 2026 · 9 min read

Your robots.txt file controls which bots can crawl your website. That includes AI crawlers -- GPTBot from OpenAI, ClaudeBot from Anthropic, PerplexityBot, Google-Extended, and Applebot-Extended. If you are blocking them, AI systems cannot read your site, and they cannot recommend your business. Most sites I audit are accidentally blocking at least one.

This is not a theoretical problem. When someone asks ChatGPT "who is the best dentist in my city" or Perplexity "best accounting firm for small business," those systems reference sites they can actually read. If your robots.txt says "no" to GPTBot, ChatGPT literally cannot access your pages. Your competitors who allow it get cited. You do not.

The AI Crawlers You Need to Know

There are five primary AI crawlers operating right now. Each one represents a major AI platform that millions of people use daily to find businesses, compare services, and make purchasing decisions.

GPTBot (OpenAI). Powers ChatGPT. Over 200 million weekly active users. When someone asks ChatGPT for a recommendation, GPTBot is how ChatGPT knew about the businesses it suggests. User-agent: GPTBot.

ClaudeBot (Anthropic). Powers Claude, used by millions for research and business queries. User-agent: ClaudeBot.

PerplexityBot. Powers Perplexity, the AI search engine that cites sources directly. Perplexity is growing fast as a Google alternative. User-agent: PerplexityBot.

Google-Extended. Controls whether Google can use your content for AI Overviews and Gemini. This is separate from Googlebot -- you can allow traditional search crawling but block AI training. User-agent: Google-Extended.

Applebot-Extended. Powers Apple Intelligence, Siri, and Apple's AI features across every iPhone, iPad, and Mac. User-agent: Applebot-Extended.

How robots.txt Controls AI Access

The robots.txt file sits at the root of your domain (yourdomain.com/robots.txt). It is a plain text file that tells crawlers what they can and cannot access. The syntax is simple but the consequences are significant.

Here is what a robots.txt looks like when it blocks all AI crawlers:

# Blocking AI crawlers -- NOT recommended

User-agent: GPTBot

Disallow: /

User-agent: ClaudeBot

Disallow: /

User-agent: PerplexityBot

Disallow: /

User-agent: Google-Extended

Disallow: /

User-agent: Applebot-Extended

Disallow: /

And here is what it looks like when you allow AI crawlers while still protecting sensitive areas:

# Allowing AI crawlers -- recommended

User-agent: GPTBot

Allow: /

Disallow: /admin/

Disallow: /private/

User-agent: ClaudeBot

Allow: /

Disallow: /admin/

User-agent: PerplexityBot

Allow: /

Disallow: /admin/

User-agent: Google-Extended

Allow: /

User-agent: Applebot-Extended

Allow: /

The difference between these two configurations is whether AI systems can recommend your business or not. That is the stakes.

How Security Plugins Accidentally Block AI

This is the most common issue I find during audits, especially on WordPress sites. Security plugins like Wordfence, Sucuri, iThemes Security, and All In One Security have bot-blocking features designed to prevent malicious crawlers. The problem is that AI crawlers are relatively new, and these plugins often categorize them as unknown or suspicious bots.

I audited a blogging educator's website last month that had blocked every single AI crawler through their security plugin. The site owner had no idea. They had published hundreds of blog posts about content creation -- exactly the kind of expertise AI systems want to reference -- and none of it was accessible to ChatGPT, Claude, or Perplexity.

The fix took five minutes. Whitelisting the AI crawler user-agents in the plugin settings. But the business had been invisible to AI for months because of a default security setting they never reviewed.

CDN and Hosting Firewalls Do It Too

It is not just plugins. Cloudflare, Sucuri CDN, and other web application firewalls (WAFs) can block AI crawlers at the network level -- before they even reach your server. These blocks do not show up in your robots.txt file, making them harder to diagnose.

If your site uses a CDN or WAF, check the firewall rules and bot management settings. Look for any rules that block "AI bots," "scraper bots," or specific user-agents like GPTBot. Some hosting providers have also started offering AI crawler blocking as a "feature" -- which is helpful for sites with proprietary content but devastating for businesses that need AI visibility.

The Allow vs Block Decision

There are legitimate reasons to block AI crawlers. If your business model depends on proprietary content behind a paywall -- research reports, training courses, premium articles -- you may not want AI systems summarizing your content for free. Major publishers like The New York Times and The Wall Street Journal block AI crawlers because their content is the product.

But for service businesses, local businesses, professional practices, and anyone whose website exists to generate leads -- blocking AI crawlers is self-sabotage. Your website is a marketing tool. You want AI systems reading it. You want ChatGPT to know about your services, your expertise, your location, and your reviews. Every AI crawler you block is a recommendation channel you are shutting off.

Block AI crawlers if: Your content is the product (paywalled articles, premium courses, proprietary research).

Allow AI crawlers if: Your website generates leads for your business (service pages, about us, blog posts, reviews, case studies).

How to Check Your Site Right Now

This takes sixty seconds. Open a browser and go to yourdomain.com/robots.txt. Look for any mention of GPTBot, ClaudeBot, PerplexityBot, Google-Extended, or Applebot-Extended. If you see "Disallow: /" after any of those user-agents, that AI system is blocked from your entire site.

If you do not see any AI crawlers mentioned at all, that is actually fine -- robots.txt operates on an "allow by default" basis. If a crawler is not mentioned, it is allowed. The problem is only when they are explicitly blocked.

Next, check your security plugins and CDN settings. This is where the hidden blocks live. Look for bot protection, firewall rules, or any setting that mentions AI bots, scrapers, or specific user-agent strings.

The Business Cost of Blocking

I want to be direct about the stakes here. AI-assisted search is not a future trend. It is happening now. ChatGPT has over 200 million weekly users. Google AI Overviews appear on an increasing percentage of search results. Perplexity is growing as a primary search tool. Apple Intelligence is built into every new iPhone.

Every day your site blocks these crawlers, your competitors who allow them are building AI visibility that compounds over time. AI systems learn and remember. The businesses they can access today become the businesses they recommend tomorrow. Catching up later is possible, but you are starting from behind.

The fix is free. It takes minutes. And the upside is being recommended by the AI systems that millions of people use daily to find exactly what your business offers. There is no good reason for a lead-generation website to block AI crawlers. Check your robots.txt today. Check your plugins. Check your CDN. Make sure the door is open.

Frequently Asked Questions

What is GPTBot and should I allow it?

GPTBot is OpenAI's web crawler. It gathers content so ChatGPT can reference your business in answers. If you allow it, your site can be recommended when someone asks ChatGPT about your industry or location. Blocking it means ChatGPT cannot read your pages directly.

How do I check if my site blocks AI crawlers?

Go to yourdomain.com/robots.txt and look for user-agents like GPTBot, ClaudeBot, PerplexityBot, Google-Extended, or Applebot-Extended. If any are followed by "Disallow: /" they are blocked. Also check your security plugins and CDN firewall rules for hidden bot blocks.

Do WordPress security plugins block AI crawlers?

Yes, many do. Wordfence, Sucuri, and similar plugins often block unknown bots by default. AI crawlers are new enough that they get caught in blanket bot-blocking rules. Check your plugin's bot protection or firewall settings to whitelist them.

What happens if I block all AI crawlers?

AI systems cannot read your website directly. They may still reference your business through third-party sources like review sites and directories, but they cannot access your pages, your content, or your structured data. This significantly reduces your chances of being recommended.

Are You Blocking AI Without Knowing It?

I'll check your robots.txt, security plugins, and CDN settings for AI crawler blocks -- and fix them. Free audit, no commitment.

Get Your AI Visibility Audit