AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended)

The web agents AI companies use to read, train on and cite the web.

AI crawlers are the web agents AI companies use to read web content, each independently controllable in robots.txt. A crucial distinction is between training crawlers and search/citation crawlers — they have separate purposes and separate tokens. OpenAI runs GPTBot (training data for foundation models), OAI-SearchBot (powers ChatGPT search results and citations) and ChatGPT-User (user-initiated page fetches). Anthropic runs ClaudeBot (training), Claude-SearchBot (indexing for Claude's search) and Claude-User (user-triggered fetch).

Perplexity runs PerplexityBot, which it states surfaces and links sites in results and is "not used to crawl content for AI foundation models," plus Perplexity-User. Google-Extended (introduced 28 September 2023) is not a separate crawler but a robots.txt token that controls whether content trains and grounds Gemini; Google states it "does not impact a site's inclusion in Google Search nor is it used as a ranking signal." The practical takeaway: to be cited you must allow the relevant search/citation crawlers (OAI-SearchBot, PerplexityBot, Claude-SearchBot/Claude-User) — blocking only training crawlers does not make you citable, and blocking search crawlers makes you invisible.

Sources

Overview of OpenAI Crawlers (GPTBot, OAI-SearchBot, ChatGPT-User) — OpenAI
Does Anthropic crawl the web, and how to block the crawler — Anthropic (Claude Help Center)
Perplexity Crawlers (PerplexityBot, Perplexity-User) — Perplexity
Google-Extended — Google's common crawlers — Google Search Central

AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended)

Sources

Related terms

Put these concepts to work — for free