Jun 3, 2026 · 11:47 PM
Subscribe
Home Ai

How Anthropic accidentally became the silent knowledge base powering millions of ChatGPT queries

A 2023 incident in which ChatGPT's browsing plugin repeatedly retrieved and summarized content from Anthropic's public documentation to answer millions of user queries is trending again in 2026. The episode is being used to illustrate the "Vampire Data" dynamic, where AI products quietly draw on competitors' publicly available content to power their own outputs. With Claude 4 and GPT-5 now competing directly for market leadership, the data governance questions it raised remain unresolved.

Janet Harrison
· 4 min read · 82 views
How Anthropic accidentally became the silent knowledge base powering millions of ChatGPT queries

A 2023 episode involving ChatGPT's browsing plugin and Anthropic's public documentation is trending again, this time as a lens on the data power struggles defining AI competition in 2026.

The phrase "that time Anthropic played 2.5 million ChatGPT users" has resurfaced across Reddit and X this week, and while the incident it references is nearly three years old, the reasons people are talking about it now say everything about where the AI industry currently stands. In the spring of 2023, OpenAI rolled out a web browsing beta for ChatGPT Plus subscribers. The plugin let ChatGPT retrieve and summarize live web content in response to user queries. It was a meaningful upgrade. What OpenAI did not fully anticipate, or at least did not publicly account for, was that one of the most-indexed destinations for AI-related queries would be the website of its most direct competitor.

Anthropic, founded by former OpenAI executives Dario and Daniela Amodei, had built out substantial public documentation for its Claude assistant and its underlying research. When ChatGPT's browsing agent crawled the open web to answer questions about AI models, safety techniques, and large language model behavior, Anthropic's materials came up repeatedly. Because the plugin summarized retrieved content rather than linking users through to the original source, those 2.5 million Plus subscribers, plus an unknown volume of free-tier users, were effectively receiving Anthropic-sourced answers without ever visiting Claude.ai.

The revival of the story is not driven by new legal action or any official statement from either company. It is being used as shorthand for a concept that researchers and commentators are now calling "Vampire Data," the dynamic where AI products quietly drain the publicly available intellectual output of competitors to power their own retrieval and generation. The term is pointed, and it is catching on because it captures something real about how LLM-based products are built and sustained. Public websites, documentation, and research papers remain foundational to both training pipelines and retrieval-augmented systems, regardless of who published them.

The 2023 episode is now being treated as an early, vivid illustration of that dynamic rather than a one-off curiosity. At the time it generated some commentary in AI circles but did not escalate into a formal dispute. Anthropic did not publicly demand that OpenAI exclude its content. OpenAI eventually pulled the browsing plugin temporarily over separate concerns about the tool accessing paywalled content, and the moment passed without resolution.

What It Reflects About the 2026 Competitive Landscape

The reason the anecdote lands differently now is that the stakes have risen considerably. Claude 4 and GPT-5 are the two products most directly competing for enterprise and consumer AI spend this year, and the data governance questions that felt theoretical in 2023 have become operational and legal realities. Multiple ongoing court cases involve AI companies and the sources they used to train their models. Robots.txt compliance, content licensing, and retrieval attribution are being negotiated, litigated, and lobbied over simultaneously.

The Anthropic-ChatGPT browsing episode predates most of that legal infrastructure, which is partly why it remains unresolved in any formal sense. But it established a pattern that has only become more entrenched: the open web subsidizes all the major players, and the companies producing the most useful public content often have the least control over how that content powers rival products.

For anyone watching the Claude-GPT competitive dynamic through 2026, the practical question is whether either company moves aggressively to restrict how its public content is indexed and retrieved by competing systems. Anthropic in particular has strong incentives to revisit that calculus. The more capable retrieval systems become, the more valuable a well-documented, clearly written knowledge base is as a backend resource, whether or not its authors intended it to function that way. That is the unresolved tension the 2023 incident put on the table, and the current conversation suggests the industry has not come close to settling it.

Also read: ChatGPT users are calling the chatbot rude and OpenAI has yet to explain whyMost knowledge workers use AI every day and verify almost everything it tells themA 23-year-old used ChatGPT to diagnose a rare genetic disorder her doctors missed for years

TOPICS
Janet Harrison has over 16 years experience in the financial services industry giving her a vast understanding of how news affects the financial markets, and an early adopter of blockchain technology and digital currencies. Janet is an active holder and trader spending the majority of her time analyzing blockchain projects, reports and watching new and upcoming projects and other initiatives in the industry. She has a Masters Degree in Economics with previous roles counting Investment Banking.
Related Articles
More posts →
Loading next article…
You're all caught up