Anthropic's Claude Opus 4.6 appears to be underperforming significantly compared to recent weeks, with users citing benchmark scores as low as 40% , well below ChatGPT 4.5's 63% on comparable tests , fueling speculation that the company is quietly throttling resources ahead of a new release.
Something feels off with Claude Opus 4.6, and users are increasingly convinced it isn't just them. Over the past several days, developers, researchers, and everyday power users have taken to forums and social platforms to report a noticeable and consistent decline in the model's output quality , on coding tasks, reasoning benchmarks, and general conversational performance. The complaints aren't anecdotal edge cases. Self-designed benchmark tests are showing scores around 40%, a figure that sits uncomfortably below the 63% recorded by OpenAI's ChatGPT 4.5 on similar evaluations.
The timing has set off a familiar debate in AI circles: is this model drift, infrastructure strain, or something more deliberate? The theory gaining the most traction is that Anthropic is reallocating compute resources internally to accelerate training for the rumored Opus 4.7 model, effectively deprioritizing inference quality for the current flagship. It's a practice that, while never officially acknowledged by AI companies, has become an open secret in the developer community whenever a major model upgrade is on the horizon.
A 23-percentage-point gap between Opus 4.6 and ChatGPT 4.5 on user-run benchmarks is significant enough to matter in production environments. Developers building on top of Claude through Anthropic's API are particularly sensitive to this kind of performance variability , when outputs degrade without a changelog or official notice, it breaks assumptions baked into prompts, evals, and automated pipelines. Several users report that tasks reliably handled by Opus 4.6 just weeks ago now require multiple retries or produce outputs that need substantial manual correction.
The benchmarks being cited are admittedly self-designed rather than standardized, which limits direct comparison. But the consistency of the reports across different use cases , from software engineering to long-form reasoning to creative tasks , lends them collective weight. When users running independent tests across varied domains all land near the same conclusion simultaneously, it stops looking like noise.
Anthropic's silence and what it signals
As of publication, Anthropic has not issued any statement addressing the reported degradation. That silence is itself a data point. The company has historically been transparent about model updates through its model cards and release documentation, but interim compute allocation decisions , the kind that might temporarily affect a live model's performance without constituting a formal update , would fall outside that communication cadence. There's no standard industry protocol for disclosing that a model is being run leaner while internal resources shift toward its successor.
This isn't unique to Anthropic. The pattern has surfaced before with other frontier labs, often in the weeks preceding a major model release. Users notice, speculation runs hot, and then the new model arrives and the conversation moves on. The problem is that in the interim, enterprise customers and developers absorbing real productivity costs rarely get a straight answer.
For anyone evaluating AI model providers right now, the episode reinforces a practical lesson: benchmark a model at the point of adoption and again periodically in production. Performance is not a static property of a deployed model, and contracts or integrations built on a model's capabilities at launch may not reflect what that model delivers three or six months later. Anthropic would strengthen trust considerably by publishing even basic compute allocation transparency, especially for paid-tier API users who have a reasonable expectation of consistency.
Watch for an Opus 4.7 announcement in the coming weeks. If the compute-redirection theory holds, that release should be the clearest indicator of whether today's performance dip was temporary and intentional , or something more concerning about where Opus 4.6 is headed.
Also read: China has doubled its AI scientific computing capacity in two months without a single American chip • Allbirds stock surges 700% after the footwear brand bets its survival on biodegradable AI hardware • US lawyers are warning clients that confiding in AI chatbots could become their biggest legal liability