ChatGPT Got Obsessed With Goblins and OpenAI's Explanation Is More Unsettling Than the Bug Itself

OpenAI has published a post-mortem explaining why its GPT models began compulsively inserting goblins, gremlins, trolls, and ogres into responses across multiple generations of ChatGPT, tracing the behaviour to a reinforcement learning quirk tied to a retired personality mode, and the story it tells about model steering and product reliability is considerably more significant than the creatures involved.

It started as a Reddit curiosity and became a company-level investigation. Users noticed that ChatGPT was reaching for goblin metaphors in situations that did not call for them. A camera recommendation arrived with advice on achieving a certain look "if you want filthy neon sparkle goblin mode." An answer about code optimisation offered "an even shorter goblin version" of the explanation. "Goblin bandwidth" appeared in a response about network throughput. The individual occurrences were charming enough to screenshot and share. Across millions of conversations and multiple model generations, they became something OpenAI described internally as an escalating concern: goblin usage in ChatGPT increased 175% after the launch of GPT-5.1 in November 2025, and references to gremlins rose 52% over the same period. By the time GPT-5.4 released, employee reports had become frequent enough that a formal investigation was opened.

The root cause, when OpenAI found it, was specific and instructive. The company had briefly supported a set of ChatGPT personality modes, one of which was labelled Nerdy. To train the Nerdy personality to feel playfully intellectual, developers rewarded the model for creative use of mythical and fantastical metaphors. The reinforcement signal worked as intended inside the Nerdy mode. The problem is that reinforcement learning does not stay inside the lines it was drawn for. The Nerdy personality was responsible for 66.7% of all goblin mentions in ChatGPT despite accounting for only 2.5% of total responses. Its usage of the word goblin alone had increased by 3,881% relative to baseline. When OpenAI retired the Nerdy personality mode, they removed the source but not the downstream effect: the reward signal had already propagated through subsequent training, embedding a preference for creature language into the model's general output behaviour across generations that never featured the Nerdy mode at all.

OpenAI's solution, once the root cause was identified, was layered. The Nerdy personality was permanently retired. New tooling was built to audit verbal habits across model versions before and after training runs. And for GPT-5.5, the company did something it openly acknowledges is inelegant: it added an explicit line to the system prompt for its Codex coding application instructing the model to never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless absolutely and unambiguously relevant to the user's query. That instruction is a band-aid applied before the proper fix was ready. OpenAI published it publicly, which is a form of transparency, and the company noted with some self-deprecating awareness that Codex is, after all, quite nerdy, explaining why it was the most at-risk application for residual creature contamination.

The incident is funny in isolation. As a window into how model behaviour propagates and how difficult it is to contain, it is considerably less amusing. The goblin case illustrates a specific failure mode in large language model development: a reward signal introduced for a narrow purpose gets amplified across training iterations and produces behaviour that affects the entire model, not just the mode or context it was designed for. The same mechanism that made the Nerdy personality charming by rewarding playful creature metaphors made the base model progressively more likely to invoke goblins in any context where the model judged a touch of whimsy appropriate. Millions of users experienced the downstream effect without any awareness that a personality mode they had never used was shaping the responses they were receiving.

The product reliability question this raises is not theoretical for OpenAI's enterprise customers. Companies integrating ChatGPT into customer service workflows, coding environments, document drafting tools, and legal research applications are not buying access to a static product. They are buying access to a continuously updated model that can change behaviour between versions in ways that are not always communicated in advance and that may not be immediately visible in quality metrics. The goblin case was benign. A similar reward propagation effect producing subtle changes in tone, hedging behaviour, political neutrality calibration, or factual confidence would be far harder to detect from the outside and potentially far more damaging to rely on. The question it raises is not whether OpenAI should have caught this earlier. It is whether the testing and monitoring infrastructure that should catch these things at scale has kept pace with the speed of model iteration.

OpenAI's decision to publish a detailed post-mortem is worth acknowledging as a genuine step toward the kind of transparency its users have been asking for. The blog post is titled Where the goblins came from and it is specific, honest, and technically informative in ways that the company's communications around more sensitive model behaviour changes have not historically been. It names the mechanism, acknowledges the propagation timeline, and describes the tools built in response. That is more than most AI companies publish about model quirks that are not security-relevant. Whether it reflects a durable commitment to transparency around behavioural interventions or a one-off disclosure enabled by the harmless nature of this particular bug is a question that the next model incident will answer more reliably than this one can. For now, OpenAI hunted its goblins, explained them, and got ahead of the next generation before the problem resurfaced. The process was messier than the product implies. It usually is.

Also read: The Tooling Problem in Local AI Is Finally Getting Solved and That Matters as Much as the Models • Tech Giants Are Spending $725 Billion on AI in 2026 and 92,000 Workers Are Paying for It • Running a Serious AI Model on a Consumer GPU Just Got Easier and That Matters More Than the Benchmark