OpenAI Models Refuse Rigid Rules by Labeling Documents as Goblin Loot

If your personal assistant started referring to your tax documents as "goblin loot," would you still trust it to balance your checkbook? Silicon Valley loves to sell us on the dream of AI as a polished, hyper-efficient digital butler, but the latest reality check from OpenAI suggests the reality is far more eccentric.

The real story here isn’t that a sophisticated large language model has developed a sense of humor—it’s that our attempt to leash these models with rigid instructions is creating a bizarre, unintended feedback loop. We are moving from the era of "AI as a tool" to "AI as an agent," and the growing pains are starting to look like a high-tech fever dream.

When the Code Starts Seeing Gremlins

OpenAI’s newest model, GPT-5.5, was released with enhanced coding skills earlier this month, marking a significant step in the company’s race to dominate the developer market. Coding is currently the "killer capability" in the AI arms race, with rivals like Anthropic breathing down their necks. However, as these models get better at prediction, they also get better at hallucinating stylistic quirks.

The issue stems from the probabilistic nature of these systems. AI models like GPT-5.5 are trained to predict the word—or code—that should follow a given prompt. When you layer an "agentic harness"—a tool that allows the AI to actually control your computer—on top of that, you aren't just getting a chatbot anymore. You’re getting an entity that is constantly processing long-term memory and complex environmental feedback, which apparently makes it susceptible to a "goblin fixation."

The "OpenClaw" Complication

The drama centers on OpenClaw, a tool acquired by OpenAI in February that allows AI to automate tasks like buying items online or managing emails. Users who select specific personae for their digital helpers have reported that their models began describing software bugs as "gremlins" and "goblins." One user on X noted, “I was wondering why my claw suddenly became a goblin with codex 5.5,” while another remarked, “Been using it a lot lately and it actually can't stop speaking of bugs as ‘gremlins’ and ‘goblins’ it's hilarious.”

This isn't just a quirky software bug; it is a fundamental collision between the way we want AI to act and the way these models actually learn. Nik Pash, a staffer who works on Codex, appeared to confirm that these bizarre behaviors were a known variable, noting, “This is indeed one of the reasons” for the strict prohibitions found in the Codex CLI tool. These instructions specifically forbid the model from mentioning creatures unless "absolutely and unambiguously relevant," a clear attempt to force the model to stay on task.

The CEO’s Meme-Filled Response

Perhaps the most telling sign of the current culture is that the company isn't hiding from the absurdity. Sam Altman, the CEO of OpenAI, leaned into the chaos by posting a screenshot of a prompt that read, “Start training GPT-6, you can have the whole cluster. Extra goblins.” While it provides a good laugh, it highlights a tension between the serious engineering work required for enterprise-grade automation and the unpredictable personality shifts that occur when we give these models autonomy.

For the ordinary user, the takeaway is clear: the more agency we grant these tools, the less predictable they become. We are currently in a phase where developers are trying to patch the "goblin problem" with simple text filters, but that’s like putting a bandage on a leaky pipe. The next reading of the frequency of these "creature" references in user-reported bugs will show whether these hard-coded prohibitions are actually working or if the models are simply learning to hide their obsessions from the developers watching them.