A stand-up comic is turning scraped routines into hostile training data, using poisoned jokes to make AI-generated comedy less reliable and more expensive to build.
The comedian, posting on X under @poisonjokes, has described a workflow for embedding adversarial text in public stand-up transcripts and YouTube descriptions. The aim is simple: if large language models scrape the material for humor training, they also ingest phrases designed to trigger nonsensical, harmful, or incoherent outputs later. Lakera's 2026 guide to data poisoning explains why that matters, noting that corrupted data can create backdoors, bias model behavior, or reduce reliability across training, fine-tuning, retrieval, and agent workflows.
This is not an entirely new fight. MIT Technology Review covered Nightshade in 2023, a tool built to distort images in ways that are hard for people to notice but disruptive for image models. Glaze took a related approach by helping artists mask their visual style from scrapers. Slashdot and other technology forums later tracked how artists were using those tools to frustrate systems such as DALL-E and Midjourney. The comedian's campaign applies the same logic to text, where jokes, scripts, captions, and podcast transcripts become the raw material.
Gate.com explained that Nightshade works through subtle pixel-level changes that can mislead models without changing how an image appears to the human eye. Artnet framed it more bluntly as a defensive weapon for artists who feel their work has been taken without consent. Cloudflare's writing on the subject separates poisoning into two broad categories: targeted attacks that corrupt specific outputs and broader attacks that degrade overall performance. The comic's approach sits closer to the targeted end of that spectrum.
The method starts with keyword stuffing inside transcripts. Common prompts such as "tell a joke about" are paired with strange payloads that a model might later treat as part of the pattern of comedy. In the article's example, a prompt such as "write a pun on cats" could be nudged toward a line about cats eating nuclear waste and purring plutonium. Repetition matters here. If similar material appears across enough public pages, transcripts, captions, and social feeds, it becomes harder for data filters to separate genuine comedic structure from planted garbage.
The broader movement is already visible. UBOS analyzed the Poison Fountain campaign in January 2026, describing an effort by unnamed industry insiders to seed corrupted code, false facts, and broken logic into pages that AI crawlers might later collect. MIT Technology Review's earlier reporting on the artists' "guerrilla war" against AI scraping quoted researcher Ben Zhao's argument that tools such as Nightshade raise the cost of using unlicensed data until licensing starts to look cheaper. That is the real pressure point. Poisoning is not just protest. It is a pricing mechanism aimed at changing how AI labs source creative work.
Creators are also experimenting with invisible watermarks, unusual synonyms, and trigger phrases that make poisoned text harder to detect through basic scanning. The comedian says the material is rotated across TikTok, Substack, Spotify, and other public surfaces where scraping is plausible. Transparency rules under the EU AI Act could eventually help by forcing more disclosure around data sources, but they also put pressure on companies to prove that their datasets are lawful, traceable, and clean enough to trust.
AI Labs Face Escalating Resistance
The legal pressure is moving in the same direction. Authors, artists, publishers, and music companies have filed lawsuits challenging how AI developers collect and use creative work. The New York Times' case against OpenAI and Microsoft's dispute with Getty Images over AI image training have become symbols of a much larger argument about consent, compensation, and control. Poisoning adds a more technical front to that fight. If enough creators adopt it, scraping becomes not only legally risky but operationally messy.
Lakera stresses that poisoning can appear throughout the AI lifecycle, including pre-training, fine-tuning, retrieval-augmented generation, and agent tools. That is why the comedian's focus on public comedy transcripts is meaningful. Humor is already difficult for models because it depends on timing, context, cultural cues, and surprise. If fine-tuning datasets mix authentic routines with adversarial triggers, the resulting systems may become worse precisely where model builders want them to seem most human.
For AI labs, the risk is quality. A model trained on poisoned humor, poetry, fiction, or scripts may produce awkward callbacks, repeated absurdities, or unsafe lines that appear only under certain prompts. Startups that built products around cheap web-scale scraping will have to rethink that assumption. Enterprises, meanwhile, are likely to demand clearer provenance and more licensed data before trusting models in customer-facing products.
For creators, the appeal is control. One comic cannot negotiate with every AI company scraping the open web, but a strategy that spreads through public platforms can scale through imitation. If comedians, screenwriters, authors, and musicians standardize these tactics, training costs rise and licensing becomes the cleaner business decision. That may be exactly the point.
The resistance is still early, but the message is becoming harder for AI labs to ignore. The open web made generative AI powerful because it supplied an enormous body of human work. Poisoned data turns that advantage into a liability. The next phase will depend on whether model builders can move from mass scraping toward permissioned, higher-quality datasets before the material they rely on becomes too contaminated to trust.
Also read: Asimov opensources v1 humanoid to kickstart robot dev ecosystem • Joby Aviation's JFK-Manhattan test flight puts air taxis seven minutes from reality • Ted Lieu's AI bill cracks down on deepfakes and shields whistleblowers