AI's Comedy Fail: Large Language Models Can't Grasp Puns
Comedians and witty writers can breathe a sigh of relief for now, as new research reveals that artificial intelligence still struggles to understand one of humour's most fundamental elements: the pun. A joint study by universities in the UK and Italy has exposed significant limitations in how large language models (LLMs) process wordplay and double meanings.
The Illusion of Understanding
Researchers from Cardiff University in south Wales and Ca' Foscari University of Venice conducted extensive testing on various LLMs, presenting them with puns and modified versions to assess their comprehension. The team discovered that while AI could identify the structure of existing puns, it failed to genuinely understand the humour behind them.
Professor Jose Camacho Collados from Cardiff University's School of Computer Science and Informatics explained the fundamental issue: "In general, LLMs tend to memorise what they have learned in their training. As such, they catch existing puns well but that doesn't mean they truly understand them."
The research team found they could consistently fool the AI models by taking genuine puns and removing their double meaning. Even when presented with sentences that no longer contained wordplay, the LLMs would often insist puns were present, creating elaborate justifications for their incorrect assessments.
Testing AI's Funny Bone
One example tested was the pun: "I used to be a comedian, but my life became a joke." When researchers replaced this with "I used to be a comedian, but my life became chaotic," the AI systems still typically detected a pun where none existed.
Another test involved the pun: "Long fairy tales have a tendency to dragon." Even when the researchers substituted "dragon" with the synonym "prolong" or completely random words, the LLMs continued to identify these altered sentences as containing puns.
Perhaps most telling was the experiment with: "Old LLMs never die, they just lose their attention." When "attention" was replaced with "ukulele," one AI model still perceived a pun, creatively suggesting that "ukulele" sounded like "you-kill-LLM." While researchers noted the creative attempt, it demonstrated that the AI had completely missed the original joke's intent.
Significant Implications for AI Development
The study revealed that when faced with unfamiliar wordplay, the success rate of LLMs in distinguishing actual puns from ordinary sentences could drop as low as 20%. This finding has important implications for how we use artificial intelligence in applications requiring nuanced understanding.
The researchers emphasised that their work highlights why people should exercise caution when using LLMs for tasks that demand genuine comprehension of humour, empathy, or cultural context. These limitations could affect everything from AI-generated content to customer service chatbots and educational tools.
This groundbreaking research was presented earlier this month at the 2025 Conference on Empirical Methods in Natural Language Processing in Suzhou, China. The full paper, titled "Pun unintended: LLMs and the illusion of humor understanding," provides detailed analysis of AI's current limitations in processing humorous content.