AI Struggles with Humor: Study Reveals Limitations in Understanding Puns

Recent investigations into AI reveal that comedians and writers who excel at clever wordplay might find temporary solace.

Researchers from institutions in the UK and Italy have been exploring the capacity of large-scale language models (LLMs) to comprehend puns, only to discover significant gaps in their understanding.

A team from Cardiff University in South Wales and Ca’ Foscari University of Venice found that while LLMs could identify the structure of a pun, they struggled to grasp its humor.

For instance, they examined the statement, “I used to be a comedian, but my life became a joke.” Even after substituting it with “I used to be a comedian and my life became a mess,” LLMs still acknowledged the presence of puns.

Another example tested was: “long fairy tales have a tendency to dragonify.” When “dragon” was swapped with its synonym “extension” or any arbitrary word, LLMs erroneously assumed a pun was present.

Professor Jose Camacho Collados, associated with Cardiff University’s School of Computer Science and Informatics, suggested that the research indicates a fragile understanding of humor by LLMs.

“Essentially, LLMs tend to retain information from their training, allowing them to recognize established puns, but that doesn’t equate to true understanding,” he remarked.

“We consistently managed to mislead the LLM by altering existing puns and stripping away the double meanings integral to the original humor. In these scenarios, the model would draw connections to prior puns and create various justifications for its conclusions. Ultimately, we determined that the model’s interpretation of puns was merely an illusion.”

The findings indicated that LLMs’ accuracy in differentiating between pun and non-pun sentences could dip to 20% when encountering unfamiliar wordplay.

Another pun tested was: “Old LLM never dies, it just loses attention.” Even when attention shifted to “ukulele,” the LLM still identified it as a pun, as “ukulele” bore a slight resemblance to “you-kill-LLM.”

Skip past newsletter promotions

The team was impressed by the creativity displayed, yet the LLM still failed to appreciate the humor.

The researchers emphasized that their findings underscore the need for caution when utilizing LLMs for tasks that involve humor, empathy, and an understanding of cultural subtleties.

Their research was showcased at the 2025 Conference on Empirical Methods in Natural Language Processing in Suzhou, China, earlier this month, and is documented in a paper titled Unintentional pun: LLM and the illusion of understanding humor.

Source: www.theguardian.com

Undeniable Wit and Heartfelt Puns: Are Cryptic Crosswords AI’s Final Challenge?

The Times organizes a yearly crossword-solving competition, which will continue until the Guardian establishes its own high standard.

This year’s participants included dogs. Among them was Ross, a cheerful coffee-drinking dog depicted in the Crossword Genius smartphone app.

Human contestants at the event, held in London near the Shard at the Times’ parent company News UK, were remarkably quick, swiftly filling in clues before moving on. Can AI outsmart us humans?

For now, humans still have the upper hand. Ross “surrendered” when Mark Goodliffe, the reigning champion, signaled the end of the battle.

Serial crossword solver Mark Goodliffe competing in the Sudoku Championship. Photo: Terry Pengilly

This was an unexpected turn of events. Ross must have figured it out…

1ac Completely disenfranchised MPs expelled by the Liberal Party (9)

… Replace MP in IMPLICITLY (a synonym for “absolutely” in the clue) with L ILLICITLY (“without authority”) in the solution. Some human contestants were still debating between adjective, adverb, or MP for the answer. Ross seems to “know” almost everything.

But here’s where Ross is stumped.

13th A fundamental review of motorsports image (9)

Radicals are sometimes portrayed as FIREBRAND, or as setters might say, F1 RE-BRAND. This clue stands out from the rest, almost like a joke. It’s a human touch that AI struggles with. The question remains, “Have we seen this before?”

Introducing the setter, Paul. Photo: John Halpern

This was a unique clue from the Times. It’s interesting how AI humorously confronted Paul, asking, “Picnicker, does that sound like art thieves?”

For now, that human connection from setters acknowledging, “Yes, I’ve been there,” is something we as humans need to appreciate.

Instead of identifying objects, online security could focus on deciphering cryptic clues with clever wordplay. Guardian setters are ready.

(Full disclosure: I was involved in testing some of the puzzles with an earlier version of Ross. I developed a fondness for Ross and was curious if clues allowed for multiple interpretations. Sometimes we use “he” for confirmation.)

Thank you to all the contributors at the clue conference for STOKES. The runner up had a clever clue involving “Runs!” leading to the England captain. The winning clue creatively used “Loads Tinder, fingers right Swipe to.”

Kudos to Danat. Share your entries below for the next challenge: How do you clue PUNNY?

Source: www.theguardian.com