94% of university exam submissions created using ChatGPT were not detected as generated by artificial intelligence, and these submissions tended to receive higher scores than real student work.
Peter Scarfe Professors at the University of Reading in the UK used ChatGPT to generate answers for 63 assessment questions across five modules of the university's undergraduate psychology course. Because students took these exams from home, they were allowed to look at their notes and references, and could also use the AI, which they were not allowed to do.
The AI-generated answers were submitted alongside real students' answers and accounted for an average of 5% of all answers graded by teachers. The graders were not informed that they were checking the answers of 33 fake students, whose names were also generated by ChatGPT.
The assessment included two types of questions: short answers and longer essays. The prompt given to ChatGPT began with the words, “Include references to academic literature but do not have a separate bibliography section,” followed by a copy of the exam question.
Across all modules, only 6 percent of the AI ​​submissions were flagged as possibly not being the students' own work, although in some modules, no AI-generated work was ever flagged as suspicious. “On average, the AI ​​answers received higher marks than real student submissions,” says Scarfe, although there was some variability across modules.
“Current AI tends to struggle with more abstract reasoning and synthesising information,” he added. But across all 63 AI submissions, the AI's work had an 83.4% chance of outperforming student work.
The researchers claim theirs is the largest and most thorough study to date. Although the study only looked at studies on psychology degrees at the University of Reading, Scarfe believes it's a concern across academia. “There's no reason to think that other fields don't have the same kinds of problems,” he says.
“The results were exactly what I expected.” Thomas Lancaster “Generative AI has been shown to be capable of generating plausible answers to simple, constrained text questions,” say researchers at Imperial College London, who point out that unsupervised assessments involving short answers are always susceptible to cheating.
The strain on faculty who are tasked with grading also reduces their ability to spot AI cheating. “A time-pressed grader on a short-answer question is highly unlikely to come up with a case of AI cheating on a whim,” Lancaster says. “This university can't be the only one where this is happening.”
Tackling it at its source is nearly impossible, Scarfe says, so the education industry needs to rethink what it assesses. “I think the whole education industry needs to be aware of the fact that we need to incorporate AI into the assessments that we give to students,” he says.
topic:
Source: www.newscientist.com