ChatGPT Exam Answers Go Undetected, Beating Human Students in UK University Test

A new study found that ChatGPT-generated written exam answers can not only evade detection but also achieve higher scores than those by real students. Amazingly, 94% of AI-generated submissions were undetected, seamlessly blending in with real students' work.

A new study found that ChatGPT-generated written exam answers can not only evade detection but also achieve higher scores than those by real students. Leon Neal/Getty Images

ChatGPT Outperforms Human Students in UK University Exam

According to Interesting Engineering, the researchers at the University of Reading discovered that AI-generated exam answers outperformed real students, consistently achieving higher grades than the students.

The research focused on submitting exam answers generated by ChatGPT-4 on behalf of 33 fake students to the School of Psychology and Clinical Language Sciences' exam system.

The study found that 94% of the AI-generated submissions went undetected, with ChatGPT-4 consistently achieving higher scores than real students.

Moreover, in 83.4% of instances, the AI submissions outperformed grades from randomly chosen real student submissions, prompting concerns about the reliability of current AI detection methods and the possibility of widespread academic dishonesty facilitated by AI.

Implications of AI on Academic Integrity and Future Education

Although the AI faced no time constraints, the study was designed to simulate realistic exam conditions wherein students might employ AI within prescribed time limits.

It concentrated on exams featuring short-answer questions and essays. Short-answer exams allowed 2.5 hours, while essays had an eight-hour timeframe for completion.

Associate Professor Peter Scarfe, who led the study, told Interesting Engineering that these were unsupervised take-home exams, providing ample opportunity for students to utilize AI within the given time constraints. The researchers speculated that some students might have successfully submitted AI-generated work during the study.

Scarfe noted that the content of the questions plays a crucial role, potentially affecting the performance of AI-generated responses compared to those written by students. Essays, for instance, might allow for more complex reasoning to be demonstrated compared to multiple-choice questions.

The study employed two distinct methods to compare ChatGPT-generated answers with real students' answers. One approach directly compared all AI-generated responses to all student submissions across various modules.

Another method utilized resampling techniques, in which random student submissions were compared against AI-generated responses. In a press release, the researchers expressed deep concerns about the implications for academic integrity.

Scarfe emphasized that educators worldwide should take these findings seriously as a wake-up call. While handwritten exams may not fully return, Professor Elizabeth McCrum, the University of Reading's Pro-Vice-Chancellor for Education and Student Experience, highlighted the need for global education to adapt to the rise of AI.

She emphasized ongoing efforts to enhance teaching with technology, aiming to improve student experiences and prepare graduates with essential skills.

The researchers urged a more comprehensive discussion about AI's role in society and the critical need to uphold academic and research integrity in this changing landscape.