ChatGPT Is Writing Better School Essays Than Students, Study Finds

ChatGPT has been making waves since 2022, and whether some students like to admit it or not, the AI tool is being employed in the academic setting. Now, researchers compared the quality of essays written by secondary school students with machine-generated content from the language model ChatGPT.

They found, unsurprisingly, that the AI chatbot outperformed students across all criteria, particularly excelling in language mastery, TechXplore reported.

Student-Written Essays vs. AI

ChatGPT faced a setback when version 3.5 failed the Bavarian Abitur, a secondary school test in Germany. However, its successor, version 4, made significant progress and achieved a solid grade nearly six months later, according to researchers from the University of Passau.

The study, titled "A large-scale comparison of human-written versus ChatGPT-generated essays," explored the potential impact of AI-generated content on the education system.

The researchers assessed machine-generated texts and student essays based on the Ministry of Education of Lower Saxony guidelines. Professor Steffen Herbold, chair of AI Engineering at the University of Passau and initiator of the study, expressed surprise at the apparent outcome.

Both versions of the OpenAI chatbot scored higher than students, with GPT-3 ranking in the middle and GPT-4 achieving the best score. This revelation suggests that schools should not ignore the potential of these new AI tools.

The interdisciplinary study involved collaboration between computer scientists and experts in computer linguistics and computer science education. Ute Heuer, a computer science didactician, emphasized the importance of preparing teachers for the challenges and opportunities presented by the increasing availability of artificial intelligence models.

In March, Heuer initiated a training course titled "ChatGPT-Opportunity and Challenge," attended by 139 teachers, mostly from German gymnasiums. The teachers were introduced to the technological concepts behind general text generators and ChatGPT before evaluating English-language texts without knowing their origin.

Teachers used the Ministry of Education of Lower Saxony grading scales to evaluate essays based on criteria like topic, completeness, logic, vocabulary, complexity, and language mastery. GPT-4 scored 5.25, GPT-3 scored 5.03, while students averaged 3.9 in language mastery, as per the assessment by 111 participating teachers.

AI's Language Mastery

Annette Hautli-Janisz, Junior Professor of Computational Rhetoric and Natural Language Processing at the University of Passau, noted that the high scores achieved by the machine don't imply poor English language skills in students. Instead, it showcases the exceptional performance of the AI models.

From a linguistic perspective, Hautli-Janisz and doctoral student Zlata Kikteva analyzed the texts and gained insights into the language development of machine models.

The study not only demonstrates the improvement of the models over time but also raises intriguing questions about the potential impact of AI-generated language on human communication.

As AI-generated texts become more prevalent, the study suggests a need to consider how exposure to machine-generated language may influence and shape human language in the future. The research team's findings were published in Scientific Reports.