A recent study conducted by Purdue University has shed light on the accuracy and reliability of answers provided by ChatGPT and Stack Overflow in response to software engineering questions.
According to Science X Network, the study is titled "Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions."
ChatGPT Preferred Over Stack Overflow
The study's findings indicate that a substantial proportion of ChatGPT's answers to programming inquiries were incorrect. Interestingly, although participants were asked to compare responses from ChatGPT and Stack Overflow, 40% of them preferred ChatGPT's answers.
The reasoning behind this preference lies in ChatGPT's "comprehensive" and persuasively "articulate language style," which users found appealing.
The study examined 512 responses provided by ChatGPT, and researchers discovered that 52% of these responses were incorrect. Additionally, among the responses favored by participants, a significant 77% were found to be incorrect.
Even in cases where ChatGPT's responses were inaccurate, a surprising 2 out of 12 subjects still favored ChatGPT's answers over those from Stack Overflow. According to Samia Kabir, one of the study's authors, participants often overlooked inaccuracies in ChatGPT's responses if they found the answer insightful.
The confident and articulate manner in which ChatGPT presented information, even if incorrect, seemed to instill user trust and subsequently led them to prefer the wrong answer.
Kabir noted that polite language, articulate explanations, comprehensive coverage of the topic, and an authoritative tone in answers can appear accurate even when they are entirely wrong.
The researchers acknowledged the potential of large language models like ChatGPT to transform the traditional methods of accessing programming information. Platforms like Stack Overflow offer valuable insights from a community of experts but might involve prolonged waiting times for solutions.
In contrast, ChatGPT offers quick responses to complex coding queries, engaging in conversation-like interactions to delve deep into questions.
Prevalence of Erroneous Answers
Nonetheless, the researchers expressed apprehensions regarding the possibility of chatbots spreading inaccurate information, potentially tainting information repositories with incorrect data.
This concern prompted Stack Overflow to implement a ban on responses sourced from ChatGPT earlier in the current year. The researchers identified the prevalence of erroneous answers as concerning.
They recommended that ChatGPT go beyond its concise disclaimer concerning potential inaccuracies and provide a clear indication of the extent of incorrectness and ambiguity in each response.
"Our examination revealed that 52% of ChatGPT's answers contain inaccuracies and 77% are verbose. Nevertheless, users still prefer ChatGPT's responses 39.34% of the time due to their comprehensiveness and articulate language style," the study's abstract reads.
"These findings underscore the need for meticulous error correction in ChatGPT while also raising awareness among users about the potential risks associated with seemingly accurate answers," it added.
The study's findings were published in arXiv.