OpenAI's ChatGPT has emerged as a popular tool for various uses in the rapidly evolving artificial intelligence landscape.
However, a recently published Purdue University study sheds light on a critical element of ChatGPT's performance that deserves attention: its accuracy in answering software engineering questions.
The study, titled "Who Answers It Better? An In-Depth Analysis of ChatGPT and Stack Overflow Answers to Software Engineering Questions," delves deep into the quality and usability of ChatGPT's responses, uncovering some intriguing and, at times, problematic findings.
Exposing ChatGPT With Programmer Questions
The Purdue team meticulously examined ChatGPT's answers to 517 questions sourced from Stack Overflow, a well-known Q&A platform for programmers.
The assessment spanned various criteria, including correctness, consistency, comprehensiveness, and conciseness. The results were both enlightening and concerning.
ChatGPT answered approximately 52% of software engineering questions incorrectly, raising significant questions about its accuracy and reliability as a programming resource.
The study unveiled another interesting aspect of ChatGPT's behavior: verbosity. A staggering 77% of ChatGPT's responses were deemed excessively wordy, potentially impacting the clarity and efficiency of its solutions.
However, amid these inaccuracies and rambling words, users surprisingly continued to prefer ChatGPT's responses 39.34% of the time. As the study reveals, this preference is attributed to ChatGPT's comprehensive and well-articulated language style.
Read Also: Google's Med-PaLM 2 Deployment Draws Scrutiny from US Senate Over Healthcare AI Risks
Moreover, the research highlighted a distinctive trait of ChatGPT's approach - a propensity for conceptual errors. The model seems to struggle to grasp the underlying context of questions, leading to a higher frequency of errors stemming from a lack of conceptual understanding.
Even when an answer contained glaring inaccuracies, participants in the study often marked the response as preferred, indicating the influence of ChatGPT's polite, authoritative style.
However, the authors acknowledge ChatGPT's limitations, particularly regarding reasoning. The model often provides solutions or code snippets without clearly understanding their implications, hinting at the challenge of incorporating reasoning into language models like ChatGPT.
A Closer Look
As News18 reports, the Purdue study also delved into the linguistic and sentiment aspects of ChatGPT's responses.
Surprisingly, the model's answers exhibited more formal language, analytic thinking, and positive sentiments compared to responses from Stack Overflow.
This inclination towards positivity might contribute to user trust in ChatGPT's answers, even when they contain inaccuracies.
What This Study Holds
The implications of this study extend beyond the confines of ChatGPT's performance. The observed decline in usage of traditional platforms like Stack Overflow suggests that ChatGPT's popularity is altering the landscape of seeking programming assistance online.
In response to these findings, the researchers offer valuable recommendations. Platforms like Stack Overflow could benefit from enhancing the detection of negative sentiments and toxicity in answers and providing more precise guidelines for structuring answers effectively.
The study emphasizes that while ChatGPT can be useful, users should be aware of the potential risks associated with seemingly accurate answers.
Stay posted here at Tech Times.