A recent study conducted by computer scientists at Purdue University has highlighted concerns regarding the accuracy of responses provided by OpenAI's chatbot ChatGPT to computer programming questions.
The findings shed light on the inaccuracies in ChatGPT's responses when queried with programming-related inquiries.
Responses of ChatGPT to Computer Programming Questions
The study, presented at the Conference on Human Factors in Computing Systems (CHI 2024), aimed to evaluate ChatGPT's reliability in providing accurate answers to programming questions.
With the increasing popularity of language models like ChatGPT among programming students seeking assistance with code writing and conceptual understanding, the researchers sought to assess the model's effectiveness in delivering reliable information.
The researchers used questions sourced from the StackOverflow website, a platform widely utilized by programmers for knowledge sharing and problem-solving. They subjected ChatGPT to 517 programming queries.
The questions encompassed various topics and complexities commonly encountered in programming practice. The researchers analyzed ChatGPT's responses to each query, evaluating the answers' correctness, consistency, comprehensiveness, and conciseness.
Surprisingly, the study revealed that ChatGPT's performance in providing accurate responses to programming questions was subpar, with the model delivering correct answers only 52% of the time. The user study also revealed that despite ChatGPT's shortcomings in accuracy, users preferred its answers in 35% of cases.
This preference was attributed to the perceived comprehensiveness and articulate language style of ChatGPT's responses. However, the team also found that users overlooked the chatbot's incorrect answers 39% of the time.
"This implies the need to counter misinformation in ChatGPT answers to programming questions and raise awareness of the risks associated with seemingly correct answers," the study noted.
Persuasion of ChatGPT
The linguistic analysis findings indicate that ChatGPT's responses exhibit a formal tone and seldom express negative sentiments.
Despite the user study revealing a greater preference and quality rating for human-generated answers, users occasionally make errors by favoring inaccurate responses from ChatGPT due to its polished language style and the perceived correctness of its positive assertions.
Given ChatGPT's propensity for providing incorrect answers, the study underscores the importance of exercising caution and awareness when relying on its responses for programming tasks. Moreover, the research aims to stimulate further investigation into identifying and addressing various conceptual and factual errors.
Ultimately, the researchers anticipate this study will prompt additional research into enhancing transparency and communication regarding inaccuracies in machine-generated responses, particularly within the programming domain.
ChatGPT has been wildly popular since its introduction in 2022. However, its popularity also uncovered issues in its responses to user queries, particularly its tendency to "hallucinate" or provide false information.
OpenAI acknowledged this tendency and provided a disclaimer that the AI chatbot can "make mistakes" and advises users to check important information.
Related Article : Radio Host Sues OpenAI After ChatGPT 'Hallucinates' Legal Information