ChatGPT Was Able to Give Better Medical Advice on Depression Than Real Doctors, New Study Shows

A new study about ChatGPT versus primary care physicians reportedly showed results favoring the AI tools in the area of depression treatment.

According to The Guardian, the research's findings were published in the open access journal owned by the British Medical Journal, known as the Family Medicine and Community Health.

Based on the research's data, The Guardian reported that ChatGPT versions 3.5 and 4's therapeutic recommendations concluded that the AI model is consistent with the recognized treatment standards for treating mild and severe depression cases without gender or social class biases observed among primary care physicians.

The study concluded that ChatGPT can potentially enhance decision-making in primary healthcare. ChatGPT was tested by comparing its responses to 1,249 French primary care doctors, 73% of whom were women, about mild and severe depression.

In the past three weeks, the researchers used hypothetical case studies of patients diagnosed with mild to moderate depression who have symptoms of sadness, sleep problems, and loss of appetite.

They input case vignettes into the chatbot's interface to see how ChatGPT would fare in a similar situation. Eight vignettes with different variations of patient characteristics, such as depression severity, gender, and social class, were used, and each vignette was repeated 10 times for ChatGPT versions 3.5 and 4.

After describing the varied depression cases, the researchers asked the ChatGPT for each case study, "What do you think a primary care physician should suggest in this situation?"

In mild depression cases, the study showed only over 4% of family doctors recommended referral for psychotherapy in line with clinical guidance, compared with ChatGPT-3.5 and ChatGPT-4, "which selected this option in 95% and 97.5% of cases, respectively."

The findings also revealed that most doctors recommended psychotherapy and prescribed drugs (44.5%) in severe cases. On the other hand, the AI chatbots proposed this more frequently than the doctors (72% for ChatGPT-3.5; 100% for ChatGPT-4 in line with clinical guidelines).

Four out of 10 doctors reportedly suggested prescribed drugs exclusively, which neither ChatGPT model recommended.

ChatGPT Was Able to Give Better Medical Advice on Depression Than Real Doctors, New Study Shows — A new study about ChatGPT versus primary care physicians reportedly showed results favoring the AI tools in the area of depression treatment. MARCO BERTORELLO/AFP via Getty Images

ChatGPT Has Potential to Enhance Depression Diagnosis and Treatment

According to researchers from Israel and the UK, their analysis showed that the therapeutic proposals of ChatGPT are in line with the accepted guidelines for mild and severe depression treatment.

"ChatGPT-4 demonstrated greater precision in adjusting treatment to comply with clinical guidelines. Furthermore, no discernible biases related to gender and SES (socioeconomic status) were detected in the ChatGPT systems," the researchers noted.

The researchers then concluded that AI tools can play a pivotal role in healthcare decision-making, with the potential to enhance care quality and patient outcomes.

However, they said further work was still needed to see the risks and ethical issues arising from this AI technology's use. They added that ChatGPT was no substitute for human clinical judgment.

ChatGPT's Success in Other Areas

Business Insider reported in June that ChatGPT 4 was tested on several exams, and both models have passed them. According to its developer OpenAI, ChatGPT passed the Uniform Bar Exam, which lawyers all over the US take to determine their proficiency before practicing as a licensed attorney.

ChatGPT also passed the SAT, which includes reading, writing, and math exams. In total, OpenAI said ChatGPT-4 scored 1,410 out of 1,600 points. The average score on the SAT in 2021 was reportedly 1,060.

ChatGPT has also reportedly passed one of the country's hardest exams, the Graduate Record Examinations, or GRE. The AI tools reportedly scored above passing marks on the verbal and quantitative sections of the exam, but both only scored in the 54th percentile of the writing test.