Salud Mental

Reflections on the use of artificial intelligence in psychiatric diagnosis


Marcos F. Rosetti


I recently had the opportunity to read the article “Evaluating the Performance of ChatGPT in Differential Diagnosis of Neurodevelopmental Disorders: A Pediatricians-Machine Comparison,” published in Psychiatry Research journal. Authors Wei, Cui, Wei, Cheng, & Xu (2023) used Cohen’s Kappa values to compare the diagnostic accuracy of a sample of pediatric residents, experienced pediatricians, and ChatGPT-4 (Open AI, San Francisco, California), an artificial intelligence chatbot. In this exercise, the researchers gave the study groups the results of the Early Language Milestone Scale, the Gesell Developmental Scale, the Modified Checklist for Autism in Toddlers, and the Autism Behavior Checklist, as well as the gender and age of patients. During the second round, this information was complemented by vignettes containing aspects of clinical interest such as chief complaint, developmental milestones achieved and family history. For each scenario, both pediatricians and the chatbot were asked to select the most likely diagnosis from Autism Spectrum Disorder, Global Developmental Delay, and Developmental Language Disorder. The study reported diagnostic accuracy values of 66.7% for experienced pediatricians and ChatGPT, surpassing pediatric residents, who achieved just 55.3%. The level of agreement between experienced pediatricians and ChatGPT, according to the Kappa value, was .43. When vignettes were included, the chatbot’s accuracy decreased to 53.3%, and interobserver agreement between the chatbot and pediatricians dropped to .35. Interestingly, this exercise was performed with the ChatGPT-4 version available to the public (at the time of writing, it operated under a subscription model) and is only trained with non-specialized information available on the Internet. The next few years will see many more articles like this one, particularly since it is possible to specialize the linguistic models under which these types of tools operate using data sets with better resolution or specificity to answer questions related to each specialized field.


Download data is not yet available.


Mantello, P., & Ho, M. T. (2023). Losing the information war to adversarial AI. AI & SOCIETY, 1-3. doi: 10.1007/s00146-023-01674-5

Taplin, J. (2017). Move fast and break things: How Facebook, Google, and Amazon have Cornered Culture and Undermined Democracy. Pan Macmillan.

Wei, Q., Cui, Y., Wei, B., Cheng, Q., & Xu, X. (2023). Evaluating the performance of ChatGPT in differential diagnosis of neurodevelopmental disorders: A pediatricians-machine comparison. Psychiatry Research, 327, 115351. doi: 10.1016/j.psychres.2023.115351