Summary: A new study has found that analyzing a person’s word choice can predict worsening symptoms of major depressive disorder. Researchers used human evaluators and ChatGPT to assess written responses, finding that both could accurately predict depression severity weeks later.
While traditional language analysis tools like LIWC fell short, ChatGPT successfully captured the emotional tone through word order and phrase meaning. This finding could pave the way for AI-assisted mental health evaluations, offering clinicians new tools for diagnosing and predicting mental health outcomes.
Key Facts:
- ChatGPT and human raters accurately predicted future depression symptoms.
- Traditional word-count tools like LIWC were less effective at prediction.
- AI language analysis may enhance clinicians’ ability to assess mental health.
Source: Yale
A person’s choice of words can be predictive of worsening symptoms of major depressive disorder, a new Yale study finds.
Using both human evaluators and the large language model ChatGPT, researchers demonstrated that written responses to open-ended questions could be used to predict who would experience worse symptoms of depression weeks later.
The findings, reported Sept. 16 in the Proceedings of the National Academy of Sciences, suggest automated procedures that can assess language use might complement and enhance psychological evaluations.
A growing body of research has uncovered a link between depression and the language a person uses. People with depression use more negative emotional words on social media and in text messages, for instance. And word choice is associated with how well individuals respond to treatment.
For this study, Yale researchers wanted to explore whether language might also yield insight into someone’s future symptoms. To better understand this, they asked 467 participants to complete nine open-ended, neutral short-answer questions and the Patient Health Questionnaire (PHQ-9), which assesses depression severity. Three weeks later, all participants completed the PHQ-9 questionnaire again.
Using a tool called Linguistic Inquiry and Word Count (LIWC) — which can calculate how many words fall into a particular category — the researchers identified how many words in the participants’ written responses to the short-answer questions had a positive or negative emotional tone.
While LIWC scores were associated with depression severity at the time participants answered the questions, they did not predict depression severity three weeks later, researchers found.
Sentiment scores given by human raters, on the other hand, did predict future depression symptoms.
“This told us that human raters were picking up on something that just counting emotion words could not,” said Robb Rutledge, an assistant professor of psychology in Yale’s Faculty of Arts and Sciences and senior author of the study.
LIWC treats each word individually, which may be why it falls short in this particular application, said the researchers.
“We wanted to look at word order and the multidimensional aspect of language central to shaping emotional tone,” said lead author Jihyun Hur, a Ph.D. student in Rutledge’s lab and the lab of coauthor Jutta Joormann, the Richard Ely Foundation Professor of Psychology.
“That’s when we got interested in ChatGPT.”
ChatGPT is an artificial intelligence tool that aims to mimic human conversational speech. Therefore, word order and the meaning within and between phrases are taken into account in a way that standard tools for analyzing language, like LIWC, do not.
When the researchers instructed ChatGPT versions 3.5 and 4.0 to rate the positive and negative tone of the participants’ responses, the scores predicted future changes in depression severity much like the human raters’ scores.
Researchers say the finding is a starting point that lays a foundation for additional research. Rutledge and his team, for example, are interested in how this approach might be applied to other psychiatric disorders and across longer time periods.
This line of work is part of the lab’s ongoing research into the relationship between emotion and decision-making, which anyone can participate in by playing the games in the lab’s free smartphone app Happiness Quest.
Rutledge said he can see this type of language assessment being a useful addition to the clinician’s toolbox in the future.
“Analysis of the language people use offers extra information that clinicians currently don’t have, and our approach could be one way clinicians evaluate their patients,” said Rutledge.
“You want a combination of tools that work across lots of people, which together can give you a snapshot of an individual. If some of those tools are automated like this, that frees up the clinician to spend more time trying to help the patient.”
And ultimately, a better understanding of symptoms and how to predict them would be beneficial.
“Artificial intelligence tools like ChatGPT open up a new way to use the great deal of language data already available in the clinical setting to better understand mental health,” said Hur.
About this AI and depression research news
Author: Bess Connolly
Source: Yale
Contact: Bess Connolly – Yale
Image: The image is credited to Neuroscience News
Original Research: Open access.
“Language sentiment predicts changes in depressive symptoms” by Jihyun Hur et al. PNAS
Abstract
Language sentiment predicts changes in depressive symptoms
The prevalence of depression is a major societal health concern, and there is an ongoing need to develop tools that predict who will become depressed. Past research suggests that depression changes the language we use, but it is unclear whether language is predictive of worsening symptoms.
Here, we test whether the sentiment of brief written linguistic responses predicts changes in depression.
Across two studies (N = 467), participants provided responses to neutral open-ended questions, narrating aspects of their lives relevant to depression (e.g., mood, motivation, sleep).
Participants also completed the Patient Health Questionnaire (PHQ-9) to assess depressive symptoms and a risky decision-making task with periodic measurements of momentary happiness to quantify mood dynamics.
The sentiment of written responses was evaluated by human raters (N = 470), Large Language Models (LLMs; ChatGPT 3.5 and 4.0), and the Linguistic Inquiry and Word Count (LIWC) tool.
We found that language sentiment evaluated by human raters and LLMs, but not LIWC, predicted changes in depressive symptoms at a three-week follow-up.
Using computational modeling, we found that language sentiment was associated with current mood, but language sentiment predicted symptom changes even after controlling for current mood.
In summary, we demonstrate a scalable tool that combines brief written responses with sentiment analysis by AI tools that matches human performance in the prediction of future psychiatric symptoms.