HMM for discovering decision-making dynamics using reinforcement learning experiments

Xingche Guo; Donglin Zeng; Yuanjia Wang

doi:10.1093/biostatistics/kxae033

HMM for discovering decision-making dynamics using reinforcement learning experiments

Xingche Guo, Donglin Zeng, Yuanjia Wang

Mailman School of Public Health

Research output: Contribution to journal › Article › peer-review

Abstract

Major depressive disorder (MDD), a leading cause of years of life lived with disability, presents challenges in diagnosis and treatment due to its complex and heterogeneous nature. Emerging evidence indicates that reward processing abnormalities may serve as a behavioral marker for MDD. To measure reward processing, patients perform computer-based behavioral tasks that involve making choices or responding to stimulants that are associated with different outcomes, such as gains or losses in the laboratory. Reinforcement learning (RL) models are fitted to extract parameters that measure various aspects of reward processing (e.g. reward sensitivity) to characterize how patients make decisions in behavioral tasks. Recent findings suggest the inadequacy of characterizing reward learning solely based on a single RL model; instead, there may be a switching of decision-making processes between multiple strategies. An important scientific question is how the dynamics of strategies in decision-making affect the reward learning ability of individuals with MDD. Motivated by the probabilistic reward task within the Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care (EMBARC) study, we propose a novel RL-HMM (hidden Markov model) framework for analyzing reward-based decision-making. Our model accommodates decision-making strategy switching between two distinct approaches under an HMM: subjects making decisions based on the RL model or opting for random choices. We account for continuous RL state space and allow time-varying transition probabilities in the HMM. We introduce a computationally efficient Expectation-maximization (EM) algorithm for parameter estimation and use a nonparametric bootstrap for inference. Extensive simulation studies validate the finite-sample performance of our method. We apply our approach to the EMBARC study to show that MDD patients are less engaged in RL compared to the healthy controls, and engagement is associated with brain activities in the negative affect circuitry during an emotional conflict task.

Original language	English
Article number	kxae033
Journal	Biostatistics
Volume	26
Issue number	1
DOIs	https://doi.org/10.1093/biostatistics/kxae033
Publication status	Published - 2025

Bibliographical note

Publisher Copyright:
© 2024 The Author(s). Published by Oxford University Press. All rights reserved.

ASJC Scopus Subject Areas

General Medicine

Access to Document

10.1093/biostatistics/kxae033

Cite this

Guo, X., Zeng, D., & Wang, Y. (2025). HMM for discovering decision-making dynamics using reinforcement learning experiments. Biostatistics, 26(1), Article kxae033. https://doi.org/10.1093/biostatistics/kxae033

@article{f5e340d7b605466082370c78e69b1de1,

title = "HMM for discovering decision-making dynamics using reinforcement learning experiments",

abstract = "Major depressive disorder (MDD), a leading cause of years of life lived with disability, presents challenges in diagnosis and treatment due to its complex and heterogeneous nature. Emerging evidence indicates that reward processing abnormalities may serve as a behavioral marker for MDD. To measure reward processing, patients perform computer-based behavioral tasks that involve making choices or responding to stimulants that are associated with different outcomes, such as gains or losses in the laboratory. Reinforcement learning (RL) models are fitted to extract parameters that measure various aspects of reward processing (e.g. reward sensitivity) to characterize how patients make decisions in behavioral tasks. Recent findings suggest the inadequacy of characterizing reward learning solely based on a single RL model; instead, there may be a switching of decision-making processes between multiple strategies. An important scientific question is how the dynamics of strategies in decision-making affect the reward learning ability of individuals with MDD. Motivated by the probabilistic reward task within the Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care (EMBARC) study, we propose a novel RL-HMM (hidden Markov model) framework for analyzing reward-based decision-making. Our model accommodates decision-making strategy switching between two distinct approaches under an HMM: subjects making decisions based on the RL model or opting for random choices. We account for continuous RL state space and allow time-varying transition probabilities in the HMM. We introduce a computationally efficient Expectation-maximization (EM) algorithm for parameter estimation and use a nonparametric bootstrap for inference. Extensive simulation studies validate the finite-sample performance of our method. We apply our approach to the EMBARC study to show that MDD patients are less engaged in RL compared to the healthy controls, and engagement is associated with brain activities in the negative affect circuitry during an emotional conflict task.",

author = "Xingche Guo and Donglin Zeng and Yuanjia Wang",

year = "2025",

doi = "10.1093/biostatistics/kxae033",

language = "English",

volume = "26",

journal = "Biostatistics",

issn = "1465-4644",

publisher = "Oxford University Press",

number = "1",

}

TY - JOUR

T1 - HMM for discovering decision-making dynamics using reinforcement learning experiments

AU - Guo, Xingche

AU - Zeng, Donglin

AU - Wang, Yuanjia

PY - 2025

Y1 - 2025

N2 - Major depressive disorder (MDD), a leading cause of years of life lived with disability, presents challenges in diagnosis and treatment due to its complex and heterogeneous nature. Emerging evidence indicates that reward processing abnormalities may serve as a behavioral marker for MDD. To measure reward processing, patients perform computer-based behavioral tasks that involve making choices or responding to stimulants that are associated with different outcomes, such as gains or losses in the laboratory. Reinforcement learning (RL) models are fitted to extract parameters that measure various aspects of reward processing (e.g. reward sensitivity) to characterize how patients make decisions in behavioral tasks. Recent findings suggest the inadequacy of characterizing reward learning solely based on a single RL model; instead, there may be a switching of decision-making processes between multiple strategies. An important scientific question is how the dynamics of strategies in decision-making affect the reward learning ability of individuals with MDD. Motivated by the probabilistic reward task within the Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care (EMBARC) study, we propose a novel RL-HMM (hidden Markov model) framework for analyzing reward-based decision-making. Our model accommodates decision-making strategy switching between two distinct approaches under an HMM: subjects making decisions based on the RL model or opting for random choices. We account for continuous RL state space and allow time-varying transition probabilities in the HMM. We introduce a computationally efficient Expectation-maximization (EM) algorithm for parameter estimation and use a nonparametric bootstrap for inference. Extensive simulation studies validate the finite-sample performance of our method. We apply our approach to the EMBARC study to show that MDD patients are less engaged in RL compared to the healthy controls, and engagement is associated with brain activities in the negative affect circuitry during an emotional conflict task.

AB - Major depressive disorder (MDD), a leading cause of years of life lived with disability, presents challenges in diagnosis and treatment due to its complex and heterogeneous nature. Emerging evidence indicates that reward processing abnormalities may serve as a behavioral marker for MDD. To measure reward processing, patients perform computer-based behavioral tasks that involve making choices or responding to stimulants that are associated with different outcomes, such as gains or losses in the laboratory. Reinforcement learning (RL) models are fitted to extract parameters that measure various aspects of reward processing (e.g. reward sensitivity) to characterize how patients make decisions in behavioral tasks. Recent findings suggest the inadequacy of characterizing reward learning solely based on a single RL model; instead, there may be a switching of decision-making processes between multiple strategies. An important scientific question is how the dynamics of strategies in decision-making affect the reward learning ability of individuals with MDD. Motivated by the probabilistic reward task within the Establishing Moderators and Biosignatures of Antidepressant Response in Clinical Care (EMBARC) study, we propose a novel RL-HMM (hidden Markov model) framework for analyzing reward-based decision-making. Our model accommodates decision-making strategy switching between two distinct approaches under an HMM: subjects making decisions based on the RL model or opting for random choices. We account for continuous RL state space and allow time-varying transition probabilities in the HMM. We introduce a computationally efficient Expectation-maximization (EM) algorithm for parameter estimation and use a nonparametric bootstrap for inference. Extensive simulation studies validate the finite-sample performance of our method. We apply our approach to the EMBARC study to show that MDD patients are less engaged in RL compared to the healthy controls, and engagement is associated with brain activities in the negative affect circuitry during an emotional conflict task.

UR - http://www.scopus.com/inward/record.url?scp=85217892925&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85217892925&partnerID=8YFLogxK

U2 - 10.1093/biostatistics/kxae033

DO - 10.1093/biostatistics/kxae033

M3 - Article

C2 - 39226534

AN - SCOPUS:85217892925

SN - 1465-4644

VL - 26

JO - Biostatistics

JF - Biostatistics

IS - 1

M1 - kxae033

ER -

HMM for discovering decision-making dynamics using reinforcement learning experiments

Abstract

Bibliographical note

ASJC Scopus Subject Areas

Access to Document

Other files and links

Fingerprint

Cite this