Research
Algorithmic fairness           Health care quality measurement           Colorectal cancerAlgorithmic fairness
Algorithms are used pervasively in medical decision-making. While this has many advantages (e.g. consistency, efficiency, data-driven), how do we know if algorithms are perpetuating social biases and how can we reduce unfairness?
Detecting disparities in model fit
Problem: In psychiatry and clinical psychology, models are used to make sense of psychological symptom patterns to inform diagnostic categories and the development of screening and measurement instruments. However, these models might not describe all individuals equally well. Models that do not generalize equally well across heterogeneous populations could perpetuate social biases.
Contributions:
We proposed a robust estimation alternative to the EM algorithm (commonly used in latent variable model estimation), which we call REM (robust expectation-maximization algorithm), to detect subsets in a data sample that are poorly described by a fitted model. This can help researchers understand and work towards improving the generalizability of their models.
This approach is built on replacing the likelihood function with a mixture between the likelihood function and an unknown process, represented by \(\epsilon\), which we treat as a hyperparameter:
\[ f_{X|\theta}(x) \rightarrow \gamma f_{X|\theta}(x) + (1-\gamma)\epsilon \]In this simple 2D example, the dark blue dots in the REM plot on the right (above) denote individuals with reported symptom patterns that do not neatly fit into either of the two major subgroups, shown by the ellipses.
[REM methods paper] [code]-
We applied the REM method described above to postpartum depressive symptom data to detect and describe differential depressive symptom patterns and examine associations with demographics and psychiatric histories.
In our sample, we fit an exploratory factor analysis model and found that about 10% of our sample did not fit the model well. This subset was more likely to have severe depressive symptoms, particularly regarding negative self-judgement and thoughts of self-harm. This subset was also more likely to have a history of childhood trauma and/or a history of social anxiety disorder.
I built an R Shiny app to demonstrate how information from the REM fit could be used to predict which individuals are likely to be in this subset based on self-reported depressive symptoms. This information could inform the tailoring of screening and treatment stratgies for postpartum depression.
[postpartum depression application paper][R Shiny app]
Estimating sample average treatment effects (SATE) in experimental and observational studies
Problem: In many health studies, select population subgroups—such as racial and ethnic minorities, older adults, and adults with less than a high school education—consistently make up a smaller proportion of the data sample compared to others. This has led to study findings (and consequently medical decisions) that generalize well for some sociodemographic subgroups and poorly for others.
Contributions:
-
We developed a statistical framework to more quantitatively understand the consequences of systemic differences in sample proportions between subgroups. We show under some assumptions that the difference in mean-squared error of SATE estimates for two subgroups is equal to the product of (1) the difference in subgroup sample proportions and (2) the average squared difference between subgroup treatment effects. The formula derived in our paper could be used to inform design, analysis, and interpretation of studies in heterogeneous populations.
In the same paper, we developed a reweighting approach for adjusting sample representation in a way that lowers mean-squared error of subgroup-specific effect estimation on average, which we call representation-adjusted average treatment effect (RATE) estimation. This approach is similar to a Bayesian shrinkage estimator which enables each subgroup to leverage information from the full sample rather than the subgroup's own data only. This reduces statistical noise at the expense of some bias.
[RATE paper][code]
Health care quality measurement
Health care quality can vary considerably across physicians and health systems. Quality measures are essential for holding health care providers accountable and identifying disparities. How do we evaluate and ensure that quality measures are capturing what we want them to capture?
Reliability
Problem: Differences in quality measures between providers might be due to an artifact of chance as opposed to true differences in quality of care. Reliability quantifies the stability of a measurement if we could somehow repeat the measurement again in another sample from the sample population of patients and providers in the same time period. Reliability is one scientific criterion by which prospective CMS measures are evaluated. However, there is debate among experts on how to calculate and interpret estimates of quality measure reliability.
Contributions:
-
We conducted simulation studies of the split-sample method for estimating reliability and found that estimates can be very sensitive to the random split of the data in low sample size and low performance variability settings. We show that averaging many split-sample estimates can reduce the variability of the split-sample estimate of reliability. [split-sample paper]
-
We reviewed various methods for estimating reliability of health care quality measures and compared estimates in the case of two mental health quality measure sets.
We found that estimates can differ substantially, especially when sample sizes are small. More work is needed to understand which methods should be preferred in which situations. [comparing methods paper]
[code]
Colorectal cancer
Colorectal cancer (CRC) is one of the most commonly diagnosed cancers and one of the most common causes of cancer-related death in the U.S. While CRC incidence has declined over the last few decades, largely attributed to the adoption of screening and changes in health-related behaviors, incidence has been rising among adults under the age of 50. CRC is now the leading cause of cancer death for men and the second leading cause of cancer death for women under age 50. Stark racial and ethnic disparities exist in CRC mortality rates with American Indian, Alaska Native, and Black/African American adults having higher rates than other subgroups. What is driving these trends and what policies and interventions will be most effective at addressing them?
Sociodemographic disparities in screening
In progress.