Estimating causal effect in case-control studies with nondifferential misclassified outcomes

Last updated:: 26 April 2026

Author(s):: Min Zeng, Zijian Sui, Zeyang Jia, Jinfeng Xu, Hong Zhang
Publish date:: 9 March 2026
Journal:: Journal of the Royal Statistical Society Series C (Applied Statistics)
DOI:: 10.1093/jrsssc/qlag012

Abstract

Abstract Establishing the causal relationship between treatment and outcome is a primary objective in many biomedical studies. In case-control studies, individuals are selected based on their outcome status, complicating causal inference due to the retrospective design. Further complications arise when the observed outcomes are subject to misclassifications. As a motivating example, the Global Enteric Multicenter Study utilizes a case-control design to investigate the causal effect of Cryptosporidium infection on diarrhoea in African children. The classification of diarrhoea status (outcome of interest) is based on caregiver-reported symptoms, which can differ from the true status. In fact, caregiver-reported classification has been reported to have a low sensitivity of approximately 16.8%. The presence of both case-control sampling and outcome misclassification poses a great challenge in accurately estimating causal effects, and we aim to resolve this issue in this paper. We establish nonparametric identifiability of the average treatment effect and conditional average treatment effect under both nondifferential and differential outcome misclassification scenarios by leveraging external information on disease prevalence and misclassification rates, and propose two novel estimation methods for the nondifferential scenarios. Extensive simulation studies and two real-data examples are provided to evaluate the finite-sample performance of the proposed estimators.