Observational data and the UK Biobank have large sample sizes but it is difficult to establish causality. Treatment assignment is unbalanced and confounding factors influence the outcomes. Advances in causal inference allow to control for confounding factors using machine learning methods for tabular data (doi:10.1111/ectj.12097). With the current work of our research group, DoubleMLDeep (doi:10.48550/arXiv.2402.01785) multimodal data can be directly used as a confounder in causal analyses. No manual feature engineering or expert knowledge on the multimodal data is required for analysis. This is promising as imaging provides detail on a person’s physical state that is not available in medical records traditionally used. Genetic data also contains information that is a confounder in many observational studies (e.g. risk of death depends on genes).
Following the development of methods that allow controlling for confounding using multimodal data, can they be used with medical data to improve the causal validity of findings? For this it needs to be verified if the networks are able to train well with the required modifications for causal inference and that the data used has sufficient predictive power.
Two potential applications are targeted for this evaluation:
Use of genetic sequencing data directly as a confounder to evaluate the impact of high blood pressure on cardiac outcomes (extension of doi:10.1001/jamacardio.2018.1717). Can feature engineering to identify important genes be replaced by passing the genetic information directly into a neural network?
Use of chest imaging data as a confounder in an observational assessment of the coronavirus vaccine effectiveness of preventing severe illness or death. The image contains a multitude of information about lungs and could be a good predictor for the outcome. Can confounding in a treatment assessment be better controlled using images rather than traditional use of preexisting conditions?