Approved research

Evaluation of New Semisupervised Estimators for Personalized Psychiatry

RWTH Aachen University

Lay summary

This clinical neuroscience project we will systematically evaluate `semisupervised` classification for translation into brainimaging. These algorithms emerged only recently in machine-learning. They have the potential to improve model performance and interpretability when data of a psychiatric/neurological population is scarce, but large general-purpose datasets are available. First, we will quantitatively characterize the gain in prediction performance by varying the size of either the (present) schizophrenia sample or the (to be obtained) UKBiobank data in structural and functional brain scans. Subsequently, we will investigate the latent variables immanent in the statistical models and association with questionnaires and clinical parameters. Improving the model performance and neurobiological interpretability of clinical investigations in schizophrenia by exploiting general-purpose neuroimaging databases could be an important cornerstone for personalized medicine in psychiatry and neurology. If successful, the proposed semisupervised modelling framework can be shared in from of code on www.github.com with the scientific community, the approach naturally extends to other mental disorders, and can be applied on a single-subject basis. The automatically revealed and formalized relationships between candidate disease endophenotypes and clinical exophenotypes could translate to medical practice. The computational experiments are performed based on common scientific computing environments (Python coding language) and on a university-hosted computer cluster with 50 CPU processors and 500GB working memory. Three different statistical models, that have recently been introduced in basic statistics, will be applied to varying amounts of the brain and behavioral data in the database. A number of performance metrics will be used to evaluate the model success across computational experiments. This will allow for a systematic assessment of the scaling behavior of models and data proportions. From UKBiobank, we need data from all participant that have both structural and functional brain scans including their behavioral, questionnaire, and clinical data. This sample should be as large as possible for optimal assessment of the feasibility of the above approach; and will hopefully grow in size over the coming years.