Skip to navigation Skip to main content Skip to footer

Approved research

Defining and redefining human disease at scale: an atlas of the human phenome.

Principal Investigator: Dr Spiros Denaxas
Approved Research ID: 58356
Approval date: June 12th 2020

Lay summary

Our understadning of human disease and the different factors which influence our health changes all the time through but the manner in which we define diseases is still based on what clinicians can directly observe. As a result, we have a one-size-fits-all medication treatment for many diseases which does not benefit all patients as they might have the same disease but have significant differences in their genetic material which influenced if the treatment will work or how well it will work. The aims of this project is to use analytical approaches in order to identify and describe how the same disease can vary across different patients. To do so, we will use data from many different aspects of human health available in the UK Biobank, from genetic data and blood data to phenotypic data that get collected when we interact with the healthcare system. The result of this study (which will last 36 months) will improve human health and healthcare by enabling clinicians to accurately identify who will benefit from what drug and by providing insights into the creation of better drugs for patients who do not currently benefit from existing treatments.

Current scope

Overarching aim: to create an atlas of human disease by creating, evaluating, and disseminating human disease, health behaviour and risk factor phenotype algorithms derived in the UK Biobank.

Aim 1: To create and validate rule-based, deterministic, phenotyping algorithms of diseases, health behaviours and risk factors using combinations of genotypic and phenotypic data.

Aim 2: To systematically evaluate supervised and unsupervised machine learning methods for creating phenotyping algorithms and compare algorithmic accuracy with rule-based approaches.

Aim 3: To evaluate unsupervised machine learning approaches for identifying and characterizing latent subtypes of diseases that can accurately capture variation in terms of aetiology or prognosis amongst patients.

Aim 4: To create an open-access platform that will curate the defined phenotypes and provide algorithm metadata, definitions, implementation details and links to further relevant resources with the aim of enabling reproducibility.

New scope

New aim 5. Building on the work we carried out for aims 1 and 4 (above), we will identify areas of focus and carry out 'deeper' phenotype development and validation to include additional data elements (such as biomarker data from primary care and prescriptions) in order to define disease progression and severity phenotypes. We will leverage the infrastructure we developed (reproducible pipeline, aim 4) to host metadata and trait definitions for new phenotypes, which will be made available as open source for use by the research community. Our approach will be based on the methods we developed and published here: Denaxas et al, JAMIA Open, 2020.

New aim 6: We will translate and evaluate the phenotyping algorithms created to the OMOP Common Data Model based on our previous work published here: