Machine learning predicts hepatocellular carcinoma risk from routine clinical data: a large population-based multicentric study.

Last updated:: 31 May 2026

Author(s):: Jan Clusmann, Paul-Henry Koop, David Y Zhang, Felix van Haag, Omar S M El Nahhas, Tobias Seibel, Laura Zigutyte, Apichat Kaewdech, Julien Calderaro, Frank Tacke, Tom Luedde, Daniel Truhn, Tony Bruns, Kai Markus Schneider, Jakob Nikolas Kather, Carolin V Schneider
Publish date:: 26 March 2026
Journal:: Cancer Discovery
PubMed ID:: 41881847
DOI:: 10.1158/2159-8290.cd-25-1323

Abstract

Hepatocellular carcinoma (HCC) is a highly fatal tumor, for which risk stratification is crucial, yet remains challenging. Here, we develop an interpretable machine-learning framework for HCC risk stratification based on routinely collected clinical data. We utilize prospectively collected multimodal data from over 900,000 individuals and 983 cases of HCC across two population-scale cohorts: the “UK Biobank study” (development) and the “All of Us Research Program” (external testing). We assess individual and cumulative contributions of data modalities including demographics, lifestyle, health records, blood, genomics, and metabolomics. Our final, random-forest-based models significantly outperform all publicly available state-of-the-art risk-scores on both internal and external test sets. We demonstrate robustness across ethnic subgroups, provide comprehensive interpretability and release all code, model weights and a web-calculator for external validation and agentic integration. Our study presents PRE-Screen-HCC, a robust and interpretable machine-learning framework for HCC risk stratification and early detection.