Last updated:
Author(s):
Yue Shen, Jie Wang, Zhe Wang, Zhihao Shi, Hanzhu Chen, Zheng Wang, Yukang Jiang, Xiaopu Wang, Chuandong Cheng, Xueqin Wang, Hongtu Zhu, Jieping Ye
Publish date:
2 May 2025
Journal:
Artificial Intelligence in Medicine
PubMed ID:
40344999

Abstract

Diagnosis codes are standard code format of diseases or medical conditions. This study is aimed at assigning diagnosis codes to patients in large-scale biobanks, particularly addressing the issue of missing codes for some patients. This is crucial for downstream disease-related tasks. While recent methods primarily rely on structured biobank data for code assignment, they often overlook the valuable medical context provided by textual information in the biobanks and hierarchical structure of the disease coding system. To address this gap, we have developed CATI, a medical context-enhanced framework for diagnosis Code Assignment by integrating Textual details derived from key features and disease hIerarchy. The study is based on the UK Biobank data and considers Phecodes and ICD-10 codes as standard disease formats. We start by representing ten informative codified features using their formal names and then integrate them into CATI as text embeddings, achieved through prompt tuning on the pre-trained language model BioBERT. Recognizing the hierarchical structure of diagnosis codes, we have developed a novel convolution layer in our method that effectively propagates logits between adjacent diagnosis codes. Evaluation results demonstrate that CATI outperforms existing state-of-the-art methods in terms of both Phecodes and ICD-10 codes, boasting at least a 5.16% improvement in average AUROC for unseen disease codes and an 8.68% rise in average AUPRC for disease codes with training instances ranging in (1000,10000]. This framework contributes to the formation of well-defined cohorts for downstream studies and offers a unique perspective for addressing complex healthcare tasks by incorporating vital medical context.

Related projects

Our research project focus on finding out the cause brain diseases. We expect to figure out what causes brain diseases, and which particular group of…

Institution:
University of Science and Technology of China, China

All projects