Last updated:
Author(s):
Yaqi A Deng, Torgny Karlsson, Åsa Johansson
Publish date:
7 October 2025
Journal:
NAR Genomics and Bioinformatics
PubMed ID:
41480591

Abstract

Advances in high-throughput technologies enable large-scale studies on genomics and molecular phenotypes. However, the trade-off between quality and quantity reduces assay sensitivity, and several measurements in large-scale proteomics and metabolomics analytes fall below the limit of detection (LOD). If not properly addressed, this may introduce bias in effect estimates. To address this, we conducted a simulation study to evaluate the performance of linear, Tobit, Cox, and logistic modeling in the presence of below-LOD measurements in genome-wide association studies. We identified the optimal strategy as a two-step Linear-Tobit scheme, including rapid screening with linear regression followed by refinement with Tobit regression to retrieve accurate effect estimates. This higher accuracy helps mitigate a 1.3-fold and 2.7-fold inflation in causal estimates in a Mendelian randomization (MR) study, which would otherwise be present with 50% and 90% values below LOD. Validation through case studies on estradiol and testosterone levels in the UK Biobank confirmed the simulation results across subgroups with varying proportions of below-LOD measurements. The Linear-Tobit scheme offers optimal detection power and efficiency, with a focus on its applicability to biobank-scale datasets and accuracy in effect estimates to mitigate bias in downstream applications such as MR and polygenic risk scores.

Related projects

In this project, we will evaluate the health-related effects of three common exposures on population health. The aims are to: 1) to determine the…

Institution:
Uppsala University, Sweden

All projects