Enhanced Variable Selection for Boosting Sparser and Less Complex Models in Distributional Copula Regression

Last updated:: 19 November 2025

Author(s):: Annika Strömer, Nadja Klein, Christian Staerk, Florian Faschingbauer, Hannah Klinkhammer, Andreas Mayr
Publish date:: 17 June 2025
Journal:: Statistics in Biosciences
DOI:: 10.1007/s12561-025-09491-8

Abstract

Structured additive distributional copula regression allows to model the joint distribution of multivariate outcomes by relating all distribution parameters to covariates. Estimation via statistical boosting enables accounting for high-dimensional data and incorporating data-driven variable selection, both of which are useful given the complexity of the model class. However, as known from univariate (distributional) regression, the standard boosting algorithm tends to select too many variables with minor importance, particularly in settings with large sample sizes, leading to complex models with difficult interpretation. To counteract this behavior and to avoid selecting base-learners with only a negligible impact, we combine the ideas of probing, stability selection, and a new deselection approach with statistical boosting for distributional copula regression. In simulations and an application to the joint modeling of weight and length of newborns, we find that all proposed methods enhance variable selection by reducing the number of false positives. However, only stability selection and the deselection approach yield similar predictive performance to classical boosting. Finally, the deselection approach is better scalable to larger datasets and leads to competitive predictive performance, which we further illustrate in a genomic cohort study from the UK Biobank by modeling the joint genetic predisposition for two phenotypes.