By Hélène Ruffieux
Ecole Polytechnique Fédérale de Lausanne and Nestlé Institute of Health Sciences
Abstract: Combined inference for heterogeneous high-dimensional data is critical in modern biology, where clinical and various kinds of molecular data may be available from a single study. Classical genetic association studies regress a single clinical outcome on many genetic variants one by one, but there is an increasing demand for joint analysis of many molecular outcomes and genetic variants in order to unravel functional interactions. Unfortunately, most existing approaches to joint modelling are either too simplistic to be powerful or are impracticable for computational reasons. Inspired by Richardson et al. (2010, Bayesian Statistics 9), we consider a sparse multivariate regression model that allows simultaneous selection of predictors and associated responses, and we propose three extensions to leverage the spatial and functional structure among the predictors. As MCMC inference on such models can be prohibitively slow for real problem sizes, we present a variational inference approach which produces posterior information very close to that of MCMC inference, at a much reduced cost. Extensive numerical experiments show that our approach outperforms popular variable selection methods and tailored Bayesian procedures, dealing within hours with problems involving hundreds of thousands of genetic variants and hundreds to thousands of clinical or molecular outcomes. Software is available at https://github.com/hruffieux/locus. This is joint work with Anthony Davison (Ecole Polytechnique Fédérale de Lausanne), Jörg Hager and Irina Irincheeva (Nestlé Institute of Health Sciences, Lausanne).