Speaker: Manuela Zucknick, University of Oslo
Title: “Statistical learning for drug screening in personalized cancer therapy”
Abstract: Large-scale cancer pharmacogenomic screening experiments profile hundreds of cancer cell lines versus hundreds of clinically approved or experimental compounds to study drug sensitivity and/or synergistic effects of drug combinations. The aim of these in vitro studies is to use the genomic profiles of the cell lines together with information about the drugs to predict the response of individual cell lines to a particular drug or combination of drugs, and ultimately to learn about in vivo treatment response for patients.
This is a multi-task multi-view prediction problem where there is only little predictive value in each of the individual data sets. Therefore, it is important to optimize prediction performance by combining the different data sources efficiently, by borrowing information across experiments, and by using external knowledge wherever available. A naïve approach to address this problem is to vectorise all available data and to apply standard methods for high-dimensional linear regression, but vectorisation will easily “blow up” the regression coefficient vector and lead to a very inefficient use of the data.
Our task is made easier by the fact that there is strong structure in the data due to the experimental setup and biological and biochemical constraints, and an efficient use of this structure can massively reduce the number of effective parameters that need to be estimated. This is crucial, since the sample size is typically much smaller than the number of input features. The data structure is often assumed to be non-linear and tends to be expressed in multi-way arrays (or tensors). In this talk I will discuss some approaches that can help us to build regression models which capture this structure efficiently. I will present examples, where we impose structure either through the regression coefficient array by designing structured penalty terms or priors, or through direct modification of the input data array, e.g. in multiple kernel learning.