
SOMX logo
Advances in bio-processing, high-throughput technology, data storage and grid computing are producing vast amounts of accessible ‘omics data, for example nucleotide resolution genomic and transcriptomic data, as well as measures of downstream biological processes, such as metabolomic profiles. Extensive questionnaires, multi-omic assays and biobanks are key types of design in modern epidemiology. By using state of the art statistical approaches combined with biological insights and efficient computations, our research addresses new challenges proposed by the rapidly expanding set of “omic” technologies, which includes assays of the genome, epigenome, transcriptome, proteome and metabolome. The broad aims of the “SOMX” (statistical ‘omics) theme are motivated by the need:
- To develop new methods for data generated by novel ‘omic technology
- To integrate different layers of data and co-information for understanding disease aetiology
- To extract understanding of biological mechanisms from modelling the connections and structure in high dimensional datasets
- To develop new statistical models for genomic data that permit mechanistic insights into underlying biological processes such as the 3D structure of the genome and its regulatory code
- To enable sophisticated feature selection (e.g. in genetic fine mapping) when only summary data are available
- To enable efficient inference within principled statistical frameworks to handle complex, and/or tall, i.e. large n, ‘omics data
Many problems that have been traditionally addressed by statistical methods are increasingly being tackled with success by methods from other fields such as machine learning under the broad heading of data science. We investigate how to translate and tailor work from the machine learning community into the biostatistical domain, while always taking great care to marry information synthesis with statistical principles, calibrating and accounting for uncertainty in the biomedical use of models and predictions. Our links with the Alan Turing Institute are of great benefit to this endeavour.
Our projects range from the extension of already established methods to entirely novel and ambitious projects. Owing to the unique scientific environment of the Cambridge Biomedical campus and nearby genome institutes (Babraham, Sanger and EBI), new lines of research are often suggested by cutting edge experimental datasets generated by scientists with whom we continue to foster close collaborative relationships. In this way, we enable a direct route to impact for our new methods.
(Previous name of theme: Statistical Genomics – SGX)
Other Research Themes:
- DART: Design and Analysis of Randomised Trials
- SURPH: Statistical methods Using data Resources to improve Population Health
- PREM: Precision Medicine and Inference for Complex Outcomes