Current PhD Opportunities at the MRC Biostatistics Unit

The BSU is an internationally recognised research unit specialising in statistical modelling with application to medical, biological or public health sciences. Details of the work carried out in the Unit appear on our website http://www.mrc-bsu.cam.ac.uk/research-and-development/

We currently have the following (please see below) studentships available at the MRC Biostatistics Unit.

To apply for any of the following PhD projects, please visit the Applications procedures page.

Deadline for all applications is the 8th January 2017, with interviews expected to take place on the 24th and 25th January 2017.

 

Developing stratified approaches from randomised trials, with application to recommended intervals between blood donations

Supervisors - Brian Tom (BSU) and Simon Thompson (University of Cambridge)

More details

Larger randomised trials offer the potential not only to estimate the overall effectiveness of alternative treatments or policies, but also to explore which types of subject may benefit most. However, the statistical methods for addressing the latter issue are not well developed, especially when there is a wealth of information on patient characteristics that could be used.

The project will be based on the very large INTERVAL trial (www.intervalstudy.org.uk), in which 50,000 male and female blood donors have been randomised to giving blood at, or more frequently than, the standard intervals (8 and 10 weeks vs the standard 12-weeks for men, and 12 and 14 weeks vs the standard 16-weeks for women). The outcomes in the trial are the amount of blood collected over the two years of the trial, the number of deferrals (temporary rejection of a donor due to low haemoglobin), and quality of life (in particular the physical subscale of the SF-36 questionnaire). In addition to the overall comparison of randomised groups, interest centres on whether different inter-donation intervals should be recommended for people with different characteristics (e.g. by age, weight, blood biomarkers, or genetic characteristics). The 50,000 trial participants are well characterised at baseline (demographics, previous donation history, haematology, iron measures, genetics, quality of life), and at two years, with interim 6-month questionnaires on quality of life and health symptoms.

In this project, it is proposed to

  • develop methods to delineate which donors benefit most, first in terms of single characteristics and then in terms of multiple characteristics;
  • consider how to balance increasing beneficial outcomes against increasing harm or side-effects (i.e. multivariate outcomes), especially when they are measured on different scales;
  • extend techniques to adaptive schemes that reflect time-varying donor covariates;

It is envisaged that this project will involve (i) developing methodology for either joint modelling of outcomes or combining outcomes measured on different scales and for decision-making to determine stratification schemes (potentially adaptive) that “optimise” these multiple or composite outcomes; (ii) model comparisons or averaging if there are potentially different subsets of variables that can be used for stratification; and (iii) developing methodology based on dynamic stratified approaches. The methodologies developed will be more applicable than just to the INTERVAL trial.

This studentship is for full time study only.

Available for commencement in Michaelmas Term 2017

Full award to cover Cambridge University fees and a competitive stipend for a period of 3 years only - see application procedures for eligibility criteria.

To apply for this project- please see our Application procedures

 

Bayesian dose adaptive trials using non-myopic response-adaptive methods

Supervisors - Sofia Villar (BSU) and Adrian Mander (BSU)

More details

In dose finding studies the aim is to find the maximum tolerated dose of an agent or to find a dose which is closest to a target dose.  In dose-ranging studies different doses of an agent are tested against each other to establish which dose works best and/or is least harmful by estimating a response-dose relationship. However, achieving either of these goals with a high precision can imply exposing a large number of patients to highly toxic doses, imposing a learning-earning trade-off. Despite extensive recent work has been done in using decision theory for addressing such a trade-off in the context of designing clinical trials [1], little work has been done to extend such a framework for dose-finding/dose-ranging studies. Using a decision-theoretic approach allows to take into account the interests of the patients both within and outside the trial to derive a patient allocation rule which can acknowledge the existing conflict between the interests of each individual patient and the following patients. This idea was proposed earlier in the literature (e.g. a framework for dose-finding trials using the theory of bandit problems was proposed by Leung and Wang [2]) yet because finding the optimal strategy for this type of bandits with dependent arms is in most relevant cases not computationally feasible the approach has not been further developed.

This PhD project will look at developing decision-theoretic non-myopic response-adaptive dose-ranging methodology for dose-ranging and dose-finding studies. The project will make use of recent advances in bandit theory to try and reduce the computational complexity of finding the optimal (or nearly optimal) solution derived from a set of relevant optimisation problems.  The PhD will cover some of the following areas:

  • Use and extend existing response-adaptive randomisation rules to be incorporated into the design of dose-escalation studies.
  • Investigate novel optimal response-adaptive adaptive designs that can handle multivariate conflictive outcomes (efficacy-toxicity).
  • Asses how these methods perform in terms of estimation purposes and patient gain decisions (administering doses nearest to the target toxicity level).
  • Use of the dynamic optimisation (bandit) literature to develop suitable and practical non-myopic adaptive randomisation methods specifically designed for dose adaptive trials;
  • Produce easy to use software in R and/or Stata to implement methods;
  • Compare the resulting decision-based designs to the real trial.

__________________________________________________________________

References:

  1. Villar, S., Wason, J. and Bowden, J. (2015) Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges. Statistical Science Vol. 30, No. 2, 199-215.
  2. Leung, D. and Wang, Y.G. (2002) An extension of the continual reassessment method using decision theory Statistics in Medicine 21(1):51-63
  3. Fan, S. and Wang, Y.G (2006) Decision-theoretic designs for dose-finding clinical trials with multiple outcomes. Statistics in Medicine Vol. 25 No 10 1699--1744

This studentship is for full time study only

Available for commencement in Michaelmas term 2017

Full award to cover Cambridge University fees and a competitive stipend for a period of 3 years only - see application procedures for eligibility criteria and to apply.

 

Developing Bayesian non-myopic response-adaptive randomisation for the case of delayed endpoint observation

Supervisors - Sofia Villar (BSU) and Adrian Mander (BSU)

More details

Before a novel treatment is made available to the wider public, clinical trials are undertaken to provide unbiased and reliable information that the treatment is safe and efficacious. The standard approach for such confirmatory clinical trials is to compare only two treatment options and requires a large number of patients to be recruited in a trial. This approach does not fit well with the development of treatment for many conditions in which there is a large number of potential treatments to explore and relatively very few patients affected by the disease that could be enrolled in a trial. This is the case for drug development for rare types of cancer.

A promising alternative to the standard approach within the above described context is the use of response-adaptive randomization (i.e. changing the allocation probabilities as outcome data is collected to favour promising treatments). Promising treatments can be quickly identified, allocating more patients to them while doing so, by designing a trial that incorporates a response-adaptive randomization patient allocation rule. The type of response-adaptive randomization rules that exhibit the best performance in terms of patient benefit are the so called non-myopic rules which unfortunately suffer from a binding computational burden. Developing computational feasible and practical methods to apply these ideas into trial design as a way for improving the success rate of Phase III clinical trials are therefore of great current interest. At the Biostatistics unit we have made a start with this by developing a non-myopic group response-adaptive randomisation method called the ‘forward looking Gittins index’ rule (1,2) for the case of dichotomous endpoints.

This PhD project will look at extending existing non-myopic response-adaptive randomisation methodology to cover the case of delayed outcomes. This is particularly relevant for trials in which the endpoint is survival.  The project will investigate novel optimal adaptive designs that can use both observed response and partial information (derived from the delayed response). Therefore, these methods will be closer to the real world situations being handled by trials in which the endpoint is not necessarily best modelled as binary and immediately observable. The PhD will cover some of the following areas:

  • To model the patient allocation problem with delayed patients responses as an optimal sequential decision making problem in the stochastic dynamic programming framework.
  • To design of index policies and their comparison to existing approaches in terms of statistical and optimality performance
  • To develop and study of efficient algorithms for optimal solutions, creation of a software package for and collaboration with statisticians and clinicians to apply designed solutions in real clinical trials

The student also will have the opportunity to collaborate with researchers from Lancaster University that are experts in stochastic dynamic programming approaches and in adaptive designs.

References:

  1. Gittins, J. and Jones, D. A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika, 66(3):561--565, 1979.
  2. Villar, S., Wason, J. and Bowden, J. (2015) Response-adaptive Randomization for Multi-arm Clinical Trials using the Forward Looking Gittins Index rule Biometrics Vol. 71, No 4. 969-978 .

This studentship is for full time study only

Available for commencement in Michaelmas term 2017

Full award to cover Cambridge University fees and a competitive stipend for a period of 3 years only - see application procedures for eligibility criteria and to apply.

 

Integrative methods for identifying non-coding rare variants responsible for rare diseases

Supervisors - Ernest Turro (Department of Haematology) and Sylvia Richardson (BSU)

More details

Only half of the approximately 7,000 known rare heritable disorders of humans have an established molecular basis. These genetic determinants have been identified through linkage studies and, more recently, by uncovering associations between genetic variants identified through genomic DNA sequencing and disease phenotypes encoded as simple variables (e.g. case/control label). Recently, we have developed a regression method for identifying associations between rare variants in genes and Human Phenotype Ontologoy (HPO)-coded patient phenotypes (Greene et al, Am. J. Hum. Genet., 2016). This method allows modeling of phenotype abnormalities that encompass all organ systems and which are encoded with a variable degree of clinical detail — a common feature of the phenotypes of patients with rare diseases. Currently, we are developing Bayesian methodology for modeling candidate rare variants (e.g. within a region) as mixtures of pathogenic and non-pathogenic rare variants in the context of the typical modes of Mendelian inheritance.

The vast majority of variants identified so far alter the protein products of genes, which comprise around 2% of the genome. This is partly because the effects of variants in protein-coding genes are more easily predicted than those outside of coding regions and partly because it has not been possible, until now, to sequence entire genomes cheaply and with high accuracy. As a high proportion of cases remain unexplained, it is commonly postulated that variants affecting gene regulation but residing outside genes themselves may underlie such disorders. Identifying these variants will require careful integration of relevant cell-specific and population genetic data to inform probabilities of pathogenicity of non-coding variants.

The aim of the proposed project is to develop innovative statistical methods for uncovering associations between rare variants and rare Mendelian diseases that make use of relevant epigenetic, chromosomal conformation, protein-protein interaction, eQTL and GWAS data, and apply them to a rich database of blood-related disorders. It is only through appropriate modeling of various layers of genomic and genetic information that elusive causes of inherited disorders are likely to be found. Methods for integration of multi-omics data are at an early stage of development and this project will build on the experience of both teams in the domain of rare disease analysis and statistical genomics, notably using Bayesian modelling strategies. The successful candidate will have access to extensive computing facilities at the University's high performance computing cluster and be engaged in the largest rare disease research programme in Europe (https://bioresource.nihr.ac.uk/rare-diseases/welcome/). Initial focus will be on diseases of the blood stem cell and its progeny. Several thousand cases with a blood-related disorder have been sequenced and phenotyped and we have access to deep epigenetic and chromosomal conformation data from all the major mature and progenitor cells in blood, as well as the results of blood-trait GWAS and blood cell eQTL studies. These data will assist in the development and assessment of emerging methodological ideas. In collaboration with colleagues in other institutions and within the Department of Haematology, potential findings will be amenable to rapid follow-up in the laboratory.

This studentship is for full time study only

Available for commencement in Michaelmas term 2017

Full award to cover Cambridge University fees and a competitive stipend for a period of 3 years only - see application procedures for eligibility criteria and to apply.

 

Bayesian methods for “weighted” biomedical data

Supervisor - Robert Goudie (BSU)

More details

The recent availability in biomedical studies of vast quantities of data, such as omics (genomic, transcriptomic, proteomic etc) data, is starting to enable data-driven “precision medicine”. This approach to medicine aims to use these new data to allow tailoring of treatments to patients, rather than the traditional “one size fits all” approach.

However, it is often not feasible to collect data on all relevant individuals due to, for instance, time and cost constraints. Instead, in many studies, data is collected on only a subgroup of the relevant population. In precision medicine studies, the subgroup is often deliberately chosen to over-represent particularly interesting cases (e.g. extreme cases) to increase the chances that differences between patients that require different treatment strategies can be identified. Such a subgroup is not representative of the overall population, and the results of a statistical analysis will be distorted unless this is accounted for in the analysis. To do this, we must account for the “weight” associated with each observed individual i.e. how many people each observed individual represents in the full population.

We at the Biostatistics Unit are involved in a number of precision medicine collaborations that involve weighted data, including studies of Alzheimer’s disease and other dementias. Many promising approaches in precision medicine take a Bayesian approach to make it straightforward to account for all sources of uncertainty within large, complex models. However, Bayesian approaches for weighted data are in their infancy. This PhD project will develop these methods, with the aim of enabling Bayesian approaches in precision medicine with weighted data. The methods developed are likely to be also applicable more widely to the many other sources of weighted data in biostatistics.

The relatively early stage of research in this area means a wide range of open problems could be investigated, according to the interests of the student. Potential practical and theoretical aspects include:

  • Studying the strengths and weaknesses of existing Bayesian weighted data approaches, and comparing these to standard classical methods.
  • Methodological development of more flexible and easier-to-use Bayesian approaches to weighted data, including fully Bayesian and approximate methods.
  • Application of existing and/or novel methods to one or more of the precision medicine studies that the Unit is involved; and/or in the context of HIV epidemiology, where data are also often weighted.
  • Development of efficient computational approaches to allow use of weighted Bayesian methods in the big data contexts common in precision medicine. This will likely require the use of the large cluster computer facilities available in Cambridge.

This studentship is for full time study only

Available for commencement in Michaelmas term 2017

Full award to cover Cambridge University fees and a competitive stipend for a period of 3 years only - see application procedures for eligibility criteria and to apply.

 

Methods for integrating and splitting complex/big models

Supervisors - Robert Goudie (BSU) and Lorenz Wernisch (BSU)

More details

Synthesis of evidence from multiple sources (data and expert opinion) and from different study designs is increasingly common in all areas of science, including in infectious disease epidemiology, health technology assessment and omics (genomics, proteomics etc). Combining information sources often results in more precise and useful inferences, especially when some data are incomplete or biased.

However, using joint "big models" of several sources of evidence, including data and expert opinion, is inferentially and computationally challenging. It is often sensible to take a modular approach in which separate sub-models are considered for smaller, more manageable parts of the available data/evidence. Each sub-model is simpler (lower-dimensional) than the "big model" and so will be easier to construct and use.

In a Bayesian framework, the sub-models should be integrated into a joint model, so that all data and uncertainty are fully accounted for. This can be challenging to do, but at the Biostatistics Unit we have recently proposed a novel approach to this problem called Markov melding [1], building on ideas from the graphical models literature. This promises to enable fully Bayesian inference in settings where this was not previously possible, and to allow splitting the computation required for large models into smaller pieces (which may be computationally advantageous). However, it remains an open problem how best to join together these pieces into inference for the joint model.

This PhD project would particularly suit a student interested in computational and methodological statistics, since there is considerable scope for new methodology and algorithms in this area. The PhD will involve working towards developing, implementing and assessing promising approaches. There is the potential to draw upon and extend ideas in the connected literatures that are developing in this area including divide-and-conquer/parallel computation methods for "big data" (such as large n "tall data"); newly-developed approximate methods for estimating the ratio of two densities; pseudo-marginal MCMC; and connections to sequential Monte Carlo. There is also scope to study the application of these methods in substantive application areas, including in network meta analysis.

[1] https://arxiv.org/abs/1607.06779

This studentship is for full time study only

Available for commencement in Michaelmas term 2017

Full award to cover Cambridge University fees and a competitive stipend for a period of 3 years only - see application procedures for eligibility criteria and to apply.

Matching approaches for efficient and robust causal inferences in Mendelian randomization

Supervisor: Stephen Burgess (joining the Unit in January 2017)

More details

Mendelian randomization is the use of genetic variants as proxy measurements for a modifiable exposure to judge whether interventions on the exposure are likely to reduce disease risk. Mendelian randomization has a history of correctly predicting the results of randomized trials of pharmacological interventions, and has wide applicability in a broad range of scientific fields for making the crucial distinction between causation and correlation.

The aim of this project is to investigate the use of traditional approaches for causal inference in Mendelian randomization, in particular: i) matching on covariates, and ii) matching by design. The use of covariates for matching should lead to more efficient and potentially more robust estimators, as covariates are similar within the matched pairs. Such approaches have not been previously adequately considered in the context of Mendelian randomization, but should be feasible in the UK Biobank dataset. Another reason for using matching in this context is to divide the UK Biobank dataset into a discovery cohort (comprising non-diseased individuals) and a validation dataset (comprising diseased individuals and matched controls). This enables the dataset used to choose which genetic variants to include in the analysis to be separate from the dataset in which the causal hypothesis is tested.

Alternatively, paired designs, such as the comparison of sibling pairs, may be worthwhile in Mendelian randomization, particularly for rare genetic variants that naturally cluster within families, and hence are unlikely to satisfy the Mendelian randomization assumptions. While population-based studies have overtaken family-based studies in the current GWAS era, recall-by-genotype experiments (where additional carriers of rare variants are found amongst the relatives of those who carry the variants) will lead to matched analyses becoming increasingly important. Methods will be developed using data from the Swedish Twin Registry.

This studentship is for full time study only

Available for commencement in Michaelmas term 2017

Full award to cover Cambridge University fees and a competitive stipend for a period of 3 years only - see application procedures for eligibility criteria and to apply.

Hybrid probabilistic model integration for -omics data

Supervisor - Lorenz Wernisch (BSU)

More details

A goal of biomedical research is to understand cellular processes underlying regular cell development and factors that can disturb these processes leading to disease. Increasingly comprehensive experimental data sets are available that aim at providing a multi-dimensional view on such processes from different angles such as genetics, genomics, epigenomics, transcriptomics, or metabolomics (for example the Blueprint project, http://www.blueprint-epigenome.eu).

Traditionally analyses of such multi-dimensional data are based on a series of individual analyses for each data level: genetic association studies to identify genetic variants, which are then fed into an analysis of the genommic and epigenomic structure, which in turn are fed into further downstream analysis of gene regulation and protein activities. However, information is potentially lost at each stage of such multi-step analysis since there is often little opportunity for feedback from later stages of the analysis to earlier ones. A probabilistic model comprising all different stages at once, which would allow information to flow freely between components, would therefore be desirable.

Traditional Bayesian approaches to a comprehensive model which are based on the (hierarchical) combination of standard distributions, however, struggle with the size, complexity and heterogeneity of the data. A potential solution exists in the combination of a traditional modelling approach with modelling ideas from Bayesian nonparametrics or machine learning. For example, some components of the model might be best modelled by nonparameteric density estimation obtained via kernel methods or deep neural networks, while other components might be understood well enough to be modelled by traditional probabilistic methods using standard distributions and modelling techniques. Inference for such hybrid models poses an extra challenge since traditional inference methods, such as Monte-Carlo simulation, need to be combined with training methods from machine learning.

This is a multi-disciplinary project which requires a deep interest in Bayesian as well as machine learning methods and the willingness to understand the biological questions and structure of the experimental data driving the modelling. Some familiarity with Bayesian modelling and statistical computing is required.

This studentship is for full time study only

Available for commencement in Michaelmas term 2017

Full award to cover Cambridge University fees and a competitive stipend for a period of 3 years only - see application procedures for eligibility criteria and to apply.

 

Experimental design for inference of gene networks from single cell data

Supervisors - John Reid (BSU) and Steven Hill (BSU)

More details

Gene regulatory networks control almost all cellular functions. The ability to accurately reconstruct these networks would greatly further our understanding of many diseases, genetic conditions and developmental biology. However only limited progress has been made reverse engineering these networks using the data available from modern high-throughput biological experiments.

The space of all possible undirected network structures grows exponentially in the number of genes and thus network inference is underdetermined for networks of any reasonable size. However, the network inference problem is wellposed in abstraction and this makes it an attractive problem to study. This low barrier to entry together with its biological importance means that network inference has been extensively studied over the last two decades. Many inference methods have been developed that work with various types of experimental
designs [1].

Perturbation experiments measure a system’s characteristics in conditions other than its natural state. For example in gene knockdown experiments, one or more genes are artificially silenced. Data from perturbation experiments are some of the most informative for network inference as the effect of a small change to the network can be accurately assessed. However they are expensive and time-consuming to perform and typically biologists can only perform a handful of perturbations. Usually the perturbed genes are chosen by the experimenter in an ad hoc fashion. This project will develop methods for experimental design (that is how to choose which gene(s) to perturb) in order to maximise the value of information from each experiment. Some work exists on experimental design in this context [2–7] but in general this field has not been studied nearly as extensively as the network inference problem.

Recently techniques have been developed to assay gene expression levels in individual cells. Previously genome-wide expression levels could only be measured as averages across populations of thousands of cells. The newly available single cell data allow us to inspect the correlations and relationships between genes in fine detail. In particular the between-cell variation in a population of cells can be characterised. This project will focus on experimental design for single cell experiments.

Most network inference techniques provide point estimates of the network structure. This is a reasonable strategy given the difficulty of exploring the entire space of networks. However to reliably gauge the likely amount of information gained from any particular experimental perturbation, methods to estimate correlations and uncertainty in the posterior will need to be developed.

Given these correlations and uncertainties, methods to choose which genes to perturb will be explored. It is anticipated that the methods developed for the will be Bayesian methods as they naturally quantify uncertainty.

This project will start by focusing on the theoretical aspects of experimental design for single cell data. However it is expected that the methods developed will be of interest to many potential collaborators and the methods will be used to choose experimental perturbations of the biological systems that the collaborators study.

References

[1] Riet De Smet and Kathleen Marchal. “Advantages and Limitations of Current
Network Inference Methods”. In: Nature Reviews Microbiology 8.10
(Oct. 2010), pp. 717–729.
[2] C. David Page Jr and Irene M. Ong. “Experimental Design of Time Series
Data for Learning from Dynamic Bayesian Networks.” In: Pacific Symposium
on Biocomputing. Vol. 11. 2006, pp. 267–278.
[3] Jesper Tegnér and Johan Björkegren. “Perturbations to Uncover Gene Networks”.
In: Trends in Genetics 23.1 (Jan. 1, 2007), pp. 34–41.
[4] Florian Steinke, Matthias Seeger, and Koji Tsuda. “Experimental Design
for Efficient Identification of Gene Regulatory Networks Using Sparse Bayesian
Models”. In: BMC Systems Biology 1.1 (2007), p. 51.
[5] Johannes D. Stigter and Jaap Molenaar. “Network Inference via Adaptive
Optimal Design”. In: BMC Research Notes 5 (2012), p. 518.
[6] Alberto Giovanni Busetto et al. “Near-Optimal Experimental Design for
Model Selection in Systems Biology”. In: Bioinformatics 29.20 (Oct. 15,
2013), pp. 2625–2632.
[7] S. M. Minhaz Ud-Dean and Rudiyanto Gunawan. “Optimal Design of Gene
Knockout Experiments for Gene Regulatory Network Inference”. In: Bioinformatics
32.6 (Mar. 15, 2016), pp. 875–883.

This studentship is for full time study only

Available for commencement in Michaelmas term 2017

Full award to cover Cambridge University fees and a competitive stipend for a period of 3 years only - see application procedures for eligibility criteria and to apply.

 

Model-free network inference from single cell data

Supervisor - John Reid (BSU)

More details

Gene regulatory networks control almost all cellular functions. The ability to accurately reconstruct these networks would greatly further our understanding of many diseases, genetic conditions and developmental biology. However, only limited progress has been made reverse engineering these networks using data available from modern high-throughput biological experiments [1].

Recently techniques have been developed to assay gene expression levels in individual cells. Previously genome-wide expression levels could only be measured as averages across populations of thousands of cells. The newly available single cell data allow us to inspect the correlations and relationships between genes in fine detail. In particular the between-cell variation in a population of cells can be characterised. It is anticipated that single cell data will greatly aid the reconstruction of gene regulatory networks. To date only a few inference methods have been developed specifically for single cell data [2, 3].

Classical network inference is posed as an network edge prediction task given a gene-by-sample data matrix of gene expression levels. In this formulation when the true network is known the predictions can be validated using precision and recall or other similar statistics [4]. This project will take an alternate approach and focus on model-free approaches to modelling such data. By model-free we mean we will use methods that do not explicitly represent the structure and parameters of the network.

Model-free approaches are the state-of-the-art for modelling certain physical systems [5]. They are able to accurately learn the dynamics of complicated systems with no prior knowledge of the physical relationships between the variables [6]. This project will investigate how to translate their success learning the dynamics of physical systems to the problem of learning the dynamics of gene expression. One model-free approach for regulatory network inference could be to learn the dynamics of the system using a deep neural network [7] or a Bayesian nonparametric model such as a Gaussian process dynamical model [8]. In this approach single cell data from a time series experiment would be placed along a pseudotime dimension [9]. The dynamics of gene expression relative to this pseudotime would be learnt by the model.

Perturbation experiments measure a system’s characteristics in conditions other than its natural state. For example in gene knockout experiments, one or more genes are artificially silenced. Data from perturbation experiments are some of the most informative for network inference as the effect of a small change to the network can be accurately assessed. We will be interested in developing model-free methods that can recapitulate the behaviour of a system under perturbations. Only in this case will we be able to interrogate the model and confidently infer which regulatory relationships are present.

It is anticipated that this project will be largely theoretical but the goal will be to develop methods that further our understanding of the processes that generate single cell gene expression data. We have many collaborators with such data and we wish to develop practical methods that can help them understand the underlying biology.

References

[1] Riet De Smet and Kathleen Marchal. “Advantages and Limitations of Current
Network Inference Methods”. In: Nature Reviews Microbiology 8.10
(Oct. 2010), pp. 717–729.
[2] Andrea Ocone et al. “Reconstructing Gene Regulatory Dynamics from
High-Dimensional Single-Cell Snapshot Data”. In: Bioinformatics 31.12
(June 15, 2015), pp. i89–i96.
[3] Pablo Cordero and Joshua M. Stuart. “Tracing Co-Regulatory Network
Dynamics in Noisy, Single-Cell Transcriptome Trajectories”. In: bioRxiv
(Oct. 4, 2016), p. 070151.
[4] D. Marbach et al. “Revealing Strengths and Weaknesses of Methods for
Gene Network Inference”. In: Proceedings of the National Academy of Sciences
107.14 (Mar. 22, 2010), pp. 6286–6291.
[5] Marc Peter Deisenroth, Dieter Fox, and Carl Edward Rasmussen. “Gaussian
Processes for Data-Efficient Learning in Robotics and Control”. In:
IEEE Transactions on Pattern Analysis and Machine Intelligence 37.2 (Feb.
2015), pp. 408–423.
[6] PilcoLearner. PilcoLearner. url: https : / / www . youtube . com / user /
PilcoLearner.
[7] Y. Bengio. “Learning Deep Architectures for AI”. In: Foundations and
Trends® in Machine Learning 2.1 (2009), pp. 1–127.
[8] Raquel Urtasun, David J. Fleet, and Pascal Fua. “3D People Tracking
with Gaussian Process Dynamical Models”. In: 2006 IEEE Computer Society
Conference on Computer Vision and Pattern Recognition (CVPR’06).
Vol. 1. IEEE, 2006, pp. 238–245.
[9] John E. Reid and Lorenz Wernisch. “Pseudotime Estimation: Deconfounding
Single Cell Time Series”. In: Bioinformatics (June 17, 2016), btw372.

This studentship is for full time study only

Available for commencement in Michaelmas term 2017

Full award to cover Cambridge University fees and a competitive stipend for a period of 3 years only - see application procedures for eligibility criteria and to apply.