The BSU is an internationally recognised research unit specialising in statistical modelling with application to medical, biological or public health sciences. Details of the work carried out in the Unit appear on our Research page. The Unit members below are interested in taking on students in Michaelmas (October) 2024 for their projects below.
You can find a list of projects that may offer extra funding opportunities at the links below:
Applicants are also encouraged to contact potential supervisors to discuss mutual areas of interest for PhD Research. Contact details and details of potential areas for PhD research are given on each individual page. Please see our how to apply page to see what you should include in your initial enquiry.
In order to be considered for a BSU Studentship, formal applications need to be submitted for the University of Cambridge by 23:59 on January 4th 2024.
I am open to developing a project proposal around any of my research interests in Bayesian evidence synthesis for infectious disease burden estimation (see here), and will be involved with the project proposed by Christopher Jackson (see below), but am also proposing two specific projects:
Optimising design of infectious disease surveillance
In the field of infectious diseases, methods for optimal design of experiments have a long history, for example, in cost-effectiveness studies assessing interventions such as vaccines and treatments. However, such methods have not yet been widely applied to disease surveillance systems, rather than interventions. Where they have been applied, usually a single surveillance objective at a time is considered, such as optimising the precision of incidence estimates; or determining the value of extending existing locations for surveillance for detecting an outbreak. However, there are usually multiple, potentially competing, objectives for surveillance (e.g. outbreak detection, incidence and severity estimation, vaccine effectiveness estimation), that might have different priorities at different stages of an epidemic, and that must be traded off against multiple cost and logistical constraints. Drawing on ideas from a recently proposed framework for optimising infectious disease surveillance under these multiple objectives and constraints (Cheng et al, 2020) and from adaptive design of clinical trials (Villar et al, 2015; Robertson et al, 2023), this PhD will explore and develop methods for dynamically optimising multiple surveillance objectives at different stages of an outbreak. This work is motivated by collaboration with UKHSA and WHO on integrated respiratory disease surveillance and monitoring progress to the target of elimination of HIV as a public health problem in the UK. As well as collaborating with these organisations, the student will interact with key colleagues in the Efficient Study Design (Sofia Villar) and Population Health (Daniela De Angelis, Christopher Jackson) themes at MRC BSU.
Efficient conflict assessment in Bayesian evidence synthesis models to estimate infectious disease burden
Estimating latent characteristics of infectious disease burden, such as incidence, prevalence and severity, requires the integration of multiple, disparate, data sources, in a single, often Bayesian, joint model. Evidence synthesis can result in greater precision than inferences from single datasets, if all included data sources provide consistent evidence on the parameters of interest. Often, however, unaccounted biases in included observational data can lead to conflicting evidence. Cross-validatory posterior-predictive checks to detect and quantify such conflict have been proposed (Presanis et al, 2013; 2017), but can be computationally challenging for complex models. This project will explore use of approximate methods such as INLA (Ferkingstad et al, 2017) and Gaussian mixture approximations (Chakraborty et al, 2022), as well as alternative conflict measures (Nott et al, 2020) and Reverse Bayes ideas (Held et al, 2022), to develop a framework for computationally efficient and systematic conflict quantification. The student will implement the methods in easy to use software for dissemination, and will apply the framework to examples including estimating the prevalence of undiagnosed HIV (Presanis et al, 2021); and estimating the severity of respiratory infection (Presanis et al, 2014).
Chris Jackson & Anne Presanis
Building flexible and practicable survival and multistate models
Survival and multistate models describe times to events of different kinds, for example, times to death, hospital admission or recovery after an infection. Traditionally they rely on restrictive assumptions, such as the shape of the time-to-event distribution, proportional hazards, and additivity or linearity of predictors. Modern statistical and machine learning methods allow models to be built which adapt flexibly to data. However, such methods are less well developed in survival and multistate analysis, due to the challenges posed by data with limited follow-up or intermittent observation.
This project will develop flexible survival and multistate models, with a particular focus on
- Bayesian methods, which are more able to express uncertainty and combine different kinds of information
- development of easily-usable software, e.g. through building on the widely-used R packages flexsurv and msm.
The student will join a research group which made substantial contributions to the COVID-19 pandemic response, in collaboration with the UK Health Security Agency (UKHSA), through work on epidemic modelling and severity estimation, much of which involved survival and multistate modelling. For this project, they will have access to a range of data sources on respiratory infection (e.g. influenza, SARS-CoV-2) through the UKHSA collaboration, as well as to the ONS Coronavirus Infection Survey. These datasets will be used to estimate incidence, prevalence and consequences of infection, which will motivate the development of the methods.
David Robertson, Sofia Villar & Ayon Mukherjee
Master protocols for the efficient development of oncology drugs (with industry placement)
David Robertson, Sofía Villar (MRC Biostatistics Unit, University of Cambridge) and Ayon Mukherjee (IQVIA)
Master protocols expedite drug development by testing multiple drugs and/or multiple cancer subpopulations in parallel under a single protocol, without the need to develop new protocols for each of the parallel sub-studies. Recently, there has been an upsurge in the development and use of such trial designs in oncology, with interest from industry, academia, and regulatory bodies. By using a single trial infrastructure and overarching trial protocol, master protocols improve efficiency, establish uniformity and lead to accelerated drug development. However, due to the complexity of master protocols, it is vital that these trials are well designed and appropriately analysed. There is much scope from a methodological perspective to continue to improve and develop master protocols for use in a wide range of oncology settings.
This PhD project is part of a joint academic-industry collaboration between the MRC Biostatistics Unit (BSU) and IQVIA, and offers an exciting opportunity to work on a variety of methodological challenges for master protocols using real-world data from ongoing oncology trials run by IQVIA. As part of the PhD, there will be a 6-month internship planned with IQVIA where students would be involved in handling clinical trial projects. This would give the student hands-on experience of clinical trial processes and the implementation of novel methodology as a trial biostatistician, including the areas of application for the methodology developed in the PhD. A key part of the project will also be the development of packages in R as a part of a trial designer software in collaboration with IQVIA, to help enable the proposed methodology to be more widely used in practice.
Depending on the interests of the student, specific areas of focus for the PhD include the following:
- Adaptive Dose-Escalation and Expansion Designs for Basket trials
A basket trial is a type of master protocol that evaluates a single targeted treatment for patients sharing a single biomarker or genomic feature, but across multiple diseases or disease subtypes (e.g. tumours in different organs of the body). A key advantage of basket trials is that they allow the evaluation of targeted therapies for genetic mutations that would be too rare to study within a tumour-specific context. As highlighted in recent FDA guidance, individual drug sub-studies under a master protocol might incorporate an initial dose-finding phase. This is particularly the case when evaluating an investigational drug combination, with the dose-finding stage identifying safe doses of the combination before proceeding with an activity-estimating stage. One potential area of focus of the PhD would be to develop a design and analysis framework for basket trials with an initial dose-finding stage, which consider the latest methodological developments in early phase dose-finding studies. Dose escalation and expansion studies are frequently used in clinical trials and regulatory agencies have published guidance on both these stages of the exploratory clinical trials. The focus of the research would be to develop adaptive methods for dose finding basket studies for estimating the true target dose that can be further expanded to find a signal for a particular indication.
- Treatment selection in platform trials
The term platform trial is used to describe a master protocol that allows the flexibility of adding new treatment arms to the trial over time. Treatment arms enter the trial as they become available, are evaluated, and then are ‘dropped’ for futility or ‘graduated’ from the trial once demonstrated as being efficacious. One potential area of focus of the PhD would be to develop novel platform designs that use covariate-adjusted response-adaptive randomisation, where the probability of patients being randomised to the different treatment arms varies over time based on the accumulated response data as well as their individual covariate information. This would then be combined with Bayesian decision making to compare experimental treatments to the control arm and select the best subset of treatment arms for a given indication. The focus of this research would be aligned to regulatory guidance on adaptive design for clinical trials and master protocols so that the methodology developed can be implemented in real clinical trials.
Details of supervisors
David Robertson is a Senior Research Associate at the MRC BSU, where he has been based since 2013. His research focuses on the development of novel methodology for the design and analysis of adaptive clinical trials. David held a Biometrika Trust Research Fellowship from 2018 – 2021. His main areas of research focus include 1) estimation after adaptive designs, 2) response-adaptive randomisation in clinical trials and 3) multiple hypothesis testing.
A testimonial from a previous internship student supervised by David Robertson can be found here.
Sofía Villar is a Programme Leader at the MRC BSU, and her research aims to improve clinical trial design through the development of innovative methods that lie in the intersection between optimisation, machine learning and statistics. Sofia’s main areas of research focus include 1) developing computationally feasible innovative trial designs, 2) improving analysis methods of optimal, patient-centric adaptive trials, 3) designing innovative trial designs in response to emerging challenges and 4) promoting uptake and application of these novel designs in practice.
Ayon Mukherjee [see attached document with full profile] is the Director and the Head of Novel Trial Design Methodology group at IQVIA, which focuses on the implementation and development of complex innovative clinical trial designs in real-world clinical studies. As the Head of the Novel Trial Design team, he provides strategic input towards development of clinical trial designs, specifically adaptive designs and master protocols. He has over 12 years of experience working with various pharmaceutical companies, including GlaxoSmithKline and Novartis.
Dominique Couturier & Thomas Jaki
Adaptive Designs for the evaluation of updates to algorithms and devices in digital healthcare
(Dr Dominique-Laurent Couturier & Professor Thomas Jaki)
Background to the project: Digital healthcare is a very dynamic field aiming to enhance patient health, from diagnostic to treatment, through the collection and analysis of increasingly larger digital health records. Examples include the use of deep learning (DL) algorithms to improve disease detection and diagnosis based on medical images such as MRIs and CT scans [1,2], the use of machine learning (ML) algorithms to inform patient treatment based on clinical and genomic profiles , and the use of wearable devices to monitor patient health [4,5].These algorithms and devices are regularly updated to follow the latest developments in DL/ML theory , in medical knowledge, with, for example, the availability of new or different health data , as well as the latest technological improvements, possibly leading to more affordable and improved health devices .
What the studentship will encompass: While guidance on how to use adaptive designs to evaluate medical devices exist , no such recommendation is available when considering updates of such algorithms and devices. This project aims to fill this gap by focusing on developing novel efficient adaptive designs for the evaluation of updates to algorithms and devices in digital healthcare when a re-evaluation is deemed necessary. The project will focus on two scenarios:
- Update of healthcare devices:
Nowadays wearable health devises typically provide a multitude of longitudinal outcomes, like, for example, heart and respiratory rates, physical activity, sleep patterns, body temperature and glucose levels. A device producer may be interested in verifying that such measures are the same following an update related to the use of an improved technology and/or a change in production. Existing methods in (adaptive) equivalence testing are typically design dependent and consider a small number of outcomes [9,10]. A first aim of this project is therefore to develop more efficient and general adaptive methods in equivalence testing in the context of large number of dependent outcomes.
- Update of healthcare algorithms:
Improved algorithms are typically expected to be developed regularly and to do as well or better than their previous versions. A second aim of this project is therefore to develop adaptive methods in non-inferiority testing able to exclude that such changes led to a decrease in quality like a decrease in specificity and sensitivity for diagnosis classifier algorithms, for example, through time. A particular emphasis will therefore focus on methods allowing to preserve power when hypotheses are tested sequentially, as data becomes available, without knowledge of future algorithm improvements .
Detail of supervision: The project will be undertaken under the supervision Dr Dominique-Laurent Couturier and Professor Thomas Jaki at the MRC Biostatistics Unit (BSU), University of Cambridge.
Collaboration: This project will be conducted in close collaboration with an industry partner, Pfizer US, with regular interactions to enhance the project’s inputs and outcomes.
Dominique Couturier & Sofia Villar
Adaptive methods for a complex endpoint: number of days alive at home
(Dr Dominique-Laurent Couturier & Dr Sofia Villar)
Background to the project: `Days alive and at home’ (DAH), defined as the total days spent at home during a follow-up period of interest, is a recent outcome measure focused on the patient post-surgery experience [1,2,3,4]. DAH exhibits a complex distribution: it is zero-inflated with two sources of zeroes respectively related to death and never discharged censored patients, left skewed and bi-modal due to potential patient re-admission(s). Its distribution may further be influenced by hospital-specific post-surgical protocols. Different modelling strategies – like quantile , log-normal , Poisson , negative-binomial  and beta  regressions, sometimes with zero-inflated variants – and inference methods – like Mann-Whitney-Wilcoxon, Wald, Student and Kruskal-Wallis tests – have therefore been used to analyse this outcome.
What the studentship will encompass: In order to address the design and analysis challenges presented by this outcome in the context of clinical trials, the aim of this project is three-fold:
- A/ Choice of Estimand: Given the complex distribution of DAH, several quantities allowing to measure a treatment effect of interest – like an overall difference in means or medians between groups or treatment-related regression coefficient – are possible. The first aim of this project will therefore be to rigorously define and select the most suitable estimand for this outcome when used in clinical trials.
- B/ Improved modelling: The second aim of this project will be to construct a parametric model for DAH that is suitable given its characteristics as a composite endpoint and to develop an unbiased estimator for its parameter vector. Such a model will combine different parts: a first one related to the probability of death, a second one related to the length of the post-surgery stay at hospital and a third one related to potential re-admission times, the latter two requiring adjustment to model clump at specific values related to the probabilities of following protocol times during the initial stay and to be re-admitted once discharge. A challenge will come from the bounded nature of DAH, defined over a time-interval of interest (like a three month period, for example), leading to right-censoring of the initial and readmission times. Therefore, an estimator accounting for zero-adjusted and censored data will be needed.
- C/ Adaptive method: Efficient study designs allow adaptations of the trial at pre-defined interim time points. Such adaptations include sample size re-estimation and the possibility of early stopping due to either futility or efficacy. A final aim of the project is to define how to perform a sample size re-estimation for the estimand of interest [aim A] when the participant group allocation in unknown (blinded analysis), leading to bias nuisance parameter estimates when fitting the model developed in [aim B] and how to design futility of efficacy stopping rules in unblinded analyses for DAH.
Detail of supervision: The project will be undertaken under the supervision Dr Dominique-Laurent Couturier and Program Leader Sofia Villar at the MRC Biostatistics Unit (BSU), University of Cambridge, where the student will be based.
Collaborations: This project will be conducted in close collaboration with the Papworth Trial Unit at Cambridge that will make data of the on-going NOTACS (Nasal High-Flow Oxygen Therapy After Cardiac Surgery) clinical trial  at the Papworth Trial Unit (project partner), considering DAH as primary outcome, available to inform the different project aims. You will also be linked to a wider external group including Dr Vadiveloo (University of Aberdeen) and other trial statisticians working on surgical trials. This will provide a collaborative and stimulating environment to start to develop into an independent researcher.
 Wasywich, C.A et al (2010), https://doi.org/10.1093/eurjhf/hfq027
 Ariti, C.A. et al (2011), https://doi.org/10.1016/j.ahj.2011.08.003
 Myles, P.S. et al (2017), https://doi.org/10.1136/bmjopen-2017-015828
 Fanaroff, A.C. et al (2018), https://doi.org/10.1161/circoutcomes.118.004755
 Chung, M. et al (2023), https://doi.org/10.1016/j.ahj.2022.10.080
 Myles, P.S. et al (2018), https://doi.org/10.1016/j.ahj.2018.06.008
 Suikkanen, S.A. et al (2021), https://doi.org/10.1016/j.jamda.2020.06.005
 Van Houtven, C.H. et al (2019), https://doi.org/10.1007%2Fs11606-019-05209-x
 Wu, A. et al (2022), https://doi.org/10.1111/anae.15742
 ICH E9 (2019), https://database.ich.org/sites/default/files/E9-R1_Step4_Guideline_2019_1203.pdf
Jessica Barrett is a programme leader at the Biostatistics Unit. Her research focuses on prediction of health outcomes when a disease is dynamically evolving over time, particularly using routinely collected health data. Because we do not live in a static world, it is important to robustly account for the complex and dynamic nature of disease processes in order to accurately and precisely predict disease risk. Routinely collected data such as primary care or hospital data can be a valuable resource for prediction modelling, but also presents methodological challenges such as dealing with largescale data and handling informative presence (e.g. when unhealthier individuals are more likely to be recorded). More generally her research also encompasses multi-outcome modelling for multiple correlated outcomes using, for example, shared parameter models. Previous PhD projects have included dynamic prediction of cardiovascular risk using primary care records from New Zealand, and exploring what patterns of serial observations in routinely collected data can tell us about disease risk (co-supervised with Brian Tom).
The below project proposal is co-supervised with Dr Eleanor Winpenny from the MRC Epidemiology Unit.
Modelling lifecourse trajectories of health-related behaviours
It is well-known that lifestyle factors can impact health, and that health and unhealthy behaviours tend to cluster together within individuals, but more research is needed to understand patterns and interplay of health-related behaviours such as diet quality, sleep, screen use and activity through the lifecourse. This project will use data from the Western Australian Pregnancy Cohort (Raine) Study which has followed around 2,000 Australian participants through childhood, adolescence and adulthood, collecting information about lifestyle behaviours and health-related outcomes.
Questions which could be addressed in this PhD project include:
(1) What is the relationship between diet quality and sleep through the lifecourse? This would extend Dr Winpenny’s research on the short-term impact of sleep on diet.
(2) Are there clusters of people with similar trajectories of diet quality, sleep, screen use and activity?
(3) Are there time-points during the lifecourse when changes in health-related behaviours tend to occur? Do these changes differ between different groups of individuals? What can these tell us about the influence of the environment on health behaviours and health outcomes?
These methods will be addressed by applying and extending statistical methods for modelling longitudinal trajectories such as multivariate mixed effects models, latent class mixed models and change-point models.
I am interested in supervising projects developing and applying statistical methods to high dimensional data, to understand the causal mechanisms underlying human disease, and to inform on the development novel therapies. I’m particularly interested in the statistical problems arising in the integration of multiple data types, for instance population scale genetic/genomic data, phenotypic data from electronic health records and other types of ‘omics, together with lab scale data eg from gene editing perturbations of cellular models.
Examples of the sorts of questions we might try to address are:
How best should we use human genetics in drug discovery? We think drug targets that are “genetically supported” by a particular definition are more likely to succeed, but there are many ways we could define “genetically supported”—which is best? How can we integrate other data, eg combining genetics with measurements of protein levels or gene expression, and how can we “borrow” information across genes with similar functions?
Can we build statistical models integrating the data types above to predict what will happen if we modulate a given human protein? In particular, will we get a therapeutic effect that would suggest a drug discovery programme is justified? Even poorly predictive models could be an important contribution to drug discovery, given the failure rate of such programmes is >95%.
Projects would typically be in collaboration with biologists, epidemiologists and clinicians at Cambridge or elsewhere, in particularly at CRUK Cambridge, the Wellcome Sanger Institute and Harvard Medical School.
In addition we have a PhD project to work on large high dimensional data generated during the full blood count measurement from in-depth characterised individuals with rare inherited disorders and from genetically modified mice (Professor Nadia Rosenthal, The Jackson Laboratory, USA; https://www.jax.org/). The main purpose being to identify specific blood cell signatures for rare inherited diseases using machine learning and AI approaches, and to explore the value of these signatures for diagnosis of rare human disease and to generate understanding of the causal mechanisms underlying rare and common human disease. This project would be joint work with colleagues in the Department of Haematology and DAMTP.
I have over 16 years of experience in research for analysing complex observational data and collaborating on projects related to HIV, rheumatology, cardiovascular disease, human nutrition, and most recently, COVID-19. You can find more information about my work on my Google Scholar profile.
My current research focuses on developing methods for causal inference using data from clinical trials, observational studies, and electronic health record databases. I’m particularly interested in improving the inverse probability weighting (IPW) methods and developing sensitivity analysis strategies for unverifiable assumptions in IPW applications. I’m currently supervising Juliette Limozin for her PhD project on statistical methods to improve target trial emulation for causal inference with survival data.
If you’re interested in pursuing a PhD in the areas of causal inference, selection bias, and analysis of complex data from trials and observational studies, feel free to contact me.
Identification of latent structures in breast cancer tumours with functional data analysis
This project aims to identify subgroups of breast cancer patients with a high risk of relapse. Functional data analysis allows to represent spatial/temporal dependence using smooth functions. We will derive representations of different types of genomic and dose-response data using these methods that capture important biological features. There are several challenges in modelling these datasets, such as enforcing monotonicity constraints, registering the curves to account for differences in scale of the x-axis, etc. The main goal will be to cluster and classify curves and build regression models.
I am also happy to discuss other possible projects related to breast cancer patients’ prognosis and monitoring.
Potential PhD Projects
- Seamless phase I/II modular dose-finding designs
It is now common to study combination of treatments to achieve a better efficacy or better tolerability. An emerging setting is to conduct a trial of an experimental drug alone, then in combination, and then to proceed into expansions. Such trials are referred to as modular. A naive (but common) approach is to design each study independently. This can be highly inefficient. The objective is to develop adaptive designs for early-phase modular trials that allow borrowing of information across modules. Basket and platform design ideas will be explored to borrow information and to tackle unplanned changes.
- Design and analysis of trial with treatment schedules
In infectious diseases such as Tuberculosis (TB) and Hepatitis B (HBV), the treatment duration with current standard regimes is lengthy which results in a large burden on the patients. Novel treatments or combinations of treatments in these areas offer the opportunity for both higher efficacy and shorter treatment periods. While the standard methods for considering various treatment durations can be applied, there will be suboptimal due to not taking into account the monotonicity assumption – the longer duration will have higher response rate. The objective of the project is to develop novel adaptive designs for trial involving treatment schedules that will exploiting the natures of the various schedules to gain efficiency in the decision-making.
- Response-adaptive design based on the weighted information measures
A class of Bayesian designs based on a novel concept of weighted information measures has been proposed recently. Such designs allow to take into account the desirability of outcomes together with the uncertainty around them (while standard information measures account for the latter one only). This results in a more ethically viable approach assigning more patients to better performing arms while not compromising the integrity of a trial. This class of designs was originally developed for multinomial endpoint. The objective of the project is to work on the generalisation of the information-theoretic concept to continuous outcomes using various type of information (Shannon, Fisher, Tsallis), its estimation, and on a randomised setting with the weighted information measure accounting for comparisons to the common control.
Potential PhD Topics
I am open to developing PhD projects within the broad areas of either (1) generic Bayesian methodology for biostatistical applications; or (2) analysis of minute-by-minute acute hospital electronic health records (EHR).
– Development of general Bayesian methodology: I am particularly interested in developing Bayesian methods for making data integration from multiple data streams (such as EHR) more precise/accurate, computationally efficient and practically easier for researchers. I am also interested in developing generic (computational) methodology for making specification of sensible prior distributions much easier and more practical for researchers to use routinely in their research.
– Analysis of minute-by-minute acute hospital EHR: I have several on-going and planned collaborations with clinicians (in Acute Medicine; Intensive Care Unit; Department of Medicine for the Elderly; Infectious Diseases; Paediatrics) and other scientists at Cambridge University Hospitals, the main teaching hospital in Cambridge. Our research uses the routinely-collected data recorded in the electronic health record system used at the hospital (Epic e-Hospital), which provides highly-detailed (de-identified) data on patients throughout their care in hospitals (including medication prescription and administration, treatments, surgical procedures, diagnoses, clinical observations, vital signs). There are many statistical challenges in appropriately analysing this important, emerging class of data. Potential topics for a PhD thesis would particularly relate to prediction, clustering and/or causal inference.
Robert Goudie is a group leader at the MRC Biostatistics Unit. I have previously co-/sole-supervised three PhD students to completion at the Unit. Their completed theses are available online, which have lead to publications in Bayesian Analysis (1, 2), Statistics and Computing (1, 2) and Biometrika (1), with further in the pipeline (1, 2, 3, 4). I supervise via one-to-one supervisions, usually weekly, but with flexibility to make these more or less frequent as needed for the stage each student is at. I aim to help students develop their research skills through their PhD studies so that by completion they are confident in identifying, refining and critically appraising their own research ideas and work.
Since 2016, I have also been one (of two) academic members of the Graduate Education Committee at the Unit, which runs the PhD Programme and supports our PhD students, giving me considerable further experience of advising and supporting PhD students throughout their PhD studies.
I am very happy to have an informal chat with anyone who is potentially interested – just email me to arrange!