skip to content

MRC Biostatistics Unit

We actively support equality, diversity and inclusion and encourage applications from all sections of society.  We are committed to widening participation in postgraduate education.


The BSU is an internationally recognised research unit specialising in statistical modelling with application to medical, biological or public health sciences. Details of the work carried out in the Unit appear on our Research page. The Unit members below are interested in taking on students in Michaelmas (October) 2025 for their projects below.

You can find a list of projects that may offer extra funding opportunities here: Network Hubs :: TMRP Doctoral Training Partnership (mrc.ac.uk)

Applicants are also encouraged to contact potential supervisors to discuss mutual areas of interest for PhD Research. Contact details and details of potential areas for PhD research are given on each individual page. Please see our how to apply page to see what you should include in your initial enquiry.

In order to be considered for a BSU Studentship as a Home student, formal applications need to be submitted for the University of Cambridge by 23:59 on March 27th 2025.

If you would like further information please contact phdstudy@mrc-bsu.cam.ac.uk

 

Li Su

Contact

Li Su (Efficient Study Design)

li.su@mrc-bsu.cam.ac.uk

I have over 17 years of experience in research for analysing complex observational data and collaborating on projects related to HIV, rheumatology, cardiovascular disease, human nutrition, and most recently, COVID-19. You can find more information about my work on my Google Scholar profile.

My current research focuses on developing methods for causal inference using data from clinical trials, observational studies, and electronic health record (EHR) databases. I am particularly interested in 1) improving the inverse probability weighting (IPW) methods and developing sensitivity analysis strategies for unverifiable assumptions in IPW applications; 2) handling missing data in EHR databases for causal inference problems; and 3) the design and analysis of clinical trials augmented with observational data. I'm currently supervising Juliette Limozin for her PhD project on statistical methods to improve target trial emulation for causal inference with survival data.

If you're interested in pursuing a PhD in causal inference, missing data or related problems, feel free to contact me.

Integrating the Design and Analysis of Randomised Controlled Trials and Observational Cohort Studies for Alzheimer’s Dementia Research

Integrating the Design and Analysis of Randomised Controlled Trials and Observational Cohort Studies for Alzheimer’s Dementia Research

By 2050, it is expected that around 139 million people will have Alzheimer’s dementia (AD) worldwide. However, new treatments often fail in large-scale clinical trials. A compelling explanation of this failure is that the new drugs are genuinely effective but have been evaluated too late in the disease course to have clinically meaningful impact. Therefore, it is urgent to understand the disease mechanisms and evaluate new treatments at the early stages of AD.

The European Prevention of Alzheimer’s Dementia (EPAD) consortium recruited participants into a longitudinal cohort study (from existing population cohorts and clinical settings) to explore new ways of understanding the early stages of AD [1]. EPAD aimed to build a readiness cohort for a proof-of-concept clinical trial (i.e. platform trial with multiple appendices representing various interventions to evaluate) and to generate a rich longitudinal dataset for disease modelling that could inform selection into the trial and adaptations. Moreover, the EPAD cohort was open to continual recruitment, which was informed by potential treatments and types of participants to be included in the forthcoming trial of an added intervention. In particular, participants recruited into the trial under a specific appendix and randomised to placebo would, after a suitable washout period following the trial, ‘re-enter’ the cohort to be eligible for future trials.

In settings where randomised controlled trials (RCTs) are nested under an observational cohort like EPAD, it is appealing to integrate information from both the RCTs and cohort for treatment evaluation. For example, incorporating untreated cohort participants as external controls could allow more efficient and precise estimation of treatment effects [2] and enable more RCT participants to potentially benefit from treatments, given resource constraints for conducting RCTs. On the other hand, estimated treatment effects from the RCTs can be generalised to a population potentially eligible for the treatments but underrepresented in the RCTs or cohort [2]. Such generalisation can inform future RCT design, e.g., by targeting patient groups most likely to benefit from the treatments during the RCT and cohort recruitment in nested design settings. By integrating the design and analysis of RCTs and observational cohort studies in a continuum, we can provide accrued and updated evidence for clinical practice.

Motivated by the EPAD, we aim to address the challenges in integrating the design and analysis of RCTs and observational cohort studies for treatment evaluation in nested design settings.

  • Following the target trial emulation framework [3], we will investigate how to incorporate external controls from the observational cohort to improve treatment effect estimation in a nested RCT.
  • We will exploit rich longitudinal data collected before the RCT baseline to explore treatment heterogeneity, which is a prerequisite before generalising the RCT results to target populations.
  • Based on estimated heterogeneous treatment effects, we will generalise RCT results to target populations most likely to benefit from treatments and propose enrichment strategies for RCT and cohort recruitment. 

The project will be undertaken under the supervision of Dr Li Su (li.su@mrc-bsu.cam.ac.uk, Efficient Study Design Theme) and Program Leader Dr Brian Tom (brian.tom@mrc-bsu.cam.ac.uk, Precision Medicine Theme) at the MRC Biostatistics Unit (BSU).

This project will give the students hands-on experience developing novel methodologies motivated by real clinical settings. The student will be offered training in methods for causal inference, longitudinal data analysis and design of randomised controlled trials. Additionally, further needs for the training will be identified during the PhD study and will include: (i) statistical programming and (ii) written and oral communication skills.

[1] Saunders et al. (2022) https://doi.org/10.3389/fneur.2022.1051543
[2] Colnet et al (2024), https://doi.org/10.1214/23-STS889
[3] Hernán and Robins (2016),  https://doi.org/10.1093/aje/kwv254

 

Oscar Rueda

Contact

Oscar Rueda (Causal Mechanisms)

oscar.rueda@mrc-bsu.cam.ac.uk

Identification of latent structures in breast cancer tumours with functional data analysis

This project aims to identify subgroups of breast cancer patients with a high risk of relapse. Functional data analysis allows to represent spatial/temporal dependence using smooth functions. We will derive representations of different types of genomic and dose-response data using these methods that capture important biological features. There are several challenges in modelling these datasets, such as enforcing monotonicity constraints, registering the curves to account for differences in scale of the x-axis, etc. The main goal will be to cluster and classify curves and build regression models.

I am also happy to discuss other possible projects related to breast cancer patients’ prognosis and monitoring.

Pavel Mozgunov

Contact

Pavel Mozgunov (Efficient Study Design Theme)

pavel.mozgunov@mrc-bsu.cam.ac.uk

Potential PhD Projects

  • Seamless phase I/II modular dose-finding designs 

It is now common to study combination of treatments to achieve a better efficacy or better tolerability. An emerging setting is to conduct a trial of an experimental drug alone, then in combination, and then to proceed into expansions. Such trials are referred to as modular. A naive (but common) approach is to design each study independently. This can be highly inefficient. The objective is to develop adaptive designs for early-phase modular trials that allow borrowing of information across modules. Basket and platform design ideas will be explored to borrow information and to tackle unplanned changes.

  • Design and analysis of trial with treatment schedules 

In infectious diseases such as Tuberculosis (TB) and Hepatitis B (HBV), the treatment duration with current standard regimes is lengthy which results in a large burden on the patients. Novel treatments or combinations of treatments in these areas offer the opportunity for both higher efficacy and shorter treatment periods. While the standard methods for considering various treatment durations can be applied, there will be suboptimal due to not taking into account the monotonicity assumption  – the longer duration will have higher response rate. The objective of the project is to develop novel adaptive designs for trial involving treatment schedules that will exploiting the natures of the various schedules to gain efficiency in the decision-making.

  • Response-adaptive design based on the weighted information measures 

A class of Bayesian designs based on a novel concept of weighted information measures has been proposed recently. Such designs allow to take into account the desirability of outcomes together with the uncertainty around them (while standard information measures account for the latter one only). This results in a more ethically viable approach assigning more patients to better performing arms while not compromising the integrity of a trial. This class of designs was originally developed for multinomial endpoint. The objective of the project is to work on the generalisation of the information-theoretic concept to continuous outcomes using various type of information (Shannon, Fisher, Tsallis), its estimation, and on a randomised setting with the weighted information measure accounting for comparisons to the common control.

 

Brian Tom and Paul D W Kirk

Contact

Brian Tom (Precision Medicine)

brian.tom@mrc-bsu.cam.ac.uk

Paul Kirk (Biostatistical Machine Learning)

paul.kirk@mrc-bsu.cam.ac.uk

Prioritisation of variables in mixture models for regression and clustering

Background: In many settings there is a need to either prioritise certain types of data, regulate the influence of different pieces of information or remove irrelevant variables when performing regression or clustering. For example, when certain types of variables are either of better quality than others, it may be better to up-weight these variables relative to others to get more reliable and stable models. If certain variables/modalities are easier to measure, cheaper to collect or more routinely used in clinical settings, then it may be preferable to adopt a more decision-theoretic solution where costs are traded-off with the benefits of better information. Moreover, situations arise when prioritisation allows targeted solutions to the specific clinical question at hand. Applied motivation for investigating prioritisation of variables comes from collaborations in osteoarthritis and dementia where to a lesser or greater extent, clinical outcome has had a minor or major role in determining cluster structure. In the mixture regression model context (e.g. two-part, latent class mixed model and joint latent class models), where intrinsically linked sub-models are formulated as regressions with common subset of covariates, it is unclear how best to do variable selection. Adaptive regularisation may be an option.

Aim of Project: To investigate and provide recommendations on how best to prioritise, regulate or perform variable selection for mixture models used to uncover latent structures.

Methods: We envisage that there will be various approaches amenable to the general problem of prioritisation. The exact approach may be dependent on the reasons for up- or down-weighting variables and the clinical questions addressed. However, possible strategies could involve (i) weighted likelihoods; (ii) adaptive regularisation; (iii) informative priors; and (iv) decision-based or multi-task learning approaches with loss functions.

Supervisors: This project will be supervised by Dr Brian Tom, Programme Leader in Precision Medicine Theme, Dr Paul Kirk, Programme Leader in Biostatistical Machine Learning and Professor Sylvia Richardson (Emeritus Professor and former Director of the MRC Biostatistics Unit)

John Whittaker

Contact

John Whittaker (Causal Mechanisms)

john.whittaker@mrc-bsu.cam.ac.uk

I am interested in supervising projects developing and applying statistical methods to high dimensional data, to understand the causal mechanisms underlying human disease, and to inform on the development novel therapies. I’m particularly interested in the statistical problems arising in the integration of multiple data types, for instance population scale genetic/genomic data, phenotypic data from electronic health records and other types of ‘omics, together with lab scale data eg from gene editing perturbations of cellular models.  

Examples of the sorts of questions we might try to address are:

How best should we use human genetics in drug discovery? We think drug targets that are “genetically supported” by a particular definition are more likely to succeed, but there are many ways we could define “genetically supported”—which is best? How can we integrate other data, eg combining genetics with measurements of protein levels or gene expression, and how can we “borrow” information across genes with similar functions?

Can we build statistical models integrating the data types above to predict what will happen if we modulate a given human protein? In particular, will we get a therapeutic effect that would suggest a drug discovery programme is justified? Even poorly predictive models could be an important contribution to drug discovery, given the failure rate of such programmes is >95%.

Projects would typically be in collaboration with biologists, epidemiologists and clinicians at Cambridge or elsewhere, in particularly at CRUK Cambridge, the Wellcome Sanger Institute and  Harvard Medical School.  

In addition we have a PhD project to work on large high dimensional data generated during the full blood count measurement from in-depth characterised individuals with rare inherited disorders and from genetically modified mice (Professor Nadia Rosenthal, The Jackson Laboratory, USA; https://www.jax.org). The main purpose being to identify specific blood cell signatures for rare inherited diseases using machine learning and AI approaches, and to explore the value of these signatures for diagnosis of rare human disease and to generate understanding of the causal mechanisms underlying rare and common human disease. This project would be joint work with colleagues in the Department of Haematology and DAMTP.

 

Anne Presanis and Daniela De Angelis

Contact

Dr Anne Presanis (Population Health)

Prof Daniela De Angelis (Population Health)

Developing adaptive designs for infectious disease surveillance
Supervisors: Dr Anne Presanis, Prof Daniela De Angelis

 

The COVID-19 pandemic demonstrated the need for infectious disease surveillance systems that are flexible and can be rapidly and continuously adapted to support responses throughout different phases of an outbreak.  Different stages of an epidemic (emergence, establishment, endemic, elimination) require surveillance to achieve different objectives, including outbreak detection, estimation (e.g. of prevalence, incidence, severity, effectiveness of interventions), prediction (e.g. of incidence and healthcare burden), and monitoring progress to elimination. These objectives might vary in priority and compete in terms of the resources needed to achieve them. To make most efficient use of limited resources, surveillance objectives must also be traded off against logistical and cost constraints (Cheng et al, 2020). Motivated by pandemic preparedness work and monitoring progress to the target of elimination of HIV as a public health problem in the UK, this project will develop methods to optimally design adaptive surveillance. We will consider the translation of methods from other fields, such as adaptive design of clinical trials (Villar et al, 2015; Robertson et al, 2023) and value of information, experimental design and decision theory (Jackson et al, 2019), to disease surveillance. This project will suit an applicant with a mathematics/statistics background who is motivated by making effective contributions to important public health questions. The project will be carried out in collaboration with UKHSA (Andre Charlett), WHO and key colleagues in the Efficient Study Design (Sofia Villar) and Population Health (Christopher Jackson) themes at MRC BSU.

Dominique-L. Couturier and Thomas Jaki

Contact

Dr. Dominique-L. Couturier

Prof. Thomas Jaki  

Partner Pfizer US

Background to the project: Digital healthcare is a very dynamic field aiming to enhance patient health, from diagnostic to treatment, through the collection and analysis of increasingly larger digital health records. Examples include the use of deep learning (DL) algorithms to improve disease detection and diagnosis based on medical images such as MRIs and CT scans [1,2], the use of machine learning (ML) algorithms to inform patient treatment based on clinical and genomic profiles [3], and the use of wearable devices to monitor patient health [4,5]. These algorithms and devices are regularly updated to follow the latest developments in DL/ML theory [5], in medical knowledge, with, for example, the availability of new or different health data [6], as well as the latest technological improvements, possibly leading to more affordable and improved health devices [7].

What the studentship will encompass: While guidance on how to use adaptive designs to evaluate medical devices exist [8], no such recommendation is available when considering updates of such algorithms and devices. This project aims to fill this gap by focusing on developing novel efficient adaptive designs for the evaluation of updates to algorithms and devices in digital healthcare when a re-evaluation is deemed necessary. The project will focus on two scenarios:

  • Update of healthcare devices:

Nowadays wearable health devises typically provide a multitude of longitudinal outcomes, like, for example, heart and respiratory rates, physical activity, sleep patterns, body temperature and glucose levels. A device producer may be interested in verifying that such measures are the same following an update related to the use of an improved technology and/or a change in production. Existing methods in (adaptive) equivalence testing are typically design dependent and consider a small number of outcomes [9,10]. A first aim of this project is therefore to develop more efficient and general adaptive methods in equivalence testing in the context of large number of dependent outcomes.

  • Update of healthcare algorithms:

Improved algorithms are typically expected to be developed regularly and to do as well or better than their previous versions. A second aim of this project is therefore to develop adaptive methods in non-inferiority testing able to exclude that such changes led to a decrease in quality like a decrease in specificity and sensitivity for diagnosis classifier algorithms, for example, through time. A particular emphasis will therefore focus on methods allowing to preserve power when hypotheses are tested sequentially, as data becomes available, without knowledge of future algorithm improvements [11].

Detail of supervision: The project will be undertaken under the supervision Dr Dominique-Laurent Couturier and Professor Thomas Jaki at the MRC Biostatistics Unit (BSU), University of Cambridge, where the student will be based.

Collaborations: This project will be conducted in close collaboration with an industry partner, Pfizer US, with regular interactions to enhance the project's inputs and outcomes.

PPI:  Training on Patient and Public Involvement will be provided (e.g. through the NIHR Cambridge BRC). Input on the practicability of the proposed methods will be discussed on the basis of concrete applications with patient representatives when appropriate.

[1] Litjens, G. et al. (2017), A survey on deep learning in medical image analysis. Med. Image Anal. https://doi.org/10.1016/j.media.2017.07.005 

[2] Liu, X. et al. (2019), A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. The Lancet Digital Health. https://doi.org/10.1016/S2589-7500(19)30123-2 [3] Quazi S. (2022), Artificial intelligence and machine learning in precision and genomic medicine. Med Oncol. https://doi.org/10.1007/s12032-022-01711-1

[4] Smith, A.A. et al (2023), Reshaping healthcare with wearable biosensors. Sci Rep 13. https://doi.org/10.1038/s41598-022-26951-z [5] Shickel, B. et al (2018), Deep EHR: A survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health Inform. https://doi.org/10.1109/JBHI.2017.2767063

[6] Savage, N. (2023), Synthetic data could be better than real data. Nature. https://doi.org/10.1038/d41586-023-01445-8

[7] Iqbal, S.M.A. et al. (2021), Advances in healthcare wearable devices. NPJ Flexible Electronics. https://doi.org/10.1038/s41528-021-00107-x 

[8] FDA (2016), Adaptive Designs for Medical Device Clinical Studies - Guidance for Industry and Food and Drug Administration Staff, https://www.regulations.gov/docket/FDA-2015-D-1439

[9] Pallmann, P. and Jaki, T. (2017), Simultaneous confidence regions for multivariate bioequivalence. Statistics in Medicine. https://doi.org/10.1002/sim.7446

[10] Grayling, M. et al (2019), Two-Stage Adaptive Designs for Three-Treatment Bioequivalence Studies. Stat Biopharm Res. https://doi.org/10.1080/19466315.2019.1654911

[11] Tian, J. and Ramdas, A. (2022). Online control of the familywise error rate. Statistical Methods in Medical Research. https://doi.org/10.1177/0962280220983381

Sofia Villar

Contact

Sofia Villar

sofia.villar@mrc-bsu.cam.ac.uk

Operationally feasible Multi-Arm-Bandit Response Adaptive Trials

About the Project

Background to the project

Response Adaptive trials, such as those based on multi-armed bandit models, have the potential to improve efficiency of clinical trials - by reducing the required sample size or increasing the chances of an experiment success. These designs also offer benefits to the trial participants. There has been a renewed interest to develop existing theoretical results into suitable trial methodology that can deliver the potential benefits of these optimal designs [1]. There have been a number of criticisms on potential issues in RAR as well as methods to address these [2] and there remain areas of methodological need like for example how to accommodate for missing data [3,4]. Nonetheless, in practice, the uptake of bandit response adaptive for clinical trial design remains very limited. In practice, the challenges associated with the operational and feasibility considerations of such trials constitutes the largest hurdle to overcome.

What the studentship will encompass and planned secondments

In this project, the student will start by analysing through simulation studies based on real trial examples, three key operational issues: drug supply, randomisation system and recruitment patterns. An important goal is to quantify their impact on existing bandit algorithms, and to develop new algorithms to potentially mitigate these practical aspects that have been typically ignored in the theoretical literature. A second goal is to combine the operational considerations in the design to allow for optimising the key trial characteristics in different settings of interest, particularly accounting for the inevitable problem of limited or incomplete data, both at baseline and during the trial. Finally, the student would aim to develop an easy-to-use software such as a web app (e.g., R Shiny App) to illustrate the impact of different practical considerations in the overall assessment of different optimal designs.

Detail of supervision

The project will be undertaken under the main supervision of Dr Villar and the co-supervision of Mr Yaron Racah (PhaseV).

Dr Sofia S. Villar, Programme Leader at MRC Biostatistics Unit, has extensive experience in methodological development of optimal response-adaptive designs for, and the implementation of them in clinical trials and is pioneering the development and application of multi arm bandits to clinical trials. 

Mr Yaron Racah is a Talpiot graduate, with many years of experience tackling tough optimization problems. Mr Racah was the first employee at VIA Transportation (valued at over 3.5 B USD), a company operating dynamic public transportation in hundreds of cities around the globe, and was key to developing their planning algorithm. Mr Racah is the inventor on 7 different patents.

Dr Robin Mitra, Associate Professor of Statistics at UCL and Structured Missingness Theme Lead at the Alan Turing Institute, has extensive experience working in the fields of missing data and medical statistics. He collaborates with Roche on developing strategies to handle missing values arising in complex linked clinco-genomic data, as well as utilising missing data methods to facilitate early decision-making in adaptive trials.

Collaborations:

This project also benefits from the collaboration with other members of Phase V. Phase V is a young and dynamic start-up building a product to disrupt the way clinical trials are run. This collaboration brings links working with some of the largest pharmaceutical companies on real world applications. The Phase V team have many years of experience tackling cutting edge optimisation problems.

Details of industry placement

As part of the PhD, there will be a 6-month internship planned with PhaseV. This would include hands-on experience of the role of a biostatistician in a clinical study and collaborating with non-statistical stakeholders. During the internship, the student would be handling real-world projects under the supervision of a Principal Biostatistician within the team. This would enable a student to have a clear understanding of how the proposed methods developed as a part of the PhD fit into the outlook of the regulatory authorities and how they can be used in real clinical trials.

PPI

The student will benefit from the PPI work done within Dr Villar’s group and will seek the contribution on the direction of the project from the patient representatives that Dr Villar works with. In addition, NIHR Cambridge BRC have several different Patient and Public Involvement online training courses and resources available free of charge for all researchers located at the campus where the student will be based.

HOW TO APPLY

You are applying for a PhD studentship from the MRC TMRP DTP. A list of potential projects and the application form is available online at:

https://mrctmrpdtp.com/current-opportunities/

Please complete the form fully. Incomplete forms will not be considered. CVs will not be accepted for this scheme.

Please apply giving details for your first choice project. You can provide details of up to two other TMRP DTP projects you may be interested in at section B of the application form.

Before making an application, applicants should contact the project supervisor to find out more about the project and to discuss their interests in the research before 06 January 2025.

The deadline for applications is 12 noon (GMT) 13 January 2025. Late applications will not be considered.

Completed application forms must be returned to: enquiries@methodologyhubs.mrc.ac.uk

Informal enquiries may be made to Dr Villar - sofia.villar@mrc-bsu.cam.ac.uk

Funding Notes

Studentships are funded by the Medical Research Council (MRC) for 3 years. Funding will cover tuition fees at the UK rate only, a Research Training and Support Grant (RTSG) and stipend. We aim to support the most outstanding applicants from outside the UK and are able to offer a limited number of bursaries that will enable full studentships to be awarded to international applicants. These full studentships will only be awarded to exceptional quality candidates, due to the competitive nature of this scheme.

References

[1] Villar, Sofía S., Jack Bowden, and James Wason. "Multi-armed bandit models for the optimal design of clinical trials: benefits and challenges." Statistical science: a review journal of the Institute of Mathematical Statistics 30.2 (2015): 199.
[2] Robertson, David S., et al. "Response-adaptive randomization in clinical trials: from myths to practical considerations." Statistical science:38.2 (2023): 185.
[3] Chen, Xijin, et al. "Some performance considerations when using multi-armed bandit algorithms in the presence of missing data." Plos one 17.9 (2022): e0274272.
[4] Mitra, R., McGough, S. F., Chakraborti, T., Holmes, C., Copping, R., Hagenbuch, N., … & MacArthur, B. D. (2023). Learning from data with structured missingness. Nature Machine Intelligence, 5(1), 13-23.
 

 

 

Helene Ruffieux & Brian Tom

Contact

Brian Tom (Precision Medicine)

brian.tom@mrc-bsu.cam.ac.uk

Helene Ruffieux (Causal Mechanisms)

helene.ruffieux@mrc-bsu.cam.ac.uk

Latent factor modelling for longitudinal biomedical data

We are seeking enthusiastic applicants for a PhD project focused on innovative Bayesian latent factor approaches to capture dynamic patterns from high-dimensional longitudinal biomedical data. This project aims to develop interpretable hierarchical models and efficient inference algorithms to reveal latent structures driving both shared and unique sources of variability in patients' molecular profiles, clinical phenotypes or treatment responses. By bridging methodological advances with open biomedical questions, this research will enhance our understanding of the latent disease mechanisms underlying patient heterogeneity and improve our ability to predict disease progression.

Simon White & Dominique-Laurent Couturier

Contact

Dr Simon R. White (Precision Medicine)

simon.white@mrc-bsu.cam.ac.uk

Dr Dominique-Laurent Couturier (Efficient Study Design)

dominique.couturier@mrc-bsu.cam.ac.uk

Study design for Bayesian change-point models with unknown change-points and unknown labels: using applications in cancer treatment delay trials and adolescent neuro-development.

Outline

Many longitudinal processes, like tumour growth in cancer studies, experience a shift, or change-point, in how the process behaves; this may be due to an intrinsic aspect of the process, for example adolescent neuro-development, or as a result of external interventions, for example the start and end of treatment phases. The key feature of these changes is that they typically occur at unknown subject-dependent time-points.

Change-point (Generalised) linear mixed models explicitly incorporating such changes in the process are used to fit the data. In the frequentist paradigm for example, it is common to first estimate the location of unknown fixed change points via a grid search minimising some function of the likelihood and then fitting a conventional linear mixed models fitted by assuming the optimised change points occurred at known time-points. Recent developments [1,2] allow fitting mixed models with random change-points but these approaches are restricted to a single change-point.

However, the issue of how to design studies when there are change-points expected in the process is under researched. This project will explore the development of study designs for Bayesian change-point linear mixed models aiming to account for unknown change-points and to allow individual variation in the change-point (i.e. a random effect around a group mean), with the additional challenge of unknown subgroups [3]. The diagram below shows the outcome mean (y-axis, tumour size, for example) as a function of time (x-axis) for 2 groups (red and blue lines, different response patterns to the same treatment, for example) initially sharing the same growth (black lines) until the first random change point (time at which the treatment starts kicking in), and then taking different paths until a second random change points (end of efficacy of the treatment, for example) occurs.  In this diagram, the question of interest translates to how many individuals to recruit and how often to measure them.

The motivation for these developments are two very distinct applications: (a) modelling treatment delays in cancer, where the key question is to determine the form of the delay; and (b) modelling adolescent neuro-development, where individuals develop at different rates but there are subgroups with different development trajectories.

(1) Muggeo V.M.R et al (2014), Segmented mixed models with random changepoints: a maximum likelihood approach with application to treatment for depression study, https://doi.org/10.1177/1471082X13504721
(2) Capuano A. & Wagner M. (2023), SOFTWARE nlive: an R Package to facilitate the application of the sigmoidal and random changepoint mixed models, https://doi.org/10.21203/rs.3.rs-2235106/v1
(3) White S.R., Muniz-Terrera G., and Matthews F.E. (2018), Sample size and classification error for Bayesian change-point models with unlabelled sub-groups and incomplete follow-up. https://doi.org/10.1177%2F0962280216662298