## Current PhD Opportunities at the MRC Biostatistics Unit

The BSU is an internationally recognised research unit specialising in statistical modelling with application to medical, biological or public health sciences. Details of the work carried out in the Unit appear on our Research page.

We currently have the following (please see below) studentships available at the MRC Biostatistics Unit.

To apply for any of the following PhD projects, please visit the Applications procedures page. Deadline for all applications is the **3oth June 2017 **

- Improving the likelihood of a successful trial when there are multiple available endpoints of interest-
**Adrian Mander and Michael Grayling** - Bayesian dose adaptive trials using non-myopic response-adaptive methods–
**Sofia Villar and Adrian Mander** - Developing Bayesian non-myopic response-adaptive randomisation for the case of delayed endpoint observation–
**Sofia Villar and Adrian Mander** - Methods for integrating and splitting complex/big models–
**Robert Goudie and Lorenz Wernisch** - Leveraging reference omics datasets to improve power in stratified approaches to patient treatment decisions–
**Chris Wallace** - Matching approaches for efficient and robust causal inferences in Mendelian randomization–
**Stephen Burgess** - Experimental design for inference of gene networks from single cell data–
**John Reid and Stephen Hill** - Model-free network inference from single cell data–
**John Reid** - Dimension reduction techniques in the field of brain imaging–
**Simon White** - Investigating the role of single-arm trials in drug development plans–
**Adrian Mander, Michael Grayling and James Wason** - Bayesian dose adaptive trials with multiple outcomes–
**Adrian Mander and Graham Wheeler**(UCL) - Methods for using high-dimensional biomarker information prospectively in clinical trials –
**James Wason and Paul Newcombe** - Adaptive designs for longitudinal trials to efficiently estimate biomarker change-point outcomes and time-to-change-point –
**Simon White**

### Improving the likelihood of a successful trial when there are multiple available endpoints of interest

**Supervisors – Adrian Mander (BSU) and Michael Grayling (BSU)**

Clinical trials are often conducted by considering a single endpoint of interest upon which to assess the performance of an experimental treatment. Much research has consequently been conducted into the efficient design of such trials, and it remains an advantageous method for many settings.

However, in some instances it is a seemingly less appropriate approach. Namely, evaluating a treatment based on a single outcome measure would ostensibly be unwise for scenarios in which either (a) there is no available consensus on the key endpoint to assess, or (b) there are multiple available endpoints, and response on some subset of these would warrant further exploration of the experimental treatment. These situations occur in dementia research as cognitive ability is split into several domains and in palliative care research where time is critical and an improvement in any symptoms can be hugely beneficial at the end of life. Unsurprisingly therefore, there has been great interest in recent years on trial design methodology that allows the simultaneous assessment of multiple endpoints. This has also now spread to the adaptive trial design community with, for example, group sequential designs for co-primary endpoints [1], or interim endpoint selection designs [2], now available.

The focus of the majority of this research has been achieving efficacy in each of the specific co-primary endpoints. More recently however, there has been a proposal for the design of a trial that essentially looks to test if a specific number of the overall endpoints are efficacious [3]. This method offers promise to address both point (a) and point (b) above. Its performance however, relative to the classical ‘choose an endpoint’ approach, or methods based on forming a single composite endpoint from the multiple considered, has yet to be fully explored. Nor have adaptive design extensions been considered for this ‘m-out-of-n’ endpoint trial design.

Accordingly, this PhD project will begin by examining the possible approaches one can take to the design of trials when there are multiple endpoints available of interest and the goal is to have a ‘successful trial’, i.e. the positive determination of several endpoints upon which the regimen is effective. The methods will be compared in terms of their ability to identify treatments that do warrant further exploration, and the sample size they require to achieve this. Following this, the project will turn towards the development of adaptive design methodology to improve the efficiency of such trials.

In summary, the PhD will cover some of the following areas:

- Comparing and contrasting the performance of readily available design possibilities for normally distributed endpoints.

- Using and extending methodology for the design of trials with multiple binary, time-to-event, or combinations of endpoint types.

- Exploring the development of adaptive trial design methodology for multiple endpoint trials, including group sequential, sample size re-estimation, and endpoint selection approaches.

- Assessing how the various methods perform in terms of estimating the effect of the interventions on each endpoint.

- Creating easy-to-use software for the various methods in either R or Stata.

**References**

- Hamasaki T, Asakura K, Evans SR et al. Group-sequential strategies in clinical trials with multiple co-primary outcomes. Stat Biopharm Res 2015; 7(1):36-54.
- Rauch G, Schüler S, Wirths M et al. Adaptive designs for two candidate primary time-to-event endpoints. Stat Biopharm Res 2016; 8(2):207-216.
- Delorme P, Lafaye de Micheaux P et al. Type-II generalized family-wise error rate formulas with application to sample size determination. Stat Med 2016; 35(16): 2687-2714.

This studentship is for full time study only.

Available for commencement in Michaelmas Term 2017

Full award to cover Cambridge University fees and a competitive stipend for a period of 3 years only – see application procedures for eligibility criteria.

To apply for this project- please see our Application procedures

___________________________________________________________________

### Bayesian dose adaptive trials using non-myopic response-adaptive methods

**Supervisors – Sofia Villar (BSU) and Adrian Mander (BSU)**

In dose finding studies the aim is to find the maximum tolerated dose of an agent or to find a dose which is closest to a target dose. In dose-ranging studies different doses of an agent are tested against each other to establish which dose works best and/or is least harmful by estimating a response-dose relationship. However, achieving either of these goals with a high precision can imply exposing a large number of patients to highly toxic doses, imposing a learning-earning trade-off. Despite extensive recent work has been done in using decision theory for addressing such a trade-off in the context of designing clinical trials [1], little work has been done to extend such a framework for dose-finding/dose-ranging studies. Using a decision-theoretic approach allows to take into account the interests of the patients both within and outside the trial to derive a patient allocation rule which can acknowledge the existing conflict between the interests of each individual patient and the following patients. This idea was proposed earlier in the literature (e.g. a framework for dose-finding trials using the theory of bandit problems was proposed by Leung and Wang [2]) yet because finding the optimal strategy for this type of bandits with dependent arms is in most relevant cases not computationally feasible the approach has not been further developed.

This PhD project will look at developing decision-theoretic non-myopic response-adaptive dose-ranging methodology for dose-ranging and dose-finding studies. The project will make use of recent advances in bandit theory to try and reduce the computational complexity of finding the optimal (or nearly optimal) solution derived from a set of relevant optimisation problems. The PhD will cover some of the following areas:

- Use and extend existing response-adaptive randomisation rules to be incorporated into the design of dose-escalation studies.
- Investigate novel optimal response-adaptive adaptive designs that can handle multivariate conflictive outcomes (efficacy-toxicity).
- Asses how these methods perform in terms of estimation purposes and patient gain decisions (administering doses nearest to the target toxicity level).
- Use of the dynamic optimisation (bandit) literature to develop suitable and practical non-myopic adaptive randomisation methods specifically designed for dose adaptive trials;
- Produce easy to use software in R and/or Stata to implement methods;
- Compare the resulting decision-based designs to the real trial.

**References:**

- Villar, S., Wason, J. and Bowden, J. (2015) Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges. Statistical Science Vol. 30, No. 2, 199-215.
- Leung, D. and Wang, Y.G. (2002) An extension of the continual reassessment method using decision theory. Statistics in Medicine 21(1):51-63
- Fan, S. and Wang, Y.G (2006) Decision-theoretic designs for dose-finding clinical trials with multiple outcomes. Statistics in Medicine Vol. 25 No 10 1699–1744

This studentship is for full time study only

Available for commencement in Michaelmas term 2017

Full award to cover Cambridge University fees and a competitive stipend for a period of 3 years only – see application procedures for eligibility criteria and to apply.

___________________________________________________________________

### Developing Bayesian non-myopic response-adaptive randomisation for the case of delayed endpoint observation

**Supervisors – Sofia Villar (BSU) and Adrian Mander (BSU)**

Before a novel treatment is made available to the wider public, clinical trials are undertaken to provide unbiased and reliable information that the treatment is safe and efficacious. The standard approach for such confirmatory clinical trials is to compare only two treatment options and requires a large number of patients to be recruited in a trial. This approach does not fit well with the development of treatment for many conditions in which there is a large number of potential treatments to explore and relatively very few patients affected by the disease that could be enrolled in a trial. This is the case for drug development for rare types of cancer.

A promising alternative to the standard approach within the above described context is the use of response-adaptive randomization (i.e. changing the allocation probabilities as outcome data is collected to favour promising treatments). Promising treatments can be quickly identified, allocating more patients to them while doing so, by designing a trial that incorporates a response-adaptive randomization patient allocation rule. The type of response-adaptive randomization rules that exhibit the best performance in terms of patient benefit are the so called non-myopic rules which unfortunately suffer from a binding computational burden. Developing computational feasible and practical methods to apply these ideas into trial design as a way for improving the success rate of Phase III clinical trials are therefore of great current interest. At the Biostatistics unit we have made a start with this by developing a non-myopic group response-adaptive randomisation method called the ‘forward looking Gittins index’ rule (1,2) for the case of dichotomous endpoints.

This PhD project will look at extending existing non-myopic response-adaptive randomisation methodology to cover the case of delayed outcomes. This is particularly relevant for trials in which the endpoint is survival. The project will investigate novel optimal adaptive designs that can use both observed response and partial information (derived from the delayed response). Therefore, these methods will be closer to the real world situations being handled by trials in which the endpoint is not necessarily best modelled as binary and immediately observable. The PhD will cover some of the following areas:

- To model the patient allocation problem with delayed patients responses as an optimal sequential decision making problem in the stochastic dynamic programming framework.
- To design of index policies and their comparison to existing approaches in terms of statistical and optimality performance
- To develop and study of efficient algorithms for optimal solutions, creation of a software package for and collaboration with statisticians and clinicians to apply designed solutions in real clinical trials

The student also will have the opportunity to collaborate with researchers from Lancaster University that are experts in stochastic dynamic programming approaches and in adaptive designs.

**References:**

- Gittins, J. and Jones, D. A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika, 66(3):561–565, 1979.
- Villar, S., Wason, J. and Bowden, J. (2015) Response-adaptive Randomization for Multi-arm Clinical Trials using the Forward Looking Gittins Index rule. Biometrics Vol. 71, No 4. 969-978 .

This studentship is for full time study only

Available for commencement in Michaelmas term 2017

Full award to cover Cambridge University fees and a competitive stipend for a period of 3 years only – see application procedures for eligibility criteria and to apply.

_____________________________________________________

### Methods for integrating and splitting complex/big models

**Supervisors – Robert Goudie (BSU) and Lorenz Wernisch (BSU)**

Synthesis of evidence from multiple sources (data and expert opinion) and from different study designs is increasingly common in all areas of science, including in infectious disease epidemiology, health technology assessment and omics (genomics, proteomics etc). Combining information sources often results in more precise and useful inferences, especially when some data are incomplete or biased.

However, using joint “big models” of several sources of evidence, including data and expert opinion, is inferentially and computationally challenging. It is often sensible to take a modular approach in which separate sub-models are considered for smaller, more manageable parts of the available data/evidence. Each sub-model is simpler (lower-dimensional) than the “big model” and so will be easier to construct and use.

In a Bayesian framework, the sub-models should be integrated into a joint model, so that all data and uncertainty are fully accounted for. This can be challenging to do, but at the Biostatistics Unit we have recently proposed a novel approach to this problem called Markov melding [1], building on ideas from the graphical models literature. This promises to enable fully Bayesian inference in settings where this was not previously possible, and to allow splitting the computation required for large models into smaller pieces (which may be computationally advantageous). However, it remains an open problem how best to join together these pieces into inference for the joint model.

This PhD project would particularly suit a student interested in computational and methodological statistics, since there is considerable scope for new methodology and algorithms in this area. The PhD will involve working towards developing, implementing and assessing promising approaches. There is the potential to draw upon and extend ideas in the connected literatures that are developing in this area including divide-and-conquer/parallel computation methods for “big data” (such as large n “tall data”); newly-developed approximate methods for estimating the ratio of two densities; pseudo-marginal MCMC; and connections to sequential Monte Carlo. There is also scope to study the application of these methods in substantive application areas, including in network meta analysis.

[1] https://arxiv.org/abs/1607.06779

This studentship is for full time study only

Available for commencement in Michaelmas term 2017

Full award to cover Cambridge University fees and a competitive stipend for a period of 3 years only – see application procedures for eligibility criteria and to apply.

____________________________________________________________

### Leveraging reference omics datasets to improve power in stratified approaches to patient treatment decisions

**Supervisor- Chris Wallace**

High dimensional biological assays are now regularly used to profile multiple aspects of clinical

and biological samples. Each of several thousand measured predictors is then often tested,

in parallel univariate analyses, for association with a phenotype or clinical outcome of interest.

With such high dimensional data and an underlying assumption that several predictors may be

truly associated, control of the family wise type 1 error rate (the chance of making at least one

false rejection of a null hypothesis) has become viewed as overly conservative in many circum-

stances. The false discovery rate (FDR) (Benjamini and Hochberg, 1995), defined as the expected

proportion of rejected null hypotheses that true, has been widely adopted as a less conservative

approach to deal with the mulitplicity of hypotheses considered. Empirical Bayes can also be used

to define a local fdr, the probability that a given rejected hypothesis is truly null, and the

FDR can be expressed as the integral of the local fdr over the region corresponding to rejection

(Efron, 2007).

This project will explore a related concept, the conditional FDR (cFDR) which has been proposed to incorporate information from tests of related hypotheses on the same biomarkers. This has been applied in genetic studies (Andreassen and others, 2013), where the same sets of hundreds of thousands of variants may be tested for association with different but aetiologically related diseases. However, the second dimension introduced causes additional complexity when determining overall FDR control (Liley and Wallace, 2015). This will be harder to deal with in the case of genomic data, where the sparser data (typically 1-2 orders of magnitude smaller than genetic data) makes it harder to accurately estimate the distribution functions and regions over which we need to integrate.

This PhD project will initially explore the extension of the cFDR concept may be adapted to genomic data, with opportunities for extending focus on this method, or for introducing information from over per-hypothesis covariates, or indeed for completely novel methods to borrow information from genomic data generated on related samples. The aim is ultimately to apply these methods to genomic biomarker data generated by collaborators to identify predictors of treatment response in autoimmune diseases and thus contribute to improving the chances that a patient is given the right treatment early on in their disease.

**References:**

- Andreassen et al., (2013). Improved detection of common variants associated with schizophrenia and bipo-

lar disorder using pleiotropy-informed conditional false discovery rate. PLoS Genet. 9(4),

e1003455. - Benjamini and Hochberg. (1995). Controlling the False Discovery Rate:

A Practical and Powerful Approach to Multiple Testing. J. R. Stat. Soc. Series B Stat.

Methodol. 57(1), 289–300. - Efron. (2007). Size, Power and False Discovery Rates. Ann. Stat. 35(4), 1351–1377.
- Lileyand Wallace. (2015). A pleiotropy-informed Bayesian false

discovery rate adapted to a shared control design finds new disease associations from GWAS

summary statistics. PLoS Genet. 11(2), e1004926.

This studentship is for full time study only

Available for commencement in Michaelmas term 2017

____________________________________________________________

### Matching approaches for efficient and robust causal inferences in Mendelian randomization

*Supervisor: Stephen Burgess *

Mendelian randomization is the use of genetic variants as proxy measurements for a modifiable exposure to judge whether interventions on the exposure are likely to reduce disease risk. Mendelian randomization has a history of correctly predicting the results of randomized trials of pharmacological interventions, and has wide applicability in a broad range of scientific fields for making the crucial distinction between causation and correlation.

The aim of this project is to investigate the use of traditional approaches for causal inference in Mendelian randomization, in particular: i) matching on covariates, and ii) matching by design. The use of covariates for matching should lead to more efficient and potentially more robust estimators, as covariates are similar within the matched pairs. Such approaches have not been previously adequately considered in the context of Mendelian randomization, but should be feasible in the UK Biobank dataset. Another reason for using matching in this context is to divide the UK Biobank dataset into a discovery cohort (comprising non-diseased individuals) and a validation dataset (comprising diseased individuals and matched controls). This enables the dataset used to choose which genetic variants to include in the analysis to be separate from the dataset in which the causal hypothesis is tested.

Alternatively, paired designs, such as the comparison of sibling pairs, may be worthwhile in Mendelian randomization, particularly for rare genetic variants that naturally cluster within families, and hence are unlikely to satisfy the Mendelian randomization assumptions. While population-based studies have overtaken family-based studies in the current GWAS era, recall-by-genotype experiments (where additional carriers of rare variants are found amongst the relatives of those who carry the variants) will lead to matched analyses becoming increasingly important. Methods will be developed using data from the Swedish Twin Registry.

This studentship is for full time study only

Available for commencement in Michaelmas term 2017

_____________________________________________________________

### Experimental design for inference of gene networks from single cell data

**Supervisors – John Reid (BSU) and Steven Hill (BSU)**

Gene regulatory networks control almost all cellular functions. The ability to accurately reconstruct these networks would greatly further our understanding of many diseases, genetic conditions and developmental biology. However only limited progress has been made reverse engineering these networks using the data available from modern high-throughput biological experiments.

The space of all possible undirected network structures grows exponentially in the number of genes and thus network inference is underdetermined for networks of any reasonable size. However, the network inference problem is wellposed in abstraction and this makes it an attractive problem to study. This low barrier to entry together with its biological importance means that network inference has been extensively studied over the last two decades. Many inference methods have been developed that work with various types of experimental

designs [1].

Perturbation experiments measure a system’s characteristics in conditions other than its natural state. For example in gene knockdown experiments, one or more genes are artificially silenced. Data from perturbation experiments are some of the most informative for network inference as the effect of a small change to the network can be accurately assessed. However they are expensive and time-consuming to perform and typically biologists can only perform a handful of perturbations. Usually the perturbed genes are chosen by the experimenter in an ad hoc fashion. This project will develop methods for experimental design (that is how to choose which gene(s) to perturb) in order to maximise the value of information from each experiment. Some work exists on experimental design in this context [2–7] but in general this field has not been studied nearly as extensively as the network inference problem.

Recently techniques have been developed to assay gene expression levels in individual cells. Previously genome-wide expression levels could only be measured as averages across populations of thousands of cells. The newly available single cell data allow us to inspect the correlations and relationships between genes in fine detail. In particular the between-cell variation in a population of cells can be characterised. This project will focus on experimental design for single cell experiments.

Most network inference techniques provide point estimates of the network structure. This is a reasonable strategy given the difficulty of exploring the entire space of networks. However to reliably gauge the likely amount of information gained from any particular experimental perturbation, methods to estimate correlations and uncertainty in the posterior will need to be developed.

Given these correlations and uncertainties, methods to choose which genes to perturb will be explored. It is anticipated that the methods developed for the will be Bayesian methods as they naturally quantify uncertainty.

This project will start by focusing on the theoretical aspects of experimental design for single cell data. However it is expected that the methods developed will be of interest to many potential collaborators and the methods will be used to choose experimental perturbations of the biological systems that the collaborators study.

**References**

- Riet De Smet and Kathleen Marchal. “Advantages and Limitations of Current Network Inference Methods”. In: Nature Reviews Microbiology 8.10 pp. 717–729.
- C. David Page Jr and Irene M. Ong. “Experimental Design of Time Series Data for Learning from Dynamic Bayesian Networks.” In: Pacific Symposium on Biocomputing. Vol. 11. 2006, pp. 267–278.
- Jesper Tegnér and Johan Björkegren. “Perturbations to Uncover Gene Networks”. In: Trends in Genetics 23.1 (Jan. 1, 2007), pp. 34–41.
- Florian Steinke, Matthias Seeger, and Koji Tsuda. “Experimental Design for Efficient Identification of Gene Regulatory Networks Using Sparse Bayesian Models”. In: BMC Systems Biology 1.1 (2007), p. 51.
- Johannes D. Stigter and Jaap Molenaar. “Network Inference via Adaptive Optimal Design”. In: BMC Research Notes 5 (2012), p. 518.
- Alberto Giovanni Busetto et al. “Near-Optimal Experimental Design for Model Selection in Systems Biology”. In: Bioinformatics 29.20 (Oct. 15, 2013), pp. 2625–2632.
- S. M. Minhaz Ud-Dean and Rudiyanto Gunawan. “Optimal Design of Gene Knockout Experiments for Gene Regulatory Network Inference”. In: Bioinformatics 32.6 (Mar. 15, 2016), pp. 875–883.

This studentship is for full time study only

Available for commencement in Michaelmas term 2017

_____________________________________________________________________

### Model-free network inference from single cell data

**Supervisor – John Reid (BSU)**

Gene regulatory networks control almost all cellular functions. The ability to accurately reconstruct these networks would greatly further our understanding of many diseases, genetic conditions and developmental biology. However, only limited progress has been made reverse engineering these networks using data available from modern high-throughput biological experiments [1].

Recently techniques have been developed to assay gene expression levels in individual cells. Previously genome-wide expression levels could only be measured as averages across populations of thousands of cells. The newly available single cell data allow us to inspect the correlations and relationships between genes in fine detail. In particular the between-cell variation in a population of cells can be characterised. It is anticipated that single cell data will greatly aid the reconstruction of gene regulatory networks. To date only a few inference methods have been developed specifically for single cell data [2, 3].

Classical network inference is posed as an network edge prediction task given a gene-by-sample data matrix of gene expression levels. In this formulation when the true network is known the predictions can be validated using precision and recall or other similar statistics [4]. This project will take an alternate approach and focus on model-free approaches to modelling such data. By model-free we mean we will use methods that do not explicitly represent the structure and parameters of the network.

Model-free approaches are the state-of-the-art for modelling certain physical systems [5]. They are able to accurately learn the dynamics of complicated systems with no prior knowledge of the physical relationships between the variables [6]. This project will investigate how to translate their success learning the dynamics of physical systems to the problem of learning the dynamics of gene expression. One model-free approach for regulatory network inference could be to learn the dynamics of the system using a deep neural network [7] or a Bayesian nonparametric model such as a Gaussian process dynamical model [8]. In this approach single cell data from a time series experiment would be placed along a pseudotime dimension [9]. The dynamics of gene expression relative to this pseudotime would be learnt by the model.

Perturbation experiments measure a system’s characteristics in conditions other than its natural state. For example in gene knockout experiments, one or more genes are artificially silenced. Data from perturbation experiments are some of the most informative for network inference as the effect of a small change to the network can be accurately assessed. We will be interested in developing model-free methods that can recapitulate the behaviour of a system under perturbations. Only in this case will we be able to interrogate the model and confidently infer which regulatory relationships are present.

It is anticipated that this project will be largely theoretical but the goal will be to develop methods that further our understanding of the processes that generate single cell gene expression data. We have many collaborators with such data and we wish to develop practical methods that can help them understand the underlying biology.

**References**

- Riet De Smet and Kathleen Marchal. “Advantages and Limitations of Current Network Inference Methods”. In: Nature Reviews Microbiology 8.10 (Oct. 2010), pp. 717–729.
- Andrea Ocone et al. “Reconstructing Gene Regulatory Dynamics from High-Dimensional Single-Cell Snapshot Data”. In: Bioinformatics 31.12 (June 15, 2015), pp. i89–i96.
- Pablo Cordero and Joshua M. Stuart. “Tracing Co-Regulatory Network Dynamics in Noisy, Single-Cell Transcriptome Trajectories”. In: bioRxiv (Oct. 4, 2016), p. 070151.
- D. Marbach et al. “Revealing Strengths and Weaknesses of Methods for Gene Network Inference”. In: Proceedings of the National Academy of Sciences 107.14 (Mar. 22, 2010), pp. 6286–6291.
- Marc Peter Deisenroth, Dieter Fox, and Carl Edward Rasmussen. “Gaussian Processes for Data-Efficient Learning in Robotics and Control”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 37.2 (Feb. 2015), pp. 408–423.
- PilcoLearner. PilcoLearner. url: https : /www.youtube.com/user/PilcoLearner.
- Y. Bengio. “Learning Deep Architectures for AI”. In: Foundations and Trends® in Machine Learning 2.1 (2009), pp. 1–127.
- Raquel Urtasun, David J. Fleet, and Pascal Fua. “3D People Tracking with Gaussian Process Dynamical Models”. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06). Vol. 1. IEEE, 2006, pp. 238–245.
- John E. Reid and Lorenz Wernisch. “Pseudotime Estimation: Deconfounding Single Cell Time Series”. In: Bioinformatics (June 17, 2016), btw372.

This studentship is for full time study only

Available for commencement in Michaelmas term 2017

### _______________________________________________________

### Dimension reduction techniques in the field of brain imaging

*Supervisor: Simon White (BSU)*

Modern brain scanning technologies, such as functional magnetic resonance imaging (fMRI), generate extremely detailed images of each individual. A growing area of research concerns linking neuro-imaging data with cognitive and behavioural measures to better understand how changes in the brain relate to cognitive changes.

The high dimensionality of the imaging data presents many issues for analysis, not least the large computational burden. Although there is a large amount of variability between subjects it is believed that the variability of interest can be considered in a smaller sub-space with a much lower dimension. Dimension reduction techniques transform the original data on to a lower dimension space, our question of interest is then what information and features are lost or preserved under these transformations.

Using imaging data from a large collaborative study (Cam-CAN, http://cam-can.org/), this project will build on existing dimension reduction techniques currently used in the field of neuro-imaging, such as principal component analysis (PCA) and independent component analysis (ICA). The project will then proceed to consider extensions to include cognitive outcomes, addressing issues of computation and complexity of interpretation, and develop generalisable implementations for application in the neuro-imaging field.

This studentship is for full time study only

Available for commencement in Michaelmas term 2017

___________________________________________________________________

### Investigating the role of single-arm trials in drug development plans

*Supervisors- Adrian Mander (BSU), Michael Grayling (BSU) and James Wason (BSU)*

Phase II oncology trials have classically been single-arm in design. However, the lack of randomisation in such trials means it is easy for a selection bias to be introduced, whilst their results can also be heavily influenced by variability in the historical control rate, or by temporal drift in patient response. It is thus not surprising that there has been much interest recently in whether it would be preferable to utilise a randomised design in phase II; with studies attempting to resolve this debate proposing that when possible randomised designs should be preferred.[1]

Accordingly, it is logical to question the role single-arm trials should play in modern drug development. Attempting to explore this, a recent publication provided the first discussions on the efficiency of a phase II drug development plan consisting of a single-arm followed by a randomised two-arm trial[2]. It was demonstrated that in some circumstances such a development plan would be the optimal one to minimise the expected sample size.

Whilst results indicated that a single-arm to randomised two-arm development plan could be highly efficient, it remains unclear how precisely such a development plan should be designed and analysed. In what way should the results of the single-arm trial inform the design of the following randomised trial? Can the data gathered from both the single-arm and randomised trials be combined to better estimate the experimental regimens treatment effect? How would such considerations be affected if the single-arm trial were to assess a binary endpoint, as is typical in oncology, but the longer in duration randomised trial were to utilise a normal or survival endpoint? In this project, we propose to begin by investigating these questions. Specifically, we will look to develop a complete framework for the design and analysis of seamless single-arm to randomised two-arm trials.

Furthermore, conventional single-arm and randomised trials have indeed now been extensively compared. But, the suitability of multiple single-arm trials, versus a single adaptive enrichment trial design, at identifying efficacious treatment regimens in subgroups of interest has to date not been examined. Such considerations are particularly important given the former design has been used in practice, for example by the National Lung Matrix Trial. It may be that the efficiency gains made from sharing a control group mean that a randomised design could be almost universally preferable to multiple single-arm trials when a large number of drugs, and/or subgroups, are present in a trial. We will look to answer this by exploring the efficiency of multiple single-arm trials at enriching for subgroup-treatment interactions of interest.

**References**

- Tang H, Foster NR, Grothey et al. Comparison of error rates in single-arm versus randomized phase II cancer clinical trials. J Clin Oncol 2010; 28:1936-41.
- Grayling MJ, Mander A Do single-arm trials have a role in drug development plans incorporating randomised trials? Pharm Stat 2016; 15:143-51.

This studentship is for full time study onlyAvailable for commencement in Michaelmas term 2017

_______________________________________________________________________

### Bayesian dose adaptive trials with multiple outcomes

**Supervisors- Adrian Mander (BSU) and Graham Wheeler (UCL)**

A recently completed dose-ranging trial (Todd *et al.*, PLOS Medicine, 2016) was designed to find two targeted doses of the biological agent Proleukin that resulted in a 10% or 20% immune response (as measured by the change in the amount of regulatory T-cells) in newly diagnosed diabetes patients. Proleukin is administered by injection and any dose can be administered within the safe therapeutic range. Dose decisions were made by using optimal design theory of minimising the variance of the doses that gave the targeted responses. A follow-up study is planned to identify the best dose and frequency of repeat administration of dose. The primary outcomes are laboratory measurements of three blood-based markers and dose-changing decisions are made using a multivariate regression model.

This PhD project will look at extending existing dose-ranging methodology by designing and investigating novel optimal adaptive designs that can handle multivariate outcomes. These new approaches will be closer to the real world situations faced in the decision-making process early on in dose-ranging clinical trials. This may include aims such as:

- Quantify the information loss by using dimension reduction techniques, such as principle components analysis and using univariate outcomes;
- Investigate model-robust methods such as Bayesian model averaging techniques and likelihood-based information methods (previous BSU PhD work)
- Using historical data to inform dose-changing decisions, e.g. using data from the single Proleukin dose study in the second study via techniques such as commensurate and power priors (Hobbs
*et al.*, Biometrics 2011); - Exploring fully Bayesian approaches to handle uncertainty in parameter estimates;
- Using penalised D-optimality methods (Pronzato, J. Stat. Planning 2010) when safety endpoints are in the multivariate outcome;
- Produce easy to use software in R and/or Stata to implement methods.

This studentship is for full time study only

Available for commencement in Michaelmas term 2017

________________________________________________________________________

### Methods for using high-dimensional biomarker information prospectively in clinical trials

**Supervisors- James Wason (BSU) and Paul Newcombe (BSU)**

With advances in high-throughput biological techniques, huge numbers of potentially predictive biomarkers are becoming routinely collected in modern clinical trials. However, current designs do not make best use of these data, and there is potential for better approaches that will provide more information on which subgroups of patients benefit and which don’t.

The adaptive signature design (ASD) of Freidlin and Simon [1] is a trial design that was developed to make better use of biomarker data. It aims to: 1) develop a predictive biomarker signature that classifies patients as ‘sensitive’ or ‘non-sensitive’ to the treatment; 2) test the treatment effect in sensitive patients; and 3) test the treatment effect in all patients. An alternative approach is the adaptive enrichment design (AED), in which the eligibility criteria of patients are adapted within the trial according to observed efficacy in biomarker subgroups. Proposed methodology for AEDs is limited to the use of one pre-specified biomarker.

During this project, the student will learn about state-of-the-art statistical techniques from the fields of adaptive clinical trials and high-dimensional statistical analysis. They will then work on combining these fields in order to propose designs that can improve on the ASD and AED.

The ASD method, as currently proposed, develops the predictive biomarker signature by testing for interaction between treatment assignment and each biomarker separately. This technique is known to have sub-optimal properties when there are many correlated biomarkers to choose from. Biomarkers that are associated with the same underlying causal effect are likely to be incorrectly included. This leads to lower predictive ability of the signature and over- confident predictions. We will seek to modernise the ASD with state of the art variable selection methods such as Bayesian sparse regression[2], so that the ASD has good performance for correlated high-dimensional biomarker data.

We will then work on applying similar methodology to extend the AED so that it can also be used with high-dimensional biomarker information. Such a trial design would develop a biomarker classifier at an interim analysis that can be used to determine whether future patients would benefit or be harmed by treatment. This opens up possibilities such as not recruiting patients who would likely be harmed, or allocating patients to treatments that are more likely to benefit them (when there are multiple experimental treatments available). The benefits of this are that patients are more ethically treated and the trial will be more efficient (as the recruited patients will likely have a higher treatment effect). However we will also investigate potential drawbacks of this approach.

**References**

- Freidlin B, Simon R. Clinical Cancer Research 2005;11(21):7872-8.
- Newcombe PJ, et al. Statistical methods in medical research.

This studentship is for full time study only

Available for commencement in Michaelmas term 2017

_________________________________________________________

### Adaptive designs for longitudinal trials to efficiently estimate biomarker change-point outcomes and time-to-change-point

**Supervisors- Simon White (BSU) **

Within a clinical trial it is common to make longitudinal observations of biomarkers (and other covariates). If the primary outcome is concerned with a change in the longitudinal biomarker we could consider assessing the time until a change-point occurs in the longitudinal process, that is, when the long-term behaviour of the biomarker changes abruptly. A change-point is an identifiable shift in the long-run biomarker level at an individual level, only some individuals may have a change-point and the time when this occurs will vary by person. This is a non-linear longitudinal model of the biomarker and is appropriate when the trajectory of the biomarker over time is of interest as a clinical outcome, rather than simply the overall change.

For example, monitoring cognition in older individuals, there is a distinct rapid decline in cognitive ability linked with dementia and other cognitive impairments, beyond the normal age- related decline. Rather than assessing a treatment’s effect on preventing a change in cognition, another outcome would be to delay the change to so-called rapid decline. A treatment might fail to prevent rapid decline, but could delay the time until onset of rapid decline; to test this involves analysing the time to the change-point (between normal and rapid decline of cognition). The estimation of the change-point in cognitive decline is well established in the cognition literature, using the Mini-Mental State Exam (MMSE), but there is little research on designs with the time to a change-point as the outcome.

The ultimate aim of the project is to create a novel adaptive design that adapts the number and interval of observations for each individual and to demonstrate whether this leads to an increased power to detect a treatment affect, within a biomarker with change-point(s), which could lead to changes in clinical practice.

The MRC Biostatistics Unit is a partner in the European Prevention of Alzheimer’s Dementia (EPAD) Consortium, this study includes a longitudinal cohort from which participants will be recruited into trials. The EPAD study, as well as historical cognition studies (using, for example the MMSE), will be used to assess novel designs developed during the project. The project will investigate alternate methods for analysing time to change-point type trials, such as assessing the proportion that have changed after a fixed interval or inferring the time of the change-point using regularly timed observations, and compare the power and efficiency of these designs to the adaptive design. Within the context of EPAD, there is the additional issue of designing trials within longitudinal cohort studies, that will need to be incorporated into the trial design. The focus of the project is to investigate adaptive sampling of individuals. Adapting the observation interval and number will minimise the number of observations on each individual, thus leading to a more efficient design for each participant and overall reducing the number of observations required. The project will investigate the power, efficiency and bias of the adaptive design compared to the alternate methods.

This studentship is for full time study only

Available for commencement in Michaelmas term 2017