Statistical and epidemiological models are currently in the news a lot. We often hear that predictions made using models of the current pandemic have contributed to the UK government’s strategy about how to respond to it. But how do we decide if a model is “good” or “bad”?
“All models are wrong, but some are useful” is a common mantra in statistics. What does this mean? Well, models provide descriptions of a simplified view of how the world works. Because the real world is very complicated, we cannot model all of it. We therefore try to identify just the details that are relevant to what we want to predict or learn about.
For example, if I wanted to write down a model that would allow me to predict whether or not a breast cancer patient is likely to respond to a particular treatment, there might be all sorts of things that I think it could be helpful to include in the model. We call these things predictors, and in this case they might include (among many others): how old the patient is, whether or not they have a BRCA mutation, and how early the cancer was detected. It might be the case that some of the things that I think might be useful are actually not very useful at all, and it will very likely be the case that there are useful predictors that I have either not yet identified or have chosen to exclude (e.g. because measuring them might require invasive and painful procedures to be performed on the patient) .
Even if I have a good set of predictors, I typically don’t know what type of relationship exists between the predictors and the response (i.e. the thing I want to predict).
In the breast cancer example, if a patient’s age is indeed a useful predictor, what is its relationship with the probability of the patient responding to treatment? Does being 10 years older mean that a patient is more or less likely to respond than a younger patient? Does the probability vary linearly or nonlinearly with age? Does it depend on some other factor, such as whether or not the patient is a smoker?
The true relationship is likely to be very complicated, and may depend on predictors that we have not included in our model. But, crucially, even though our model may be incomplete or may not fully describe the relationship between the predictors and response, it may nevertheless enable us to make useful predictions.
How can we decide if a model is useful? One way is to use it to predict the future, or to make predictions for people we have not seen before, and then to see how well these predictions match up with reality. For example, for our breast cancer example, we might use a model to predict whether or not each of 1,000 new patients entering an oncology clinic will respond to a particular treatment. Having received the treatment, we could then follow up in, say, 5 year’s time, and count the number of patients who were cancer-free.
In practice, we might have to wait a long time before we can tell if our model’s predictions were accurate, or we might want to make decisions and undertake interventions before the future predicted by our model comes to pass. For example, for COVID-19, models from Imperial College predicted that hundreds of thousands of people in the UK would die before August if the UK government took no action. Since the government subsequently enacted a number of measures, we cannot assess the quality of these predictions simply by comparing them to reality — because the actions taken by the government were intended to try to make sure that this prediction would not come to pass.
In such cases, statisticians may instead perform sensitivity analyses for their models. Roughly speaking, these look to see how our model’s predictions would change if some of the assumptions we made when we came up with our model were a tiny bit wrong, or if they were fairly wrong, or if they were very wrong. If the predictions do not change a great deal under each of these different scenarios, then this typically increases our confidence in them. On the other hand, if the predictions change a great deal, then this might motivate further work to assess whether or not the assumptions underpinning our model are valid.
As described above, statistical and mathematical models necessarily have limitations, but they have played a crucial role in enabling scientists to understand the world around us, and to make predictions about the future. Not all models are useful, and — even for those that are — a crucial part of the scientific process is to update and improve our models in light of new data and new discoveries. But, as we have seen during recent events, models can be very important and can have a significant impact on our lives.
Blog post by Dr Paul Kirk
Paul Kirk leads a group at the MRC Biostatistics Unit, and has written a number of articles on modelling in collaboration with his co-authors:
- Kirk, P. D. W., Babtie, A. C., & Stumpf, M. P. H. (2015). Systems biology (un)certainties. Science (New York, NY), 350(6259), 386–388.
- Babtie, A. C., Kirk, P., & Stumpf, M. P. H. (2014). Topological sensitivity analysis for systems biology. Proceedings of the National Academy of Sciences of the United States of America, 111(52), 18507–18512.
- Liepe, J., Kirk, P., Filippi, S., Toni, T., Barnes, C. P., & Stumpf, M. P. H. (2014). A framework for parameter estimation and model selection from experimental data in systems biology using approximate Bayesian computation. Nature Protocols, 9(2), 439–456.
- Kirk, P., Thorne, T., & Stumpf, M. P. (2013). Model selection in systems and synthetic biology. Current Opinion in Biotechnology.
Further reading: An interesting, and closely related, post by Bruno Gonçalves about the challenges of modelling for COVID-19 is available here: Epidemic Modeling 102: All CoVID-19 models are wrong, but some are useful