
Submitted by A.S. Quenault on Mon, 22/12/2025 - 10:58
Research published in the Journal of the Royal Statistical Society Series B: Statistical Methodology (JRSS-B) highlights an innovative new approach to combining and analysing data from many different sources.
In medical research, we often try to combine as much data as possible to get the most precise results. However, we frequently face a mix of high-quality data (like rigorous clinical trials) and lower-quality data (like large, messy internet surveys). A challenge is that standard statistical methods often let the sheer volume of the low-quality data drown out the smaller, high-quality datasets. It allows the "bad apples" to spoil the bunch.
Bayesian statistical approaches can in particular struggle because Bayesian approaches allow information to flow freely back and forth within the model so that everything influences everything. This is unhelpful when there is "bad apple" data because the "bad apple" data will contaminate all results.
In the paper, Robert Goudie and Yang Liu consider a solution to this problem using "cut distributions". This approach splits up a Bayesian model into modules. For example, one module could contain high-quality data and the other could contain lower-quality data. Cut distributions allow us to then control the flow of information between the modules, preventing the lower-quality data from contaminating the high-quality data. In the paper, we formalise this idea and provide an algorithm to enable users to translate knowledge of how reliable each data source is into a specific cut distribution. Notably, our approach allows for models involving several different modules, as would be common in medical research.
Robert Goudie, Group Leader and senior author, said:
The challenge of needing to use data of widely differing quality is becoming ever more common in medical research. Our work lays the technical foundation upon which we can build more useful and reliable statistical analysis approaches in this situation.
Read full paper: academic.oup.com/jrsssb/article/87/4/1171/8096422