skip to content

MRC Biostatistics Unit

Title: "A Regression Tree Approach to Missing Data"

Speaker: Professor Wei-Yin Loh, Department of Statistics, University of Wisconsin, Madison, USA

Abstract: Analysis of data with missing values is arguably the hardest problem in statistics.  Statistical methods are often designed for completely observed data and are inapplicable if some values are missing.  Although there are many techniques for imputation of missing values, the statistical properties of the resulting fitted models are unknown, except in special situations that require unverifiable and likely unjustifiable assumptions, such as "missing at random" (MAR) and "no unobserved confounding". We use a large dataset of electronic health records of Covid-19 patients and a national consumer expenditure survey to show that (1) routine imputation of missing data is inadvisable and even illogical, as missingness itself can contain useful information that imputation destroys and (2) popular imputation algorithms such as MICE are impractical when the amount of missing data is large. We also show how the GUIDE classification and regression tree method easily overcomes these difficulties. GUIDE is unique among tree algorithms in many respects, including its ability to completely avoid imputation of missing data in predictor variables and to explicitly display the effects of missing values in its decision tree diagrams. Literature on GUIDE and its accompanying software may be obtained at https://pages.stat.wisc.edu/~loh/guide.html.


This will be a free hybrid seminar. To register to attend virtually, please click here: https://cam-ac-uk.zoom.us/meeting/register/5qcUxSTvSKmUXCSScvBvWg

Date: 
Tuesday, 13 May, 2025 - 14:00 to 15:00
Event location: 
MRC Biostatistics Unit, East Forvie Building, Forvie Site, Robinson Way, Cambridge CB2 0SR