Speaker: Richard Guo, Stas Lab, University of Cambridge
Abstract: Many modern statistical procedures are randomized in the sense that the output is a random function of data. For example, many procedures employ data splitting, which randomly divides the dataset into disjoint parts for separate purposes. Despite their flexibility and popularity, data splitting and other constructions of randomized procedures have obvious drawbacks. First, two analyses of the same dataset may lead to different results due to the extra randomness introduced. Second, randomized procedures typically lose statistical power because the entire sample is not fully utilized.
To address these drawbacks, in this talk, I will study how to properly combine the results from multiple realizations (such as through multiple data splits) of a randomized procedure. I will introduce rank-transformed subsampling as a general method for delivering large sample inference of the combined result under minimal assumptions. I will illustrate the method with three applications: (1) a “hunt-and-test” procedure for detecting cancer subtypes using high-dimensional gene expression data, (2) testing the hypothesis of no direct effect in a sequentially randomized trial and (3) calibrating cross-fit “double machine learning” confidence intervals. For these problems, our method is able to derandomize and improve power. Moreover, in contrast to existing approaches for combining p-values, our method enjoys type-I error control that asymptotically approaches the nominal level. This new development opens up the possibility of designing procedures that explicitly randomize and derandomize: extra randomness is introduced to make the problem easier before being marginalized out.
This talk is based on joint work with Rajen Shah.
Speaker bio: Richard Guo is a research associate in the Statistical Laboratory at the University of Cambridge, mentored by Prof. Rajen Shah. Previously, he was the Richard M. Karp Research Fellow in the 2022 causality program at the Simons Institute for the Theory of Computing. He received his PhD in Statistics from University of Washington in 2021, advised by Thomas Richardson. His research interests include graphical models, causal inference, semiparametric methods and replicability of data analysis. Dr. Guo will start as an assistant professor in Biostatistics at University of Washington in January 2024.
This will be a free hybrid seminar. To register to attend virtually, please click here: https://us02web.zoom.us/meeting/register/tZAofuihqzgvHNyG24AgAAytgHlr1nmpetpx