Seminar| Institute of Mathematical Sciences
Time: Wednesday, December 13th, 2023 , 16:00-17:00
Speaker: Yanlin Tang, East China Normal University
Abstract: Existing methods for missing clustered data often rely on strong model assumptions and are therefore prone to model misspecification. We construct prediction bands for the whole trajectories of new subjects based on the conformal inference, yielding covariate-dependent prediction bands with coverage guarantees in finite samples, without making any assumptions about model specification and within-cluster dependency structure. We first reduce the clustered data into independent cross-sectional data by subsampling, then propose three weighted conformal methods to produce prediction regions. To make use of the correlation information of the clustered data, we repeat the subsampling and conformal inference, to produce an integrated prediction region by combining the dependent p-values. Among the three proposed methods, the weighted CD-split method yields the smallest prediction region by converging to the highest density set, and provides asymptotic conditional coverage guarantees for each given subject. Simulations show that our methods have excellent finite-sample behavior under different complex error distributions compared to other alternatives. The practical use is demonstrated in the motivating serum cholesterol data and CD4+ cell data sets.