Colloquium| Institute of Mathematical Sciences
Time:Friday, June 18th ,2021, 14:00-15:00
Location:Zoom
Speaker: Yongdao Zhou, Nankai University
Abstract: Subsampling or subdata selection is a useful approach in large-scale statistical learning. Most existing studies focus on model-based subsampling methods which significantly depend on the model assumption. In this paper, we consider the model-free subsampling strategy for generating subdata from the original full data. In order to measure the goodness of representation of a subdata with respect to the original data, we propose a criterion, generalized empirical F-discrepancy (GEFD), and study its theoretical properties in connection with the classical generalized L2-discrepancy in the theory of uniform designs. These properties allow us to develop a kind of low-GEFD data-driven subsampling method based on the existing uniform designs. By simulation examples and a real case study, we show that the proposed subsampling method enjoys the model-free property and is superior to the random sampling method. In practice, such a model-free property is more appealing than the model-based subsampling methods, where the latter may have poor performance when the model is misspecified, as demonstrated in our simulation studies.
Zoom Link:https://zoom.us/j/99599178096?pwd=QmJ5VExTaWJITkdmaVA3RU9mNWhHZz09
Conference zoom number: 995 9917 8096
Psssword: 057517