Symposium| Institute of Mathematical Sciences
Date：Tuesday, July 27th
Nonparametric regressions are popular statistical tools in exploring a general relationship between one dependent variable and one or several covariates. With the rapid development of data collection techniques, the growing data volume brings big challenges to data scientists. This mini- Symposium aims to bring together excellent young researchers to share their recent research in the field of large-scale nonparametric regressions. This mini-Symposium will mainly focus on the computational challenges in large-scale nonparametric regressions. Welcome for your attendance.
Title: Basis selection methods in nonparametric regressions
Speaker: Cheng Meng, Institute of Statistics and Big Data, Renmin University of China
Abstract: Nonparametric regressions, e.g., kernel ridge regression and smoothing splines, are popular statistical tools. However, the computational burden of such methods is significant when the sample size n is large. Considering the smoothing spline. When the number of predictors d>1, the computational cost for such methods is at the order of O(n^3) using the standard approach. Many methods have been developed to approximate smoothing spline estimators by using q basis functions instead of n ones, resulting in a computational cost of the order O(nq^2). These methods are called the basis selection methods. In this talk, we will present recent works on how to determine the size of q, and how to select a set of informative basis functions.
Title: Oversampling Divide-and-conquer for Response-skewed Kernel Ridge Regression
Speaker: Jingyi Zhang, Center for Statistical Science, Tsinghua University
Abstract: The divide-and-conquer method has been widely used for estimating large-scale kernel ridge regression estimates. Unfortunately, when the response variable is highly skewed, the divide-and-conquer kernel ridge regression (dacKRR) may overlook the underrepresented region and result in unacceptable results. We develop a novel response-adaptive partition strategy to overcome the limitation. In particular, we propose to allocate the replicates of some carefully identified informative observations to multiple nodes (local processors). The idea is analogous to the popular oversampling technique. Although such a technique has been widely used for addressing discrete label skewness, extending it to the dacKRR setting is nontrivial. We provide both theoretical and practical guidance on how to effectively over-sample the observations under the dacKRR setting. Furthermore, we show the proposed estimate has a smaller asymptotic mean squared error (AMSE) than that of the classical dacKRR estimate under mild conditions. Our theoretical findings are supported by both simulated and real-data analyses.
Zoom Link: https://zoom.com.cn/j/86856525272?pwd=VXMveVQ3c1RjMjRNVGl5UUZablJZUT09
Meeting ID: 868 5652 5272