Insitute of Mathematical Science

# Symposium on Large-scale nonparametric regressions

Symposium| Institute of Mathematical Sciences

Organizer：Shufei Ge

DateTuesday, July 27th

LocationZoom

Nonparametric regressions are popular statistical tools in exploring a general relationship between one dependent variable and one or several covariates. With the rapid development of data collection techniques, the growing data volume brings big challenges to data scientists.  This mini- Symposium aims to bring together excellent young researchers to share their recent research in the field of large-scale nonparametric regressions. This mini-Symposium will mainly focus on the computational challenges in large-scale nonparametric regressions.  Welcome for your attendance.

Schedule

13:30-14:15

Title: Basis selection methods in nonparametric regressions

Speaker: Cheng Meng, Institute of Statistics and Big Data, Renmin University of China

Abstract: Nonparametric regressions, e.g., kernel ridge regression and smoothing splines, are popular statistical tools. However, the computational burden of such methods is significant when the sample size n is large. Considering the smoothing spline. When the number of predictors d>1, the computational cost for such methods is at the order of O(n^3) using the standard approach. Many methods have been developed to approximate smoothing spline estimators by using q basis functions instead of n ones, resulting in a computational cost of the order O(nq^2). These methods are called the basis selection methods. In this talk, we will present recent works on how to determine the size of q, and how to select a set of informative basis functions.

14:15-15:00

Title: Oversampling Divide-and-conquer for Response-skewed Kernel Ridge Regression

Speaker: Jingyi Zhang, Center for Statistical Science, Tsinghua University

Abstract: The divide-and-conquer method has been widely used for estimating large-scale kernel ridge regression estimates. Unfortunately, when the response variable is highly skewed, the divide-and-conquer kernel ridge regression (dacKRR) may overlook the underrepresented region and result in unacceptable results. We develop a novel response-adaptive partition strategy to overcome the limitation. In particular, we propose to allocate the replicates of some carefully identified informative observations to multiple nodes (local processors). The idea is analogous to the popular oversampling technique. Although such a technique has been widely used for addressing discrete label skewness, extending it to the dacKRR setting is nontrivial. We provide both theoretical and practical guidance on how to effectively over-sample the observations under the dacKRR setting. Furthermore, we show the proposed estimate has a smaller asymptotic mean squared error (AMSE) than that of the classical dacKRR estimate under mild conditions. Our theoretical findings are supported by both simulated and real-data analyses.

Meeting ID:  868 5652 5272

Passcode: 604733

200031（岳阳路校区） 