Symposium on Statistics,Data Science and Artificial Intelligence
Organizer: Pengcheng Zeng
Date: Sunday, 24th November 2024
Location: S408, IMS
Schedule
8:30-9:10
Title: Penalized Weighted GEEs for High-dimensional Longitudinal Data with Informative Cluter Size
Speaker: Xuejun Jiang , Department of Statistics and Data Science, Southern University of Science and Technology
Abstract:
High-dimensional longitudinal data has become increasingly common in recent studies. Penalized generalized estimating equations (GEEs) are frequently employed to model such data. However, the desirable properties of the GEE method break down when the outcome of interest is related to the cluster size, a phenomenon known as informative cluster size. In this article, we address this issue by formulating the effect of informative cluster size and proposing a novel weighted GEE approach to mitigate its adverse impact. We demonstrate that the penalized weighted GEE approach achieves consistency in both model selection and estimation. Theoretically, we establish that the proposed penalized weighted GEE estimator is asymptotically equivalent to the oracle estimator, when we know the true model in advance. This finding indicates that the penalized weighted GEE approach retains the excellent properties of the GEE method and is robust to informative cluster sizes, thereby extending the applicability of the GEE method to more complex situations. Simulations and a real data application demonstrate that the penalized weighted GEE have outstanding performance over existing alternative methods.
9:10-9:50
Title: AlphaGo for Global Statistical Optimization without Initial Value and Stepsize Constraints
Speaker: Xiaodong Yan , School of Mathematics and Statistics, Xi'an Jiao Tong University
Abstract:
This work introduces a unified slot machine framework for global optimization, transforming the search for global optimizers into the formulation of an optimal bandit strategy over infinite policy sets. Inspired by AlphaGo's success with Monte Carlo Tree Search, we develop the Strategic Monte Carlo Optimization (SMCO) algorithm, which extends the exploration space by employing tree search methods. SMCO generates points coordinate-wise from paired distributions, facilitating parallel implementation for high-dimensional continuous functions. Unlike gradient descent ascent (GDA), which follows a single-directional path and depends on initial points and step sizes, SMCO takes a two-sided sampling approach, ensuring robustness to these parameters. We establish convergence to global optimizers almost surely and prove a strategic law of large numbers for nonlinear expectations. Numerical results demonstrate that SMCO outperforms GDA, particle swarm optimization, and simulated annealing in both speed and accuracy.
10:10-10:50
Title: An Adaptive Transfer Learning Framework for Functional Classification
Speaker: Yang Bai, School of Statistics and Management, Shanghai University of Finance and Economics
Abstract:
In this paper, we study the transfer learning problem in functional classification, aiming to improve the classification accuracy of the target data by leveraging information from related source datasets. To facilitate transfer learning, we propose a novel transferability function tailored for classification problems, enabling a more accurate evaluation of the similarity between source and target dataset distributions. Interestingly, we find that a source dataset can offer more substantial benefits under certain conditions than another dataset with an identical distribution to the target dataset. This observation renders the commonly-used debiasing step in the parameter-based transfer learning algorithm unnecessary under some circumstances to the classification problem. In particular, we propose two adaptive transfer learning algorithms based on the functional Distance Weighted Discrimination (DWD) classifier for scenarios with and without prior knowledge regarding informative sources. Furthermore, we establish the upper bound on the excess risk of the proposed classifiers, providing the statistical gain via transfer learning mathematically provable. Simulation studies are conducted to thoroughly examine the finite-sample performance of the proposed algorithms. Finally, we implement the proposed method to Beijing air-quality data, and significantly improve the prediction of the PM2.5 level of a target station by effectively incorporating information from source datasets.
10:50-11:30
Title: Weighted Q-learning for Optimal Dynamic Treatment Regimes with Nonignorable Missing Covariates
Speaker: Bo Fu, School of Big Data, Fudan University
Abstract:
Dynamic treatment regimes (DTRs) formalize medical decision-making as a sequence of rules for different stages, mapping patient-level information to recommended treatments. In practice, estimating an optimal DTR using observational data from electronic medical record (EMR) databases can be complicated by nonignorable missing covariates resulting from informative monitoring of patients. Since complete case analysis can provide consistent estimation of outcome model parameters under the assumption of outcome-independent missingness, Q-learning is a natural approach to accommodating nonignorable missing covariates. However, the backward induction algorithm used in Q-learning can introduce challenges, as nonignorable missing covariates at later stages can result in nonignorable missing pseudo-outcomes at earlier stages, leading to suboptimal DTRs, even if the longitudinal outcome variables are fully observed. To address this unique missing data problem in DTR settings, we propose two weighted Q-learning approaches where inverse probability weights for missingness of the pseudo-outcomes are obtained through estimating equations with valid nonresponse instrumental variables or sensitivity analysis. The asymptotic properties of the weighted Q-learning estimators are derived, and the finite-sample performance of the proposed methods is evaluated and compared with alternative methods through extensive simulation studies. Using EMR data from the Medical Information Mart for Intensive Care database, we apply the proposed methods to investigate the optimal fluid strategy for sepsis patients in intensive care units
11:30-12:10
Title: Efficient Ternary Quantization Mean Estimation for Distributed Learning
Speaker: Xiaojun Mao, School of Mathematical Sciences,Shanghai Jiao Tong University
Abstract:
The increasing size of data has created a pressing need for communication and data privacy protection, which has spurred significant interest in quantization. This paper proposes a novel scheme for variance reduced correlated quantization that is designed for data with bounded support and distributed mean estimation. Our method is shown to achieve a theoretical reduction in mean square error for both fixed and randomized designs compared to the correlated quantization method under different levels and dimensions scenarios. We conducted several synthetic data experiments to illustrate the effectiveness of our approach and to provide a good approximation of the reduced mean square error based on our theory. We further applied our proposed method to real-world data with different learning tasks, and it produced promising results.