IMS Anniversary Symposium on Statistics
Friday, Oct 20, 9:30—10:30 AM, IMS RS408, Jinhong You (Shanghai University of Finance and Economics)
Title: Two Sample Testing for High-dimensional Functional Data: A Multi-resolution
Projection Method
Abstract: It is of great interest to test the equality of the means in two samples of functional
data. Past research has predominantly concentrated on low-dimensional functional data, a focus that may not hold up in high-dimensional scenarios. In this article, we propose a novel two-sample test for the mean functions of high-dimensional functional data, employing a multi-resolution projection (MRP) method. We establish the asymptotic normality of the proposed MRP test statistic and investigate its power performance when the dimension of the functional variables is high. In practice, functional data are observed only at discrete points. We further explore the influence of function reconstruction on our test statistic theoretically. Finally, we assess the finite-sample performance of our test through extensive simulation studies and
demonstrate its practicality via two real data applications. Specifically, our analysis of global climate data uncovers significant differences in the functional means of climate variables in the years 2020-2069 when comparing intermediate greenhouse gas emission pathways (e.g., RCP4.5) to high greenhouse gas emission pathways (e.g., RCP8.5).
Friday, Oct 20, 11:00—12:00 AM, IMS RS408, Tao Wang (Shanghai Jiao Tong University)
Title: Analysis of Sparse Compositions of Microbiomes
Abstract: A central objective in microbiome research is to identify microbes that play crucial roles in both health and disease, with the potential of these microbes to serve as biomarkers for preventing, diagnosing, and treating diseases. However, in microbiome studies, feature tables provide relative rather than absolute abundance of each feature in each sample, as the microbial loads of the samples and the ratios of sequencing depth to microbial load are both unknown and subject to considerable variation. Moreover, microbiome abundance data are count-valued, often over-dispersed, and contain a substantial proportion of zeros. The presence of compositionality, sparsity, and over-dispersion presents formidable challenges for absolute abundance analysis, leading to potentially misleading results when classical data analysis methods are applied. To address these challenges, we introduce a model-based approach called mbDecoda, for debiased analysis of sparse compositions of microbiomes. mbDecoda employs a zero-inflated negative binomial model, linking mean abundance to the variable of interest through a log link function, and it accommodates the adjustment for confounding factors. To efficiently obtain maximum likelihood estimates of model parameters, an Expectation Maximization algorithm is developed. A minimum coverage interval approach is then proposed to rectify compositional bias, enabling accurate and reliable absolute abundance analysis. Simulated examples and real-world data applications are used to comprehensively demonstrate the robustness and effectiveness of mbDecoda in the context of absolute abundance analysis.
Friday, Oct 20, 14:30—15:30 PM, IMS RS408, Zhou Yu (East China Normal University)
Title: Random Forests and Deep Neural Networks for Euclidean and Non-Euclidean Regression
Abstract: Neural networks and random forests are popular and promising tools for machine learning. We explore the proper integration of these two approaches for nonparametric regression to improve the performance of a single approach. It naturally synthesizes the local relation adaptivity of random forests and the strong global approximation ability of neural networks.. By utilizing advanced U-process theory and an appropriate network structure, we obtain the minimax convergence rate for the estimator. Moreover, we propose the novel random forest weighted local Frechet regression paradigm for regression with Non-Euclidean responses. We establish the consistency, rate of convergence, and asymptotic normality for the Non-Euclidean random forests based estimator.
Friday, Oct 20, 16:00—17:00 PM, IMS RS408, Zhao Chen (Fudan University)
Title: Hypothesis Testing on High Dimensional Quantile Regression
Abstract: Quantile regression has been an important analytical tool in econometrics since it
was proposed in 1970s. Many advantages make it still popular in the era of big data. This paper focuses on the testing problems of high-dimensional quantile regression, supplementing to robust methods in the literature of high-dimensional hypothesis testing. We first construct a new test statistic based on the quantile regression score function. The new test statistic avoids the inverse operation of the covariance matrix, and hence becomes applicable to high-dimensional or even ultrahigh-dimensional settings. The proposed method retains robustness for non-Gaussian and heavy-tailed distributions. We then derive the limiting distributions of the proposed test statistic under both the null and the alternative hypotheses. We further investigate the case where the design matrix follows an elliptical distribution. We examined the finite sample performance
of our proposed method through Monte Carlo simulations. The numerical comparisons exhibit that our proposed tests outperform some existing methods in terms of controlling Type I error rate and power, when the data deviate from the Gaussian assumptions or are heavy-tailed. We illustrate our proposed high-dimensional quantile testing in financial econometrics, through an empirical analysis of Chinese stock market data.