2025 – 2026 Department of Statistics Colloquium Speaer
When: Thursday, September 4, 2025—2:50 p.m. to 3:50 p.m.
Where: LeConte 224
Speaker: Dr. Anru Zhang, Department of Biostatistics & Bioinformatics and Department of Computer Science, Duke University
Abstract: The increasing availability of electronic health records (EHRs) and other biomedical data calls for methodologies that can generate high-quality synthetic data while preserving privacy, correcting bias, and addressing complex data structures. In this talk, I will present a series of recent advances in generative modeling for synthetic health data. First, using denoising diffusion probabilistic models, we develop a framework for generating realistic, privacy-preserving EHR time series that achieve superior fidelity and lower privacy risk than existing methods. Second, to address irregularly observed functional data, we introduce Smooth Flow Matching (SFM), a semiparametric copula flow framework capable of generating smooth, infinite-dimensional trajectories under irregular sampling and non-Gaussian structures. Finally, we propose a bias-corrected data synthesis strategy for imbalanced learning, which mitigates distortions introduced by synthetic samples and enhances predictive performance in rare-event classification. Collectively, these methods provide a principled foundation for generative modeling of synthetic health data, enabling privacy-preserving bias-reduced analysis and broader utilization of sensitive biomedical datasets.
When: Thursday, September 4, 2025—2:50 p.m. to 3:50 p.m.
Where: LeConte 224
Speaker: Dr. Cong Ma, Department of Statistics, University of Chicago
Abstract: Integrative data analysis often requires separating shared from individual variations across multiple datasets, typically using the Joint and Individual Variation Explained (JIVE) model. Despite its popularity, theoretical insights into JIVE methods remain limited, particularly in the context of multiple matrices and varying degrees of subspace misalignment. In this talk, I will present new theoretical results on the Angle-based JIVE (AJIVE) method—a two-stage spectral algorithm. Specifically, we establish that AJIVE achieves decreasing estimation error with an increasing number of matrices in high signal-to-noise ratio (SNR) regimes. In contrast, AJIVE faces inherent limitations in low-SNR conditions, where estimation error remains persistently high. Complementary minimax lower bounds confirm AJIVE’s optimal performance at high SNR, while analysis of an oracle estimator highlights fundamental limitations of spectral methods at low SNR.
When: Thursday, September 18, 2025—2:50 p.m. to 3:50 p.m.
Where: LeConte 224
Speaker: Dr. Christopher Wikle, Department of Statistics, University of Missouri
Abstract: The world is full of extreme events. For example, a central question in public health planning might be to assess the likelihood of extreme exposures (meteorological conditions, air pollution, social stress, etc.). Such extreme events typically occur in spatial and/or temporal clusters. Yet, the principal methodologies that statisticians deal with spatially dependent processes (Gaussian processes and Markov random fields) are not suitable for complex tail dependence structures. This is particularly true of simulation model emulation. More flexible spatial extremes models exhibit appealing extremal dependence properties but are often exceedingly prohibitive to fit and simulate from in high dimensions. Here I present recent work where we develop a new spatial extremes model that has flexible and non-stationary dependence properties, and we integrate it in the encoding-decoding structure of a variational autoencoder (XVAE), whose parameters are estimated via variational Bayes combined with deep learning. The XVAE can be used to analyze high-dimensional data or as a spatio-temporal emulator that characterizes the distribution of potential mechanistic model output states and produces outputs that have the same statistical properties as the inputs, especially in the tail. Through extensive simulation studies, we show that our XVAE is substantially more time-efficient than traditional Bayesian inference while also outperforming many spatial extremes models with a stationary dependence structure. We demonstrate our method applied to a high-resolution satellite-derived dataset of sea surface temperature in the Red Sea and to a high-resolution simulation model of a turbulent plume, such as one would find in a wildfire. We note, however, that these methods can be applied to any data set or simulation model that exhibits extremes.
When: Thursday, September 25, 2025—2:50 p.m. to 3:50 p.m.
Where: LeConte 224
Speaker: Dr. Seungchul Baek, Department of Mathematics and Statistics, University of Maryland, Baltimore County
Abstract: I introduce two projects related to high-dimensional classification. The first project focuses on developing a classifier using random partitioning. Specifically, we split the original high-dimensional data ($p>n$) into multiple low-dimensional subsets, making sure the number of selected covariates is less than the sample size. Using these partitioned datasets, we apply linear discriminant analysis (LDA) to each subset and propose a method to aggregate the results. We provide theoretical justification for our approach by comparing its misclassification rates to those of LDA in high dimensions. The second project concerns variable selection in high-dimensional classification. By utilizing the recently proposed mirror statistic, we first identify significant variables and then develop a new classifier based on a modified version of the $\epsilon$-greedy algorithm.
When: Tuesday, October 14, 2025—2:50 p.m. to 3:50 p.m.
Where: LeConte 224
Speaker: Dr. Philip Ernst, Department of Mathematics, Imperial College London
Abstract: TBD
When: Thursday, October 16, 2025—2:50 p.m. to 3:50 p.m.
Where: LeConte 224
Speaker: Dr. Jason Klusowski, Department of Operations Research and Financial Engineering, Princeton University
Abstract: TBD
When: Thursday, October 30, 2025—2:50 p.m. to 3:50 p.m.
Where: LeConte 224
Speaker: Dr. Tingting Zhang, Department of Statistics, University of Pittsburgh
Abstract:
When: Thursday, November 6, 2025—2:50 p.m. to 3:50 p.m.
Where: LeConte 224
Speaker: Dr. Nathaniel Josephs, Department of Statistics, North Carolina State University
Abstract: TBD
When: Thursday, November 13, 2025—2:50 p.m. to 3:50 p.m.
Where: LeConte 224
Speaker: Dr. Yichao Wu, Department of Mathematics, Statistics, and Computer Science, University of Illinois Chicago
Abstract: The first part of the talk will focus on the general partially linear model without any structure assumption on the nonparametric component. For such a model with both linear and nonlinear predictors being multivariate, we propose a new variable selection method. Our new method is a unified approach in the sense that it can select both linear and nonlinear predictors simultaneously by solving a single optimization problem. We prove that the proposed method achieves consistency.The second part of the talk will be based on an ongoing research project. In this project, we are extending the above variable selection method to partially global Fréchet regression (Tucker and Wu, 2025 Statistica Sinica).