Research

Colloquia — Spring 2021

Friday, February 5, 2021

Title: Bayesian variable selection for high dimensional data with complex structures
Speaker: Liangliang Zhang
University of Texas MD Anderson Cancer Center
Time: 3:00pm‐4:00pm
Place: MS Teams

Sponsor: K. Ramachandran

Abstract

Due to the emergence of next-generation sequencing techniques, which enable comprehensive profiling of high-throughput sequencing data, there is growing interest in developing novel methods of variable selection for high dimensional data with complex structures. The applicability of standard variable selection methods is complicated by the main challenging characteristics of sequencing data, including compositionality and dependency. To enable sparse regression modeling with sequencing predictors, I proposed a novel zero-constrained prior to handle the compositionality. The zero-constrained prior consistently controls the summation of random coefficients to zero across the model space. To account for the dependency between variables, the model utilized the Ising prior to encourage the joint selection of variables that are similar to each other. I applied this model into a real data set to link patients' body mass index to their gut microbiome sequences. In addition, I extended the Bayesian linear regression framework to Bayesian logistic regression and survival models. In the end, I will take you through my ongoing work in designing a new shrinkage prior with desirable zero concentration rate and tail decaying rate. Unlike the mixture priors, the proposed prior has a closed functional form, which gives advantages to analytical and theoretical investigations.

Friday, February 4, 2021

Title: Statistical Inference Based on Neural Networks
Speaker: Xiaoxi Shen
University of Florida Time: 3:00pm‐4:00pm
Place: MS Teams

Sponsor: K. Ramachandran

Abstract

Neural networks have become increasingly popular in the field of machine learning and have been successfully used in many applied fields (e.g., imaging recognition). With more research having been conducted on neural networks, we have a better understanding of the statistical proprieties of neural networks. While many studies focus on bounding the prediction error of neural network estimators, limited research has been done on the statistical inference of neural networks. From a statistical point of view, it is of great interest to investigate the statistical inference of neural networks as it could facilitate hypothesis testing in many fields (e.g., genetics, epidemiology, and medical science). In this talk, some statistical properties of neural networks will be reviewed, and a goodness-of-fit test statistic based on neural network sieve estimators will be introduced. The test statistic follows an asymptotic normal distribution, which makes it easy to use in practice. The applicability of such a test is investigated via simulations.

Friday, January 29, 2021

Title: Engrafting Statistical Methodologies on Artificial Intelligence to Solve Image Segmentation Problems
Speaker: Jiwoong Kim
Michigan State University
Time: 3:00pm‐4:00pm
Place: MS Teams

Sponsor: K. Ramachandran

Abstract

We are observing keen competition to win in the Fourth Industrial Revolution, which is featured by a relentless advance of technological breakthroughs. Among those technological breakthroughs, artificial intelligence such as machine learning has attracted attention from scientists as a possible alternative to solve image segmentation problems. Especially, medical image segmentation — which segments object of interest such as a tumor in a medical image — has gained enormous attention since it can be applied to diagnose a disease. For example, medical institutions and pharmaceutical companies are rushing to adopt artificial intelligence to diagnose a variety of intractable brain diseases such as dementia and Alzheimer’s disease. In this presentation, we will demonstrate that the performance of artificial intelligence can be enhanced much more when it is integrated with statistical methodologies. Case studies illustrate how artificial intelligence enhanced by those methodologies can be applied to the medical diagnosis of intractable brain diseases.

This is joint work with Hira Koul and Jinhee Jang.

Thursday, January 28, 2021

Title: Gaussian Process Modeling with Applications in Remote Sensing and Coastal Flood Hazard Studies
Speaker: Pulong Ma
Duke University and Statistical and Applied Mathematical Sciences Institute
Time: 3:00pm‐4:00pm
Place: MS Teams

Sponsor: K. Ramachandran

Abstract

In the first part of my talk, I will give an introduction to the OCO-2 mission and science that motivate this research. In the second part of my talk, I will introduce a new family of covariance functions called the Confluent Hypergeometric (CH) class for kriging or Gaussian process modeling, which has been widely used to understand and predict real-world processes. In the past several decades, the Matérn covariance function has been a popular choice to model dependence structures in spatial statistics. A key benefit of the Matérn class is that it is possible to get precise control over the degree of differentiability of the process realizations. However, the Matérn class possesses exponentially decaying tails, and thus may not be suitable for modeling polynomial-tailed dependence. This problem can be remedied using polynomial covariances; however, one loses control over the degree of differentiability of the process realizations, in that the realizations using polynomial covariances are either infinitely differentiable or not differentiable at all. To overcome this dilemma, a new family of covariance functions is constructed using a scale mixture representation of the Matérn class where one obtains the benefits of both Matérn and polynomial covariances. The resultant covariance contains two parameters: one controls the degree of differentiability near the origin and the other controls the tail heaviness, independently of each other. The CH class also enjoys nice theoretical properties under infill asymptotics including equivalence measures, asymptotic behavior of the maximum likelihood estimators, and asymptotically efficient prediction under misspecified models. The improved theoretical properties in the predictive performance of the CH class are verified via extensive simulations. Application using OCO-2 data confirms the advantage of the CH class over the Matérn class, especially in extrapolative settings. Finally, I will give a brief overview of my research in UQ for remote sensing and coastal flood hazard studies as well as future research directions.

Friday, January 22, 2021

Title: Signals Aggregation and Detection Methods for Biomedical Data Analysis
Speaker: Hong Zhang
Merck Research Laboratories
Time: 3:00pm–4:00pm
Location: MS Teams
Sponsor: K. Ramachandran

Abstract

The statistical theory of signals aggregation has played a key role in advancing scientific research in many areas. The combination of \(p\)-values or, equivalently, statistics is one of the most popular and successful approaches for information-aggregated decision making in a lot of applications such as signals detection, data integration and meta-analysis. In this talk, we present recent research progress on both summation-based and maximum-based \(p\)-values combination methods. The distributional properties of these statistics are studied under a global hypothesis testing framework. Efficient \(p\)-value calculation algorithms are proposed to accurately control the type I errors under correlated inputs. Extensive simulation has been conducted to compare the statistical power of these methods. We finally demonstrate the practical utility of these combination methods through various applications in genetic association studies and clinical data analysis.