Non-Gaussian models for spatial and spatio-temporal processes

Authors

Paritosh Kumar Roy

Published

November 7, 2024

Publication details

Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Canada

Links

 

Abstract

Non-Gaussian data distributions are common in spatial and spatio-temporal processes, leading analysts to adopt Gaussian assumptions after applying appropriate data transformations. However, modeling data in their original scale while accommodating non-Gaussian behavior is preferred for better quantifying prediction uncertainty. Challenges also arise when the variability of the processes exhibits patterns that transformation methods cannot effectively accommodate. Recently proposed models that avoid arbitrary transformations use two independent Gaussian processes (GP). The location mixture of a GP using a log-GP is adept at handling skewness, while the scale mixture of a GP using a log-GP can accommodate heavy-tailed symmetric distributions. This thesis centers on advancing Bayesian methods for models incorporating two independent GPs. These methods are designed to accommodate skewness and heavy-tailed distributions for univariate and multivariate spatial processes and for multivariate spatio-temporal processes. While promising, the inference procedures for both models incur high computational costs due to the involvement of two independent GPs. The problem becomes intractable even with a few hundred locations, and this thesis is also concerned with addressing the computational complexities in analyzing large datasets.

The first manuscript is focused on modeling a skewed spatial process following a Gaussian log-Gaussian convolution (GLGC) model, which offers flexibility in handling skewness. To address the challenge of fitting this model to large datasets, it introduces three approximation methods based on nearest neighbor GP (NNGP) and Hilbert space GP (HSGP) to manage computational complexity. Simulation studies and temperature data from 3,000 locations show that all methods perform comparably to exact inference, with the HSGP method being the fastest for smooth processes. A hybrid approach incorporating NNGP and HSGP enhances Markov chain Monte Carlo efficiency and speeds up inference for wiggly processes.

The second manuscript introduces an extension of the model built upon a scale mixture of a GP using a log-GP to handle heterogeneous and heavy-tailed processes in multivariate cases. Within the linear model of coregionalization (LMC) framework, this multivariate model independently applies a mixture model to each process, capturing dependence, spatial variability, and accommodating outliers. Simulation studies confirm that the parameters are identifiable, and specific parameters are crucial in evaluating potential models and selecting the most suitable ones. The model’s effectiveness is demonstrated by analyzing UK air pollutant data. Finally, Vecchia-based approximation methods address the computational complexities of analyzing multivariate large datasets.

The third manuscript extends the multivariate non-Gaussian spatial model from the second manuscript to a spatio-temporal context, where time is discrete; thus, temporal dependency is captured by a dynamic linear model. The spatial and temporal mixing processes are handled separately under a separability assumption. The spatial component mirrors the second manuscript’s approach, while a dynamic linear model describes the temporal component. Therefore, the model is also a multivariate extension of the recently introduced dynamical non-Gaussian modeling of univariate spatial processes built upon a scale mixture of a GP using a log-GP. The model’s properties are derived and discussed, with simulation studies confirming parameter identifiability. The analyses of artificially contaminated data and UK air pollutants illustrate the improvement in quantifying prediction uncertainty through the proposed model.

Collectively, these projects advance methods for analyzing high-dimensional spatial and spatio-temporal non-Gaussian data on their original scale, providing a better understanding of the processes of interest and enhancing uncertainty quantification of predictions.