Date of Award
Doctor of Philosophy (PhD)
Dr. Christopher S. McMahan, Committee Co-Chair
Dr. Colin M. Gallagher, Committee Co-Chair
Dr. Robert B. Lund
Dr. Xiaoqian Sun
This dissertation consists of three projects that make use of latent variable modeling techniques. One of the focuses of this dissertation research has been in the area of spatial and spatio-temporal modeling. The specific topics and motivating problems in this study have been fully supported and motivated by the Companion Animal Parasite Council (CAPC). In particular, the CAPC has developed a rather extensive database, which houses several common dog disease data sets collected throughout the conterminous United States. This data exists at a county level and was collected monthly over a span of 5 consecutive years, and exhibits strong spatial and temporal correlation structures. Further, due to non-reporting counties a significant portion of the data is missing, both in the spatial and temporal domain. The goal of our work in this area was to identify risk factors significantly related to the prevalence of the various diseases and to develop models which could be used to accurately forecast future disease trends nationwide. No similar work has been completed for these diseases on the spatio-temporal scale that we consider. To accomplish this task, we developed and implemented a Bayesian spatio-temporal regression model to analyze the data. Due to the relatively large spatial scale and complex structure of the data, a key challenge was developing computationally efficient algorithms that could be used to implement Markov chain Monte Carlo (MCMC) techniques. Once this was completed, we implemented our models to assess the relevance of the considered covariates and to forecast future trends. In addition to the spatial and spatio-temporal modeling problems, this dissertation research also focus on developing new modeling techniques for data collected on pooled specimens. The concept of using pooling as a more cost effective data collection technique is becoming pervasive in the biological sciences and elsewhere. In particular pooled data is collected by first amalgamating several specimens (e.g., blood, urine, etc.), collected from individuals, into a pooled sample, this pooled sample is then measured for a characteristic of interest; e.g., in infectious disease studies the pooled outcome is typically binary indicating disease status and in biological marker (i.e., biomarker) evaluation studies the outcome is continuous. In either case, information on several individuals is obtained at the expense of making only one measurement, thus reducing the cost of data collection. However, the statistical analysis of measurements (either binary or continuous) taken on pools is often fraught with many challenges. In my dissertation research, I have considered developing regression methods for both continuous and binary outcomes measured on pools. For continuous outcomes, I proposed a general regression framework which can be used to analyze pooled outcomes under practically all parametric models. This was accomplished through the use of an advanced Monte Carlo sampling algorithm, which was implemented to approximate the observed data likelihood. Proceeding in this fashion, also allows us to account for measurement error, which has not been accounted for previously, and led to the development of computationally efficient software which can be used to implement the proposed approach. For binary outcomes (usually referred to as group testing data), I developed a novel Bayesian generalized additive model. Specifically, the proposed approach assumes the linear predictor depends on several unknown smooth functions of some covariates as well as linear combinations of other covariates. In addition, our model can account for imperfect testing, and can be used to analyze data collected according to any group testing process.
Liu, Yan, "Latent data modeling with biostatistical applications" (2017). All Dissertations. 2005.