Date of Award
Doctor of Philosophy (PhD)
Dr. William Bridges, Committee Chair
Dr. Neil Calkin
Dr. Matthew Saltzman
Dr. Elizabeth Cooper
In this dissertation, we investigate the limitations of several methods that have been proposed for variable selection in recent decades, and in particular we explore how these limitations arise in genome-wide association studies (GWAS). Three often discussed categories of methods include ordinary least squares, penalized regression, and Bayesian approaches, but while there have been efforts in the past to apply these techniques to GWAS, it remained unclear whether or not one approach was superior to the others, or if certain scenarios might favor a given method. In this research, our results from two real data sets reveal that the three categories of approaches do not yield consistent sets of selected variables, so we use simulations to determine which factors might be driving this inconsistency and subsequently to assess how these factors could also be impacting accuracy. Specific issues that are considered in this dissertation in terms of their effects on method performance are: 1) the relationship between the number of variables (p) and the sample size (n), 2) the level of correlation among variables, 3) family structure among samples, 4) measurement errors, and 5) the complexity of the true underlying model. After evaluating the impact of these five factors on variable selections using a full factorial experimental design, we found that the n-p relationship and the model complexity had the biggest influence on all of the methods. Because both of these factors commonly occur in real world genetic data sets, our findings indicate that biologists should be wary of accepting results from any single given test, and that new methods will be needed to address the problems of modern genomics.
Wei, Tianhui, "Variable Selection for Complex Data with Sparsity: An Application in GWAS" (2018). All Dissertations. 2178.