Date of Award

8-2018

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Mathematical Sciences

Committee Member

Dr. William Bridges, Committee Chair

Committee Member

Dr. Neil Calkin

Committee Member

Dr. Matthew Saltzman

Committee Member

Dr. Elizabeth Cooper

Abstract

In this dissertation, we investigate the limitations of several methods that have been proposed for variable selection in recent decades, and in particular we explore how these limitations arise in genome-wide association studies (GWAS). Three often discussed categories of methods include ordinary least squares, penalized regression, and Bayesian approaches, but while there have been efforts in the past to apply these techniques to GWAS, it remained unclear whether or not one approach was superior to the others, or if certain scenarios might favor a given method. In this research, our results from two real data sets reveal that the three categories of approaches do not yield consistent sets of selected variables, so we use simulations to determine which factors might be driving this inconsistency and subsequently to assess how these factors could also be impacting accuracy. Specific issues that are considered in this dissertation in terms of their effects on method performance are: 1) the relationship between the number of variables (p) and the sample size (n), 2) the level of correlation among variables, 3) family structure among samples, 4) measurement errors, and 5) the complexity of the true underlying model. After evaluating the impact of these five factors on variable selections using a full factorial experimental design, we found that the n-p relationship and the model complexity had the biggest influence on all of the methods. Because both of these factors commonly occur in real world genetic data sets, our findings indicate that biologists should be wary of accepting results from any single given test, and that new methods will be needed to address the problems of modern genomics.

Recommended Citation

Wei, Tianhui, "Variable Selection for Complex Data with Sparsity: An Application in GWAS" (2018). All Dissertations. 2178.
https://tigerprints.clemson.edu/all_dissertations/2178

Download

COinS

All Dissertations

Variable Selection for Complex Data with Sparsity: An Application in GWAS

Date of Award

Document Type

Degree Name

Department

Committee Member

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Search

Browse by

Useful Links

All Dissertations

Variable Selection for Complex Data with Sparsity: An Application in GWAS

Author

Date of Award

Document Type

Degree Name

Department

Committee Member

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Share

Search

Browse by

Useful Links