Date of Award

12-2016

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Legacy Department

Genetics

Committee Member

Dr. Liangjiang Wang, Committee Chair

Committee Member

Dr. Julia Frugoli

Committee Member

Dr. Hong Luo

Committee Member

Dr. Anand Srivastava

Abstract

Within this study, we sought to leverage knowledge from well-characterized protein coding genes to characterize the lesser known long non-coding RNA (lncRNA) genes using computational methods to find functional annotations and disease associations. Functional genome annotation is an essential step to a systems-level view of the human genome. With this knowledge, we can gain a deeper understanding of how humans develop and function, and a better understanding of human disease. LncRNAs are transcripts greater than 200 nucleotides, which do not code for proteins. LncRNAs have been found to regulate development, tissue and cell differentiation, and organ formation. Their dysregulation has been linked to several diseases including autism spectrum disorder (ASD) and cancer. While a great deal of research has been dedicated to protein-coding genes, the relatively recently discovered lncRNA genes have yet to be characterized. LncRNA function is tied closely to when and where they are expressed. Co-expression network analysis offer a means of functional annotation of uncharacterized genes through a "guilt by association" approach. We have constructed two co-expression networks using known disease-associated protein-coding genes and lncRNA genes. Through clustering of the networks, gene set enrichment analysis, and centrality measures, we found enrichment for disease association and functions as well as identified high-confidence lncRNA disease gene targets. We present a novel approach to the identification of disease state associations by demonstrating genes that are associated with the same disease states share patterns that can be discerned from transcriptomes of healthy tissues. Using a machine learning algorithm, we built a model to classify ASD versus non-ASD genes using their expression profiles from healthy developing human brain tissues. Feature selection during the model-building process also identified critical temporospatial points for the determination of ASD genes. We constructed a webserver tool for the prioritization of genes for ASD association. The webserver tool has a database containing prioritization and co-expression information for nearly every gene in the human genome.

Additional file A-1.csv (43 kB)
File A-1

Additional file A-2.csv (785 kB)
File A-2

Additional file A-3.csv (261 kB)
File A-3

Additional file A-4.csv (11747 kB)
File A-4

Additional file A-5.csv (625 kB)
File A-5

Share

COinS