Date of Award

May 2020

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

School of Computing

Committee Member

Ilya Safro

Committee Member

Amy Apon

Committee Member

Sez Atamturktur

Committee Member

Brian Dean

Committee Member

Alexander Herzog

Abstract

As the size and scope of online data continues to grow, new machine learning techniques become necessary to best capitalize on the wealth of available information. However, the models that help convert data into knowledge require nontrivial processes to make sense of large collections of text and massive online graphs. In both scenarios, modern machine learning pipelines produce embeddings --- semantically rich vectors of latent features --- to convert human constructs for machine understanding. In this dissertation we focus on information available within biomedical science, including human-written abstracts of scientific papers, as well as machine-generated graphs of biomedical entity relationships. We present the Moliere system, and our method for identifying new discoveries through the use of natural language processing and graph mining algorithms. We propose heuristically-based ranking criteria to augment Moliere, and leverage this ranking to identify a new gene-treatment target for HIV-associated Neurodegenerative Disorders. We additionally focus on the latent features of graphs, and propose a new bipartite graph embedding technique. Using our graph embedding, we advance the state-of-the-art in hypergraph partitioning quality. Having newfound intuition of graph embeddings, we present Agatha, a deep-learning approach to hypothesis generation. This system learns a data-driven ranking criteria derived from the embeddings of our large proposed biomedical semantic graph. To produce human-readable results, we additionally propose CBAG, a technique for conditional biomedical abstract generation.

Recommended Citation

Sybrandt, Justin George, "Exploiting Latent Features of Text and Graphs" (2020). All Dissertations. 2592.
https://tigerprints.clemson.edu/all_dissertations/2592

Download

COinS

All Dissertations

Exploiting Latent Features of Text and Graphs

Date of Award

Document Type

Degree Name

Department

Committee Member

Committee Member

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Search

Browse by

Useful Links

All Dissertations

Exploiting Latent Features of Text and Graphs

Author

Date of Award

Document Type

Degree Name

Department

Committee Member

Committee Member

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Share

Search

Browse by

Useful Links