Date of Award

5-2018

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

School of Computing

Committee Member

Dr. James Z. Wang, Committee Chair

Committee Member

Dr. Pradip K Srimani

Committee Member

Dr. Jim Martin

Committee Member

Dr. Feng Luo

Abstract

Millions of text data are penetrating into our daily life. These unstructured text data serve as a huge source of information. Efficient organization and analysis of the overwhelming text can filter out irrelevant and redundant information, uncover invaluable knowledge, thus significantly reduce human effort, facilitate knowledge discovery and enhance cognitive abilities. Semantic similarity analysis among text objects is one of the fundamental problems in text mining including document classifi-cation/clustering, recommendation, query expansion, information retrieval, relevance feedback, word sense disambiguation, etc. While a combination of common sense and domain knowledge could let a person quickly determine if two objects are similar, the computers understand very little of human thinking. Knowledge resources such as ontologies can greatly capture the semantics of text objects, which enables the numeric representation of both domain knowledge and context information. In this dissertation, we develop a series of techniques to measure the semantic similarity of objects in multiple domains. By utilizing the structured knowledge that has already been established, we explore the domain knowledge from the existing lexical resources and incorporate it into specific applications within different domains. Specifically, we investigate the semantic similarities between gene products using Gene Ontology in biology domain. In text domain, we propose a hybrid representation of text objects (words and documents) based on WordNet which exploits both context and ontology information to extract meaningful information from the unstructured text to measure the semantic similarity of text documents.

Share

COinS