Date of Award

12-2016

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Legacy Department

Computer Science

Committee Member

Dr. Brian Dean, Committee Chair

Committee Member

Dr. Wayne Goddard

Committee Member

Dr. Ilya Safro

Committee Member

Dr. Feng Luo

Abstract

Large “correlation”-type networks have applications in machine learning, data mining and analysis of large biological networks. These networks contain short length-k (on the order of thousands) feature vectors for every node in the network that describe the characteristic or behavior of that node. Edges between nodes are expressed as correlations, partial correlations or coherence between the corresponding feature vectors. If the number of nodes n is too large (on the order of millions), then the dense correlation network with size n×n is computationally infeasible even to explicitly store in memory, let alone to use for computation of standard network analytics like clustering or centrality. In this dissertation, we provide an algorithmic framework to analyze large correlation networks efficiently in an implicit fashion, without writing down every entry of the network. Our framework allows us to compute various network analytics like clustering and centrality using spectral techniques (based on linear algebra). It can accommodate several more sophisticated edge weight measurements beyond just correlation, including partial correlation and coherence. Since correlation networks can contain negative edge weights, one of the main innovations of our work is a collection of transformations that re-map node feature vectors so as to produce a non-negative network in which edge weights are approximately the squares of the original weights. We also develop a number of other useful algorithmic tools (e.g., implicit calculation of shrinkage estimators) that play an important role along our implicit computational pipeline.

Share

COinS