Date of Award
Doctor of Philosophy (PhD)
Dr. Brian Dean, Committee Chair
Dr. Wayne Goddard
Dr. Ilya Safro
Dr. Feng Luo
Large â€œcorrelationâ€-type networks have applications in machine learning, data mining and analysis of large biological networks. These networks contain short length-k (on the order of thousands) feature vectors for every node in the network that describe the characteristic or behavior of that node. Edges between nodes are expressed as correlations, partial correlations or coherence between the corresponding feature vectors. If the number of nodes n is too large (on the order of millions), then the dense correlation network with size nÃ—n is computationally infeasible even to explicitly store in memory, let alone to use for computation of standard network analytics like clustering or centrality. In this dissertation, we provide an algorithmic framework to analyze large correlation networks efficiently in an implicit fashion, without writing down every entry of the network. Our framework allows us to compute various network analytics like clustering and centrality using spectral techniques (based on linear algebra). It can accommodate several more sophisticated edge weight measurements beyond just correlation, including partial correlation and coherence. Since correlation networks can contain negative edge weights, one of the main innovations of our work is a collection of transformations that re-map node feature vectors so as to produce a non-negative network in which edge weights are approximately the squares of the original weights. We also develop a number of other useful algorithmic tools (e.g., implicit calculation of shrinkage estimators) that play an important role along our implicit computational pipeline.
Mohan, Raghuveer, "Efficient Analytics for Large Implicit Correlation Networks" (2016). All Dissertations. 1826.