Date of Award
Master of Science (MS)
School of Computing
High density oligonucleotide array (microarray) from the Affymetrix GeneChip¨ system has been widely used for the measurements of gene expressions. Currently, public data repositories, such as Gene Expression Omnibus (GEO) of the National Center for Biotechnology Information (NCBI), have accumulated very large amount of microarray data. For example, there are 84389 human and 9654 Arabidopsis microarray experiments in GEO database. Efficiently integrative analysis large amount of microarray data will provide more knowledge about the biological systems. Traditional microarray analysis tools all implemented sequential algorithms and can only be run on single processor. They are not able to handle very large microarray data sets with thousands of experiments. It is necessary to develop new microarray analysis tools using parallel framework. In this thesis, I implemented microarray quality assessment, background correction, normalization and summarization algorithms using the Map/Reduce framework. The Map/Reduce framework, first introduced by Google in 2004, offers a promising paradigm to develop scalable parallel applications for large-scale data. Evaluation of our new implementation on large microarray data of rice and Arabidopsis showed that they have good speedups. For example, running rice microarray data using our implementations of MAS5.0 algorithms on 20 computer nodes totally 320 processors has a 28 times speedup over using previous C++ implementation on single processor. Our new microarray tools will make it possible to utilize the valuable experiments in the public repositories.
Yang, Guangyu, "DEVELOPMENT OF MAP/REDUCE BASED MICROARRAY ANALYSIS TOOLS" (2013). All Theses. 1758.