Date of Award

5-2010

Document Type

Thesis

Degree Name

Master of Science (MS)

Legacy Department

Computer Science

Advisor

Goasguen, Sebastien

Committee Member

Westall , Mike

Committee Member

Ligon , Walt

Abstract

Virtualization has great potential in the realm of scientific computing because of its inherent advantages with regard to environment customization and isolation. Virtualization technology is not without it's downsides, most notably, increased computational overhead. This thesis introduces the operating mechanisms of grid technologies in general, and the Open Science Grid in particular, including a discussion of general organization and specific software implementation. A model for utilization of virtualization resources with separate administrative domains for the virtual machines (VMs) and the physical resources is then presented. Two well-known virtual machine monitors, Xen and the Kernel-based Virtual Machine (KVM), are introduced and a performance analysis conducted. The High-Performance Computing Challenge (HPCC) benchmark suite is used in conjunction with independent High-Performance Linpack (HPL) trials in order to analyze specific performance issues. Xen was found to introduce much lower performance overhead than KVM, however, KVM retains advantages with regard to ease of deployment, both of the VMM itself and of the VM images. KVM's snapshot mode is of special interest, as it allows multiple VMs to be instantiated from a single image located on a network store.
With virtualization overhead shown to be acceptable for high-throughput computing tasks, the Virtual Organization Cluster (VOC) Model was implemented as a prototype. Dynamic scaling and multi-site scheduling extensions were also successfully implemented using this prototype. It is also shown that traditional overlay networks have scaling issues and that a new approach to wide-area scheduling is needed.
The use of XMPP messaging and the Google App Engine service to implement a virtual machine monitoring system is presented. Detailed discussions of the relevant sections of the XMPP protocol and libraries are presented. XMPP is found to be a good choice for sending status information due to its inherent advantages in a bandwidth-limited NAT environment.
Thus, it is concluded that the VOC Model is a practical way to implement virtualization of high-throughput computing tasks. Smaller VOCs may take advantage of traditional overlay networks whereas larger VOCs need an alternative approach to scheduling.

Share

COinS