Automated Cluster Provisioning And Workflow Management for Parallel Scientific Applications in the Cloud
Many commercial cloud providers and tools are available that researchers could utilize to advance computational science research. However, adoption by the research community has been slow. In this paper we describe the automated Pro-visioning And Workflow (PAW) management tool for parallel scientific applications in the cloud. PAW is a comprehensive resource provisioning and workflow tool that automates the steps of dynamically provisioning a large scale cluster environment in the cloud, executing a set of jobs or a custom workflow and, after the jobs have completed, de-provisioning the cluster environment in a single operation. A key characteristic of PAW is that it separates the provisioning of cluster resources in the cloud from the management of scientific workflow on these resources, which enables fine-grained decisions about performance and cost trade-offs in a commercial cloud environment. This paper describes our initial AWS implementation of PAW for executing a large parameter sweep workflow. We demonstrate this using an MPI-based topic modeling application. PAW provides a standardized, simplified, and pluggable interface that can easily be expanded to support a variety of underlying cloud or cluster hardware environments, user-facing scheduling systems, workflows, and scientific applications.
Posey, Brandon; Gropp, Christopher; Herzog, Alexander; and Apon, Amy, "Automated Cluster Provisioning And Workflow Management for Parallel Scientific Applications in the Cloud" (2017). Publications. 38.
This paper has been accepted for publication at the 10th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS'17) held in conjunction with The International Conference on High Performance Computing, Networking, Storage and Analysis (SC'17).