Date of Award

December 2021

Document Type

Thesis

Degree Name

Master of Engineering (MEngr)

Department

Computer Engineering

Committee Member

Jon C Calhoun

Committee Member

Melissa C Smith

Committee Member

Walt Ligon

Committee Member

Rong Ge

Abstract

Improvements in High-Performance Computing (HPC) has enabled researchers to develop more sophisticated simulations and applications which solve previously intractable problems. While these applications are critical to scientific innovation, they continue to generate even larger quantities of data, which only worsens the existing I/O bottleneck. To resolve this issue, researchers use various forms of data reduction.

Currently, researchers have access to many different types of data reduction. These include methods such as data compression, time-step selection, and data sampling. While each of these are effective methods, data compression algorithms and data sampling methods do not leverage the temporal aspect of the data, and time-step selection is prone to missing critical abrupt changes. With this in mind, we develop our spatiotemporal data sampling method.

In this thesis, we develop a spatiotemporal data sampling method that leverages both the spatial and temporal properties of simulation data. Specifically, our method compares corresponding regions of the current time-step with that of the previous time-step to determine whether data from the previous time-step is similar enough to reuse. Additionally, this method biases more rare data values during the sampling process to ensure regions of interest are kept with higher fidelity. By operating in this manner, our method improves sample budget utilization and, as a result, post-reconstruction data quality. As the effectiveness of our method relies heavily on user input parameters, we also provide a set of pre-processing steps to alleviate the burden on the user to set appropriate ones. Specifically, these pre-processing steps assist users in determining an optimal value for the number of bins, error threshold, and the number of regions. Finally, we demonstrate the modularity of our sampling process by demonstrating how it works with any different internal core sampling algorithm.

Upon evaluating our spatiotemporal sampling algorithm, we find it is capable of achieving higher post-reconstruction quality than Biswas et al.’s non-reuse importance-based sampling method. Specifically, we find our method achieves a 31.3% higher post-reconstruction quality while only introducing a 37% degradation in throughput, on average. When assessing our pre-processing steps, we find they are efficient at assisting users in determining an optimal value for the number of bins, error threshold, and the number of regions. Finally, we illustrate the modularity of our sampling method by showing how one would swap the core sampling algorithm. From our evaluation, we find our spatiotemporal sampling method is an effective choice for sampling simulation data.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.