Date of Award

5-2011

Document Type

Thesis

Degree Name

Master of Science (MS)

Legacy Department

Computer Engineering

Advisor

Ligon, Walt B

Committee Member

Birchfield , Stan

Committee Member

Hoover , Adam

Abstract

In recent years computers have been increasing in compute density and speed at a dramatic pace. This increase allows for massively parallel programs to run faster than ever before. Unfortunately, many such programs are being held back by the relatively slow I/O subsystems that they are forced to work with. Storage technology simply has not followed the same curve of progression in the computing world. Because the storage systems are so slow in comparison the processors are forced to idle while waiting for data; a potentially performance crippling condition.
This performance disparity is lessened by the advent of parallel file systems. Such file systems allow data to be spread across multiple servers and disks. High speed networking allows for large amounts of bandwidth to and from the file system with relatively low latency. This arrangement allows for very large increases in sustained read and write speeds on large files although performance of the file system can be hampered if an application spends most of its time working on small data sets and files.
In recent years there has also been an unprecedented forward shift in high performance I/O systems through the widespread development and deployment of NAND Flash-based solid state disks (SSDs). SSDs offer many advantages over traditional platter-based hard disk drives (HDDs) but also suffer from very specific disadvantages due to their use of Flash memory as a storage medium as well as use of a hardware flash translation layer (FTL).
The advantages of SSDs are numerous: faster random and sequential access times, higher I/O operations per second} (IOPS), and much lower power consumption in both idle and load scenarios. SSDs also tend to have a much longer mean time between failure (MTBF); an advantage that can be attributed to their complete lack of moving parts.
Two key things prevent SSDs from widespread mass storage deployment: storage capacity and cost per gigabyte. Enterprise level SSDs that utilize single-level cell (SLC) Flash are orders of magnitude more expensive per gigabyte than their enterprise class HDD counterparts (which are also higher capacity per drive).
Because of this disparity we propose utilizing relatively small SSDs in conjunction with high capacity HDD arrays in parallel file systems like OrangeFS (previously known as the Parallel Virtual File System, or PVFS). The access latencies and bandwidth of SSDs make them an ideal medium for storing file metadata in a parallel file system. These same characteristics also make them ideal for integration as a persistent server-side cache.
We also introduce a method of transparently compressing file data in striped parallel file systems for high-performance streaming reads and writes with increased storage capacity to combat rising checkpoint sizes and bandwidth requirements.

Share

COinS