Date of Award

5-2009

Document Type

Thesis

Degree Name

Master of Science (MS)

Legacy Department

Computer Engineering

Advisor

Smith, Melissa C

Committee Member

Brooks , Richard

Committee Member

Ligon , Walter

Abstract

The role of heterogeneous multi-core architectures in the industrial and scientific computing community is expanding. For researchers to increase the performance of complex applications, a multifaceted approach is needed to utilize emerging reconfigurable computing (RC) architectures. First, the method for accelerating applications must provide flexible solutions for fully utilizing key architecture traits across platforms. Secondly, the approach needs to be readily accessible to application scientists. A recent trend toward emerging disruptive architectures is an important signal that fundamental limitations in traditional high performance computing (HPC) are limiting break through research. To respond to these challenges, scientists are under pressure to identify new programming methodologies and elements in platform architectures that will translate into enhanced program efficacy.
Reconfigurable computing (RC) allows the implementation of almost any computer architecture trait, but identifying which traits work best for numerous scientific problem domains is difficult. However, by leveraging the existing underlying framework available in field programmable gate arrays (FPGAs), it is possible to build a method for utilizing RC traits for accelerating scientific applications. By contrasting both hardware and software changes, RC platforms afford developers the ability to examine various architecture characteristics to find those best suited for production-level scientific applications. The flexibility afforded by FPGAs allow these characteristics to then be extrapolated to heterogeneous, multi-core and general-purpose computing on graphics processing units (GP-GPU) HPC platforms. Additionally by coupling high-level languages (HLL) with reconfigurable hardware, relevance to a wider industrial and scientific population is achieved.
To provide these advancements to the scientific community we examine the acceleration of a scientific application on a RC platform. By leveraging the flexibility provided by FPGAs we develop a methodology that removes computational loads from host systems and internalizes portions of communication with the aim of reducing fiscal costs through the reduction of physical compute nodes required to achieve the same runtime performance. Using this methodology an improvement in application performance is shown to be possible without requiring hand implementation of HLL code in a hardware description language (HDL)
A review of recent literature demonstrates the challenge of developing a platform-independent flexible solution that allows access to cutting edge RC hardware for application scientists. To address this challenge we propose a structured methodology that begins with examination of the application's profile, computations, and communications and utilizes tools to assist the developer in making partitioning and optimization decisions. Through experimental results, we will analyze the computational requirements, describe the simulated and actual accelerated application implementation, and finally describe problems encountered during development. Using this proposed method, a 3x speedup is possible over the entire accelerated target application. Lastly we discuss possible future work including further potential optimizations of the application to improve this process and project the anticipated benefits.

Share

COinS