Date of Award

8-2022

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

Committee Chair/Advisor

Rong Ge

Committee Member

Jacob Sorber

Committee Member

Amy Apon

Committee Member

Jon Calhoun

Abstract

The programming difficulty of creating GPU-accelerated high performance computing (HPC) codes has been greatly reduced by the advent of Unified Memory technologies that abstract the management of physical memory away from the developer. However, these systems incur substantial overhead that paradoxically grows for codes where these technologies are most useful. While these technologies are increasingly adopted for use in modern HPC frameworks and applications, the performance cost reduces the efficiency of these systems and turns away some developers from adoption entirely. These systems are naturally difficult to optimize due to the large number of interconnected hardware and software components that must be untangled to perform thorough analysis.

In this thesis, we take the first deep dive into a functional implementation of a Unified Memory system, NVIDIA UVM, to evaluate the performance and characteristics of these systems. We show specific hardware and software interactions that cause serialization between host and devices. We further provide a quantitative evaluation of fault handling for various applications under different scenarios, including prefetching and oversubscription. Through lower-level analysis, we find that the driver workload is dependent on the interactions among application access patterns, GPU hardware constraints, and Host OS components. These findings indicate that the cost of host OS components is significant and present across UM implementations. We also provide a proof-of-concept asynchronous approach to memory management in UVM that allows for reduced system overhead and improved application performance. This study provides constructive insight into future implementations and systems, such as Heterogeneous Memory Management.

Recommended Citation

Allen, Tyler, "Holistic Performance Analysis and Optimization of Unified Virtual Memory" (2022). All Dissertations. 3092.
https://tigerprints.clemson.edu/all_dissertations/3092

Author ORCID Identifier

0000-0002-0253-372X

Download

Included in

OS and Networks Commons, Systems Architecture Commons

COinS

All Dissertations

Holistic Performance Analysis and Optimization of Unified Virtual Memory

Date of Award

Document Type

Degree Name

Department

Committee Chair/Advisor

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Author ORCID Identifier

Included in

Search

Browse by

Useful Links

All Dissertations

Holistic Performance Analysis and Optimization of Unified Virtual Memory

Author

Date of Award

Document Type

Degree Name

Department

Committee Chair/Advisor

Committee Member

Committee Member

Committee Member

Abstract

Recommended Citation

Author ORCID Identifier

Included in

Share

Search

Browse by

Useful Links