Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper

التفاصيل البيبلوغرافية
العنوان: Harnessing Integrated CPU-GPU System Memory for HPC: a first look into Grace Hopper
المؤلفون: Schieffer, Gabin, Wahlgren, Jacob, Ren, Jie, Faj, Jennifer, Peng, Ivy
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Distributed, Parallel, and Cluster Computing
الوصف: Memory management across discrete CPU and GPU physical memory is traditionally achieved through explicit GPU allocations and data copy or unified virtual memory. The Grace Hopper Superchip, for the first time, supports an integrated CPU-GPU system page table, hardware-level addressing of system allocated memory, and cache-coherent NVLink-C2C interconnect, bringing an alternative solution for enabling a Unified Memory system. In this work, we provide the first in-depth study of the system memory management on the Grace Hopper Superchip, in both in-memory and memory oversubscription scenarios. We provide a suite of six representative applications, including the Qiskit quantum computing simulator, using system memory and managed memory. Using our memory utilization profiler and hardware counters, we quantify and characterize the impact of the integrated CPU-GPU system page table on GPU applications. Our study focuses on first-touch policy, page table entry initialization, page sizes, and page migration. We identify practical optimization strategies for different access patterns. Our results show that as a new solution for unified memory, the system-allocated memory can benefit most use cases with minimal porting efforts.
Comment: Accepted to ICPP '24 (The 53rd International Conference on Parallel Processing)
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2407.07850
رقم الأكسشن: edsarx.2407.07850
قاعدة البيانات: arXiv