تقرير
Fast Shared-Memory Barrier Synchronization for a 1024-Cores RISC-V Many-Core Cluster
العنوان: | Fast Shared-Memory Barrier Synchronization for a 1024-Cores RISC-V Many-Core Cluster |
---|---|
المؤلفون: | Bertuletti, Marco, Riedel, Samuel, Zhang, Yichao, Vanelli-Coralli, Alessandro, Benini, Luca |
سنة النشر: | 2023 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Distributed, Parallel, and Cluster Computing |
الوصف: | Synchronization is likely the most critical performance killer in shared-memory parallel programs. With the rise of multi-core and many-core processors, the relative impact on performance and energy overhead of synchronization is bound to grow. This paper focuses on barrier synchronization for TeraPool, a cluster of 1024 RISC-V processors with non-uniform memory access to a tightly coupled 4MB shared L1 data memory. We compare the synchronization strategies available in other multi-core and many-core clusters to identify the optimal native barrier kernel for TeraPool. We benchmark a set of optimized barrier implementations and evaluate their performance in the framework of the widespread fork-join Open-MP style programming model. We test parallel kernels from the signal-processing and telecommunications domain, achieving less than 10% synchronization overhead over the total runtime for problems that fit TeraPool's L1 memory. By fine-tuning our tree barriers, we achieve 1.6x speed-up with respect to a naive central counter barrier and just 6.2% overhead on a typical 5G application, including a challenging multistage synchronization kernel. To our knowledge, this is the first work where shared-memory barriers are used for the synchronization of a thousand processing elements tightly coupled to shared data memory. Comment: 15 pages, 7 figures |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2307.10248 |
رقم الأكسشن: | edsarx.2307.10248 |
قاعدة البيانات: | arXiv |
كن أول من يترك تعليقا!