Performance Enhancement of the Ozaki Scheme on Integer Matrix Multiplication Unit

التفاصيل البيبلوغرافية
العنوان: Performance Enhancement of the Ozaki Scheme on Integer Matrix Multiplication Unit
المؤلفون: Uchino, Yuki, Ozaki, Katsuhisa, Imamura, Toshiyuki
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Distributed, Parallel, and Cluster Computing
الوصف: This study was aimed at simultaneously achieving sufficient accuracy and high performance for general matrix multiplications. Recent architectures, such as NVIDIA GPUs, feature high-performance units designed for low-precision matrix multiplications in machine learning models, and next-generation architectures are expected to follow the same design principle. The key to achieving superior performance is to fully leverage such architectures. The Ozaki scheme, a highly accurate matrix multiplication algorithm using error-free transformations, enables higher-precision matrix multiplication to be performed through multiple lower-precision matrix multiplications and higher-precision matrix additions. Ootomo et al. implemented the Ozaki scheme on high-performance matrix multiplication units with the aim of achieving both sufficient accuracy and high performance. This paper proposes alternative approaches to improving performance by reducing the numbers of lower-precision matrix multiplications and higher-precision matrix additions. Numerical experiments demonstrate the accuracy of the results and conduct performance benchmarks of the proposed approaches. These approaches are expected to yield more efficient results in next-generation architectures.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2409.13313
رقم الأكسشن: edsarx.2409.13313
قاعدة البيانات: arXiv