A split execution model for SpTRSV

التفاصيل البيبلوغرافية
العنوان: A split execution model for SpTRSV
المؤلفون: Najeeb Ahmad, Buse Yilmaz, Didem Unat
المساهمون: İstinye Üniversitesi, Mühendislik ve Doğa Bilimleri Fakültesi, Yazılım Mühendisliği Bölümü, Yilmaz, Buse
بيانات النشر: IEEE Computer Society, 2021.
سنة النشر: 2021
مصطلحات موضوعية: Computer science, Parallel algorithm, 02 engineering and technology, Parallel computing, Fats, Matrix (mathematics), Parallel Algorithms, 0202 electrical engineering, electronic engineering, information engineering, Execution model, Sparse matrix, 020203 distributed computing, Xeon, Computational Modeling, Sparse Triangular Solve, Directed acyclic graph, Heterogeneous Computing, Phased Arrays, SpTRSV, Sparse Matrices, Kernel, Computational Theory and Mathematics, Kernel (image processing), CPU-GPU Computing, Hardware and Architecture, Signal Processing, Parallelism (grammar), Graphics Processing Units, Sparse Linear Systems, SpTS
الوصف: Sparse Triangular Solve (SpTRSV) is an important and extensively used kernel in scientific computing. Parallelism within SpTRSV depends upon matrix sparsity pattern and, in many cases, is non-uniform from one computational step to the next. In cases where the SpTRSV computational steps have contrasting parallelism characteristics some steps are more parallel, others more sequential in nature, the performance of an SpTRSV algorithm may be limited by the contrasting parallelism characteristics. In this work, we propose a split-execution model for SpTRSV to automatically divide SpTRSV computation into two sub-SpTRSV systems and an SpMV, such that one of the sub-SpTRSVs has more parallelism than the other. Each sub-SpTRSV is then computed by using a different SpTRSV algorithm and possibly executes on a different platform (CPU or GPU). By analyzing the SpTRSV Directed Acyclic Graph (DAG) and matrix sparsity features, we use a heuristics-based approach to (i) automatically determine suitability of an SpTRSV for split-execution, (ii) find the appropriate split-point, and (iii) execute SpTRSV in a split fashion using two SpTRSV algorithms while managing any required inter-platform communication. Experimental evaluation of the execution model on two CPU-GPU machines with matrix dataset of 327 matrices from the SuiteSparse Matrix Collection shows that our approach correctly selects the fastest SpTRSV method (split or unsplit) for 88% of matrices on the Intel Xeon Gold (6148) + NVIDIA Tesla V100 and 83% on the Intel Core I7 + NVIDIA G1080 Ti platform achieving speedups in the range of 1.01 10 and 1.03 6.36, respectively. IEEE WOS:000655244100005 Q2
وصف الملف: application/pdf
اللغة: English
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e125babbda67afc539193993b330f64c
https://hdl.handle.net/20.500.12713/1730
حقوق: CLOSED
رقم الأكسشن: edsair.doi.dedup.....e125babbda67afc539193993b330f64c
قاعدة البيانات: OpenAIRE