GPU-Accelerated Hybrid Discontinuous Galerkin Time Domain Algorithm With Universal Matrices and Local Time Stepping Method

التفاصيل البيبلوغرافية
العنوان: GPU-Accelerated Hybrid Discontinuous Galerkin Time Domain Algorithm With Universal Matrices and Local Time Stepping Method
المؤلفون: Qi Yang, Shi Chen Zhu, Long Li, Yan Shi, Zhen Guo Ban, Peng Wang
المصدر: IEEE Transactions on Antennas and Propagation. 68:4738-4752
بيانات النشر: Institute of Electrical and Electronics Engineers (IEEE), 2020.
سنة النشر: 2020
مصطلحات موضوعية: Speedup, Basis (linear algebra), Discretization, Computer science, 020206 networking & telecommunications, 02 engineering and technology, Wave equation, Acceleration, CUDA, symbols.namesake, Matrix (mathematics), Helmholtz free energy, 0202 electrical engineering, electronic engineering, information engineering, symbols, Tetrahedron, Electrical and Electronic Engineering, Algorithm, Interpolation, Block (data storage)
الوصف: In this article, a graphic processing unit (GPU)-based acceleration implementation of a hybrid discontinuous Galerkin time domain method based on Maxwell’s equations and Helmholtz vector wave equation (HDGTD) has been developed. The computational domain is discretized by tetrahedrons and the resultant meshes are categorized into two regions solved by Maxwell’s equations and Helmholtz vector wave equation, respectively. The hierarchical vector basis functions are used to expand the unknowns in the HDGTD method, and a universal matrix technique is proposed to decompose the geometry-dependent matrices in each tetrahedron into the summation of universal matrices defined in barycentric coordinates, thus giving rise to a great decrease of the memory usage. A local time stepping (LTS) method based on a simple interpolation is introduced in the proposed HDGTD method to achieve highly efficient solution of multiscale problems. Two kinds of compute unified device architecture (CUDA)-based mapping techniques, i.e., 1-D and 2-D blocks, are implemented to achieve a tradeoff between the parallel speedup and the memory usage. With the 1-D block mapping, over 590 times speedup can be achieved, and in the case of the 2-D block mapping, over 150 times acceleration and 13 times memory reduction are obtained. Some practical complex examples are given to demonstrate a good performance of the proposed parallel method.
تدمد: 1558-2221
0018-926X
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_________::7282b21fae7489c0b6da5fae8b2c991b
https://doi.org/10.1109/tap.2020.2972404
حقوق: CLOSED
رقم الأكسشن: edsair.doi...........7282b21fae7489c0b6da5fae8b2c991b
قاعدة البيانات: OpenAIRE