BISWSRBS: A Winograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern and Mixed Precision Quantization

التفاصيل البيبلوغرافية
العنوان: BISWSRBS: A Winograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern and Mixed Precision Quantization
المؤلفون: Qi Han, Li Jiang, Yun Liang, Tengchuan Kou, Qingzheng Li, Zhezhi He, Haibao Yu, Fangxin Liu, Tao Yang
المصدر: ACM Transactions on Reconfigurable Technology and Systems. 14:1-28
بيانات النشر: Association for Computing Machinery (ACM), 2021.
سنة النشر: 2021
مصطلحات موضوعية: General Computer Science, Computer science, Inference, Pruning (decision trees), Mixed precision, Field-programmable gate array, Quantization (image processing), Convolutional neural network, Algorithm, Convolution
الوصف: Field-programmable Gate Array (FPGA) is a high-performance computing platform for Convolution Neural Networks (CNNs) inference. Winograd algorithm, weight pruning, and quantization are widely adopted to reduce the storage and arithmetic overhead of CNNs on FPGAs. Recent studies strive to prune the weights in the Winograd domain, however, resulting in irregular sparse patterns and leading to low parallelism and reduced utilization of resources. Besides, there are few works to discuss a suitable quantization scheme for Winograd. In this article, we propose a regular sparse pruning pattern in the Winograd-based CNN, namely, Sub-row-balanced Sparsity (SRBS) pattern, to overcome the challenge of the irregular sparse pattern. Then, we develop a two-step hardware co-optimization approach to improve the model accuracy using the SRBS pattern. Based on the pruned model, we implement a mixed precision quantization to further reduce the computational complexity of bit operations. Finally, we design an FPGA accelerator that takes both the advantage of the SRBS pattern to eliminate low-parallelism computation and the irregular memory accesses, as well as the mixed precision quantization to get a layer-wise bit width. Experimental results on VGG16/VGG-nagadomi with CIFAR-10 and ResNet-18/34/50 with ImageNet show up to 11.8×/8.67× and 8.17×/8.31×/10.6× speedup, 12.74×/9.19× and 8.75×/8.81×/11.1× energy efficiency improvement, respectively, compared with the state-of-the-art dense Winograd accelerator [20] with negligible loss of model accuracy. We also show that our design has 4.11× speedup compared with the state-of-the-art sparse Winograd accelerator [19] on VGG16.
تدمد: 1936-7414
1936-7406
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_________::3ad8d9162f43f4ba7b8c67392f710598
https://doi.org/10.1145/3467476
رقم الأكسشن: edsair.doi...........3ad8d9162f43f4ba7b8c67392f710598
قاعدة البيانات: OpenAIRE