Complexity-Aware Layer-Wise Mixed-Precision Schemes With SQNR-Based Fast Analysis

التفاصيل البيبلوغرافية
العنوان:	Complexity-Aware Layer-Wise Mixed-Precision Schemes With SQNR-Based Fast Analysis
المؤلفون:	Hana Kim, Hyun Eun, Jung Hwan Choi, Ji-Hoon Kim
المصدر:	IEEE Access, Vol 11, Pp 117800-117809 (2023)
بيانات النشر:	IEEE, 2023.
سنة النشر:	2023
المجموعة:	LCC:Electrical engineering. Electronics. Nuclear engineering
مصطلحات موضوعية:	Deep neural network (DNN), mixed-precision, signal to quantization noise ratio (SQNR), complexity-awareness, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
الوصف:	Recently, deep neural network (DNN) acceleration has been critical for hardware systems from mobile/edge devices to high-performance data centers. Especially, for on-device AI, there have been many studies on hardware numerical precision reduction considering the limited hardware resources of mobile/edge devices. Although layer-wise mixed-precision leads to computational complexity reduction, it is not straightforward to find a well-balanced layer-wise precision scheme since it takes a long time to determine the optimal precision for each layer due to the repetitive experiments and the model accuracy, the fundamental measure of deep learning quality, should be considered as well. In this paper, we propose the layer-wise mixed precision scheme which can significantly reduce the time required to determine the optimal hardware numerical precision with Signal-to-Quantization Noise Ratio (SQNR)-based analysis. In addition, the proposed scheme can take the hardware complexity into consideration in terms of the number of operations (OPs) or weight memory requirement of each layer. The proposed method can be directly applied to inference, meaning that users can utilize well-trained neural network models without the need for additional training or hardware units. With the proposed SQNR-based analysis, for SSDlite and YOLOv2 networks, the analysis time required for layer-wise precision determination is reduced by more than 95% compared to conventional mean Average Precision(mAP)-based analysis. Also, with the proposed complexity-aware schemes, the number of OPs and weight memory requirement can be reduced by up to 86.14% and 78.03%, respectively, for SSDlite, and by up to 51.93% and 50.62%, respectively, for YOLOv2, with negligible model accuracy degradation.
نوع الوثيقة:	article
وصف الملف:	electronic resource
اللغة:	English
تدمد:	2169-3536
Relation:	https://ieeexplore.ieee.org/document/10287357/; https://doaj.org/toc/2169-3536
DOI:	10.1109/ACCESS.2023.3325402
URL الوصول:	https://doaj.org/article/5db157cf06524a1d9850588ea7dd0c1c
رقم الأكسشن:	edsdoj.5db157cf06524a1d9850588ea7dd0c1c
قاعدة البيانات:	Directory of Open Access Journals

Full Text Finder

الوصف
تدمد:	21693536
DOI:	10.1109/ACCESS.2023.3325402