دورية أكاديمية

SANA: Sensitivity-Aware Neural Architecture Adaptation for Uniform Quantization

التفاصيل البيبلوغرافية
العنوان: SANA: Sensitivity-Aware Neural Architecture Adaptation for Uniform Quantization
المؤلفون: Mingfei Guo, Zhen Dong, Kurt Keutzer
المصدر: Applied Sciences, Vol 13, Iss 18, p 10329 (2023)
بيانات النشر: MDPI AG, 2023.
سنة النشر: 2023
المجموعة: LCC:Technology
LCC:Engineering (General). Civil engineering (General)
LCC:Biology (General)
LCC:Physics
LCC:Chemistry
مصطلحات موضوعية: neural architecture adaptation, automated machine learning, uniform quantization, model compression, efficient deep learning, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
الوصف: Uniform quantization is widely taken as an efficient compression method in practical applications. Despite its merit of having a low computational overhead, uniform quantization fails to preserve sensitive components in neural networks when applied with ultra-low bit precision, which could lead to a non-trivial accuracy degradation. Previous works have applied mixed-precision quantization to address this problem. However, finding the correct bit settings for different layers always demands significant time and resource consumption. Moreover, mixed-precision quantization is not well supported on current general-purpose machines such as GPUs and CPUs and, thus, will cause intolerable overheads in deployment. To leverage the efficiency of uniform quantization while maintaining accuracy, in this paper, we propose sensitivity-aware network adaptation (SANA), which automatically modifies the model architecture based on sensitivity analysis to make it more compatible with uniform quantization. Furthermore, we formulated four different channel initialization strategies to accelerate the quantization-aware fine-tuning process of SANA. Our experimental results showed that SANA can outperform standard uniform quantization and other state-of-the-art quantization methods in terms of accuracy, with comparable or even smaller memory consumption. Notably, ResNet-50-SANA (24.4 MB) with W4A8 quantization achieved 77.8% top-one accuracy on ImageNet, which even surpassed the 77.6% of the full-precision ResNet-50 (97.8 MB) baseline.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2076-3417
Relation: https://www.mdpi.com/2076-3417/13/18/10329; https://doaj.org/toc/2076-3417
DOI: 10.3390/app131810329
URL الوصول: https://doaj.org/article/97dbaddb9d674e768fa1304715147cc3
رقم الأكسشن: edsdoj.97dbaddb9d674e768fa1304715147cc3
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:20763417
DOI:10.3390/app131810329