Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks

التفاصيل البيبلوغرافية
العنوان:	Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks
المؤلفون:	Motetti, Beatrice Alessandra, Risso, Matteo, Burrello, Alessio, Macii, Enrico, Poncino, Massimo, Pagliari, Daniele Jahier
سنة النشر:	2024
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Machine Learning
الوصف:	The resource requirements of deep neural networks (DNNs) pose significant challenges to their deployment on edge devices. Common approaches to address this issue are pruning and mixed-precision quantization, which lead to latency and memory occupation improvements. These optimization techniques are usually applied independently. We propose a novel methodology to apply them jointly via a lightweight gradient-based search, and in a hardware-aware manner, greatly reducing the time required to generate Pareto-optimal DNNs in terms of accuracy versus cost (i.e., latency or memory). We test our approach on three edge-relevant benchmarks, namely CIFAR-10, Google Speech Commands, and Tiny ImageNet. When targeting the optimization of the memory footprint, we are able to achieve a size reduction of 47.50% and 69.54% at iso-accuracy with the baseline networks with all weights quantized at 8 and 2-bit, respectively. Our method surpasses a previous state-of-the-art approach with up to 56.17% size reduction at iso-accuracy. With respect to the sequential application of state-of-the-art pruning and mixed-precision optimizations, we obtain comparable or superior results, but with a significantly lowered training time. In addition, we show how well-tailored cost models can improve the cost versus accuracy trade-offs when targeting specific hardware for deployment.
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2407.01054
رقم الأكسشن:	edsarx.2407.01054
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.