PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference

التفاصيل البيبلوغرافية
العنوان: PRANCE: Joint Token-Optimization and Structural Channel-Pruning for Adaptive ViT Inference
المؤلفون: Li, Ye, Tang, Chen, Meng, Yuan, Fan, Jiajun, Chai, Zenghao, Ma, Xinzhu, Wang, Zhi, Zhu, Wenwu
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computer Vision and Pattern Recognition
الوصف: We introduce PRANCE, a Vision Transformer compression framework that jointly optimizes the activated channels and reduces tokens, based on the characteristics of inputs. Specifically, PRANCE~ leverages adaptive token optimization strategies for a certain computational budget, aiming to accelerate ViTs' inference from a unified data and architectural perspective. However, the joint framework poses challenges to both architectural and decision-making aspects. Firstly, while ViTs inherently support variable-token inference, they do not facilitate dynamic computations for variable channels. To overcome this limitation, we propose a meta-network using weight-sharing techniques to support arbitrary channels of the Multi-head Self-Attention and Multi-layer Perceptron layers, serving as a foundational model for architectural decision-making. Second, simultaneously optimizing the structure of the meta-network and input data constitutes a combinatorial optimization problem with an extremely large decision space, reaching up to around $10^{14}$, making supervised learning infeasible. To this end, we design a lightweight selector employing Proximal Policy Optimization for efficient decision-making. Furthermore, we introduce a novel "Result-to-Go" training mechanism that models ViTs' inference process as a Markov decision process, significantly reducing action space and mitigating delayed-reward issues during training. Extensive experiments demonstrate the effectiveness of PRANCE~ in reducing FLOPs by approximately 50\%, retaining only about 10\% of tokens while achieving lossless Top-1 accuracy. Additionally, our framework is shown to be compatible with various token optimization techniques such as pruning, merging, and sequential pruning-merging strategies. The code is available at \href{https://github.com/ChildTang/PRANCE}{https://github.com/ChildTang/PRANCE}.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2407.05010
رقم الأكسشن: edsarx.2407.05010
قاعدة البيانات: arXiv