Adaptive Multi-Modality Prompt Learning

التفاصيل البيبلوغرافية
العنوان: Adaptive Multi-Modality Prompt Learning
المؤلفون: Wu, Zongqian, Liu, Yujing, Zhan, Mengmeng, Shen, Jialie, Hu, Ping, Zhu, Xiaofeng
سنة النشر: 2023
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Computer Vision and Pattern Recognition
الوصف: Although current prompt learning methods have successfully been designed to effectively reuse the large pre-trained models without fine-tuning their large number of parameters, they still have limitations to be addressed, i.e., without considering the adverse impact of meaningless patches in every image and without simultaneously considering in-sample generalization and out-of-sample generalization. In this paper, we propose an adaptive multi-modality prompt learning to address the above issues. To do this, we employ previous text prompt learning and propose a new image prompt learning. The image prompt learning achieves in-sample and out-of-sample generalization, by first masking meaningless patches and then padding them with the learnable parameters and the information from texts. Moreover, each of the prompts provides auxiliary information to each other, further strengthening these two kinds of generalization. Experimental results on real datasets demonstrate that our method outperforms SOTA methods, in terms of different downstream tasks.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2312.00823
رقم الأكسشن: edsarx.2312.00823
قاعدة البيانات: arXiv