Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition

التفاصيل البيبلوغرافية
العنوان:	Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
المؤلفون:	Ren, Shuhuai, Zhang, Aston, Zhu, Yi, Zhang, Shuai, Zheng, Shuai, Li, Mu, Smola, Alex, Sun, Xu
سنة النشر:	2023
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
الوصف:	This work proposes POMP, a prompt pre-training method for vision-language models. Being memory and computation efficient, POMP enables the learned prompt to condense semantic information for a rich set of visual concepts with over twenty-thousand classes. Once pre-trained, the prompt with a strong transferable ability can be directly plugged into a variety of visual recognition tasks including image classification, semantic segmentation, and object detection, to boost recognition performances in a zero-shot manner. Empirical evaluation shows that POMP achieves state-of-the-art performances on 21 datasets, e.g., 67.0% average accuracy on 10 classification datasets (+3.1% compared to CoOp) and 84.4 hIoU on open-vocabulary Pascal VOC segmentation (+6.9 compared to ZSSeg). Our code is available at https://github.com/amazon-science/prompt-pretraining. Comment: Code is available at https://github.com/amazon-science/prompt-pretraining
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2304.04704
رقم الأكسشن:	edsarx.2304.04704
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.