A Study of Test-time Contrastive Concepts for Open-world, Open-vocabulary Semantic Segmentation

التفاصيل البيبلوغرافية
العنوان: A Study of Test-time Contrastive Concepts for Open-world, Open-vocabulary Semantic Segmentation
المؤلفون: Wysoczańska, Monika, Vobecky, Antonin, Cardiel, Amaia, Trzciński, Tomasz, Marlet, Renaud, Bursuc, Andrei, Siméoni, Oriane
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computer Vision and Pattern Recognition
الوصف: Recent VLMs, pre-trained on large amounts of image-text pairs to align both modalities, have opened the way to open-vocabulary semantic segmentation. Given an arbitrary set of textual queries, image regions are assigned the closest query in feature space. However, the usual setup expects the user to list all possible visual concepts that may occur in the image, typically all classes of benchmark datasets, that act as negatives to each other. We consider here the more challenging scenario of segmenting a single concept, given a textual prompt and nothing else. To achieve good results, besides contrasting with the generic 'background' text, we study different ways to generate query-specific test-time contrastive textual concepts, which leverage either the distribution of text in the VLM's training set or crafted LLM prompts. We show the relevance of our approach using a new, specific metric.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2407.05061
رقم الأكسشن: edsarx.2407.05061
قاعدة البيانات: arXiv