دورية أكاديمية

Clustering swap prediction for image-text pre-training

التفاصيل البيبلوغرافية
العنوان: Clustering swap prediction for image-text pre-training
المؤلفون: Sun Fayou, Hea Choon Ngo, Yong Wee Sek, Zuqiang Meng
المصدر: Scientific Reports, Vol 14, Iss 1, Pp 1-16 (2024)
بيانات النشر: Nature Portfolio, 2024.
سنة النشر: 2024
المجموعة: LCC:Medicine
LCC:Science
مصطلحات موضوعية: Model pre-training, Clustering learning, Swap prediction, Cluster number, Medicine, Science
الوصف: Abstract It is essential to delve into the strategy of multimodal model pre-training, which is an obvious impact on downstream tasks. Currently, clustering learning has achieved noteworthy benefits in multiple methods. However, due to the availability of open image-text pairs, it is challenging for multimodal with clustering learning. In this paper, we propose an approach that utilizes clustering swap prediction strategy to learn image-text clustering embedding space by interaction prediction between image and text features. Unlike existing models with clustering learning, our method (Clus) allows for an open number of clusters for web-scale alt-text data. Furthermore, in order to train the image and text encoders efficiently, we introduce distillation learning approach and evaluate the performance of the image-encoder in downstream visual tasks. In addition, Clus is pre-trained end-to-end by using large-scale image-text pairs. Specifically, both text and image serve as ground truth for swap prediction, enabling effective representation learning. Concurrently, extensive experiments demonstrate that Clus achieves state-of-the-art performance on multiple downstream fine-tuning and zero-shot tasks (i.e., Image-Text Retrieval, VQA, NLVR2, Image Captioning, Object Detection, and Semantic Segmentation).
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2045-2322
Relation: https://doaj.org/toc/2045-2322
DOI: 10.1038/s41598-024-60832-x
URL الوصول: https://doaj.org/article/a8ac5e46876c44c587def0f7e7f467fa
رقم الأكسشن: edsdoj.8ac5e46876c44c587def0f7e7f467fa
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:20452322
DOI:10.1038/s41598-024-60832-x