2AFC Prompting of Large Multimodal Models for Image Quality Assessment

التفاصيل البيبلوغرافية
العنوان: 2AFC Prompting of Large Multimodal Models for Image Quality Assessment
المؤلفون: Zhu, Hanwei, Sui, Xiangjie, Chen, Baoliang, Liu, Xuelin, Chen, Peilin, Fang, Yuming, Wang, Shiqi
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
الوصف: While abundant research has been conducted on improving high-level visual understanding and reasoning capabilities of large multimodal models~(LMMs), their visual quality assessment~(IQA) ability has been relatively under-explored. Here we take initial steps towards this goal by employing the two-alternative forced choice~(2AFC) prompting, as 2AFC is widely regarded as the most reliable way of collecting human opinions of visual quality. Subsequently, the global quality score of each image estimated by a particular LMM can be efficiently aggregated using the maximum a posterior estimation. Meanwhile, we introduce three evaluation criteria: consistency, accuracy, and correlation, to provide comprehensive quantifications and deeper insights into the IQA capability of five LMMs. Extensive experiments show that existing LMMs exhibit remarkable IQA ability on coarse-grained quality comparison, but there is room for improvement on fine-grained quality discrimination. The proposed dataset sheds light on the future development of IQA models based on LMMs. The codes will be made publicly available at https://github.com/h4nwei/2AFC-LMMs.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2402.01162
رقم الأكسشن: edsarx.2402.01162
قاعدة البيانات: arXiv