Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness

التفاصيل البيبلوغرافية
العنوان:	Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
المؤلفون:	Chandu, Khyathi Raghavi, Li, Linjie, Awadalla, Anas, Lu, Ximing, Park, Jae Sung, Hessel, Jack, Wang, Lijuan, Choi, Yejin
سنة النشر:	2024
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition
الوصف:	The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and further explore finer categories within. Based on this taxonomy, we synthesize a benchmark dataset, CertainlyUncertain, featuring 178K visual question answering (VQA) samples as contrastive pairs. This is achieved by 1) inpainting images to make previously answerable questions into unanswerable ones; and 2) using image captions to prompt large language models for both answerable and unanswerable questions. Additionally, we introduce a new metric confidence-weighted accuracy, that is well correlated with both accuracy and calibration error, to address the shortcomings of existing metrics. Comment: 26 pages
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2407.01942
رقم الأكسشن:	edsarx.2407.01942
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.