تقرير
InFoBench: Evaluating Instruction Following Ability in Large Language Models
العنوان: | InFoBench: Evaluating Instruction Following Ability in Large Language Models |
---|---|
المؤلفون: | Qin, Yiwei, Song, Kaiqiang, Hu, Yebowen, Yao, Wenlin, Cho, Sangwoo, Wang, Xiaoyang, Wu, Xuansheng, Liu, Fei, Liu, Pengfei, Yu, Dong |
سنة النشر: | 2024 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Computation and Language, Computer Science - Artificial Intelligence |
الوصف: | This paper introduces the Decomposed Requirements Following Ratio (DRFR), a new metric for evaluating Large Language Models' (LLMs) ability to follow instructions. Addressing a gap in current methodologies, DRFR breaks down complex instructions into simpler criteria, facilitating a detailed analysis of LLMs' compliance with various aspects of tasks. Alongside this metric, we present InFoBench, a benchmark comprising 500 diverse instructions and 2,250 decomposed questions across multiple constraint categories. Our experiments compare DRFR with traditional scoring methods and explore annotation sources, including human experts, crowd-sourced workers, and GPT-4. The findings demonstrate DRFR's higher reliability and the effectiveness of using GPT-4 as a cost-efficient annotator. The evaluation of several advanced LLMs using this framework reveals their strengths and areas needing improvement, particularly in complex instruction-following. This study contributes a novel metric and benchmark, offering insights for future LLM development and evaluation. |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2401.03601 |
رقم الأكسشن: | edsarx.2401.03601 |
قاعدة البيانات: | arXiv |
كن أول من يترك تعليقا!