Evaluating Large Language Models with fmeval

التفاصيل البيبلوغرافية
العنوان: Evaluating Large Language Models with fmeval
المؤلفون: Schwöbel, Pola, Franceschi, Luca, Zafar, Muhammad Bilal, Vasist, Keerthan, Malhotra, Aman, Shenhar, Tomer, Tailor, Pinal, Yilmaz, Pinar, Diamond, Michael, Donini, Michele
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Machine Learning
الوصف: fmeval is an open source library to evaluate large language models (LLMs) in a range of tasks. It helps practitioners evaluate their model for task performance and along multiple responsible AI dimensions. This paper presents the library and exposes its underlying design principles: simplicity, coverage, extensibility and performance. We then present how these were implemented in the scientific and engineering choices taken when developing fmeval. A case study demonstrates a typical use case for the library: picking a suitable model for a question answering task. We close by discussing limitations and further work in the development of the library. fmeval can be found at https://github.com/aws/fmeval.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2407.12872
رقم الأكسشن: edsarx.2407.12872
قاعدة البيانات: arXiv