Evaluating Large Language Models with fmeval

التفاصيل البيبلوغرافية
العنوان:	Evaluating Large Language Models with fmeval
المؤلفون:	Schwöbel, Pola, Franceschi, Luca, Zafar, Muhammad Bilal, Vasist, Keerthan, Malhotra, Aman, Shenhar, Tomer, Tailor, Pinal, Yilmaz, Pinar, Diamond, Michael, Donini, Michele
سنة النشر:	2024
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Computation and Language, Computer Science - Machine Learning
الوصف:	fmeval is an open source library to evaluate large language models (LLMs) in a range of tasks. It helps practitioners evaluate their model for task performance and along multiple responsible AI dimensions. This paper presents the library and exposes its underlying design principles: simplicity, coverage, extensibility and performance. We then present how these were implemented in the scientific and engineering choices taken when developing fmeval. A case study demonstrates a typical use case for the library: picking a suitable model for a question answering task. We close by discussing limitations and further work in the development of the library. fmeval can be found at https://github.com/aws/fmeval.
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2407.12872
رقم الأكسشن:	edsarx.2407.12872
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.