تقرير
CLUE: A Clinical Language Understanding Evaluation for LLMs
العنوان: | CLUE: A Clinical Language Understanding Evaluation for LLMs |
---|---|
المؤلفون: | Dada, Amin, Bauer, Marie, Contreras, Amanda Butler, Koraş, Osman Alperen, Seibold, Constantin Marc, Smith, Kaleb E, Kleesiek, Jens |
سنة النشر: | 2024 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning |
الوصف: | Large Language Models (LLMs) are expected to significantly contribute to patient care, diagnostics, and administrative processes. Emerging biomedical LLMs aim to address healthcare-specific challenges, including privacy demands and computational constraints. Assessing the models' suitability for this sensitive application area is of the utmost importance. However, evaluation has primarily been limited to non-clinical tasks, which do not reflect the complexity of practical clinical applications. To fill this gap, we present the Clinical Language Understanding Evaluation (CLUE), a benchmark tailored to evaluate LLMs on clinical tasks. CLUE includes six tasks to test the practical applicability of LLMs in complex healthcare settings. Our evaluation includes a total of $25$ LLMs. In contrast to previous evaluations, CLUE shows a decrease in performance for nine out of twelve biomedical models. Our benchmark represents a step towards a standardized approach to evaluating and developing LLMs in healthcare to align future model development with the real-world needs of clinical application. We open-source all evaluation scripts and datasets for future research at https://github.com/TIO-IKIM/CLUE. |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2404.04067 |
رقم الأكسشن: | edsarx.2404.04067 |
قاعدة البيانات: | arXiv |
كن أول من يترك تعليقا!