Large Language Models in the Clinic: A Comprehensive Benchmark

التفاصيل البيبلوغرافية
العنوان: Large Language Models in the Clinic: A Comprehensive Benchmark
المؤلفون: Liu, Andrew, Zhou, Hongjian, Hua, Yining, Rohanian, Omid, Thakur, Anshul, Clifton, Lei, Clifton, David A.
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
الوصف: The adoption of large language models (LLMs) to assist clinicians has attracted remarkable attention. Existing works mainly adopt the close-ended question-answering (QA) task with answer options for evaluation. However, many clinical decisions involve answering open-ended questions without pre-set options. To better understand LLMs in the clinic, we construct a benchmark ClinicBench. We first collect eleven existing datasets covering diverse clinical language generation, understanding, and reasoning tasks. Furthermore, we construct six novel datasets and complex clinical tasks that are close to real-world practice, i.e., referral QA, treatment recommendation, hospitalization (long document) summarization, patient education, pharmacology QA and drug interaction for emerging drugs. We conduct an extensive evaluation of twenty-two LLMs under both zero-shot and few-shot settings. Finally, we invite medical experts to evaluate the clinical usefulness of LLMs.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2405.00716
رقم الأكسشن: edsarx.2405.00716
قاعدة البيانات: arXiv