Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text

التفاصيل البيبلوغرافية
العنوان: Synth-SBDH: A Synthetic Dataset of Social and Behavioral Determinants of Health for Clinical Text
المؤلفون: Mitra, Avijit, Druhl, Emily, Goodwin, Raelene, Yu, Hong
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language
الوصف: Social and behavioral determinants of health (SBDH) play a crucial role in health outcomes and are frequently documented in clinical text. Automatically extracting SBDH information from clinical text relies on publicly available good-quality datasets. However, existing SBDH datasets exhibit substantial limitations in their availability and coverage. In this study, we introduce Synth-SBDH, a novel synthetic dataset with detailed SBDH annotations, encompassing status, temporal information, and rationale across 15 SBDH categories. We showcase the utility of Synth-SBDH on three tasks using real-world clinical datasets from two distinct hospital settings, highlighting its versatility, generalizability, and distillation capabilities. Models trained on Synth-SBDH consistently outperform counterparts with no Synth-SBDH training, achieving up to 62.5% macro-F improvements. Additionally, Synth-SBDH proves effective for rare SBDH categories and under-resource constraints. Human evaluation demonstrates a Human-LLM alignment of 71.06% and uncovers areas for future refinements.
Comment: Github: https://github.com/avipartho/Synth-SBDH
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2406.06056
رقم الأكسشن: edsarx.2406.06056
قاعدة البيانات: arXiv