Synth-Empathy: Towards High-Quality Synthetic Empathy Data

التفاصيل البيبلوغرافية
العنوان: Synth-Empathy: Towards High-Quality Synthetic Empathy Data
المؤلفون: Liang, Hao, Sun, Linzhuang, Wei, Jingxuan, Huang, Xijie, Sun, Linkun, Yu, Bihui, He, Conghui, Zhang, Wentao
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Machine Learning
الوصف: In recent years, with the rapid advancements in large language models (LLMs), achieving excellent empathetic response capabilities has become a crucial prerequisite. Consequently, managing and understanding empathetic datasets have gained increasing significance. However, empathetic data are typically human-labeled, leading to insufficient datasets and wasted human labor. In this work, we present Synth-Empathy, an LLM-based data generation and quality and diversity selection pipeline that automatically generates high-quality empathetic data while discarding low-quality data. With the data generated from a low empathetic model, we are able to further improve empathetic response performance and achieve state-of-the-art (SoTA) results across multiple benchmarks. Moreover, our model achieves SoTA performance on various human evaluation benchmarks, demonstrating its effectiveness and robustness in real-world applications. Furthermore, we show the trade-off between data quantity and quality, providing insights into empathetic data generation and selection.
Comment: arXiv admin note: text overlap with arXiv:2407.01937
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2407.21669
رقم الأكسشن: edsarx.2407.21669
قاعدة البيانات: arXiv