Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores

التفاصيل البيبلوغرافية
العنوان: Standardizing the Measurement of Text Diversity: A Tool and a Comparative Analysis of Scores
المؤلفون: Shaib, Chantal, Barrow, Joe, Sun, Jiuding, Siu, Alexa F., Wallace, Byron C., Nenkova, Ani
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language
الوصف: The diversity across outputs generated by large language models shapes the perception of their quality and utility. Prompt leaks, templated answer structure, and canned responses across different interactions are readily noticed by people, but there is no standard score to measure this aspect of model behavior. In this work we empirically investigate diversity scores on English texts. We find that computationally efficient compression algorithms capture information similar to what is measured by slow to compute $n$-gram overlap homogeneity scores. Further, a combination of measures -- compression ratios, self-repetition of long $n$-grams and Self-BLEU and BERTScore -- are sufficient to report, as they have low mutual correlation with each other. The applicability of scores extends beyond analysis of generative models; for example, we highlight applications on instruction-tuning datasets and human-produced texts. We release a diversity score package to facilitate research and invite consistency across reports.
Comment: Preprint
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2403.00553
رقم الأكسشن: edsarx.2403.00553
قاعدة البيانات: arXiv