تقرير
Unsupervised Word Polysemy Quantification with Multiresolution Grids of Contextual Embeddings
العنوان: | Unsupervised Word Polysemy Quantification with Multiresolution Grids of Contextual Embeddings |
---|---|
المؤلفون: | Xypolopoulos, Christos, Tixier, Antoine J. -P., Vazirgiannis, Michalis |
سنة النشر: | 2020 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Computation and Language |
الوصف: | The number of senses of a given word, or polysemy, is a very subjective notion, which varies widely across annotators and resources. We propose a novel method to estimate polysemy, based on simple geometry in the contextual embedding space. Our approach is fully unsupervised and purely data-driven. We show through rigorous experiments that our rankings are well correlated (with strong statistical significance) with 6 different rankings derived from famous human-constructed resources such as WordNet, OntoNotes, Oxford, Wikipedia etc., for 6 different standard metrics. We also visualize and analyze the correlation between the human rankings. A valuable by-product of our method is the ability to sample, at no extra cost, sentences containing different senses of a given word. Finally, the fully unsupervised nature of our method makes it applicable to any language. Code and data are publicly available at https://github.com/ksipos/polysemy-assessment . The paper was accepted as a long paper at EACL 2021. Comment: Equal contribution by Christos Xypolopoulos and Antoine J.-P. Tixier |
نوع الوثيقة: | Working Paper |
DOI: | 10.18653/v1/2021.eacl-main.297 |
URL الوصول: | http://arxiv.org/abs/2003.10224 |
رقم الأكسشن: | edsarx.2003.10224 |
قاعدة البيانات: | arXiv |
DOI: | 10.18653/v1/2021.eacl-main.297 |
---|