Bad Form: Comparing Context-Based and Form-Based Few-Shot Learning in Distributional Semantic Models

التفاصيل البيبلوغرافية
العنوان: Bad Form: Comparing Context-Based and Form-Based Few-Shot Learning in Distributional Semantic Models
المؤلفون: Van Hautte, Jeroen, Emerson, Guy, Rei, Marek
سنة النشر: 2019
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Machine Learning
الوصف: Word embeddings are an essential component in a wide range of natural language processing applications. However, distributional semantic models are known to struggle when only a small number of context sentences are available. Several methods have been proposed to obtain higher-quality vectors for these words, leveraging both this context information and sometimes the word forms themselves through a hybrid approach. We show that the current tasks do not suffice to evaluate models that use word-form information, as such models can easily leverage word forms in the training data that are related to word forms in the test data. We introduce 3 new tasks, allowing for a more balanced comparison between models. Furthermore, we show that hyperparameters that have largely been ignored in previous work can consistently improve the performance of both baseline and advanced models, achieving a new state of the art on 4 out of 6 tasks.
Comment: Accepted to the Proceedings of the Second Workshop on Deep Learning for Low-Resource NLP (DeepLo 2019)
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/1910.00275
رقم الأكسشن: edsarx.1910.00275
قاعدة البيانات: arXiv