Augmenting NLP data to counter Annotation Artifacts for NLI Tasks

التفاصيل البيبلوغرافية
العنوان: Augmenting NLP data to counter Annotation Artifacts for NLI Tasks
المؤلفون: Bhullar, Armaan Singh
سنة النشر: 2023
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Machine Learning, I.2.7
الوصف: In this paper, we explore Annotation Artifacts - the phenomena wherein large pre-trained NLP models achieve high performance on benchmark datasets but do not actually "solve" the underlying task and instead rely on some dataset artifacts (same across train, validation, and test sets) to figure out the right answer. We explore this phenomenon on the well-known Natural Language Inference task by first using contrast and adversarial examples to understand limitations to the model's performance and show one of the biases arising from annotation artifacts (the way training data was constructed by the annotators). We then propose a data augmentation technique to fix this bias and measure its effectiveness.
Comment: Submitted as part of NLP Research Project Gregg Durett of the University of Texas at Austin
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2302.04700
رقم الأكسشن: edsarx.2302.04700
قاعدة البيانات: arXiv