TRAMS: Training-free Memory Selection for Long-range Language Modeling

التفاصيل البيبلوغرافية
العنوان: TRAMS: Training-free Memory Selection for Long-range Language Modeling
المؤلفون: Yu, Haofei, Wang, Cunxiang, Zhang, Yue, Bi, Wei
سنة النشر: 2023
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language
الوصف: The Transformer architecture is crucial for numerous AI models, but it still faces challenges in long-range language modeling. Though several specific transformer architectures have been designed to tackle issues of long-range dependencies, existing methods like Transformer-XL are plagued by a high percentage of ineffective memories. In this study, we present a plug-and-play strategy, known as TRAining-free Memory Selection (TRAMS), that selects tokens participating in attention calculation based on one simple metric. This strategy allows us to keep tokens that are likely to have a high attention score with the current queries and ignore the other ones. We have tested our approach on the word-level benchmark (WikiText-103) and the character-level benchmark (enwik8), and the results indicate an improvement without having additional training or adding additional parameters.
Comment: Findings of EMNLP 2023
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2310.15494
رقم الأكسشن: edsarx.2310.15494
قاعدة البيانات: arXiv