SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection

التفاصيل البيبلوغرافية
العنوان: SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection
المؤلفون: Ye, Ke, Jiang, Heinrich, Rostamizadeh, Afshin, Chakrabarti, Ayan, DeSalvo, Giulia, Kagy, Jean-François, Karydas, Lazaros, Citovsky, Gui, Kumar, Sanjiv
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Machine Learning, Computer Science - Computation and Language
الوصف: Pre-training large language models is known to be extremely resource intensive and often times inefficient, under-utilizing the information encapsulated in the training text sequences. In this paper, we present SpacTor, a new training procedure consisting of (1) a hybrid objective combining span corruption (SC) and token replacement detection (RTD), and (2) a two-stage curriculum that optimizes the hybrid objective over the initial $\tau$ iterations, then transitions to standard SC loss. We show empirically that the effectiveness of the hybrid objective is tied to the two-stage pre-training schedule, and provide extensive analysis on why this is the case. In our experiments with encoder-decoder architectures (T5) on a variety of NLP tasks, SpacTor-T5 yields the same downstream performance as standard SC pre-training, while enabling a 50% reduction in pre-training iterations and 40% reduction in total FLOPs. Alternatively, given the same amount of computing budget, we find that SpacTor results in significantly improved downstream benchmark performance.
Comment: 9+13 pages, 5 figures
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2401.13160
رقم الأكسشن: edsarx.2401.13160
قاعدة البيانات: arXiv