Repeated Padding as Data Augmentation for Sequential Recommendation

التفاصيل البيبلوغرافية
العنوان: Repeated Padding as Data Augmentation for Sequential Recommendation
المؤلفون: Dang, Yizhou, Liu, Yuting, Yang, Enneng, Guo, Guibing, Jiang, Linying, Wang, Xingwei, Zhao, Jianzhe
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Information Retrieval
الوصف: Sequential recommendation aims to provide users with personalized suggestions based on their historical interactions. When training sequential models, padding is a widely adopted technique for two main reasons: 1) The vast majority of models can only handle fixed-length sequences; 2) Batching-based training needs to ensure that the sequences in each batch have the same length. The special value \emph{0} is usually used as the padding content, which does not contain the actual information and is ignored in the model calculations. This common-sense padding strategy leads us to a problem that has never been explored before: \emph{Can we fully utilize this idle input space by padding other content to further improve model performance and training efficiency?} In this paper, we propose a simple yet effective padding method called \textbf{Rep}eated \textbf{Pad}ding (\textbf{RepPad}). Specifically, we use the original interaction sequences as the padding content and fill it to the padding positions during model training. This operation can be performed a finite number of times or repeated until the input sequences' length reaches the maximum limit. Our RepPad can be viewed as a sequence-level data augmentation strategy. Unlike most existing works, our method contains no trainable parameters or hyperparameters and is a plug-and-play data augmentation operation. Extensive experiments on various categories of sequential models and five real-world datasets demonstrate the effectiveness and efficiency of our approach. The average recommendation performance improvement is up to 60.3\% on GRU4Rec and 24.3\% on SASRec. We also provide in-depth analysis and explanation of what makes RepPad effective from multiple perspectives. The source code will be released to ensure the reproducibility of our experiments.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2403.06372
رقم الأكسشن: edsarx.2403.06372
قاعدة البيانات: arXiv