RuleR: Improving LLM Controllability by Rule-based Data Recycling

التفاصيل البيبلوغرافية
العنوان: RuleR: Improving LLM Controllability by Rule-based Data Recycling
المؤلفون: Li, Ming, Chen, Han, Wang, Chenguang, Nguyen, Dang, Li, Dianqi, Zhou, Tianyi
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning
الوصف: Large language models (LLMs) still lack delicate controllability over their responses, which is critical to enhancing their performance and the user experience. However, curating supervised fine-tuning (SFT) datasets to improve LLM controllability usually relies on human experts or proprietary LLMs, which requires additional costs. To bridge this gap, we propose Rule-based Data Recycling (RuleR), a data augmentation method incorporating multiple constraints into the original data samples according to predefined rules, which creates new training tasks to consolidate the controllability of LLMs. Instead of creating new data from scratch, RuleR ``recycles'' existing data by simply applying rule-based edits to their responses and appending the rule-instructions in their original instructions. Experimental results demonstrate RuleR's effectiveness in improving LLM controllability while maintaining general instruction-following capabilities. The code will be released on https://github.com/MingLiiii/RuleR.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2406.15938
رقم الأكسشن: edsarx.2406.15938
قاعدة البيانات: arXiv