Don't Forget Your Reward Values: Language Model Alignment via Value-based Calibration

التفاصيل البيبلوغرافية
العنوان: Don't Forget Your Reward Values: Language Model Alignment via Value-based Calibration
المؤلفون: Mao, Xin, Li, Feng-Lin, Xu, Huimin, Zhang, Wei, Luu, Anh Tuan
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
الوصف: While Reinforcement Learning from Human Feedback (RLHF) significantly enhances the generation quality of Large Language Models (LLMs), recent studies have raised concerns regarding the complexity and instability associated with the Proximal Policy Optimization (PPO) algorithm, proposing a series of order-based calibration methods as viable alternatives. This paper delves further into current order-based methods, examining their inefficiencies in utilizing reward values and addressing misalignment issues. Building upon these findings, we propose a novel \textbf{V}alue-based \textbf{C}ali\textbf{B}ration (VCB) method to better align LLMs with human preferences. Experimental results demonstrate that VCB surpasses existing alignment methods on AI assistant and summarization datasets, providing impressive generalizability, robustness, and stability in diverse settings.
Comment: 19 pages, Under review
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2402.16030
رقم الأكسشن: edsarx.2402.16030
قاعدة البيانات: arXiv