Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning CodeLLMs

التفاصيل البيبلوغرافية
العنوان: Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning CodeLLMs
المؤلفون: Hu, Zichao, Li, Junyi Jessy, Guha, Arjun, Biswas, Joydeep
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Robotics
الوصف: Large language models (LLMs) have shown great promise at generating robot programs from natural language given domain-specific robot application programming interfaces (APIs). However, the performance gap between proprietary LLMs and smaller open-weight LLMs remains wide. This raises a question: Can we fine-tune smaller open-weight LLMs for generating domain-specific robot programs to close the performance gap with proprietary LLMs? While Self-Instruct is a promising solution by generating a diverse set of training data, it cannot verify the correctness of these programs. In contrast, a robot simulator with a well-defined world can identify execution errors but limits the diversity of programs that it can verify. In this work, we introduce Robo-Instruct, which brings the best of both worlds -- it promotes the diversity of Self-Instruct while providing the correctness of simulator-based checking. Robo-Instruct introduces RoboSim to synthesize a consistent world state on the fly by inferring properties relevant to the program being checked, and simulating actions accordingly. Furthermore, the instructions and programs generated by Self-Instruct may be subtly inconsistent -- such as the program missing a step implied by the instruction. Robo-Instruct further addresses this with InstAlign, an instruction-program alignment procedure that revises the task instruction to reflect the actual results of the generated program. Given a few seed task descriptions and the robot APIs, Robo-Instruct is capable of generating a training dataset using only a small open-weight model. This dataset can then be used to fine-tune small open-weight language models, enabling them to match or even exceed the performance of several proprietary LLMs, such as GPT-3.5-Turbo and Gemini-Pro.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2405.20179
رقم الأكسشن: edsarx.2405.20179
قاعدة البيانات: arXiv