Airavata: Introducing Hindi Instruction-tuned LLM

التفاصيل البيبلوغرافية
العنوان: Airavata: Introducing Hindi Instruction-tuned LLM
المؤلفون: Gala, Jay, Jayakumar, Thanmay, Husain, Jaavid Aktar, M, Aswanth Kumar, Khan, Mohammed Safi Ur Rahman, Kanojia, Diptesh, Puduppully, Ratish, Khapra, Mitesh M., Dabre, Raj, Murthy, Rudra, Kunchukuttan, Anoop
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
الوصف: We announce the initial release of "Airavata," an instruction-tuned LLM for Hindi. Airavata was created by fine-tuning OpenHathi with diverse, instruction-tuning Hindi datasets to make it better suited for assistive tasks. Along with the model, we also share the IndicInstruct dataset, which is a collection of diverse instruction-tuning datasets to enable further research for Indic LLMs. Additionally, we present evaluation benchmarks and a framework for assessing LLM performance across tasks in Hindi. Currently, Airavata supports Hindi, but we plan to expand this to all 22 scheduled Indic languages. You can access all artifacts at https://ai4bharat.github.io/airavata.
Comment: Work in progress
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2401.15006
رقم الأكسشن: edsarx.2401.15006
قاعدة البيانات: arXiv