A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?

التفاصيل البيبلوغرافية
العنوان: A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?
المؤلفون: Liu, Xinyu, Shen, Shuyu, Li, Boyan, Ma, Peixian, Jiang, Runzhi, Luo, Yuyu, Zhang, Yuxin, Fan, Ju, Li, Guoliang, Tang, Nan
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Databases
الوصف: Translating users' natural language queries (NL) into SQL queries (i.e., NL2SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of NL2SQL has been greatly enhanced with the emergence of Large Language Models (LLMs). In this survey, we provide a comprehensive review of NL2SQL techniques powered by LLMs, covering its entire lifecycle from the following four aspects: (1) Model: NL2SQL translation techniques that tackle not only NL ambiguity and under-specification, but also properly map NL with database schema and instances; (2) Data: From the collection of training data, data synthesis due to training data scarcity, to NL2SQL benchmarks; (3) Evaluation: Evaluating NL2SQL methods from multiple angles using different metrics and granularities; and (4) Error Analysis: analyzing NL2SQL errors to find the root cause and guiding NL2SQL models to evolve. Moreover, we provide a rule of thumb for developing NL2SQL solutions. Finally, we discuss the research challenges and open problems of NL2SQL in the LLMs era.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2408.05109
رقم الأكسشن: edsarx.2408.05109
قاعدة البيانات: arXiv