A Neural Corpus Indexer for Document Retrieval

التفاصيل البيبلوغرافية
العنوان: A Neural Corpus Indexer for Document Retrieval
المؤلفون: Wang, Yujing, Hou, Yingyan, Wang, Haonan, Miao, Ziming, Wu, Shibin, Sun, Hao, Chen, Qi, Xia, Yuqing, Chi, Chengmin, Zhao, Guoshuai, Liu, Zheng, Xie, Xing, Sun, Hao Allen, Deng, Weiwei, Zhang, Qi, Yang, Mao
سنة النشر: 2022
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Information Retrieval
الوصف: Current state-of-the-art document retrieval solutions mainly follow an index-retrieve paradigm, where the index is hard to be directly optimized for the final retrieval target. In this paper, we aim to show that an end-to-end deep neural network unifying training and indexing stages can significantly improve the recall performance of traditional methods. To this end, we propose Neural Corpus Indexer (NCI), a sequence-to-sequence network that generates relevant document identifiers directly for a designated query. To optimize the recall performance of NCI, we invent a prefix-aware weight-adaptive decoder architecture, and leverage tailored techniques including query generation, semantic document identifiers, and consistency-based regularization. Empirical studies demonstrated the superiority of NCI on two commonly used academic benchmarks, achieving +21.4% and +16.8% relative enhancement for Recall@1 on NQ320k dataset and R-Precision on TriviaQA dataset, respectively, compared to the best baseline method.
Comment: 19 pages, 6 figures, accepted by NeurIPS 2022
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2206.02743
رقم الأكسشن: edsarx.2206.02743
قاعدة البيانات: arXiv