Two-dimensional indexing to provide one-integrated-memory view of distributed memory for a massively-parallel search engine

التفاصيل البيبلوغرافية
العنوان: Two-dimensional indexing to provide one-integrated-memory view of distributed memory for a massively-parallel search engine
المؤلفون: Il-Yeol Song, Tae-Seob Yun, Jun-Sung Kim, Kyu-Young Whang, Hyuk-Yoon Kwon
المصدر: World Wide Web.
بيانات النشر: Springer Science and Business Media LLC, 2018.
سنة النشر: 2018
مصطلحات موضوعية: Web search query, Computer Networks and Communications, Computer science, Search engine indexing, 02 engineering and technology, Parallel computing, Partition (database), Search engine, Hardware and Architecture, 020204 information systems, Scalability, 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Distributed memory, Massively parallel, Software
الوصف: We propose two-dimensional indexing—a novel in-memory indexing architecture that operates over distributed memory of a massively-parallel search engine. The goal of two-dimensional indexing is to provide a one-integrated-memory view as in a single node system using one large integrated memory. In two-dimensional indexing, we partition the entire index into n× m fragments and distribute them over the memories of multiple nodes in such a way that each fragment is entirely stored in main memory of one node. The proposed architecture is not only scalable as it uses a scaled-out shared-nothing architecture but also is capable of achieving low query response time as it processes queries in main memory. We also propose the concept of the one-memory point, which is the amount of the memory space required to completely store the entire index in main memory providing a one-integrated-memory view. We first prove the effectiveness of two-dimensional indexing with single-keyword queries, and then, extend the notion so as to be able to handle multiple-keyword queries. To handle multiple-keyword queries, we adopt pre-join that materializes a multiple-keyword query a priori as well as a new notion of semi-memory join that obviates extensive communication overhead to perform join across multiple nodes. In experiments using the real-life search query set over a database consisting of 100 million Web documents crawled, we show that two-dimensional indexing can effectively provide a one-integrated-memory view without too much of additional memory compared with the single node system using one large integrated memory. We also show that, with a six-node prototype, in an ideal case, it significantly improves the query processing performance over a disk-based search engine with an equivalent amount of in-memory buffer but without two-dimensional indexing — by up to 535.54 times. This improvement is expected to get larger as the system is scaled-out with a larger number of machines.
تدمد: 1573-1413
1386-145X
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_________::b42d8642ba23585370b124860aad233f
https://doi.org/10.1007/s11280-018-0647-1
حقوق: CLOSED
رقم الأكسشن: edsair.doi...........b42d8642ba23585370b124860aad233f
قاعدة البيانات: OpenAIRE