Finding Data Broadness Via Generalized Nearest Neighbors.

التفاصيل البيبلوغرافية
العنوان: Finding Data Broadness Via Generalized Nearest Neighbors.
المؤلفون: Ioannidis, Yannis, Scholl, Marc H., Schmidt, Joachim W., Matthes, Florian, Hatzopoulos, Mike, Boehm, Klemens, Kemper, Alfons, Grust, Torsten, Boehm, Christian, Venkateswaran, Jayendra, Kahveci, Tamer, Camoglu, Orhan
المصدر: Advances in Database Technology - EDBT 2006; 2006, p645-663, 19p
مستخلص: A data object is broad if it is one of the k-Nearest Neighbors (k-NN) of many data objects. We introduce a new database primitive called Generalized Nearest Neighbor (GNN) to express data broadness. We also develop three strategies to answer GNN queries efficiently for large datasets of multidimensional objects. The R*-Tree based search algorithm generates candidate pages and ranks them based on their distances. Our first algorithm, Fetch All (FA), fetches as many candidate pages as possible. Our second algorithm, Fetch One (FO), fetches one candidate page at a time. Our third algorithm, Fetch Dynamic (FD), dynamically decides on the number of pages that needs to be fetched. We also propose three optimizations, Column Filter, Row Filter and Adaptive Filter, to eliminate pages from each dataset. Column Filter prunes the pages that are guaranteed to be non-broad. Row Filter prunes the pages whose removal do not change the broadness of any data point. Adaptive Filter prunes the search space dynamically along each dimension to eliminate unpromising objects. Our experiments show that FA is the fastest when the buffer size is large and FO is the fastest when the buffer size is small. FD is always either fastest or very close to the faster of FA and FO. FD is significantly faster than the existing methods adapted to the GNN problem. [ABSTRACT FROM AUTHOR]
Copyright of Advances in Database Technology - EDBT 2006 is the property of Springer eBooks and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
قاعدة البيانات: Supplemental Index
الوصف
ردمك:9783540329602
DOI:10.1007/11687238_39