SkyQuery: An Implementation of a Parallel Probabilistic Join Engine for Cross-Identification of Multiple Astronomical Databases
العنوان: | SkyQuery: An Implementation of a Parallel Probabilistic Join Engine for Cross-Identification of Multiple Astronomical Databases |
---|---|
المؤلفون: | István Csabai, László Dobos, Nolan Li, Tamás Budavári, Alexander S. Szalay |
المصدر: | Lecture Notes in Computer Science ISBN: 9783642312342 SSDBM |
بيانات النشر: | Springer Berlin Heidelberg, 2012. |
سنة النشر: | 2012 |
مصطلحات موضوعية: | SQL, Database, Relational database, Computer science, Association (object-oriented programming), Search engine indexing, Probabilistic logic, computer.software_genre, Identification (information), Workflow, Server, Data mining, computer, computer.programming_language |
الوصف: | Multi-wavelength astronomical studies require cross-identification of detections of the same celestial objects in multiple catalogs based on spherical coordinates and other properties. Because of the large data volumes and spherical geometry, the symmetric N-way association of astronomical detections is a computationally intensive problem, even when sophisticated indexing schemes are used to exclude obviously false candidates. Legacy astronomical catalogs already contain detections of more than a hundred million objects while ongoing and future surveys will produce catalogs of billions of objects with multiple detections of each at different times. One time, pair-wise cross-identification of these large catalogs is not sufficient for many astronomical scenarios. Consequently, a novel system is necessary that can cross-identify multiple catalogs on-demand, efficiently and reliably. In this paper, we present our solution based on a cluster of commodity servers and ordinary relational databases. The cross-identification problems are formulated in a language based on SQL, but extended with special clauses. These special queries are partitioned spatially by coordinate ranges and compiled into a complex workflow of ordinary SQL queries. Workflows are then executed in a parallel framework using a cluster of servers hosting identical mirrors of the same data sets. |
ردمك: | 978-3-642-31234-2 |
URL الوصول: | https://explore.openaire.eu/search/publication?articleId=doi_________::b76d9ac3f1e051d55fe5f12cd5ff5eb4 https://doi.org/10.1007/978-3-642-31235-9_10 |
حقوق: | OPEN |
رقم الأكسشن: | edsair.doi...........b76d9ac3f1e051d55fe5f12cd5ff5eb4 |
قاعدة البيانات: | OpenAIRE |
ردمك: | 9783642312342 |
---|