RDDShare: Reusing Results of Spark RDD

التفاصيل البيبلوغرافية
العنوان: RDDShare: Reusing Results of Spark RDD
المؤلفون: Yan Zhou, Tang Jian-Chao, Yang Shu-qiang, Huang Chao-Qiang
المصدر: DSC
بيانات النشر: IEEE, 2016.
سنة النشر: 2016
مصطلحات موضوعية: SQL, Database, Computer science, business.industry, Semantics (computer science), Computation, Big data, 02 engineering and technology, Reuse, computer.software_genre, Hotspot (Wi-Fi), 020204 information systems, Spark (mathematics), 0202 electrical engineering, electronic engineering, information engineering, 020201 artificial intelligence & image processing, Cache, business, computer, computer.programming_language
الوصف: In recent years, Spark has become a hotspot for big data processing. For a single user, Spark provides the cache method to share the results between the jobs in a single application. When accessed concurrently by multi users, there may exist same computation among the submitted applications, however, Spark does not provide a method to share computing results between applications. In traditional databases, one way to optimize the performance of queries is to cache part or all of the results of a query to share with other requests. Based on this, we propose RDDShare system based on Spark SQL to manage the cache and reuse the results. Finally, the results of simulate experiments show that RDDShare system can significant optimize the query performance of Spark SQL.
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_________::0a20c309215eec3fb4fe93daef8d593d
https://doi.org/10.1109/dsc.2016.80
رقم الأكسشن: edsair.doi...........0a20c309215eec3fb4fe93daef8d593d
قاعدة البيانات: OpenAIRE