Bilevel Relations and Their Applications to Data Insights

التفاصيل البيبلوغرافية
العنوان: Bilevel Relations and Their Applications to Data Insights
المؤلفون: Wu, Xi, Yu, Xiangyao, Deep, Shaleen, Mahmood, Ahmed, Jang, Uyeong, Viglas, Stratis, Jha, Somesh, Cieslewicz, John, Naughton, Jeffrey F.
سنة النشر: 2023
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Databases, Computer Science - Distributed, Parallel, and Cluster Computing, Computer Science - Programming Languages
الوصف: Many data-insight analytic tasks in anomaly detection, metric attribution, and experimentation analysis can be modeled as searching in a large space of tables and finding important ones, where the notion of importance is defined in some adhoc manner. While various frameworks have been proposed (e.g., DIFF, VLDB 2019), a systematic and general treatment is lacking. This paper describes bilevel relations and operators. While a relation (i.e., table) models a set of tuples, a bilevel relation is a dictionary that explicitly models a set of tables, where each ``value'' table is identified by a ``key'' of a (region, features) pair, where region specifies key attributes of the table, and features specify columns of the table. Bilevel relational operators are BilevelRelation-to-BilevelRelation transformations and directly analyze a set of tables. Bilevel relations and operators provide higher level abstractions for creating and manipulating a set of tables, and are compatible with the classic relational algebra. Together, they allow us to construct bilevel queries, which can express succinctly a range of insight-analytical questions with ``search+eval'' character. We have implemented and deployed a query engine for bilevel queries as a service, which is a first of its kind. Bilevel queries pose a rich algorithm and system design space, such as query optimization and data format, in order to evaluate them efficiently. We describe our current designs and lessons, and report empirical evaluations. Bilevel queries have found many useful applications, and have attracted more than 30 internal teams to build data-insight applications with it.
Comment: Some overlap on examples and experiments with arXiv:2302.00120. The latter draft will be revised to focus on implementation
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2311.04824
رقم الأكسشن: edsarx.2311.04824
قاعدة البيانات: arXiv