تقرير
Balancing the Tradeoff Between Clustering Value and Interpretability
العنوان: | Balancing the Tradeoff Between Clustering Value and Interpretability |
---|---|
المؤلفون: | Saisubramanian, Sandhya, Galhotra, Sainyam, Zilberstein, Shlomo |
سنة النشر: | 2019 |
المجموعة: | Computer Science Statistics |
مصطلحات موضوعية: | Statistics - Machine Learning, Computer Science - Data Structures and Algorithms, Computer Science - Machine Learning |
الوصف: | Graph clustering groups entities -- the vertices of a graph -- based on their similarity, typically using a complex distance function over a large number of features. Successful integration of clustering approaches in automated decision-support systems hinges on the interpretability of the resulting clusters. This paper addresses the problem of generating interpretable clusters, given features of interest that signify interpretability to an end-user, by optimizing interpretability in addition to common clustering objectives. We propose a $\beta$-interpretable clustering algorithm that ensures that at least $\beta$ fraction of nodes in each cluster share the same feature value. The tunable parameter $\beta$ is user-specified. We also present a more efficient algorithm for scenarios with $\beta\!=\!1$ and analyze the theoretical guarantees of the two algorithms. Finally, we empirically demonstrate the benefits of our approaches in generating interpretable clusters using four real-world datasets. The interpretability of the clusters is complemented by generating simple explanations denoting the feature values of the nodes in the clusters, using frequent pattern mining. Comment: Accepted at AIES 2020 |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/1912.07820 |
رقم الأكسشن: | edsarx.1912.07820 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |