تقرير
A parallelizable model-based approach for marginal and multivariate clustering
العنوان: | A parallelizable model-based approach for marginal and multivariate clustering |
---|---|
المؤلفون: | de Carvalho, Miguel, Venturini, Gabriel Martos, Svetlošák, Andrej |
سنة النشر: | 2022 |
المجموعة: | Computer Science Statistics |
مصطلحات موضوعية: | Statistics - Machine Learning, Computer Science - Machine Learning, Statistics - Methodology |
الوصف: | This paper develops a clustering method that takes advantage of the sturdiness of model-based clustering, while attempting to mitigate some of its pitfalls. First, we note that standard model-based clustering likely leads to the same number of clusters per margin, which seems a rather artificial assumption for a variety of datasets. We tackle this issue by specifying a finite mixture model per margin that allows each margin to have a different number of clusters, and then cluster the multivariate data using a strategy game-inspired algorithm to which we call Reign-and-Conquer. Second, since the proposed clustering approach only specifies a model for the margins -- but leaves the joint unspecified -- it has the advantage of being partially parallelizable; hence, the proposed approach is computationally appealing as well as more tractable for moderate to high dimensions than a `full' (joint) model-based clustering approach. A battery of numerical experiments on artificial data indicate an overall good performance of the proposed methods in a variety of scenarios, and real datasets are used to showcase their application in practice. |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2212.04009 |
رقم الأكسشن: | edsarx.2212.04009 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |