A parallelizable model-based approach for marginal and multivariate clustering

التفاصيل البيبلوغرافية
العنوان: A parallelizable model-based approach for marginal and multivariate clustering
المؤلفون: de Carvalho, Miguel, Venturini, Gabriel Martos, Svetlošák, Andrej
سنة النشر: 2022
المجموعة: Computer Science
Statistics
مصطلحات موضوعية: Statistics - Machine Learning, Computer Science - Machine Learning, Statistics - Methodology
الوصف: This paper develops a clustering method that takes advantage of the sturdiness of model-based clustering, while attempting to mitigate some of its pitfalls. First, we note that standard model-based clustering likely leads to the same number of clusters per margin, which seems a rather artificial assumption for a variety of datasets. We tackle this issue by specifying a finite mixture model per margin that allows each margin to have a different number of clusters, and then cluster the multivariate data using a strategy game-inspired algorithm to which we call Reign-and-Conquer. Second, since the proposed clustering approach only specifies a model for the margins -- but leaves the joint unspecified -- it has the advantage of being partially parallelizable; hence, the proposed approach is computationally appealing as well as more tractable for moderate to high dimensions than a `full' (joint) model-based clustering approach. A battery of numerical experiments on artificial data indicate an overall good performance of the proposed methods in a variety of scenarios, and real datasets are used to showcase their application in practice.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2212.04009
رقم الأكسشن: edsarx.2212.04009
قاعدة البيانات: arXiv