تقرير
Clustering a Mixture of Gaussians with Unknown Covariance
العنوان: | Clustering a Mixture of Gaussians with Unknown Covariance |
---|---|
المؤلفون: | Davis, Damek, Díaz, Mateo, Wang, Kaizheng |
سنة النشر: | 2021 |
المجموعة: | Computer Science Mathematics Statistics |
مصطلحات موضوعية: | Statistics - Machine Learning, Computer Science - Information Theory, Computer Science - Machine Learning, Mathematics - Optimization and Control, Mathematics - Statistics Theory, 62H30, 62H12, 62H05 |
الوصف: | We investigate a clustering problem with data from a mixture of Gaussians that share a common but unknown, and potentially ill-conditioned, covariance matrix. We start by considering Gaussian mixtures with two equally-sized components and derive a Max-Cut integer program based on maximum likelihood estimation. We prove its solutions achieve the optimal misclassification rate when the number of samples grows linearly in the dimension, up to a logarithmic factor. However, solving the Max-cut problem appears to be computationally intractable. To overcome this, we develop an efficient spectral algorithm that attains the optimal rate but requires a quadratic sample size. Although this sample complexity is worse than that of the Max-cut problem, we conjecture that no polynomial-time method can perform better. Furthermore, we gather numerical and theoretical evidence that supports the existence of a statistical-computational gap. Finally, we generalize the Max-Cut program to a $k$-means program that handles multi-component mixtures with possibly unequal weights. It enjoys similar optimality guarantees for mixtures of distributions that satisfy a transportation-cost inequality, encompassing Gaussian and strongly log-concave distributions. Comment: 89 pages |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2110.01602 |
رقم الأكسشن: | edsarx.2110.01602 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |