Measure Adaptations and Rank Aggregation for the Selection of Clustering Methods and Sizes For Single Cell Data

التفاصيل البيبلوغرافية
العنوان: Measure Adaptations and Rank Aggregation for the Selection of Clustering Methods and Sizes For Single Cell Data
المؤلفون: Visser, Owen, Datta, Somnath
سنة النشر: 2024
المجموعة: Quantitative Biology
مصطلحات موضوعية: Quantitative Biology - Quantitative Methods, G.3
الوصف: The growing efficiency of single-cell sequencing technology has provided biologists with ample cells to identify and differentiate, often through clustering. Heuristic approaches for clustering method choice have become more prevalent and could lead to inaccurate reports if statistical evaluation of the resulting clusters is omitted. During the advent of microarray data, a similar dilemma was addressed in literature through the provision of supervised and unsupervised measures, which were evaluated through Rank Aggregation. In this paper, these measures are adapted into the single-cell framework through a leave-one-out approach. Additionally, a scheme was created to utilize the information of cluster sizes by using their ranking to assign importance to the aggregation of methods, resulting in one table of methods ranked by cluster sizes. To demonstrate the ensemble of measures and scheme, five benchmark single-cell datasets were clustered with various methods at appropriate cluster sizes. We show that through rank aggregation and our importance scheme, our adapted measures select clustering methods that perform better at cluster sizes associated with true biological groups compared to those selected through traditional measures. For four of the five datasets and with internal measures alone, the rank aggregation scheme could correctly identify methods that performed the best at cluster sizes that match the original biological groups. We plan to package this ensemble of measures in the hopes to provide others with a tool to identify the best performing clustering methods and associated sizes for a variety of single cell datasets.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2407.03467
رقم الأكسشن: edsarx.2407.03467
قاعدة البيانات: arXiv