دورية أكاديمية

Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm

التفاصيل البيبلوغرافية
العنوان: Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm
المؤلفون: Liang Chen, Weinan Wang, Yuyao Zhai, Minghua Deng
المصدر: Frontiers in Genetics, Vol 11 (2020)
بيانات النشر: Frontiers Media S.A., 2020.
سنة النشر: 2020
المجموعة: LCC:Genetics
مصطلحات موضوعية: single-cell RNA sequencing, UMI count data, deep autoencoder, statistical modeling, adaptive fuzzy k-means clustering, Genetics, QH426-470
الوصف: Single-cell RNA sequencing technologies have enabled us to study tissue heterogeneity at cellular resolution. Fast-developing sequencing platforms like droplet-based sequencing make it feasible to parallel process thousands of single cells effectively. Although a unique molecular identifier (UMI) can remove bias from amplification noise to a certain extent, clustering for such sparse and high-dimensional large-scale discrete data remains intractable and challenging. Most existing deep learning-based clustering methods utilize the mean square error or negative binomial distribution with or without zero inflation to denoise single-cell UMI count data, which may underfit or overfit the gene expression profiles. In addition, neglecting the molecule sampling mechanism and extracting representation by simple linear dimension reduction with a hard clustering algorithm may distort data structure and lead to spurious analytical results. In this paper, we combined the deep autoencoder technique with statistical modeling and developed a novel and effective clustering method, scDMFK, for single-cell transcriptome UMI count data. ScDMFK utilizes multinomial distribution to characterize data structure and draw support from neural network to facilitate model parameter estimation. In the learned low-dimensional latent space, we proposed an adaptive fuzzy k-means algorithm with entropy regularization to perform soft clustering. Various simulation scenarios and the analysis of 10 real datasets have shown that scDMFK outperforms other state-of-the-art methods with respect to data modeling and clustering algorithms. Besides, scDMFK has excellent scalability for large-scale single-cell datasets.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 1664-8021
Relation: https://www.frontiersin.org/article/10.3389/fgene.2020.00295/full; https://doaj.org/toc/1664-8021
DOI: 10.3389/fgene.2020.00295
URL الوصول: https://doaj.org/article/1c61d0a5fe974944931cf010514ffac3
رقم الأكسشن: edsdoj.1c61d0a5fe974944931cf010514ffac3
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:16648021
DOI:10.3389/fgene.2020.00295