دورية أكاديمية

DeepGSEA: explainable deep gene set enrichment analysis for single-cell transcriptomic data.

التفاصيل البيبلوغرافية
العنوان: DeepGSEA: explainable deep gene set enrichment analysis for single-cell transcriptomic data.
المؤلفون: Xiong G; Department of Computer Science, University of Virginia, Charlottesville, VA, 22904, United States., LeRoy NJ; Center for Public Health Genomics, University of Virginia, Charlottesville, VA, 22904, United States., Bekiranov S; Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, 22908, United States., Sheffield NC; Center for Public Health Genomics, University of Virginia, Charlottesville, VA, 22904, United States., Zhang A; Department of Computer Science, University of Virginia, Charlottesville, VA, 22904, United States.
المصدر: Bioinformatics (Oxford, England) [Bioinformatics] 2024 Jul 01; Vol. 40 (7).
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Oxford University Press Country of Publication: England NLM ID: 9808944 Publication Model: Print Cited Medium: Internet ISSN: 1367-4811 (Electronic) Linking ISSN: 13674803 NLM ISO Abbreviation: Bioinformatics Subsets: MEDLINE
أسماء مطبوعة: Original Publication: Oxford : Oxford University Press, c1998-
مواضيع طبية MeSH: Single-Cell Analysis*/methods , Transcriptome*/genetics , Deep Learning*, Humans ; Gene Expression Profiling/methods ; Sequence Analysis, RNA/methods ; Computational Biology/methods ; Neural Networks, Computer ; Software
مستخلص: Motivation: Gene set enrichment (GSE) analysis allows for an interpretation of gene expression through pre-defined gene set databases and is a critical step in understanding different phenotypes. With the rapid development of single-cell RNA sequencing (scRNA-seq) technology, GSE analysis can be performed on fine-grained gene expression data to gain a nuanced understanding of phenotypes of interest. However, with the cellular heterogeneity in single-cell gene profiles, current statistical GSE analysis methods sometimes fail to identify enriched gene sets. Meanwhile, deep learning has gained traction in applications like clustering and trajectory inference in single-cell studies due to its prowess in capturing complex data patterns. However, its use in GSE analysis remains limited, due to interpretability challenges.
Results: In this paper, we present DeepGSEA, an explainable deep gene set enrichment analysis approach which leverages the expressiveness of interpretable, prototype-based neural networks to provide an in-depth analysis of GSE. DeepGSEA learns the ability to capture GSE information through our designed classification tasks, and significance tests can be performed on each gene set, enabling the identification of enriched sets. The underlying distribution of a gene set learned by DeepGSEA can be explicitly visualized using the encoded cell and cellular prototype embeddings. We demonstrate the performance of DeepGSEA over commonly used GSE analysis methods by examining their sensitivity and specificity with four simulation studies. In addition, we test our model on three real scRNA-seq datasets and illustrate the interpretability of DeepGSEA by showing how its results can be explained.
Availability and Implementation: https://github.com/Teddy-XiongGZ/DeepGSEA.
(© The Author(s) 2024. Published by Oxford University Press.)
References: BMC Med Genomics. 2019 May 31;12(1):79. (PMID: 31151460)
Nat Neurosci. 2023 Mar;26(3):430-446. (PMID: 36732642)
Nucleic Acids Res. 2019 Jul 2;47(W1):W191-W198. (PMID: 31066453)
Nat Commun. 2020 Mar 27;11(1):1585. (PMID: 32221292)
BMC Bioinformatics. 2022 Nov 2;23(1):457. (PMID: 36324085)
Bioinformatics. 2023 Jan 1;39(1):. (PMID: 36426870)
Sci Rep. 2019 Jul 5;9(1):9747. (PMID: 31278367)
Nat Commun. 2020 May 11;11(1):2338. (PMID: 32393754)
Front Genet. 2020 Jun 30;11:654. (PMID: 32695141)
Genome Biol. 2017 Sep 12;18(1):174. (PMID: 28899397)
NAR Genom Bioinform. 2023 Mar 03;5(1):lqad024. (PMID: 36879897)
J Neurosci. 2021 Jun 16;41(24):5315-5329. (PMID: 33980545)
Cell Rep. 2022 Nov 22;41(8):111697. (PMID: 36417885)
Int J Oral Sci. 2021 Nov 15;13(1):36. (PMID: 34782601)
Brief Bioinform. 2021 Jul 20;22(4):. (PMID: 33300547)
Nat Methods. 2017 Nov;14(11):1083-1086. (PMID: 28991892)
Proc Natl Acad Sci U S A. 2005 Oct 25;102(43):15545-50. (PMID: 16199517)
Nature. 2009 Nov 5;462(7269):108-12. (PMID: 19847166)
Bioinformatics. 2004 Jan 1;20(1):93-9. (PMID: 14693814)
J Virol. 2019 Sep 30;93(20):. (PMID: 31375585)
BMC Bioinformatics. 2013 Jan 16;14:7. (PMID: 23323831)
Psychiatr Danub. 2012 Jun;24(2):152-8. (PMID: 22706413)
Bioinformatics. 2023 Aug 1;39(8):. (PMID: 37540223)
Brief Bioinform. 2020 Jul 15;21(4):1209-1223. (PMID: 31243426)
Nat Commun. 2019 Sep 26;10(1):4376. (PMID: 31558714)
معلومات مُعتمدة: R01 HG012558 United States HG NHGRI NIH HHS; 2313865 National Science Foundation; 1R01LM014012 United States NH NIH HHS
تواريخ الأحداث: Date Created: 20240701 Date Completed: 20240710 Latest Revision: 20240712
رمز التحديث: 20240712
مُعرف محوري في PubMed: PMC11236288
DOI: 10.1093/bioinformatics/btae434
PMID: 38950178
قاعدة البيانات: MEDLINE
الوصف
تدمد:1367-4811
DOI:10.1093/bioinformatics/btae434