دورية أكاديمية

SHEPHARD: a modular and extensible software architecture for analyzing and annotating large protein datasets.

التفاصيل البيبلوغرافية
العنوان: SHEPHARD: a modular and extensible software architecture for analyzing and annotating large protein datasets.
المؤلفون: Ginell GM; Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, 660 South Euclid Avenue, Saint Louis, MO 63110, United States.; Center for Biomolecular Condensates, Washington University in St. Louis, 1 Brookings Drive, Saint Louis, MO 63130, United States., Flynn AJ; Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, 660 South Euclid Avenue, Saint Louis, MO 63110, United States.; Center for Biomolecular Condensates, Washington University in St. Louis, 1 Brookings Drive, Saint Louis, MO 63130, United States., Holehouse AS; Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, 660 South Euclid Avenue, Saint Louis, MO 63110, United States.; Center for Biomolecular Condensates, Washington University in St. Louis, 1 Brookings Drive, Saint Louis, MO 63130, United States.
المصدر: Bioinformatics (Oxford, England) [Bioinformatics] 2023 Aug 01; Vol. 39 (8).
نوع المنشور: Journal Article; Research Support, Non-U.S. Gov't; Research Support, U.S. Gov't, Non-P.H.S.
اللغة: English
بيانات الدورية: Publisher: Oxford University Press Country of Publication: England NLM ID: 9808944 Publication Model: Print Cited Medium: Internet ISSN: 1367-4811 (Electronic) Linking ISSN: 13674803 NLM ISO Abbreviation: Bioinformatics Subsets: MEDLINE
أسماء مطبوعة: Original Publication: Oxford : Oxford University Press, c1998-
مواضيع طبية MeSH: Proteomics* , Proteome*, Software ; Computational Biology ; Molecular Sequence Annotation
مستخلص: Motivation: The emergence of high-throughput experiments and high-resolution computational predictions has led to an explosion in the quality and volume of protein sequence annotations at proteomic scales. Unfortunately, sanity checking, integrating, and analyzing complex sequence annotations remains logistically challenging and introduces a major barrier to entry for even superficial integrative bioinformatics.
Results: To address this technical burden, we have developed SHEPHARD, a Python framework that trivializes large-scale integrative protein bioinformatics. SHEPHARD combines an object-oriented hierarchical data structure with database-like features, enabling programmatic annotation, integration, and analysis of complex datatypes. Importantly SHEPHARD is easy to use and enables a Pythonic interrogation of largescale protein datasets with millions of unique annotations. We use SHEPHARD to examine three orthogonal proteome-wide questions relating protein sequence to molecular function, illustrating its ability to uncover novel biology.
Availability and Implementation: We provided SHEPHARD as both a stand-alone software package (https://github.com/holehouse-lab/shephard), and as a Google Colab notebook with a collection of precomputed proteome-wide annotations (https://github.com/holehouse-lab/shephard-colab).
(© The Author(s) 2023. Published by Oxford University Press.)
References: Nature. 2023 Aug;620(7973):434-444. (PMID: 37468638)
Annu Rev Genomics Hum Genet. 2019 Aug 31;20:99-127. (PMID: 31091417)
Anal Chem. 2016 Jan 5;88(1):74-94. (PMID: 26539879)
Nat Methods. 2021 May;18(5):472-481. (PMID: 33875885)
J Mol Biol. 2021 Oct 1;433(20):167196. (PMID: 34390736)
Annu Rev Anal Chem (Palo Alto Calif). 2014;7:427-54. (PMID: 25014346)
Nucleic Acids Res. 2004 Feb 11;32(3):1037-49. (PMID: 14960716)
Nucleic Acids Res. 2015 Jan;43(Database issue):D204-12. (PMID: 25348405)
Curr Opin Struct Biol. 2015 Jun;32:102-12. (PMID: 25863585)
Emerg Top Life Sci. 2020 Dec 11;4(3):307-329. (PMID: 33078839)
PLoS Biol. 2022 May 16;20(5):e3001636. (PMID: 35576205)
Nature. 2021 Aug;596(7873):590-596. (PMID: 34293799)
Nat Commun. 2022 Apr 1;13(1):1728. (PMID: 35365602)
Elife. 2022 Feb 07;11:. (PMID: 35129437)
Chem Rev. 2015 Mar 25;115(6):2376-418. (PMID: 25688442)
Protein Sci. 2020 Jan;29(1):169-183. (PMID: 31642121)
Elife. 2021 Feb 22;10:. (PMID: 33616531)
Proc Natl Acad Sci U S A. 2019 Apr 16;116(16):7889-7898. (PMID: 30926670)
Cell Rep. 2016 Jun 28;16(1):222-231. (PMID: 27320918)
المشرفين على المادة: 0 (Proteome)
تواريخ الأحداث: Date Created: 20230804 Date Completed: 20230814 Latest Revision: 20230815
رمز التحديث: 20231215
مُعرف محوري في PubMed: PMC10423030
DOI: 10.1093/bioinformatics/btad488
PMID: 37540173
قاعدة البيانات: MEDLINE
الوصف
تدمد:1367-4811
DOI:10.1093/bioinformatics/btad488