دورية أكاديمية

Perspectives on automated composition of workflows in the life sciences.

التفاصيل البيبلوغرافية
العنوان: Perspectives on automated composition of workflows in the life sciences.
المؤلفون: Lamprecht AL; Utrecht University, 3584 CS Utrecht, The Netherlands., Palmblad M; Leiden University Medical Center, 2333 ZA, Leiden, The Netherlands., Ison J; French Institute of Bioinformatics, 91057 Évry, France., Schwämmle V; University of Southern Denmark, 5230 Odense M, Denmark., Al Manir MS; University of Virginia, Charlottesville, VA, 22903, USA., Altintas I; University of California San Diego, La Jolla, CA, 92093, USA., Baker CJO; University of New Brunswick, Saint John, E2L 4L5, Canada.; IPSNP Computing Inc., Saint John, E2L 4S6, Canada., Ben Hadj Amor A; Westerdijk Institute, 3584 CT, Utrecht, The Netherlands., Capella-Gutierrez S; Barcelona Supercomputing Center (BSC), 08034, Barcelona, Spain., Charonyktakis P; Gnosis Data Analysis PC, GR-700 13 Heraklion, Greece., Crusoe MR; VU Amsterdam, 1081 HV Amsterdam, The Netherlands., Gil Y; University of Southern California, Marina Del Rey, CA, 90292, USA., Goble C; Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK., Griffin TJ; Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, 55455, USA., Groth P; University of Amsterdam, 1090 GH Amsterdam, The Netherlands., Ienasescu H; Technical University of Denmark, 2800 Kongens Lyngby, Denmark., Jagtap P; Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota, Minneapolis, MN, 55455, USA., Kalaš M; University of Bergen, 5020 Bergen, Norway., Kasalica V; Utrecht University, 3584 CS Utrecht, The Netherlands., Khanteymoori A; Bioinformatics Group, University of Freiburg, 79110 Freiburg, Germany., Kuhn T; VU Amsterdam, 1081 HV Amsterdam, The Netherlands., Mei H; Sequencing Analysis Support Core, Leiden University Medical Center, 2333 ZC Leiden, The Netherlands., Ménager H; Institut Pasteur, 75015 Paris, France., Möller S; IBIMA, Rostock University Medical Center, 18057 Rostock, Germany., Richardson RA; Netherlands eScience Center, 1098 XG Amsterdam, The Netherlands., Robert V; Westerdijk Institute, 3584 CT, Utrecht, The Netherlands., Soiland-Reyes S; Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK.; Informatics Institute, University of Amsterdam, 1090 GH Amsterdam, The Netherlands., Stevens R; Department of Computer Science, The University of Manchester, Manchester, M13 9PL, UK., Szaniszlo S; Westerdijk Institute, 3584 CT, Utrecht, The Netherlands., Verberne S; Leiden Institute of Advanced Computer Science, Leiden University, 2333 BE Leiden, The Netherlands., Verhoeven A; Leiden University Medical Center, 2333 ZA, Leiden, The Netherlands., Wolstencroft K; Leiden Institute of Advanced Computer Science, Leiden University, 2333 BE Leiden, The Netherlands.
المصدر: F1000Research [F1000Res] 2021 Sep 07; Vol. 10, pp. 897. Date of Electronic Publication: 2021 Sep 07 (Print Publication: 2021).
نوع المنشور: Journal Article; Research Support, Non-U.S. Gov't
اللغة: English
بيانات الدورية: Publisher: F1000 Research Ltd Country of Publication: England NLM ID: 101594320 Publication Model: eCollection Cited Medium: Internet ISSN: 2046-1402 (Electronic) Linking ISSN: 20461402 NLM ISO Abbreviation: F1000Res Subsets: MEDLINE
أسماء مطبوعة: Original Publication: London : F1000 Research Ltd
مواضيع طبية MeSH: Biological Science Disciplines* , Computational Biology*, Benchmarking ; Software ; Workflow
مستخلص: Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the "big picture" of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.
Competing Interests: No competing interests were disclosed.
(Copyright: © 2021 Lamprecht AL et al.)
References: Artif Intell Med. 2020 Apr;104:101822. (PMID: 32499001)
Metab Eng. 2018 Jan;45:158-170. (PMID: 29233745)
J Proteome Res. 2014 Feb 7;13(2):890-7. (PMID: 24344789)
Comput Chem. 2001 Dec;26(1):41-9. (PMID: 11765850)
J Biomed Semantics. 2011 Oct 24;2(1):8. (PMID: 22024447)
J Biomed Inform. 2008 Oct;41(5):837-47. (PMID: 18373957)
Nucleic Acids Res. 2016 Jan 4;44(D1):D38-47. (PMID: 26538599)
J Proteome Res. 2021 Apr 2;20(4):2157-2165. (PMID: 33720735)
Nature. 2019 Sep;573(7772):149-150. (PMID: 31477884)
Brief Bioinform. 2020 Sep 25;21(5):1697-1705. (PMID: 31624831)
Gigascience. 2021 Jan 6;10(1):. (PMID: 33404053)
Nucleic Acids Res. 2018 Jul 2;46(W1):W537-W544. (PMID: 29790989)
Philos Trans A Math Phys Eng Sci. 2021 May 17;379(2197):20200221. (PMID: 33775151)
Nucleic Acids Res. 2013 Jul;41(Web Server issue):W557-61. (PMID: 23640334)
Bioinformatics. 2013 May 15;29(10):1325-32. (PMID: 23479348)
Bioinformatics. 2012 Oct 1;28(19):2520-2. (PMID: 22908215)
Sci Data. 2016 Mar 15;3:160018. (PMID: 26978244)
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W729-32. (PMID: 16845108)
F1000Res. 2017 Jan 18;6:52. (PMID: 28344774)
Philos Trans A Math Phys Eng Sci. 2021 May 17;379(2197):20200409. (PMID: 33775138)
Genome Biol. 2010;11(8):R86. (PMID: 20738864)
Inf Softw Technol. 2014 Oct 1;56(10):1219-1232. (PMID: 25125798)
Bioinformatics. 2004 Nov 22;20(17):3045-54. (PMID: 15201187)
Int J Bioinform Res Appl. 2007;3(3):303-25. (PMID: 18048194)
J Proteome Res. 2021 Apr 2;20(4):2056-2061. (PMID: 33625229)
BMC Bioinformatics. 2010 Dec 21;11 Suppl 12:S5. (PMID: 21210984)
BMC Bioinformatics. 2009 Oct 15;10:334. (PMID: 19832968)
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W689-94. (PMID: 20484378)
Nucleic Acids Res. 2010 Jul;38(Web Server issue):W677-82. (PMID: 20501605)
F1000Res. 2017 Jun 13;6:. (PMID: 28751965)
J Proteome Res. 2007 Feb;6(2):654-61. (PMID: 17269722)
J Biomed Semantics. 2013 Mar 13;4(1):9. (PMID: 23497556)
Nat Biotechnol. 2017 Apr 11;35(4):316-319. (PMID: 28398311)
Nat Methods. 2018 Jul;15(7):475-476. (PMID: 29967506)
Gigascience. 2021 Jan 27;10(1):. (PMID: 33506265)
Pac Symp Biocomput. 2019;24:208-219. (PMID: 30864323)
Bioinformatics. 2004 Jun 12;20(9):1466-7. (PMID: 14976030)
Cell Syst. 2019 Nov 27;9(5):508-514.e3. (PMID: 31521606)
J Cheminform. 2015 May 30;7:23. (PMID: 26136848)
فهرسة مساهمة: Keywords: automated workflow composition; bioinformatics; computational pipelines; life sciences; scientific workflows; semantic domain modelling; workflow benchmarking
تواريخ الأحداث: Date Created: 20211122 Date Completed: 20211206 Latest Revision: 20240404
رمز التحديث: 20240404
مُعرف محوري في PubMed: PMC8573700
DOI: 10.12688/f1000research.54159.1
PMID: 34804501
قاعدة البيانات: MEDLINE
الوصف
تدمد:2046-1402
DOI:10.12688/f1000research.54159.1