دورية أكاديمية

Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning.

التفاصيل البيبلوغرافية
العنوان: Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES): a method for populating knowledge bases using zero-shot learning.
المؤلفون: Caufield JH; Biosystems Data Science, Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States., Hegde H; Biosystems Data Science, Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States., Emonet V; Institute of Data Science, Faculty of Science and Engineering, Maastricht University, 6200 MD Maastricht, The Netherlands., Harris NL; Biosystems Data Science, Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States., Joachimiak MP; Biosystems Data Science, Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States., Matentzoglu N; Semanticly, Athens, Greece., Kim H; Robert Bosch LLC, Sunnyvale, CA 94085, United States., Moxon S; Biosystems Data Science, Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States., Reese JT; Biosystems Data Science, Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States., Haendel MA; Department of Biomedical Informatics, University of Colorado, Anschutz Medical Campus, Aurora, CO 80217, United States., Robinson PN; Berlin Institute of Health at Charité, 10178 Berlin, Germany., Mungall CJ; Biosystems Data Science, Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, United States.
المصدر: Bioinformatics (Oxford, England) [Bioinformatics] 2024 Mar 04; Vol. 40 (3).
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Oxford University Press Country of Publication: England NLM ID: 9808944 Publication Model: Print Cited Medium: Internet ISSN: 1367-4811 (Electronic) Linking ISSN: 13674803 NLM ISO Abbreviation: Bioinformatics Subsets: MEDLINE
أسماء مطبوعة: Original Publication: Oxford : Oxford University Press, c1998-
مواضيع طبية MeSH: Semantics* , Knowledge Bases*, Databases, Factual
مستخلص: Motivation: Creating knowledge bases and ontologies is a time consuming task that relies on manual curation. AI/NLP approaches can assist expert curators in populating these knowledge bases, but current approaches rely on extensive training data, and are not able to populate arbitrarily complex nested knowledge schemas.
Results: Here we present Structured Prompt Interrogation and Recursive Extraction of Semantics (SPIRES), a Knowledge Extraction approach that relies on the ability of Large Language Models (LLMs) to perform zero-shot learning and general-purpose query answering from flexible prompts and return information conforming to a specified schema. Given a detailed, user-defined knowledge schema and an input text, SPIRES recursively performs prompt interrogation against an LLM to obtain a set of responses matching the provided schema. SPIRES uses existing ontologies and vocabularies to provide identifiers for matched elements. We present examples of applying SPIRES in different domains, including extraction of food recipes, multi-species cellular signaling pathways, disease treatments, multi-step drug mechanisms, and chemical to disease relationships. Current SPIRES accuracy is comparable to the mid-range of existing Relation Extraction methods, but greatly surpasses an LLM's native capability of grounding entities with unique identifiers. SPIRES has the advantage of easy customization, flexibility, and, crucially, the ability to perform new tasks in the absence of any new training data. This method supports a general strategy of leveraging the language interpreting capabilities of LLMs to assemble knowledge bases, assisting manual knowledge curation and acquisition while supporting validation with publicly-available databases and ontologies external to the LLM.
Availability and Implementation: SPIRES is available as part of the open source OntoGPT package: https://github.com/monarch-initiative/ontogpt.
(© The Author(s) 2024. Published by Oxford University Press.)
References: Database (Oxford). 2016 May 09;2016:. (PMID: 27161011)
Bull Med Libr Assoc. 2000 Jul;88(3):265-6. (PMID: 10928714)
JAMA. 2024 Jan 2;331(1):65-69. (PMID: 38032660)
Nat Commun. 2024 Feb 15;15(1):1418. (PMID: 38360817)
Nucleic Acids Res. 2011 Jul;39(Web Server issue):W541-5. (PMID: 21672956)
Perspect Psychiatr Care. 2009 Jan;45(1):62-5. (PMID: 19154241)
Summit Transl Bioinform. 2009 Mar 01;2009:56-60. (PMID: 21347171)
J Am Med Inform Assoc. 1999 Mar-Apr;6(2):151-62. (PMID: 10094068)
Drug Saf. 1999 Feb;20(2):109-17. (PMID: 10082069)
JMIR Med Inform. 2020 Nov 27;8(11):e23375. (PMID: 33245291)
Clin Transl Sci. 2022 Aug;15(8):1848-1855. (PMID: 36125173)
Nucleic Acids Res. 2018 Jan 4;46(D1):D1074-D1082. (PMID: 29126136)
NPJ Sci Food. 2018 Dec 18;2:23. (PMID: 31304272)
J Pharmacol Exp Ther. 1990 Nov;255(2):836-42. (PMID: 2173761)
AMIA Jt Summits Transl Sci Proc. 2021 May 17;2021:345-354. (PMID: 34457149)
Anticancer Drugs. 1999 Mar;10(3):275-81. (PMID: 10327032)
Brief Bioinform. 2022 Nov 19;23(6):. (PMID: 36156661)
Bioinform Adv. 2022 May 11;2(1):vbac034. (PMID: 36699362)
Clin Transl Sci. 2022 May 25;:. (PMID: 35611543)
Nucleic Acids Res. 2016 Jan 4;44(D1):D1214-9. (PMID: 26467479)
J Cheminform. 2019 Jan 21;11(1):7. (PMID: 30666476)
BMC Bioinformatics. 2019 Jul 29;20(1):407. (PMID: 31357927)
J Biomed Semantics. 2017 Jun 5;8(1):18. (PMID: 28583177)
Sci Data. 2022 Nov 19;9(1):714. (PMID: 36402838)
Nucleic Acids Res. 2018 Jan 4;46(D1):D649-D655. (PMID: 29145629)
Nucleic Acids Res. 2019 Jan 8;47(D1):D330-D338. (PMID: 30395331)
معلومات مُعتمدة: R24 OD011883 United States OD NIH HHS; RM1 HG010860 United States HG NHGRI NIH HHS; U24 HG011449 United States HG NHGRI NIH HHS; R24 OD011883 United States CD ODCDC CDC HHS
تواريخ الأحداث: Date Created: 20240221 Date Completed: 20240311 Latest Revision: 20240501
رمز التحديث: 20240501
مُعرف محوري في PubMed: PMC10924283
DOI: 10.1093/bioinformatics/btae104
PMID: 38383067
قاعدة البيانات: MEDLINE
الوصف
تدمد:1367-4811
DOI:10.1093/bioinformatics/btae104