دورية أكاديمية

Paying it forward: Crowdsourcing the harmonisation and linking of taxon names and biodiversity identifiers.

التفاصيل البيبلوغرافية
العنوان: Paying it forward: Crowdsourcing the harmonisation and linking of taxon names and biodiversity identifiers.
المؤلفون: Seah BKB; Thünen Institute for Biodiversity, Braunschweig, Germany Thünen Institute for Biodiversity Braunschweig Germany.
المصدر: Biodiversity data journal [Biodivers Data J] 2023 Nov 24; Vol. 11, pp. e114076. Date of Electronic Publication: 2023 Nov 24 (Print Publication: 2023).
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Pensoft Publishers Country of Publication: Bulgaria NLM ID: 101619899 Publication Model: eCollection Cited Medium: Print ISSN: 1314-2828 (Print) Linking ISSN: 13142828 NLM ISO Abbreviation: Biodivers Data J Subsets: PubMed not MEDLINE
أسماء مطبوعة: Original Publication: Sofia : Pensoft Publishers, [2013]-
مستخلص: Linking records for the same taxa between different databases is an essential step when working with biodiversity data. However, name-matching alone is error-prone, because of issues such as homonyms (unrelated taxa with the same name) and synonyms (same taxon under different names). Therefore, most projects will require some curation to ensure that taxon identifiers are correctly linked. Unfortunately, formal guidance on such curation is uncommon and these steps are often ad hoc and poorly documented, which hinders transparency and reproducibility, yet the task requires specialist knowledge and cannot be easily automated without careful validation. Here, we present a case study on linking identifiers between the GBIF and NCBI taxonomies for a species checklist. This represents a common scenario: finding published sequence data (from NCBI) for species chosen by occurrence or geographical distribution (from GBIF). Wikidata, a publicly editable knowledge base of structured data, can serve as an additional information source for identifier linking. We suggest a software toolkit for taxon name-matching and data-cleaning, describe common issues encountered during curation and propose concrete steps to address them. For example, about 2.8% of the taxa in our dataset had wrong identifiers linked on Wikidata because of errors in name-matching caused by homonyms. By correcting such errors during data-cleaning, either directly (through editing Wikidata) or indirectly (by reporting errors in GBIF or NCBI), we crowdsource the curation and contribute to community resources, thereby improving the quality of downstream analyses.
Competing Interests: No conflict of interest to declare Disclaimer: This article is (co-)authored by any of the Editors-in-Chief, Managing Editors or their deputies in this journal.
(Brandon Kwee Boon Seah.)
References: PeerJ Comput Sci. 2018 Sep 17;4:e164. (PMID: 33816817)
Zookeys. 2016 Jan 07;(550):207-23. (PMID: 26877660)
Database (Oxford). 2020 Jan 1;2020:. (PMID: 32761142)
New Phytol. 2023 Nov;240(4):1687-1702. (PMID: 37243532)
Database (Oxford). 2022 May 25;2022:. (PMID: 35616100)
Nucleic Acids Res. 2021 Jan 8;49(D1):D613-D621. (PMID: 33211851)
BMC Bioinformatics. 2017 May 26;18(1):279. (PMID: 28549446)
PLoS Comput Biol. 2023 Jul 20;19(7):e1011235. (PMID: 37471307)
Database (Oxford). 2017 Jan 1;2017(1):. (PMID: 28365742)
Elife. 2022 May 26;11:. (PMID: 35616633)
Zookeys. 2016 Jan 07;(550):261-81. (PMID: 26877664)
Database (Oxford). 2018 Jan 1;2018:. (PMID: 29315357)
J Genet Genomics. 2021 Sep 20;48(9):844-850. (PMID: 34001434)
Biodivers Data J. 2016 May 25;(4):e8080. (PMID: 27346955)
Elife. 2020 Mar 17;9:. (PMID: 32180547)
Nat Methods. 2018 Jul;15(7):475-476. (PMID: 29967506)
Biodivers Data J. 2022 Oct 10;10:e86089. (PMID: 36761559)
PeerJ. 2022 Jul 7;10:e13712. (PMID: 35821898)
فهرسة مساهمة: Keywords: biodiversity informatics; data curation; data integration
تواريخ الأحداث: Date Created: 20240205 Latest Revision: 20240206
رمز التحديث: 20240206
مُعرف محوري في PubMed: PMC10838036
DOI: 10.3897/BDJ.11.e114076
PMID: 38312332
قاعدة البيانات: MEDLINE
الوصف
تدمد:1314-2828
DOI:10.3897/BDJ.11.e114076