دورية أكاديمية

Empirical validation of an automated approach to data use oversight.

التفاصيل البيبلوغرافية
العنوان: Empirical validation of an automated approach to data use oversight.
المؤلفون: Cabili MN; Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA., Lawson J; Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA., Saltzman A; Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA., Rushton G; Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA., O'Rourke P; Massachusetts General Brigham, Boston, MA, USA., Wilbanks J; Sage Bionetworks, Seattle, WA, USA., Rodriguez LL; Patient-Centered Outcomes Research Institute (PCORI), Washington, DC, USA., Nyronen T; ELIXIR Finland, CSC - IT Center for Science, Espoo, Finland., Courtot M; European Molecular Biology Laboratory - European Bioinformatics Institute (EMBL-EBI), Hinxton, UK., Donnelly S; Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA., Philippakis AA; Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, MA, USA.
المصدر: Cell genomics [Cell Genom] 2021 Nov 10; Vol. 1 (2), pp. 100031. Date of Electronic Publication: 2021 Nov 10 (Print Publication: 2021).
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Elsevier, Inc Country of Publication: United States NLM ID: 9918284260106676 Publication Model: eCollection Cited Medium: Internet ISSN: 2666-979X (Electronic) Linking ISSN: 2666979X NLM ISO Abbreviation: Cell Genom Subsets: PubMed not MEDLINE
أسماء مطبوعة: Original Publication: [New York] : Elsevier, Inc., [2021]-
مستخلص: The current paradigm for data use oversight of biomedical datasets is onerous, extending the timescale and resources needed to obtain access for secondary analyses, thus hindering scientific discovery. For a researcher to utilize a controlled-access dataset, a data access committee must review her research plans to determine whether they are consistent with the data use limitations (DULs) specified by the informed consent form. The newly created GA4GH data use ontology (DUO) holds the potential to streamline this process by making data use oversight computable. Here, we describe an open-source software platform, the Data Use Oversight System (DUOS), that connects with DUO terminology to enable automated data use oversight. We analyze dbGaP data acquired since 2006, finding an exponential increase in data access requests, which will not be sustainable with current manual oversight review. We perform an empirical evaluation of DUOS and DUO on selected datasets from the Broad Institute's data repository. We were able to structure 118/123 of the evaluated DULs (96%) and 52/52 (100%) of research proposals using DUO terminology, and we find that DUOS' automated data access adjudication in all cases agreed with the DAC manual review. This first empirical evaluation of the feasibility of automated data use oversight demonstrates comparable accuracy to human-based data access oversight in real-world data governance.
Competing Interests: A.A.P. is a venture partner at GV, a venture capital group within Alphabet. In that capacity, he makes investments in companies in both the life sciences and data sciences. He has received funding from Alphabet, Microsoft, Intel, IBM, Rakuten, Bayer, Novartis, and Pfizer. M.N.C. is an employee and equity holder of Foundation Medicine. J.W. is an employee and equity holder of Biogen.
(© 2021.)
References: Nat Genet. 2014 Sep;46(9):934-8. (PMID: 25162809)
Nucleic Acids Res. 2019 Jan 8;47(D1):D955-D962. (PMID: 30407550)
Cell Genom. 2021 Nov 10;1(2):None. (PMID: 34820659)
Nat Genet. 2015 Jul;47(7):692-5. (PMID: 26111507)
Cell Genom. 2021 Nov 10;1(2):None. (PMID: 34820660)
PLoS Genet. 2016 Jan 21;12(1):e1005772. (PMID: 26796797)
NPJ Genom Med. 2018 Jul 23;3:17. (PMID: 30062047)
Nucleic Acids Res. 2020 Jan 8;48(D1):D704-D715. (PMID: 31701156)
Cell Genom. 2021 Nov 10;1(2):. (PMID: 35072136)
Nat Genet. 2007 Oct;39(10):1181-6. (PMID: 17898773)
Sci Data. 2018 Mar 14;5:180039. (PMID: 29537396)
Science. 2016 Jun 10;352(6291):1278-80. (PMID: 27284183)
F1000Res. 2018 Aug 6;7:. (PMID: 30254736)
Nature. 2021 Feb;590(7845):198-201. (PMID: 33568833)
فهرسة مساهمة: Keywords: Data Access Committee; Data Use; GA4GH; Passport
تواريخ الأحداث: Date Created: 20230213 Latest Revision: 20230214
رمز التحديث: 20240829
مُعرف محوري في PubMed: PMC9903839
DOI: 10.1016/j.xgen.2021.100031
PMID: 36778584
قاعدة البيانات: MEDLINE
الوصف
تدمد:2666-979X
DOI:10.1016/j.xgen.2021.100031