دورية أكاديمية

Understanding the performance and reliability of NLP tools: a comparison of four NLP tools predicting stroke phenotypes in radiology reports.

التفاصيل البيبلوغرافية
العنوان: Understanding the performance and reliability of NLP tools: a comparison of four NLP tools predicting stroke phenotypes in radiology reports.
المؤلفون: Casey A; Advanced Care Research Centre, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom., Davidson E; Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom., Grover C; School of Informatics, University of Edinburgh, Edinburgh, United Kingdom., Tobin R; School of Informatics, University of Edinburgh, Edinburgh, United Kingdom., Grivas A; School of Informatics, University of Edinburgh, Edinburgh, United Kingdom., Zhang H; Advanced Care Research Centre, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom., Schrempf P; Canon Medical Research Europe Ltd., AI Research, Edinburgh, United Kingdom.; School of Computer Science, University of St Andrews, St Andrews, United Kingdom., O'Neil AQ; Canon Medical Research Europe Ltd., AI Research, Edinburgh, United Kingdom.; School of Engineering, University of Edinburgh, Edinburgh, United Kingdom., Lee L; Medical School, University of Edinburgh, Edinburgh, United Kingdom., Walsh M; Intensive Care Department, University Hospitals Bristol and Weston, Bristol, United Kingdom., Pellie F; National Horizons Centre, Teesside University, Darlington, United Kingdom.; School of Health and Life Sciences, Teesside University, Middlesbrough, United Kingdom., Ferguson K; Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom., Cvoro V; Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom.; Department of Geriatric Medicine, NHS Fife, Fife, United Kingdom., Wu H; Institute of Health Informatics, University College London, London, United Kingdom.; Alan Turing Institute, London, United Kingdom., Whalley H; Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom.; Generation Scotland, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom., Mair G; Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom.; Neuroradiology, Department of Clinical Neurosciences, NHS Lothian, Edinburgh, United Kingdom., Whiteley W; Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom.; Neuroradiology, Department of Clinical Neurosciences, NHS Lothian, Edinburgh, United Kingdom., Alex B; Edinburgh Futures Institute, University of Edinburgh, Edinburgh, United Kingdom.; School of Literatures, Languages and Cultures, University of Edinburgh, Edinburgh, United Kingdom.
المصدر: Frontiers in digital health [Front Digit Health] 2023 Sep 28; Vol. 5, pp. 1184919. Date of Electronic Publication: 2023 Sep 28 (Print Publication: 2023).
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Frontiers Media S.A Country of Publication: Switzerland NLM ID: 101771889 Publication Model: eCollection Cited Medium: Internet ISSN: 2673-253X (Electronic) Linking ISSN: 2673253X NLM ISO Abbreviation: Front Digit Health Subsets: PubMed not MEDLINE
أسماء مطبوعة: Original Publication: [Lausanne, Switzerland] : Frontiers Media S.A., [2019]-
مستخلص: Background: Natural language processing (NLP) has the potential to automate the reading of radiology reports, but there is a need to demonstrate that NLP methods are adaptable and reliable for use in real-world clinical applications.
Methods: We tested the F1 score, precision, and recall to compare NLP tools on a cohort from a study on delirium using images and radiology reports from NHS Fife and a population-based cohort (Generation Scotland) that spans multiple National Health Service health boards. We compared four off-the-shelf rule-based and neural NLP tools (namely, EdIE-R, ALARM+, ESPRESSO, and Sem-EHR) and reported on their performance for three cerebrovascular phenotypes, namely, ischaemic stroke, small vessel disease (SVD), and atrophy. Clinical experts from the EdIE-R team defined phenotypes using labelling techniques developed in the development of EdIE-R, in conjunction with an expert researcher who read underlying images.
Results: EdIE-R obtained the highest F1 score in both cohorts for ischaemic stroke, ≥93%, followed by ALARM+, ≥87%. The F1 score of ESPRESSO was ≥74%, whilst that of Sem-EHR is ≥66%, although ESPRESSO had the highest precision in both cohorts, 90% and 98%. For F1 scores for SVD, EdIE-R scored ≥98% and ALARM+ ≥90%. ESPRESSO scored lowest with ≥77% and Sem-EHR ≥81%. In NHS Fife, F1 scores for atrophy by EdIE-R and ALARM+ were 99%, dropping in Generation Scotland to 96% for EdIE-R and 91% for ALARM+. Sem-EHR performed lowest for atrophy at 89% in NHS Fife and 73% in Generation Scotland. When comparing NLP tool output with brain image reads using F1 scores, ALARM+ scored 80%, outperforming EdIE-R at 66% in ischaemic stroke. For SVD, EdIE-R performed best, scoring 84%, with Sem-EHR 82%. For atrophy, EdIE-R and both ALARM+ versions were comparable at 80%.
Conclusions: The four NLP tools show varying F1 (and precision/recall) scores across all three phenotypes, although more apparent for ischaemic stroke. If NLP tools are to be used in clinical settings, this cannot be performed "out of the box." It is essential to understand the context of their development to assess whether they are suitable for the task at hand or whether further training, re-training, or modification is required to adapt tools to the target task.
Competing Interests: PS and AO were employed by Canon Medical Research Europe Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
(© 2023 Casey, Davidson, Grover, Tobin, Grivas, Zhang, Schrempf, O’Neil, Lee, Walsh, Pellie, Ferguson, Cvero, Wu, Whalley, Mair, Whiteley and Alex.)
References: BMC Med Inform Decis Mak. 2021 Jun 15;21(1):191. (PMID: 34130677)
BMC Med Genet. 2006 Oct 02;7:74. (PMID: 17014726)
NPJ Digit Med. 2022 Dec 21;5(1):186. (PMID: 36544046)
J Am Med Inform Assoc. 2016 Apr;23(e1):e113-7. (PMID: 26567329)
BMC Med Inform Decis Mak. 2019 Sep 9;19(1):184. (PMID: 31500613)
J Digit Imaging. 2020 Oct;33(5):1194-1201. (PMID: 32813098)
J Am Coll Radiol. 2017 Jun;14(6):757-764. (PMID: 28476609)
Radiology. 2016 May;279(2):329-43. (PMID: 27089187)
J Biomed Semantics. 2019 Nov 12;10(Suppl 1):23. (PMID: 31711539)
BMC Med Inform Decis Mak. 2020 Mar 30;20(1):60. (PMID: 32228556)
PLoS One. 2020 Jul 1;15(7):e0214775. (PMID: 32609723)
JMIR Med Inform. 2019 Apr 21;7(2):e12109. (PMID: 31066686)
BMC Med Inform Decis Mak. 2021 Jun 3;21(1):179. (PMID: 34082729)
J Am Coll Radiol. 2018 Mar;15(3 Pt A):422-428. (PMID: 29502651)
معلومات مُعتمدة: United Kingdom WT_ Wellcome Trust
فهرسة مساهمة: Keywords: brain radiology; electronic health records; natural language processing; stroke phenotype
تواريخ الأحداث: Date Created: 20231016 Latest Revision: 20240210
رمز التحديث: 20240210
مُعرف محوري في PubMed: PMC10569314
DOI: 10.3389/fdgth.2023.1184919
PMID: 37840686
قاعدة البيانات: MEDLINE
الوصف
تدمد:2673-253X
DOI:10.3389/fdgth.2023.1184919