دورية أكاديمية

Using clinical text to refine unspecific condition codes in Dutch general practitioner EHR data.

التفاصيل البيبلوغرافية
العنوان: Using clinical text to refine unspecific condition codes in Dutch general practitioner EHR data.
المؤلفون: Seinen TM; Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands. Electronic address: t.seinen@erasmusmc.nl., Kors JA; Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands., van Mulligen EM; Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands., Fridgeirsson EA; Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands., Verhamme KM; Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands., Rijnbeek PR; Department of Medical Informatics, Erasmus University Medical Center, Rotterdam, the Netherlands.
المصدر: International journal of medical informatics [Int J Med Inform] 2024 May 29; Vol. 189, pp. 105506. Date of Electronic Publication: 2024 May 29.
Publication Model: Ahead of Print
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Elsevier Science Ireland Ltd Country of Publication: Ireland NLM ID: 9711057 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1872-8243 (Electronic) Linking ISSN: 13865056 NLM ISO Abbreviation: Int J Med Inform Subsets: MEDLINE
أسماء مطبوعة: Original Publication: Shannon, Co. Clare, Ireland : Elsevier Science Ireland Ltd., c1997-
مستخلص: Objective: Observational studies using electronic health record (EHR) databases often face challenges due to unspecific clinical codes that can obscure detailed medical information, hindering precise data analysis. In this study, we aimed to assess the feasibility of refining these unspecific condition codes into more specific codes in a Dutch general practitioner (GP) EHR database by leveraging the available clinical free text.
Methods: We utilized three approaches for text classification-search queries, semi-supervised learning, and supervised learning-to improve the specificity of ten unspecific International Classification of Primary Care (ICPC-1) codes. Two text representations and three machine learning algorithms were evaluated for the (semi-)supervised models. Additionally, we measured the improvement achieved by the refinement process on all code occurrences in the database.
Results: The classification models performed well for most codes. In general, no single classification approach consistently outperformed the others. However, there were variations in the relative performance of the classification approaches within each code and in the use of different text representations and machine learning algorithms. Class imbalance and limited training data affected the performance of the (semi-)supervised models, yet the simple search queries remained particularly effective. Ultimately, the developed models improved the specificity of over half of all the unspecific code occurrences in the database.
Conclusions: Our findings show the feasibility of using information from clinical text to improve the specificity of unspecific condition codes in observational healthcare databases, even with a limited range of machine-learning techniques and modest annotated training sets. Future work could investigate transfer learning, integration of structured data, alternative semi-supervised methods, and validation of models across healthcare settings. The improved level of detail enriches the interpretation of medical information and can benefit observational research and patient care.
Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
(Copyright © 2024 The Author(s). Published by Elsevier B.V. All rights reserved.)
فهرسة مساهمة: Keywords: Electronic Health Record; Machine Learning; Natural Language Processing; Primary Health Care; Text Mining
تواريخ الأحداث: Date Created: 20240531 Latest Revision: 20240531
رمز التحديث: 20240601
DOI: 10.1016/j.ijmedinf.2024.105506
PMID: 38820647
قاعدة البيانات: MEDLINE
الوصف
تدمد:1872-8243
DOI:10.1016/j.ijmedinf.2024.105506