Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation

التفاصيل البيبلوغرافية
العنوان:	Natural Language Processing and Machine Learning for Identifying Incident Stroke From Electronic Health Records: Algorithm Development and Validation
المؤلفون:	Zhao, Yiqing, Fu, Sunyang, Bielinski, Suzette J, Decker, Paul A, Chamberlain, Alanna M, Roger, Veronique L, Liu, Hongfang, Larson, Nicholas B
المصدر:	Journal of Medical Internet Research, Vol 23, Iss 3, p e22951 (2021)
بيانات النشر:	JMIR Publications, 2021.
سنة النشر:	2021
المجموعة:	LCC:Computer applications to medicine. Medical informatics LCC:Public aspects of medicine
مصطلحات موضوعية:	Computer applications to medicine. Medical informatics, R858-859.7, Public aspects of medicine, RA1-1270
الوصف:	BackgroundStroke is an important clinical outcome in cardiovascular research. However, the ascertainment of incident stroke is typically accomplished via time-consuming manual chart abstraction. Current phenotyping efforts using electronic health records for stroke focus on case ascertainment rather than incident disease, which requires knowledge of the temporal sequence of events. ObjectiveThe aim of this study was to develop a machine learning–based phenotyping algorithm for incident stroke ascertainment based on diagnosis codes, procedure codes, and clinical concepts extracted from clinical notes using natural language processing. MethodsThe algorithm was trained and validated using an existing epidemiology cohort consisting of 4914 patients with atrial fibrillation (AF) with manually curated incident stroke events. Various combinations of feature sets and machine learning classifiers were compared. Using a heuristic rule based on the composition of concepts and codes, we further detected the stroke subtype (ischemic stroke/transient ischemic attack or hemorrhagic stroke) of each identified stroke. The algorithm was further validated using a cohort (n=150) stratified sampled from a population in Olmsted County, Minnesota (N=74,314). ResultsAmong the 4914 patients with AF, 740 had validated incident stroke events. The best-performing stroke phenotyping algorithm used clinical concepts, diagnosis codes, and procedure codes as features in a random forest classifier. Among patients with stroke codes in the general population sample, the best-performing model achieved a positive predictive value of 86% (43/50; 95% CI 0.74-0.93) and a negative predictive value of 96% (96/100). For subtype identification, we achieved an accuracy of 83% in the AF cohort and 80% in the general population sample. ConclusionsWe developed and validated a machine learning–based algorithm that performed well for identifying incident stroke and for determining type of stroke. The algorithm also performed well on a sample from a general population, further demonstrating its generalizability and potential for adoption by other institutions.
نوع الوثيقة:	article
وصف الملف:	electronic resource
اللغة:	English
تدمد:	1438-8871
Relation:	https://www.jmir.org/2021/3/e22951; https://doaj.org/toc/1438-8871
DOI:	10.2196/22951
URL الوصول:	https://doaj.org/article/3f884c9718334376a1e24d6fde949c1d
رقم الأكسشن:	edsdoj.3f884c9718334376a1e24d6fde949c1d
قاعدة البيانات:	Directory of Open Access Journals

Find this article in full text from ProQuest

Full Text Finder

الوصف
تدمد:	14388871
DOI:	10.2196/22951