Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study

التفاصيل البيبلوغرافية
العنوان:	Constructing High-Fidelity Phenotype Knowledge Graphs for Infectious Diseases With a Fine-Grained Semantic Information Model: Development and Usability Study
المؤلفون:	Lizong Deng, Luming Chen, Tao Yang, Mi Liu, Shicheng Li, Taijiao Jiang
المصدر:	Journal of Medical Internet Research, Vol 23, Iss 6, p e26892 (2021)
بيانات النشر:	JMIR Publications, 2021.
سنة النشر:	2021
المجموعة:	LCC:Computer applications to medicine. Medical informatics LCC:Public aspects of medicine
مصطلحات موضوعية:	Computer applications to medicine. Medical informatics, R858-859.7, Public aspects of medicine, RA1-1270
الوصف:	BackgroundPhenotypes characterize the clinical manifestations of diseases and provide important information for diagnosis. Therefore, the construction of phenotype knowledge graphs for diseases is valuable to the development of artificial intelligence in medicine. However, phenotype knowledge graphs in current knowledge bases such as WikiData and DBpedia are coarse-grained knowledge graphs because they only consider the core concepts of phenotypes while neglecting the details (attributes) associated with these phenotypes. ObjectiveTo characterize the details of disease phenotypes for clinical guidelines, we proposed a fine-grained semantic information model named PhenoSSU (semantic structured unit of phenotypes). MethodsPhenoSSU is an “entity-attribute-value” model by its very nature, and it aims to capture the full semantic information underlying phenotype descriptions with a series of attributes and values. A total of 193 clinical guidelines for infectious diseases from Wikipedia were selected as the study corpus, and 12 attributes from SNOMED-CT were introduced into the PhenoSSU model based on the co-occurrences of phenotype concepts and attribute values. The expressive power of the PhenoSSU model was evaluated by analyzing whether PhenoSSU instances could capture the full semantics underlying the descriptions of the corresponding phenotypes. To automatically construct fine-grained phenotype knowledge graphs, a hybrid strategy that first recognized phenotype concepts with the MetaMap tool and then predicted the attribute values of phenotypes with machine learning classifiers was developed. ResultsFine-grained phenotype knowledge graphs of 193 infectious diseases were manually constructed with the BRAT annotation tool. A total of 4020 PhenoSSU instances were annotated in these knowledge graphs, and 3757 of them (89.5%) were found to be able to capture the full semantics underlying the descriptions of the corresponding phenotypes listed in clinical guidelines. By comparison, other information models, such as the clinical element model and the HL7 fast health care interoperability resource model, could only capture the full semantics underlying 48.4% (2034/4020) and 21.8% (914/4020) of the descriptions of phenotypes listed in clinical guidelines, respectively. The hybrid strategy achieved an F1-score of 0.732 for the subtask of phenotype concept recognition and an average weighted accuracy of 0.776 for the subtask of attribute value prediction. ConclusionsPhenoSSU is an effective information model for the precise representation of phenotype knowledge for clinical guidelines, and machine learning can be used to improve the efficiency of constructing PhenoSSU-based knowledge graphs. Our work will potentially shift the focus of medical knowledge engineering from a coarse-grained level to a more fine-grained level.
نوع الوثيقة:	article
وصف الملف:	electronic resource
اللغة:	English
تدمد:	1438-8871
Relation:	https://www.jmir.org/2021/6/e26892; https://doaj.org/toc/1438-8871
DOI:	10.2196/26892
URL الوصول:	https://doaj.org/article/a8b47dfba9ff4c5eaa8f0b00485c9abc
رقم الأكسشن:	edsdoj.8b47dfba9ff4c5eaa8f0b00485c9abc
قاعدة البيانات:	Directory of Open Access Journals

Find this article in full text from ProQuest

Full Text Finder

الوصف
تدمد:	14388871
DOI:	10.2196/26892