دورية أكاديمية

A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning.

التفاصيل البيبلوغرافية
العنوان: A hybrid Chinese word segmentation model for quality management-related texts based on transfer learning.
المؤلفون: Peihan Wen, Linhan Feng, Tian Zhang
المصدر: PLoS ONE, Vol 17, Iss 10, p e0270154 (2022)
بيانات النشر: Public Library of Science (PLoS), 2022.
سنة النشر: 2022
المجموعة: LCC:Medicine
LCC:Science
مصطلحات موضوعية: Medicine, Science
الوصف: Text information mining is a key step to data-driven automatic/semi-automatic quality management (QM). For Chinese texts, a word segmentation algorithm is necessary for pre-processing since there are no explicit marks to define word boundaries. Because of intrinsic characteristics of QM-related texts, word segmentation algorithms for normal Chinese texts cannot be directly applied. Hence, based on the analysis of QM-related texts, we summarized six features, and proposed a hybrid Chinese word segmentation model by means of integrating transfer learning (TL), bidirectional long-short term memory (Bi-LSTM), multi-head attention (MA), and conditional random field (CRF) to construct the mTL-Bi-LSTM-MA-CRF model, considering insufficient samples of QM-related texts and excessive cutting of idioms. The mTL-Bi-LSTM-MA-CRF model is composed of two steps. Firstly, based on a word embedding space, the Bi-LSTM is introduced for context information learning, and the MA mechanism is selected to allocate attention among subspaces, and then the CRF is used to learn label sequence constraints. Secondly, a modified TL method is put forward for text feature extraction, adaptive layer weights learning, and loss function correction for selective learning. Experimental results show that the proposed model can achieve good word segmentation results with only a relatively small set of samples.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 1932-6203
Relation: https://doaj.org/toc/1932-6203
DOI: 10.1371/journal.pone.0270154
URL الوصول: https://doaj.org/article/a1369f99a5094ce6a075a82a6ba8558c
رقم الأكسشن: edsdoj.1369f99a5094ce6a075a82a6ba8558c
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:19326203
DOI:10.1371/journal.pone.0270154