A Cross-Lingual Sentence Similarity Calculation Method With Multifeature Fusion

التفاصيل البيبلوغرافية
العنوان:	A Cross-Lingual Sentence Similarity Calculation Method With Multifeature Fusion
المؤلفون:	Lingxin Wang, Shengquan Liu, Longye Qiao, Weiwei Sun, Qi Sun, Huaqing Cheng
المصدر:	IEEE Access, Vol 10, Pp 30666-30675 (2022)
بيانات النشر:	IEEE, 2022.
سنة النشر:	2022
المجموعة:	LCC:Electrical engineering. Electronics. Nuclear engineering
مصطلحات موضوعية:	Cross-language, pre-trained model, sentence similarity, feature fusion, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
الوصف:	Cross-language sentence similarity computation is among the focuses of research in natural language processing (NLP). At present, some researchers have introduced fine-grained word and character features to help models understand sentence meanings, but they do not consider coarse-grained prior knowledge at the sentence level. Even if two cross-linguistic sentence pairs have the same meaning, the sentence representations extracted by the baseline approach may have language-specific biases. Considering the above problems, in this paper, we construct a Chinese–Uyghur cross-lingual sentence similarity dataset and propose a method to compute cross-lingual sentence similarity by fusing multiple features. The method is based on the cross-lingual pretraining model XLM-RoBERTa and assists the model in similarity calculation by introducing two coarse-grained prior knowledge features, i.e., sentence sentiment and length features. At the same time, to eliminate possible language-specific biases in the vectors, we whitened the sentence vectors of different languages to ensure that they were all represented under the standard orthogonal basis. Considering that the combination of different vectors has different effects on the final performance of the model, we introduce different vector features for comparison experiments based on the basic feature splicing method. The results show that the absolute value feature of the difference between two vectors can reflect the similarity of two sentences well. The final F1 value of our method reaches 98.97%, which is 19.81% higher than that of the baseline.
نوع الوثيقة:	article
وصف الملف:	electronic resource
اللغة:	English
تدمد:	2169-3536
Relation:	https://ieeexplore.ieee.org/document/9734036/; https://doaj.org/toc/2169-3536
DOI:	10.1109/ACCESS.2022.3159692
URL الوصول:	https://doaj.org/article/4bfc45ec12a744e3b32a81c5a73191dd
رقم الأكسشن:	edsdoj.4bfc45ec12a744e3b32a81c5a73191dd
قاعدة البيانات:	Directory of Open Access Journals

Full Text Finder

الوصف
تدمد:	21693536
DOI:	10.1109/ACCESS.2022.3159692