دورية أكاديمية

Simple Conditional Spatial Query Mask Deformable Detection Transformer: A Detection Approach for Multi-Style Strokes of Chinese Characters

التفاصيل البيبلوغرافية
العنوان: Simple Conditional Spatial Query Mask Deformable Detection Transformer: A Detection Approach for Multi-Style Strokes of Chinese Characters
المؤلفون: Tian Zhou, Wu Xie, Huimin Zhang, Yong Fan
المصدر: Sensors, Vol 24, Iss 3, p 931 (2024)
بيانات النشر: MDPI AG, 2024.
سنة النشر: 2024
المجموعة: LCC:Chemical technology
مصطلحات موضوعية: object detection, Chinese character stroke, transformer, deformable DETR, SCSQ-MDD, Chemical technology, TP1-1185
الوصف: In the Chinese character writing task performed by robotic arms, the stroke category and position information should be extracted through object detection. Detection algorithms based on predefined anchor frames have difficulty resolving the differences among the many different styles of Chinese character strokes. Deformable detection transformer (deformable DETR) algorithms without predefined anchor frames result in some invalid sampling points with no contribution to the feature update of the current reference point due to the random sampling of sampling points in the deformable attention module. These processes cause a reduction in the speed of the vector learning stroke features in the detection head. In view of this problem, a new detection method for multi-style strokes of Chinese characters, called the simple conditional spatial query mask deformable DETR (SCSQ-MDD), is proposed in this paper. Firstly, a mask prediction layer is jointly determined using the shallow feature map of the Chinese character image and the query vector of the transformer encoder, which is used to filter the points with actual contributions and resample the points without contributions to address the randomness of the correlation calculation among the reference points. Secondly, by separating the content query and spatial query of the transformer decoder, the dependence of the prediction task on the content embedding is relaxed. Finally, the detection model without predefined anchor frames based on the SCSQ-MDD is constructed. Experiments are conducted using a multi-style Chinese character stroke dataset to evaluate the performance of the SCSQ-MDD. The mean average precision (mAP) value is improved by 3.8% and the mean average recall (mAR) value is improved by 1.1% compared with the deformable DETR in the testing stage, illustrating the effectiveness of the proposed method.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 1424-8220
Relation: https://www.mdpi.com/1424-8220/24/3/931; https://doaj.org/toc/1424-8220
DOI: 10.3390/s24030931
URL الوصول: https://doaj.org/article/25878f4827464ce9853ca3670f397fa6
رقم الأكسشن: edsdoj.25878f4827464ce9853ca3670f397fa6
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:14248220
DOI:10.3390/s24030931