دورية أكاديمية

Multi-scale and attention enhanced graph convolution network for skeleton-based violence action recognition

التفاصيل البيبلوغرافية
العنوان: Multi-scale and attention enhanced graph convolution network for skeleton-based violence action recognition
المؤلفون: Huaigang Yang, Ziliang Ren, Huaqiang Yuan, Wenhong Wei, Qieshi Zhang, Zhaolong Zhang
المصدر: Frontiers in Neurorobotics, Vol 16 (2022)
بيانات النشر: Frontiers Media S.A., 2022.
سنة النشر: 2022
المجموعة: LCC:Neurosciences. Biological psychiatry. Neuropsychiatry
مصطلحات موضوعية: violence action recognition, skeleton sequence, multi-scale graph convolution network, attention mechanism, spatiotemporal information, Neurosciences. Biological psychiatry. Neuropsychiatry, RC321-571
الوصف: Graph convolution networks (GCNs) have been widely used in the field of skeleton-based human action recognition. However, it is still difficult to improve recognition performance and reduce parameter complexity. In this paper, a novel multi-scale attention spatiotemporal GCN (MSA-STGCN) is proposed for human violence action recognition by learning spatiotemporal features from four different skeleton modality variants. Firstly, the original joint data are preprocessed to obtain joint position, bone vector, joint motion and bone motion datas as inputs of recognition framework. Then, a spatial multi-scale graph convolution network based on the attention mechanism is constructed to obtain the spatial features from joint nodes, while a temporal graph convolution network in the form of hybrid dilation convolution is designed to enlarge the receptive field of the feature map and capture multi-scale context information. Finally, the specific relationship in the different skeleton data is explored by fusing the information of multi-stream related to human joints and bones. To evaluate the performance of the proposed MSA-STGCN, a skeleton violence action dataset: Filtered NTU RGB+D was constructed based on NTU RGB+D120. We conducted experiments on constructed Filtered NTU RGB+D and Kinetics Skeleton 400 datasets to verify the performance of the proposed recognition framework. The proposed method achieves an accuracy of 95.3% on the Filtered NTU RGB+D with the parameters 1.21M, and an accuracy of 36.2% (Top-1) and 58.5% (Top-5) on the Kinetics Skeleton 400, respectively. The experimental results on these two skeleton datasets show that the proposed recognition framework can effectively recognize violence actions without adding parameters.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 1662-5218
Relation: https://www.frontiersin.org/articles/10.3389/fnbot.2022.1091361/full; https://doaj.org/toc/1662-5218
DOI: 10.3389/fnbot.2022.1091361
URL الوصول: https://doaj.org/article/d740b19db6f14c82a743d96e6cb7c5e5
رقم الأكسشن: edsdoj.740b19db6f14c82a743d96e6cb7c5e5
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:16625218
DOI:10.3389/fnbot.2022.1091361