دورية أكاديمية

SSDT: Scale-Separation Semantic Decoupled Transformer for Semantic Segmentation of Remote Sensing Images

التفاصيل البيبلوغرافية
العنوان: SSDT: Scale-Separation Semantic Decoupled Transformer for Semantic Segmentation of Remote Sensing Images
المؤلفون: Chengyu Zheng, Yanru Jiang, Xiaowei Lv, Jie Nie, Xinyue Liang, Zhiqiang Wei
المصدر: IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol 17, Pp 9037-9052 (2024)
بيانات النشر: IEEE, 2024.
سنة النشر: 2024
المجموعة: LCC:Ocean engineering
LCC:Geophysics. Cosmic physics
مصطلحات موضوعية: Geophysical image processing, geoscience and remote sensing, semantic segmentation, Ocean engineering, TC1501-1800, Geophysics. Cosmic physics, QC801-809
الوصف: As we all know, semantic segmentation of remote sensing (RS) images is to classify the images pixel by pixel to realize the semantic decoupling of the images. Most traditional semantic decoupling methods only decouple and do not perform scale-separation operations, which leads to serious problems. In the semantic decoupling process, if the feature extractor is too large, it will ignore the small-scale targets; if the feature extractor is too small, it will lead to the separation of large-scale target objects and reduce the segmentation accuracy. To address this concern, we propose a scale-separated semantic decoupled transformer (SSDT), which first performs scale-separation in the semantic decoupling process and uses the obtained scale information-rich semantic features to guide the Transformer to extract features. The network consists of five modules, scale-separated patch extraction (SPE), semantic decoupled transformer (SDT), scale-separated feature extraction (SFE), semantic decoupling (SD), and multiview feature fusion decoder (MFFD). In particular, SPE turns the original image into a linear embedding sequence of three scales; SD divides pixels into different semantic clusters by K-means, and further obtains scale information-rich semantic features; SDT improves the intraclass compactness and interclass looseness by calculating the similarity between semantic features and image features, the core of which is decoupled attention. Finally, MFFD is proposed to fuse salient features from different perspectives to further enhance the feature representation. Our experiments on two large-scale fine-resolution RS image datasets (Vaihingen and Potsdam) demonstrate the effectiveness of the proposed SSDT strategy in RS image semantic segmentation tasks.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 1939-1404
2151-1535
Relation: https://ieeexplore.ieee.org/document/10495748/; https://doaj.org/toc/1939-1404; https://doaj.org/toc/2151-1535
DOI: 10.1109/JSTARS.2024.3383066
URL الوصول: https://doaj.org/article/122538a3592249e8830f94f1b349a92f
رقم الأكسشن: edsdoj.122538a3592249e8830f94f1b349a92f
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:19391404
21511535
DOI:10.1109/JSTARS.2024.3383066