USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery

التفاصيل البيبلوغرافية
العنوان: USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery
المؤلفون: Irvin, Jeremy, Tao, Lucas, Zhou, Joanne, Ma, Yuntao, Nashold, Langston, Liu, Benjamin, Ng, Andrew Y.
سنة النشر: 2023
المجموعة: Computer Science
Statistics
مصطلحات موضوعية: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Image and Video Processing, Statistics - Applications
الوصف: Large, self-supervised vision models have led to substantial advancements for automatically interpreting natural images. Recent works have begun tailoring these methods to remote sensing data which has rich structure with multi-sensor, multi-spectral, and temporal information providing massive amounts of self-labeled data that can be used for self-supervised pre-training. In this work, we develop a new encoder architecture called USat that can input multi-spectral data from multiple sensors for self-supervised pre-training. USat is a vision transformer with modified patch projection layers and positional encodings to model spectral bands with varying spatial scales from multiple sensors. We integrate USat into a Masked Autoencoder (MAE) self-supervised pre-training procedure and find that a pre-trained USat outperforms state-of-the-art self-supervised MAE models trained on remote sensing data on multiple remote sensing benchmark datasets (up to 8%) and leads to improvements in low data regimes (up to 7%). Code and pre-trained weights are available at https://github.com/stanfordmlgroup/USat .
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2312.02199
رقم الأكسشن: edsarx.2312.02199
قاعدة البيانات: arXiv