USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery

التفاصيل البيبلوغرافية
العنوان:	USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery
المؤلفون:	Irvin, Jeremy, Tao, Lucas, Zhou, Joanne, Ma, Yuntao, Nashold, Langston, Liu, Benjamin, Ng, Andrew Y.
سنة النشر:	2023
المجموعة:	Computer Science Statistics
مصطلحات موضوعية:	Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Image and Video Processing, Statistics - Applications
الوصف:	Large, self-supervised vision models have led to substantial advancements for automatically interpreting natural images. Recent works have begun tailoring these methods to remote sensing data which has rich structure with multi-sensor, multi-spectral, and temporal information providing massive amounts of self-labeled data that can be used for self-supervised pre-training. In this work, we develop a new encoder architecture called USat that can input multi-spectral data from multiple sensors for self-supervised pre-training. USat is a vision transformer with modified patch projection layers and positional encodings to model spectral bands with varying spatial scales from multiple sensors. We integrate USat into a Masked Autoencoder (MAE) self-supervised pre-training procedure and find that a pre-trained USat outperforms state-of-the-art self-supervised MAE models trained on remote sensing data on multiple remote sensing benchmark datasets (up to 8%) and leads to improvements in low data regimes (up to 7%). Code and pre-trained weights are available at https://github.com/stanfordmlgroup/USat .
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2312.02199
رقم الأكسشن:	edsarx.2312.02199
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.