State Space Models as Foundation Models: A Control Theoretic Overview

التفاصيل البيبلوغرافية
العنوان: State Space Models as Foundation Models: A Control Theoretic Overview
المؤلفون: Alonso, Carmen Amo, Sieber, Jerome, Zeilinger, Melanie N.
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Electrical Engineering and Systems Science - Systems and Control, Computer Science - Computation and Language, Computer Science - Machine Learning
الوصف: In recent years, there has been a growing interest in integrating linear state-space models (SSM) in deep neural network architectures of foundation models. This is exemplified by the recent success of Mamba, showing better performance than the state-of-the-art Transformer architectures in language tasks. Foundation models, like e.g. GPT-4, aim to encode sequential data into a latent space in order to learn a compressed representation of the data. The same goal has been pursued by control theorists using SSMs to efficiently model dynamical systems. Therefore, SSMs can be naturally connected to deep sequence modeling, offering the opportunity to create synergies between the corresponding research areas. This paper is intended as a gentle introduction to SSM-based architectures for control theorists and summarizes the latest research developments. It provides a systematic review of the most successful SSM proposals and highlights their main features from a control theoretic perspective. Additionally, we present a comparative analysis of these models, evaluating their performance on a standardized benchmark designed for assessing a model's efficiency at learning long sequences.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2403.16899
رقم الأكسشن: edsarx.2403.16899
قاعدة البيانات: arXiv