تقرير
HyperMixer: An MLP-based Low Cost Alternative to Transformers
العنوان: | HyperMixer: An MLP-based Low Cost Alternative to Transformers |
---|---|
المؤلفون: | Mai, Florian, Pannatier, Arnaud, Fehr, Fabio, Chen, Haolin, Marelli, Francois, Fleuret, Francois, Henderson, James |
المصدر: | Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023 |
سنة النشر: | 2022 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Science - Machine Learning |
الوصف: | Transformer-based architectures are the model of choice for natural language understanding, but they come at a significant cost, as they have quadratic complexity in the input length, require a lot of training data, and can be difficult to tune. In the pursuit of lower costs, we investigate simple MLP-based architectures. We find that existing architectures such as MLPMixer, which achieves token mixing through a static MLP applied to each feature independently, are too detached from the inductive biases required for natural language understanding. In this paper, we propose a simple variant, HyperMixer, which forms the token mixing MLP dynamically using hypernetworks. Empirically, we demonstrate that our model performs better than alternative MLP-based models, and on par with Transformers. In contrast to Transformers, HyperMixer achieves these results at substantially lower costs in terms of processing time, training data, and hyperparameter tuning. Comment: Published at ACL 2023 |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2203.03691 |
رقم الأكسشن: | edsarx.2203.03691 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |