A Mathematical Theory of Attention

التفاصيل البيبلوغرافية
العنوان: A Mathematical Theory of Attention
المؤلفون: Vuckovic, James, Baratin, Aristide, Combes, Remi Tachet des
سنة النشر: 2020
المجموعة: Computer Science
Statistics
مصطلحات موضوعية: Statistics - Machine Learning, Computer Science - Machine Learning
الوصف: Attention is a powerful component of modern neural networks across a wide variety of domains. However, despite its ubiquity in machine learning, there is a gap in our understanding of attention from a theoretical point of view. We propose a framework to fill this gap by building a mathematically equivalent model of attention using measure theory. With this model, we are able to interpret self-attention as a system of self-interacting particles, we shed light on self-attention from a maximum entropy perspective, and we show that attention is actually Lipschitz-continuous (with an appropriate metric) under suitable assumptions. We then apply these insights to the problem of mis-specified input data; infinitely-deep, weight-sharing self-attention networks; and more general Lipschitz estimates for a specific type of attention studied in concurrent work.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2007.02876
رقم الأكسشن: edsarx.2007.02876
قاعدة البيانات: arXiv