تقرير
Transformers, parallel computation, and logarithmic depth
العنوان: | Transformers, parallel computation, and logarithmic depth |
---|---|
المؤلفون: | Sanford, Clayton, Hsu, Daniel, Telgarsky, Matus |
سنة النشر: | 2024 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Machine Learning |
الوصف: | We show that a constant number of self-attention layers can efficiently simulate, and be simulated by, a constant number of communication rounds of Massively Parallel Computation. As a consequence, we show that logarithmic depth is sufficient for transformers to solve basic computational tasks that cannot be efficiently solved by several other neural sequence models and sub-quadratic transformer approximations. We thus establish parallelism as a key distinguishing property of transformers. Comment: 58 pages, 19 figures, code available at https://github.com/chsanford/hop-induction-heads |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2402.09268 |
رقم الأكسشن: | edsarx.2402.09268 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |