Transformers, parallel computation, and logarithmic depth

التفاصيل البيبلوغرافية
العنوان:	Transformers, parallel computation, and logarithmic depth
المؤلفون:	Sanford, Clayton, Hsu, Daniel, Telgarsky, Matus
سنة النشر:	2024
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Machine Learning
الوصف:	We show that a constant number of self-attention layers can efficiently simulate, and be simulated by, a constant number of communication rounds of Massively Parallel Computation. As a consequence, we show that logarithmic depth is sufficient for transformers to solve basic computational tasks that cannot be efficiently solved by several other neural sequence models and sub-quadratic transformer approximations. We thus establish parallelism as a key distinguishing property of transformers. Comment: 58 pages, 19 figures, code available at https://github.com/chsanford/hop-induction-heads
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2402.09268
رقم الأكسشن:	edsarx.2402.09268
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.