Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit

التفاصيل البيبلوغرافية
العنوان:	Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit
المؤلفون:	Yang, Greg, Littwin, Etai
سنة النشر:	2023
المجموعة:	Computer Science Mathematics Condensed Matter
مصطلحات موضوعية:	Computer Science - Machine Learning, Condensed Matter - Disordered Systems and Neural Networks, Computer Science - Neural and Evolutionary Computing, Mathematics - Probability
الوصف:	Going beyond stochastic gradient descent (SGD), what new phenomena emerge in wide neural networks trained by adaptive optimizers like Adam? Here we show: The same dichotomy between feature learning and kernel behaviors (as in SGD) holds for general optimizers as well, including Adam -- albeit with a nonlinear notion of "kernel." We derive the corresponding "neural tangent" and "maximal update" limits for any architecture. Two foundational advances underlie the above results: 1) A new Tensor Program language, NEXORT, that can express how adaptive optimizers process gradients into updates. 2) The introduction of bra-ket notation to drastically simplify expressions and calculations in Tensor Programs. This work summarizes and generalizes all previous results in the Tensor Programs series of papers. Comment: This is the complete version of "Adaptive Optimization in the Infinite-Width Limit" in ICLR 2023, https://openreview.net/forum?id=zgVDqw9ZUES
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2308.01814
رقم الأكسشن:	edsarx.2308.01814
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.