Activation function design for deep networks: linearity and effective initialisation

التفاصيل البيبلوغرافية
العنوان: Activation function design for deep networks: linearity and effective initialisation
المؤلفون: Murray, Michael, Abrol, Vinayak, Tanner, Jared
سنة النشر: 2021
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Machine Learning, 68T07, I.2.6
الوصف: The activation function deployed in a deep neural network has great influence on the performance of the network at initialisation, which in turn has implications for training. In this paper we study how to avoid two problems at initialisation identified in prior works: rapid convergence of pairwise input correlations, and vanishing and exploding gradients. We prove that both these problems can be avoided by choosing an activation function possessing a sufficiently large linear region around the origin, relative to the bias variance $\sigma_b^2$ of the network's random initialisation. We demonstrate empirically that using such activation functions leads to tangible benefits in practice, both in terms test and training accuracy as well as training time. Furthermore, we observe that the shape of the nonlinear activation outside the linear region appears to have a relatively limited impact on training. Finally, our results also allow us to train networks in a new hyperparameter regime, with a much larger bias variance than has previously been possible.
Comment: 33 pages, 10 figures, paper code and scripts are hosted at https://github.com/Cross-Caps/AFLI
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2105.07741
رقم الأكسشن: edsarx.2105.07741
قاعدة البيانات: arXiv