دورية أكاديمية

Stochasticity, Nonlinear Value Functions, and Update Rules in Learning Aesthetic Biases

التفاصيل البيبلوغرافية
العنوان: Stochasticity, Nonlinear Value Functions, and Update Rules in Learning Aesthetic Biases
المؤلفون: Norberto M. Grzywacz
المصدر: Frontiers in Human Neuroscience, Vol 15 (2021)
بيانات النشر: Frontiers Media S.A., 2021.
سنة النشر: 2021
المجموعة: LCC:Neurosciences. Biological psychiatry. Neuropsychiatry
مصطلحات موضوعية: reinforcement learning, aesthetic value, value function, delta rule, regret minimization, stochastic dynamics, Neurosciences. Biological psychiatry. Neuropsychiatry, RC321-571
الوصف: A theoretical framework for the reinforcement learning of aesthetic biases was recently proposed based on brain circuitries revealed by neuroimaging. A model grounded on that framework accounted for interesting features of human aesthetic biases. These features included individuality, cultural predispositions, stochastic dynamics of learning and aesthetic biases, and the peak-shift effect. However, despite the success in explaining these features, a potential weakness was the linearity of the value function used to predict reward. This linearity meant that the learning process employed a value function that assumed a linear relationship between reward and sensory stimuli. Linearity is common in reinforcement learning in neuroscience. However, linearity can be problematic because neural mechanisms and the dependence of reward on sensory stimuli were typically nonlinear. Here, we analyze the learning performance with models including optimal nonlinear value functions. We also compare updating the free parameters of the value functions with the delta rule, which neuroscience models use frequently, vs. updating with a new Phi rule that considers the structure of the nonlinearities. Our computer simulations showed that optimal nonlinear value functions resulted in improvements of learning errors when the reward models were nonlinear. Similarly, the new Phi rule led to improvements in these errors. These improvements were accompanied by the straightening of the trajectories of the vector of free parameters in its phase space. This straightening meant that the process became more efficient in learning the prediction of reward. Surprisingly, however, this improved efficiency had a complex relationship with the rate of learning. Finally, the stochasticity arising from the probabilistic sampling of sensory stimuli, rewards, and motivations helped the learning process narrow the range of free parameters to nearly optimal outcomes. Therefore, we suggest that value functions and update rules optimized for social and ecological constraints are ideal for learning aesthetic biases.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 1662-5161
Relation: https://www.frontiersin.org/articles/10.3389/fnhum.2021.639081/full; https://doaj.org/toc/1662-5161
DOI: 10.3389/fnhum.2021.639081
URL الوصول: https://doaj.org/article/b2edfe9edf6b405ca240067ac1296682
رقم الأكسشن: edsdoj.b2edfe9edf6b405ca240067ac1296682
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:16625161
DOI:10.3389/fnhum.2021.639081