Accounting for Variations in Speech Emotion Recognition with Nonparametric Hierarchical Neural Network

التفاصيل البيبلوغرافية
العنوان: Accounting for Variations in Speech Emotion Recognition with Nonparametric Hierarchical Neural Network
المؤلفون: Ying, Lance, Romana, Amrit, Provost, Emily Mower
سنة النشر: 2021
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Machine Learning, Computer Science - Artificial Intelligence, Computer Science - Human-Computer Interaction
الوصف: In recent years, deep-learning-based speech emotion recognition models have outperformed classical machine learning models. Previously, neural network designs, such as Multitask Learning, have accounted for variations in emotional expressions due to demographic and contextual factors. However, existing models face a few constraints: 1) they rely on a clear definition of domains (e.g. gender, noise condition, etc.) and the availability of domain labels; 2) they often attempt to learn domain-invariant features while emotion expressions can be domain-specific. In the present study, we propose the Nonparametric Hierarchical Neural Network (NHNN), a lightweight hierarchical neural network model based on Bayesian nonparametric clustering. In comparison to Multitask Learning approaches, the proposed model does not require domain/task labels. In our experiments, the NHNN models generally outperform the models with similar levels of complexity and state-of-the-art models in within-corpus and cross-corpus tests. Through clustering analysis, we show that the NHNN models are able to learn group-specific features and bridge the performance gap between groups.
Comment: 9 pages, manuscript under peer review
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2109.04316
رقم الأكسشن: edsarx.2109.04316
قاعدة البيانات: arXiv