Describing emotions with acoustic property prompts for speech emotion recognition

التفاصيل البيبلوغرافية
العنوان: Describing emotions with acoustic property prompts for speech emotion recognition
المؤلفون: Dhamyal, Hira, Elizalde, Benjamin, Deshmukh, Soham, Wang, Huaming, Raj, Bhiksha, Singh, Rita
سنة النشر: 2022
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Sound, Computer Science - Machine Learning, Electrical Engineering and Systems Science - Audio and Speech Processing
الوصف: Emotions lie on a broad continuum and treating emotions as a discrete number of classes limits the ability of a model to capture the nuances in the continuum. The challenge is how to describe the nuances of emotions and how to enable a model to learn the descriptions. In this work, we devise a method to automatically create a description (or prompt) for a given audio by computing acoustic properties, such as pitch, loudness, speech rate, and articulation rate. We pair a prompt with its corresponding audio using 5 different emotion datasets. We trained a neural network model using these audio-text pairs. Then, we evaluate the model using one more dataset. We investigate how the model can learn to associate the audio with the descriptions, resulting in performance improvement of Speech Emotion Recognition and Speech Audio Retrieval. We expect our findings to motivate research describing the broad continuum of emotion
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2211.07737
رقم الأكسشن: edsarx.2211.07737
قاعدة البيانات: arXiv