Unimodal Multi-Task Fusion for Emotional Mimicry Intensity Prediction

التفاصيل البيبلوغرافية
العنوان: Unimodal Multi-Task Fusion for Emotional Mimicry Intensity Prediction
المؤلفون: Hallmen, Tobias, Deuser, Fabian, Oswald, Norbert, André, Elisabeth
المصدر: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 4657-4665
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
الوصف: In this research, we introduce a novel methodology for assessing Emotional Mimicry Intensity (EMI) as part of the 6th Workshop and Competition on Affective Behavior Analysis in-the-wild. Our methodology utilises the Wav2Vec 2.0 architecture, which has been pre-trained on an extensive podcast dataset, to capture a wide array of audio features that include both linguistic and paralinguistic components. We refine our feature extraction process by employing a fusion technique that combines individual features with a global mean vector, thereby embedding a broader contextual understanding into our analysis. A key aspect of our approach is the multi-task fusion strategy that not only leverages these features but also incorporates a pre-trained Valence-Arousal-Dominance (VAD) model. This integration is designed to refine emotion intensity prediction by concurrently processing multiple emotional dimensions, thereby embedding a richer contextual understanding into our framework. For the temporal analysis of audio data, our feature fusion process utilises a Long Short-Term Memory (LSTM) network. This approach, which relies solely on the provided audio data, shows marked advancements over the existing baseline, offering a more comprehensive understanding of emotional mimicry in naturalistic settings, achieving the second place in the EMI challenge.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2403.11879
رقم الأكسشن: edsarx.2403.11879
قاعدة البيانات: arXiv