Emotion-Aware Multimodal Fusion for Meme Emotion Detection

التفاصيل البيبلوغرافية
العنوان: Emotion-Aware Multimodal Fusion for Meme Emotion Detection
المؤلفون: Sharma, Shivam, S, Ramaneswaran, Akhtar, Md. Shad, Chakraborty, Tanmoy
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computers and Society
الوصف: The ever-evolving social media discourse has witnessed an overwhelming use of memes to express opinions or dissent. Besides being misused for spreading malcontent, they are mined by corporations and political parties to glean the public's opinion. Therefore, memes predominantly offer affect-enriched insights towards ascertaining the societal psyche. However, the current approaches are yet to model the affective dimensions expressed in memes effectively. They rely extensively on large multimodal datasets for pre-training and do not generalize well due to constrained visual-linguistic grounding. In this paper, we introduce MOOD (Meme emOtiOns Dataset), which embodies six basic emotions. We then present ALFRED (emotion-Aware muLtimodal Fusion foR Emotion Detection), a novel multimodal neural framework that (i) explicitly models emotion-enriched visual cues, and (ii) employs an efficient cross-modal fusion via a gating mechanism. Our investigation establishes ALFRED's superiority over existing baselines by 4.94% F1. Additionally, ALFRED competes strongly with previous best approaches on the challenging Memotion task. We then discuss ALFRED's domain-agnostic generalizability by demonstrating its dominance on two recently-released datasets - HarMeme and Dank Memes, over other baselines. Further, we analyze ALFRED's interpretability using attention maps. Finally, we highlight the inherent challenges posed by the complex interplay of disparate modality-specific cues toward meme analysis.
Comment: Accepted to IEEE Transactions on Affective Computing
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2403.10279
رقم الأكسشن: edsarx.2403.10279
قاعدة البيانات: arXiv