Benchmarking a Large Twitter Dataset for Arabic Emotion Analysis

التفاصيل البيبلوغرافية
العنوان: Benchmarking a Large Twitter Dataset for Arabic Emotion Analysis
المؤلفون: Ahmed El-Sayed Mahmoud, Shaimaa Lazem, Mohamed Abougabal
بيانات النشر: Research Square Platform LLC, 2022.
سنة النشر: 2022
الوصف: Emotion analysis of social media content, e.g., Twitter, is a useful tool forunderstanding people’s reactions during critical events such as Covid-19pandemic. Arabic annotated emotion datasets are nonetheless scarce, whichconsequently affects the accuracy of emotion detection applications, and limittheir usefulness. During the first year of the pandemic, a large Arabic EgyptianCOVID-19 Twitter Dataset (ArECTD) was collected and annotated employing amix of manual annotation and semi-supervised self-learning technique. ArECTDis compromised of approximately 78K tweets and ten emotion labels making itone of the largest available Arabic emotion datasets. The mixed annotationapproach is particularly promising for handling the demanding annotation taskand potentially growing the number of Arabic emotion datasets. The paper aimsto examine the quality of the annotation of ArECTD by measuring the lexicalcorrelates of each emotion. Moreover, content-based analysis was conducted as apractical case study to identify correlations between the detected emotions andthe Egyptian government’s decisions during the pandemic. Furthermore,classification models are developed for ArECTD by fine-tuning thestate-of-the-art Arabic deep learning transformer models, AraBERT andMARBERT, achieving accuracies of 70.01% of 72.5%, respectively. Thegeneralization of the best ArECTD classification model to two other datasetsfrom different domains was studied using transfer learning.
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_________::ce4cc3da6a523c17127e6aeab07b9cf9
https://doi.org/10.21203/rs.3.rs-2005495/v1
حقوق: OPEN
رقم الأكسشن: edsair.doi...........ce4cc3da6a523c17127e6aeab07b9cf9
قاعدة البيانات: OpenAIRE