تقرير
Class-attention Video Transformer for Engagement Intensity Prediction
العنوان: | Class-attention Video Transformer for Engagement Intensity Prediction |
---|---|
المؤلفون: | Ai, Xusheng, Sheng, Victor S., Li, Chunhua, Cui, Zhiming |
سنة النشر: | 2022 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning |
الوصف: | In order to deal with variant-length long videos, prior works extract multi-modal features and fuse them to predict students' engagement intensity. In this paper, we present a new end-to-end method Class Attention in Video Transformer (CavT), which involves a single vector to process class embedding and to uniformly perform end-to-end learning on variant-length long videos and fixed-length short videos. Furthermore, to address the lack of sufficient samples, we propose a binary-order representatives sampling method (BorS) to add multiple video sequences of each video to augment the training set. BorS+CavT not only achieves the state-of-the-art MSE (0.0495) on the EmotiW-EP dataset, but also obtains the state-of-the-art MSE (0.0377) on the DAiSEE dataset. The code and models have been made publicly available at https://github.com/mountainai/cavt. Comment: 5 figures |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2208.07216 |
رقم الأكسشن: | edsarx.2208.07216 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |