تقرير
Motion and Context-Aware Audio-Visual Conditioned Video Prediction
العنوان: | Motion and Context-Aware Audio-Visual Conditioned Video Prediction |
---|---|
المؤلفون: | Xu, Yating, Hu, Conghui, Lee, Gim Hee |
سنة النشر: | 2022 |
المجموعة: | Computer Science |
مصطلحات موضوعية: | Computer Science - Computer Vision and Pattern Recognition |
الوصف: | The existing state-of-the-art method for audio-visual conditioned video prediction uses the latent codes of the audio-visual frames from a multimodal stochastic network and a frame encoder to predict the next visual frame. However, a direct inference of per-pixel intensity for the next visual frame is extremely challenging because of the high-dimensional image space. To this end, we decouple the audio-visual conditioned video prediction into motion and appearance modeling. The multimodal motion estimation predicts future optical flow based on the audio-motion correlation. The visual branch recalls from the motion memory built from the audio features to enable better long term prediction. We further propose context-aware refinement to address the diminishing of the global appearance context in the long-term continuous warping. The global appearance context is extracted by the context encoder and manipulated by motion-conditioned affine transformation before fusion with features of warped frames. Experimental results show that our method achieves competitive results on existing benchmarks. Comment: BMVC 2023 |
نوع الوثيقة: | Working Paper |
URL الوصول: | http://arxiv.org/abs/2212.04679 |
رقم الأكسشن: | edsarx.2212.04679 |
قاعدة البيانات: | arXiv |
الوصف غير متاح. |