Visio-Linguistic Brain Encoding

التفاصيل البيبلوغرافية
العنوان: Visio-Linguistic Brain Encoding
المؤلفون: Reddy Oota, Subba, Arora, Jashn, Rowtula, Vijay, Gupta, Manish, Bapi, Raju S.
المساهمون: Laboratoire Bordelais de Recherche en Informatique (LaBRI), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS), Mnemonic Synergy (Mnemosyne), Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Université de Bordeaux (UB)-École Nationale Supérieure d'Électronique, Informatique et Radiocommunications de Bordeaux (ENSEIRB)-Centre National de la Recherche Scientifique (CNRS)-Inria Bordeaux - Sud-Ouest, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Institut des Maladies Neurodégénératives [Bordeaux] (IMN), Université de Bordeaux (UB)-Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS), International Institute of Information Technology, Hyderabad [Hyderabad] (IIIT-H), Microsoft Research (MSR)
المصدر: COLING 2022-the 29th International Conference on Computational Linguistics
COLING 2022-the 29th International Conference on Computational Linguistics, Oct 2022, Gyeongju, South Korea. pp.116-133
سنة النشر: 2022
مصطلحات موضوعية: FOS: Computer and information sciences, Computer Science - Machine Learning, Computer Science - Computation and Language, Computer Science - Artificial Intelligence, Computer Vision and Pattern Recognition (cs.CV), Computer Science - Computer Vision and Pattern Recognition, [INFO.INFO-CV]Computer Science [cs]/Computer Vision and Pattern Recognition [cs.CV], [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL], Machine Learning (cs.LG), Artificial Intelligence (cs.AI), FOS: Biological sciences, Quantitative Biology - Neurons and Cognition, Neurons and Cognition (q-bio.NC), Computation and Language (cs.CL)
الوصف: Enabling effective brain-computer interfaces requires understanding how the human brain encodes stimuli across modalities such as visual, language (or text), etc. Brain encoding aims at constructing fMRI brain activity given a stimulus. There exists a plethora of neural encoding models which study brain encoding for single mode stimuli: visual (pretrained CNNs) or text (pretrained language models). Few recent papers have also obtained separate visual and text representation models and performed late-fusion using simple heuristics. However, previous work has failed to explore: (a) the effectiveness of image Transformer models for encoding visual stimuli, and (b) co-attentive multi-modal modeling for visual and text reasoning. In this paper, we systematically explore the efficacy of image Transformers (ViT, DEiT, and BEiT) and multi-modal Transformers (VisualBERT, LXMERT, and CLIP) for brain encoding. Extensive experiments on two popular datasets, BOLD5000 and Pereira, provide the following insights. (1) To the best of our knowledge, we are the first to investigate the effectiveness of image and multi-modal Transformers for brain encoding. (2) We find that VisualBERT, a multi-modal Transformer, significantly outperforms previously proposed single-mode CNNs, image Transformers as well as other previously proposed multi-modal models, thereby establishing new state-of-the-art. The supremacy of visio-linguistic models raises the question of whether the responses elicited in the visual regions are affected implicitly by linguistic processing even when passively viewing images. Future fMRI tasks can verify this computational insight in an appropriate experimental setting.
18 pages, 13 figures
اللغة: English
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ac1fc344ffd31ab6560a7c444a0de007
http://arxiv.org/abs/2204.08261
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....ac1fc344ffd31ab6560a7c444a0de007
قاعدة البيانات: OpenAIRE