Granularity-aware and Semantic Aggregation Based Image-Text Retrieval Network

التفاصيل البيبلوغرافية
العنوان:	Granularity-aware and Semantic Aggregation Based Image-Text Retrieval Network
المؤلفون:	MIAO Lan-xin, LEI Yu, ZENG Peng-peng, LI Xiao-yu, SONG Jing-kuan
المصدر:	Jisuanji kexue, Vol 49, Iss 11, Pp 134-140 (2022)
بيانات النشر:	Editorial office of Computer Science, 2022.
سنة النشر:	2022
المجموعة:	LCC:Computer software LCC:Technology (General)
مصطلحات موضوعية:	image-text matching, cross-model retrieval, feature extraction, semantic aggregation, multi-granularity information extraction, Computer software, QA76.75-76.765, Technology (General), T1-995
الوصف:	Image-text retrieval is a basic task in visual-language domain,which aims at mining the relationships between different modalities.However,most existing approaches rely heavily on associating specific regions of an image with each word in a sentence with similar semantics and underappreciate the significance of multi-granular information in images,resulting in irrelevant matches between the two modalities and semantically ambiguous embedding.Generally,an image contains object-level,action-le-vel,relationship-level or even scene-level information that is not explicitly labeled.Therefore,it is challenging to align complex visual information with ambiguous descriptions.To tackle this issue,this paper proposes a granularity aware and semantic aggregating(GASA) network to obtain multi-visual representations and narrow the cross-modal gap.Specifically,the granularity-aware feature selection module selects copious multi-granularity information of images and conducts a multi-scale fusion,guided by an adaptive gated fusion mechanism and a pyramid structure.The semantic aggregation module clusters the multi-granularity information from visual and textual clues in a shared space to obtain the residual representations.Experiments are conducted on two benchmark datasets,and the results show our model outperforms the state-of-the-arts by over 2% on R@1 of MSCOCO 1k.Besides,our model outperforms the state-of-the-art by 4.1% in terms of Flickr30k on R@Sum.
نوع الوثيقة:	article
وصف الملف:	electronic resource
اللغة:	Chinese
تدمد:	1002-137X
Relation:	https://www.jsjkx.com/fileup/1002-137X/PDF/1002-137X-2022-49-11-134.pdf; https://doaj.org/toc/1002-137X
DOI:	10.11896/jsjkx.220600010
URL الوصول:	https://doaj.org/article/46e2dda91c5c433cbb2557024e4060ab
رقم الأكسشن:	edsdoj.46e2dda91c5c433cbb2557024e4060ab
قاعدة البيانات:	Directory of Open Access Journals

Full Text Finder

الوصف
تدمد:	1002137X
DOI:	10.11896/jsjkx.220600010