دورية أكاديمية

Image Captioning Model Using Part-of-Speech Guidance Module for Description With Diverse Vocabulary

التفاصيل البيبلوغرافية
العنوان: Image Captioning Model Using Part-of-Speech Guidance Module for Description With Diverse Vocabulary
المؤلفون: Ju-Won Bae, Soo-Hwan Lee, Won-Yeol Kim, Ju-Hyeon Seong, Dong-Hoan Seo
المصدر: IEEE Access, Vol 10, Pp 45219-45229 (2022)
بيانات النشر: IEEE, 2022.
سنة النشر: 2022
المجموعة: LCC:Electrical engineering. Electronics. Nuclear engineering
مصطلحات موضوعية: Deep learning, image captioning, multimodal layer, part of speech, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
الوصف: Image captions aim to generate human-like sentences that describe the image’s content. Recent developments in deep learning (DL) have made it possible to caption images for accurate descriptions and detailed expressions. However, since DL learns the relationship between images and captions, it constructs sentences based on commonly frequented words in the dataset. Although these generated sentences are highly accurate, they have low lexical diversity, unlike humans due to limited vocabulary. Therefore, in this paper, we propose a Part-Of-Speech (POS) guidance module and a multimodal-based image captioning model that determines the intensity of images and word sequences and generates sentences through POS to enhance the lexical diversity of DL. The proposed POS guidance module enables rich expression by controlling the information of images and sequences based on the predicted POS guidance to predict words. Then, the POS multimodal layer adds POS and output vector of Bi-LSTM using the multimodal layer to predict the next caption, considering the grammatical structure. We trained and tested the proposed model on the Flicker 30K and MS COCO datasets and compared them with current state-of-the-art studies. Also, we analyzed the lexical diversity of the caption model through the Type-Token Ratio (TTR) and confirmed that the proposed model generates sentences using several words.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2169-3536
Relation: https://ieeexplore.ieee.org/document/9762317/; https://doaj.org/toc/2169-3536
DOI: 10.1109/ACCESS.2022.3169781
URL الوصول: https://doaj.org/article/2ee00626c77b486aa0830d8d206a9f6a
رقم الأكسشن: edsdoj.2ee00626c77b486aa0830d8d206a9f6a
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:21693536
DOI:10.1109/ACCESS.2022.3169781