Software Mention Recognition with a Three-Stage Framework Based on BERTology Models at SOMD 2024

التفاصيل البيبلوغرافية
العنوان: Software Mention Recognition with a Three-Stage Framework Based on BERTology Models at SOMD 2024
المؤلفون: Thi, Thuy Nguyen, Viet, Anh Nguyen, Van, Thin Dang, Thuy, Ngan Nguyen Luu
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Software Engineering, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
الوصف: This paper describes our systems for the sub-task I in the Software Mention Detection in Scholarly Publications shared-task. We propose three approaches leveraging different pre-trained language models (BERT, SciBERT, and XLM-R) to tackle this challenge. Our bestperforming system addresses the named entity recognition (NER) problem through a three-stage framework. (1) Entity Sentence Classification - classifies sentences containing potential software mentions; (2) Entity Extraction - detects mentions within classified sentences; (3) Entity Type Classification - categorizes detected mentions into specific software types. Experiments on the official dataset demonstrate that our three-stage framework achieves competitive performance, surpassing both other participating teams and our alternative approaches. As a result, our framework based on the XLM-R-based model achieves a weighted F1-score of 67.80%, delivering our team the 3rd rank in Sub-task I for the Software Mention Recognition task.
Comment: Software mention recognition, Named entity recognition, Transformer, Three-stage framework
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2405.01575
رقم الأكسشن: edsarx.2405.01575
قاعدة البيانات: arXiv