CoastTerm: a Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature

التفاصيل البيبلوغرافية
العنوان: CoastTerm: a Corpus for Multidisciplinary Term Extraction in Coastal Scientific Literature
المؤلفون: Delaunay, Julien, Tran, Hanh Thi Hong, González-Gallardo, Carlos-Emiliano, Bordea, Georgeta, Ducos, Mathilde, Sidere, Nicolas, Doucet, Antoine, Pollak, Senja, De Viron, Olivier
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language
الوصف: The growing impact of climate change on coastal areas, particularly active but fragile regions, necessitates collaboration among diverse stakeholders and disciplines to formulate effective environmental protection policies. We introduce a novel specialized corpus comprising 2,491 sentences from 410 scientific abstracts concerning coastal areas, for the Automatic Term Extraction (ATE) and Classification (ATC) tasks. Inspired by the ARDI framework, focused on the identification of Actors, Resources, Dynamics and Interactions, we automatically extract domain terms and their distinct roles in the functioning of coastal systems by leveraging monolingual and multilingual transformer models. The evaluation demonstrates consistent results, achieving an F1 score of approximately 80\% for automated term extraction and F1 of 70\% for extracting terms and their labels. These findings are promising and signify an initial step towards the development of a specialized Knowledge Base dedicated to coastal areas.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2406.09128
رقم الأكسشن: edsarx.2406.09128
قاعدة البيانات: arXiv