Indexing Portuguese NLP Resources with PT-Pump-Up

التفاصيل البيبلوغرافية
العنوان: Indexing Portuguese NLP Resources with PT-Pump-Up
المؤلفون: Almeida, Rúben, Campos, Ricardo, Jorge, Alípio, Nunes, Sérgio
المصدر: PROPOR 2024
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Information Retrieval, 68P20, I.7.1
الوصف: The recent advances in natural language processing (NLP) are linked to training processes that require vast amounts of corpora. Access to this data is commonly not a trivial process due to resource dispersion and the need to maintain these infrastructures online and up-to-date. New developments in NLP are often compromised due to the scarcity of data or lack of a shared repository that works as an entry point to the community. This is especially true in low and mid-resource languages, such as Portuguese, which lack data and proper resource management infrastructures. In this work, we propose PT-Pump-Up, a set of tools that aim to reduce resource dispersion and improve the accessibility to Portuguese NLP resources. Our proposal is divided into four software components: a) a web platform to list the available resources; b) a client-side Python package to simplify the loading of Portuguese NLP resources; c) an administrative Python package to manage the platform and d) a public GitHub repository to foster future collaboration and contributions. All four components are accessible using: https://linktr.ee/pt_pump_up
Comment: Demo Track, 3 pages
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2401.15400
رقم الأكسشن: edsarx.2401.15400
قاعدة البيانات: arXiv