دورية أكاديمية

An Analytical Analysis of Text Stemming Methodologies in Information Retrieval and Natural Language Processing Systems

التفاصيل البيبلوغرافية
العنوان: An Analytical Analysis of Text Stemming Methodologies in Information Retrieval and Natural Language Processing Systems
المؤلفون: Abdul Jabbar, Sajid Iqbal, Manzoor Ilahi Tamimy, Amjad Rehman, Saeed Ali Bahaj, Tanzila Saba
المصدر: IEEE Access, Vol 11, Pp 133681-133702 (2023)
بيانات النشر: IEEE, 2023.
سنة النشر: 2023
المجموعة: LCC:Electrical engineering. Electronics. Nuclear engineering
مصطلحات موضوعية: Text stemming, information retrieval (IR) systems, text classification, stemmer evaluation, technological development, natural language processing (NLP), Electrical engineering. Electronics. Nuclear engineering, TK1-9971
الوصف: The exponential increase in textual unstructured digital data creates significant demand for advanced and smart stemming systems. As a preprocessing stage, stemming is applied in various research fields such as information retrieval (IR), domain vocabulary analysis, and feature reduction in many natural language processing (NLP). Text stemming (TS), an important step, can significantly improve performance in such systems. Text-stemming methods developed till now could be better in their results and can produce errors of different types leading to degraded performance of the applications in which these are used. This work presents a systematic study with an in-depth review of selected stemming works published from 1968 to 2023. The work presents a multidimensional review of studied stemming algorithms i.e., methodology, data source, performance, and evaluation methods. For this study, we have chosen different stemmers, which can be categorized as 1) linguistic knowledge-based, 2) statistical, 3) corpus-based, 4) context-sensitive, and 5) hybrid stemmers. The study shows that linguistic knowledge-based stemming techniques were widely used for highly inflected languages (such as Arabic, Hindi, and Urdu) and have reported higher accuracy than other techniques. We compare and analyze the performance of various state-of-the-art TS approaches, including their issues and challenges, which are summarized as research gaps. This work also analyzes different NLP applications utilizing stemming methods. At the end, we list the future work directions for interested researchers.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 2169-3536
Relation: https://ieeexplore.ieee.org/document/10318122/; https://doaj.org/toc/2169-3536
DOI: 10.1109/ACCESS.2023.3332710
URL الوصول: https://doaj.org/article/4a87ad4ce39846af9f33e0946e205ac4
رقم الأكسشن: edsdoj.4a87ad4ce39846af9f33e0946e205ac4
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:21693536
DOI:10.1109/ACCESS.2023.3332710