A rule and statistical modeling based stem extraction method for kazakh words

التفاصيل البيبلوغرافية
العنوان: A rule and statistical modeling based stem extraction method for kazakh words
المؤلفون: Askar Hamdulla, Rehmutulla Memet, Gulnigar Mahmut, Mewlude Nijat
المصدر: IALP
بيانات النشر: IEEE, 2017.
سنة النشر: 2017
مصطلحات موضوعية: Agglutinative language, Computer science, business.industry, Affix, Statistical model, Kazakh, Part of speech, computer.software_genre, language.human_language, Prefix, language, Artificial intelligence, Suffix, business, computer, Natural language processing, Word (computer architecture)
الوصف: The Kazakh is one of the agglutinative language with more complicated morphological changes. Kazak stem and affix extraction have important significance for Kazakh information processing. In this paper, according to the morphological structure of Kazakh words, we applied a method to stem extraction, which is combined the lexical rules with statistical model. The stem extraction is carried out by using prefix dictionary, suffix dictionary, stem dictionary, statistical model dictionary and the rule base. Experimental results show that, in the statistical model, the method to extract the stem by using part of speech features is effective, in that, the word level accuracy and the stem level accuracy of this method reached 0.93% and 76.74% respectively.
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_________::3f1717a8f23ffc3ba05f48804142ec6e
https://doi.org/10.1109/ialp.2017.8300586
رقم الأكسشن: edsair.doi...........3f1717a8f23ffc3ba05f48804142ec6e
قاعدة البيانات: OpenAIRE