الوصف: |
S pomočjo jezikovnega modela tipa BERT, naučenega na slovenskem jeziku, smo razvili metode za predlaganje popravkov slovničnih napak. Za osnovni model s semantičnim znanjem o slovenskem jeziku smo uporabili jezikovni model SloBERTa. Uporabili smo pripomočke za vrednotenje ter spreminjanje oblik besed v vhodnih povedih. Osredotočili smo se na popravljanje sklona tožilnik-rodilnik in števila množina-dvojina. Predlagane popravke smo ovrednotili s pomočjo lektoriranega in označenega korpusa s slovenskimi besedili. Program ob nastavitvi, ko hkrati popravlja obe napaki, doseže F-oceno med 95% in 96%. Pravilno popravi od 92% do 95% napačno nastavljenih besed - odvisno od števila nastavljenih napačnih besed v posamezni povedi. Using a BERT-type language model, pre-trained on the Slovenian language, we have developed methods for proposing corrections of gramatical errors. For the basic model with semantic knowledge of the Slovenian language, we use the SloBERTa language model. We have used tools for evaluating and changing of words forms. In particular, we suggest the corrections of words with case accusative-genitive and number plural-dual. We evaluated the proposed corrections with the help of a proofread and annotated corpus of Slovenian texts. The program achieves an F-score between 95% and 96% when correcting both types of errors at the same time. Depending on the number of misspelled words in the sentence, it correctly predicts corrections of 92% to 95% of misspelled words. |