دورية أكاديمية

Building a Language-Independent Discourse Parser using Universal Networking Language.

التفاصيل البيبلوغرافية
العنوان: Building a Language-Independent Discourse Parser using Universal Networking Language.
المؤلفون: Navaneethakrishnan, Subalalitha Chinnaudayar, Parthasarathi, Ranjani
المصدر: Computational Intelligence; Nov2015, Vol. 31 Issue 4, p593-618, 26p
مصطلحات موضوعية: PARSING (Computer grammar), PROGRAMMING languages, INFORMATION processing, RHETORICAL analysis, UNIVERSAL language, CLASSIFIERS (Linguistics), BAYES' theorem
مستخلص: Discourse parsing has become an inevitable task to process information in the natural language processing arena. Parsing complex discourse structures beyond the sentence level is a significant challenge. This article proposes a discourse parser that constructs rhetorical structure (RS) trees to identify such complex discourse structures. Unlike previous parsers that construct RS trees using lexical features, syntactic features and cue phrases, the proposed discourse parser constructs RS trees using high-level semantic features inherited from the Universal Networking Language (UNL). The UNL also adds a language-independent quality to the parser, because the UNL represents texts in a language-independent manner. The parser uses a naive Bayes probabilistic classifier to label discourse relations. It has been tested using 500 Tamil-language documents and the Rhetorical Structure Theory Discourse Treebank, which comprises 21 English-language documents. The performance of the naive Bayes classifier has been compared with that of the support vector machine (SVM) classifier, which has been used in the earlier approaches to build a discourse parser. It is seen that the naive Bayes probabilistic classifier is better suited for discourse relation labeling when compared with the SVM classifier, in terms of training time, testing time, and accuracy. [ABSTRACT FROM AUTHOR]
Copyright of Computational Intelligence is the property of Wiley-Blackwell and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
قاعدة البيانات: Complementary Index
الوصف
تدمد:08247935
DOI:10.1111/coin.12037