A Twitter corpus and benchmark resources for german sentiment analysis

التفاصيل البيبلوغرافية
العنوان: A Twitter corpus and benchmark resources for german sentiment analysis
المؤلفون: Fatih Uzdilli, Jan Milan Deriu, Mark Cieliebak, Dominic Egger
المصدر: SocialNLP@EACL
بيانات النشر: Association for Computational Linguistics, 2017.
سنة النشر: 2017
مصطلحات موضوعية: Source code, business.industry, media_common.quotation_subject, Sentiment analysis, InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL, Twitter, 006: Spezielle Computerverfahren, computer.software_genre, Corpus, language.human_language, German, Support vector machine, Feature (machine learning), language, Benchmark (computing), Sentiment Analysis, Artificial intelligence, business, computer, Word (computer architecture), Natural language processing, 410.285: Computerlinguistik, media_common
الوصف: In this paper we present SB10k, a new corpus for sentiment analysis with approx. 10,000 German tweets. We use this new corpus and two existing corpora to provide state-of-the-art benchmarks for sentiment analysis in German: we implemented a CNN (based on the winning system of SemEval-2016) and a feature-based SVM and compare their performance on all three corpora. For the CNN, we also created German word embeddings trained on 300M tweets. These word embeddings were then optimized for sentiment analysis using distant-supervised learning. The new corpus, the German word embeddings (plain and optimized), and source code to re-run the benchmarks are publicly available.
وصف الملف: application/pdf
اللغة: English
DOI: 10.21256/zhaw-1530
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::bf532bf043b0e337a59584af3afc1d50
حقوق: OPEN
رقم الأكسشن: edsair.doi.dedup.....bf532bf043b0e337a59584af3afc1d50
قاعدة البيانات: OpenAIRE