Findings of the The RuATD Shared Task 2022 on Artificial Text Detection in Russian

التفاصيل البيبلوغرافية
العنوان: Findings of the The RuATD Shared Task 2022 on Artificial Text Detection in Russian
المؤلفون: Shamardina, Tatiana, Mikhailov, Vladislav, Chernianskii, Daniil, Fenogenova, Alena, Saidov, Marat, Valeeva, Anastasiya, Shavrina, Tatiana, Smurov, Ivan, Tutubalina, Elena, Artemova, Ekaterina
سنة النشر: 2022
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language
الوصف: We present the shared task on artificial text detection in Russian, which is organized as a part of the Dialogue Evaluation initiative, held in 2022. The shared task dataset includes texts from 14 text generators, i.e., one human writer and 13 text generative models fine-tuned for one or more of the following generation tasks: machine translation, paraphrase generation, text summarization, text simplification. We also consider back-translation and zero-shot generation approaches. The human-written texts are collected from publicly available resources across multiple domains. The shared task consists of two sub-tasks: (i) to determine if a given text is automatically generated or written by a human; (ii) to identify the author of a given text. The first task is framed as a binary classification problem. The second task is a multi-class classification problem. We provide count-based and BERT-based baselines, along with the human evaluation on the first sub-task. A total of 30 and 8 systems have been submitted to the binary and multi-class sub-tasks, correspondingly. Most teams outperform the baselines by a wide margin. We publicly release our codebase, human evaluation results, and other materials in our GitHub repository (https://github.com/dialogue-evaluation/RuATD).
Comment: Accepted to Dialogue-22
نوع الوثيقة: Working Paper
DOI: 10.28995/2075-7182-2022-21-497-511
URL الوصول: http://arxiv.org/abs/2206.01583
رقم الأكسشن: edsarx.2206.01583
قاعدة البيانات: arXiv
الوصف
DOI:10.28995/2075-7182-2022-21-497-511