PetKaz at SemEval-2024 Task 8: Can Linguistics Capture the Specifics of LLM-generated Text?

التفاصيل البيبلوغرافية
العنوان: PetKaz at SemEval-2024 Task 8: Can Linguistics Capture the Specifics of LLM-generated Text?
المؤلفون: Petukhova, Kseniia, Kazakov, Roman, Kochmar, Ekaterina
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Artificial Intelligence, I.2.7
الوصف: In this paper, we present our submission to the SemEval-2024 Task 8 "Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection", focusing on the detection of machine-generated texts (MGTs) in English. Specifically, our approach relies on combining embeddings from the RoBERTa-base with diversity features and uses a resampled training set. We score 12th from 124 in the ranking for Subtask A (monolingual track), and our results show that our approach is generalizable across unseen models and domains, achieving an accuracy of 0.91.
Comment: 8 pages, 3 figures, 5 tables, to be published in the Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), for associated code, see https://github.com/sachertort/petkaz-semeval-m4
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2404.05483
رقم الأكسشن: edsarx.2404.05483
قاعدة البيانات: arXiv