PetKaz at SemEval-2024 Task 8: Can Linguistics Capture the Specifics of LLM-generated Text?

التفاصيل البيبلوغرافية
العنوان:	PetKaz at SemEval-2024 Task 8: Can Linguistics Capture the Specifics of LLM-generated Text?
المؤلفون:	Petukhova, Kseniia, Kazakov, Roman, Kochmar, Ekaterina
سنة النشر:	2024
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Computation and Language, Computer Science - Artificial Intelligence, I.2.7
الوصف:	In this paper, we present our submission to the SemEval-2024 Task 8 "Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection", focusing on the detection of machine-generated texts (MGTs) in English. Specifically, our approach relies on combining embeddings from the RoBERTa-base with diversity features and uses a resampled training set. We score 12th from 124 in the ranking for Subtask A (monolingual track), and our results show that our approach is generalizable across unseen models and domains, achieving an accuracy of 0.91. Comment: 8 pages, 3 figures, 5 tables, to be published in the Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024), for associated code, see https://github.com/sachertort/petkaz-semeval-m4
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2404.05483
رقم الأكسشن:	edsarx.2404.05483
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.