An RFP dataset for Real, Fake, and Partially fake audio detection

التفاصيل البيبلوغرافية
العنوان: An RFP dataset for Real, Fake, and Partially fake audio detection
المؤلفون: AlAli, Abdulazeez, Theodorakopoulos, George
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Sound, Computer Science - Cryptography and Security, Electrical Engineering and Systems Science - Audio and Speech Processing
الوصف: Recent advances in deep learning have enabled the creation of natural-sounding synthesised speech. However, attackers have also utilised these tech-nologies to conduct attacks such as phishing. Numerous public datasets have been created to facilitate the development of effective detection models. How-ever, available datasets contain only entirely fake audio; therefore, detection models may miss attacks that replace a short section of the real audio with fake audio. In recognition of this problem, the current paper presents the RFP da-taset, which comprises five distinct audio types: partial fake (PF), audio with noise, voice conversion (VC), text-to-speech (TTS), and real. The data are then used to evaluate several detection models, revealing that the available detec-tion models incur a markedly higher equal error rate (EER) when detecting PF audio instead of entirely fake audio. The lowest EER recorded was 25.42%. Therefore, we believe that creators of detection models must seriously consid-er using datasets like RFP that include PF and other types of fake audio.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2404.17721
رقم الأكسشن: edsarx.2404.17721
قاعدة البيانات: arXiv