Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models

التفاصيل البيبلوغرافية
العنوان:	Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models
المؤلفون:	Raina, Vyas, Ma, Rao, McGhee, Charles, Knill, Kate, Gales, Mark
سنة النشر:	2024
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
الوصف:	Recent developments in large speech foundation models like Whisper have led to their widespread use in many automatic speech recognition (ASR) applications. These systems incorporate `special tokens' in their vocabulary, such as $\texttt{<\|endoftext\|>}$, to guide their language generation process. However, we demonstrate that these tokens can be exploited by adversarial attacks to manipulate the model's behavior. We propose a simple yet effective method to learn a universal acoustic realization of Whisper's $\texttt{<\|endoftext\|>}$ token, which, when prepended to any speech signal, encourages the model to ignore the speech and only transcribe the special token, effectively `muting' the model. Our experiments demonstrate that the same, universal 0.64-second adversarial audio segment can successfully mute a target Whisper ASR model for over 97\% of speech samples. Moreover, we find that this universal adversarial audio segment often transfers to new datasets and tasks. Overall this work demonstrates the vulnerability of Whisper models to `muting' adversarial attacks, where such attacks can pose both risks and potential benefits in real-world settings: for example the attack can be used to bypass speech moderation systems, or conversely the attack can also be used to protect private speech data.
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/2405.06134
رقم الأكسشن:	edsarx.2405.06134
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.