Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models

التفاصيل البيبلوغرافية
العنوان: Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models
المؤلفون: Raina, Vyas, Ma, Rao, McGhee, Charles, Knill, Kate, Gales, Mark
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Computer Science - Computation and Language, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
الوصف: Recent developments in large speech foundation models like Whisper have led to their widespread use in many automatic speech recognition (ASR) applications. These systems incorporate `special tokens' in their vocabulary, such as $\texttt{<|endoftext|>}$, to guide their language generation process. However, we demonstrate that these tokens can be exploited by adversarial attacks to manipulate the model's behavior. We propose a simple yet effective method to learn a universal acoustic realization of Whisper's $\texttt{<|endoftext|>}$ token, which, when prepended to any speech signal, encourages the model to ignore the speech and only transcribe the special token, effectively `muting' the model. Our experiments demonstrate that the same, universal 0.64-second adversarial audio segment can successfully mute a target Whisper ASR model for over 97\% of speech samples. Moreover, we find that this universal adversarial audio segment often transfers to new datasets and tasks. Overall this work demonstrates the vulnerability of Whisper models to `muting' adversarial attacks, where such attacks can pose both risks and potential benefits in real-world settings: for example the attack can be used to bypass speech moderation systems, or conversely the attack can also be used to protect private speech data.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2405.06134
رقم الأكسشن: edsarx.2405.06134
قاعدة البيانات: arXiv