End-Point Detection with State Transition Model based on Chunk-Wise Classification

التفاصيل البيبلوغرافية
العنوان: End-Point Detection with State Transition Model based on Chunk-Wise Classification
المؤلفون: Kim, Juntae, Bae, Jaesung, Hahn, Minsoo
سنة النشر: 2019
المجموعة: Computer Science
مصطلحات موضوعية: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Sound
الوصف: A state transition model (STM) based on chunk-wise classification was proposed for end-point detection (EPD). In general, EPD is developed using frame-wise voice activity detection (VAD) with additional STM, in which the state transition is conducted based on VAD's frame-level decision (speech or non-speech). However, VAD errors frequently occur in noisy environments, even though we use state-of-the-art deep neural network based VAD, which causes the undesired state transition of STM. In this work, to build robust STM, a state transition is conducted based on chunk-wise classification as EPD does not need to be conducted in frame-level. The chunk consists of multiple frames and the classification of chunk between speech and non-speech is done by aggregating the decisions of VAD for multiple frames, so that some undesired VAD errors in a chunk can be smoothed by other correct VAD decisions. Finally, the model was evaluated in both qualitative and quantitative measures including phone error rate.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/1912.10442
رقم الأكسشن: edsarx.1912.10442
قاعدة البيانات: arXiv