Handling Numeric Expressions in Automatic Speech Recognition

التفاصيل البيبلوغرافية
العنوان: Handling Numeric Expressions in Automatic Speech Recognition
المؤلفون: Huber, Christian, Waibel, Alexander
سنة النشر: 2024
المجموعة: Computer Science
مصطلحات موضوعية: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence, Computer Science - Computation and Language
الوصف: This paper addresses the problem of correctly formatting numeric expressions in automatic speech recognition (ASR) transcripts. This is challenging since the expected transcript format depends on the context, e.g., 1945 (year) vs. 19:45 (timestamp). We compare cascaded and end-to-end approaches to recognize and format numeric expression, such as years, timestamps, currency amounts, and quantities. For the end-to-end approach we employed a data generation strategy using a large language model (LLM) together with a text to speech (TTS) model to generate adaptation data. The results on our test dataset show that while approaches based on LLMs perform well on recognizing formatted numeric expressions, adapted end-to-end models offer competitive performance with the advantage of lower latency and inference cost.
نوع الوثيقة: Working Paper
URL الوصول: http://arxiv.org/abs/2408.00004
رقم الأكسشن: edsarx.2408.00004
قاعدة البيانات: arXiv