The Microsoft 2016 Conversational Speech Recognition System

التفاصيل البيبلوغرافية
العنوان:	The Microsoft 2016 Conversational Speech Recognition System
المؤلفون:	Xiong, W., Droppo, J., Huang, X., Seide, F., Seltzer, M., Stolcke, A., Yu, D., Zweig, G.
المصدر:	Proc. IEEE ICASSP, March 2017, pp. 5255-5259
سنة النشر:	2016
المجموعة:	Computer Science
مصطلحات موضوعية:	Computer Science - Computation and Language, Electrical Engineering and Systems Science - Audio and Speech Processing
الوصف:	We describe Microsoft's conversational speech recognition system, in which we combine recent developments in neural-network-based acoustic and language modeling to advance the state of the art on the Switchboard recognition task. Inspired by machine learning ensemble techniques, the system uses a range of convolutional and recurrent neural networks. I-vector modeling and lattice-free MMI training provide significant gains for all acoustic model architectures. Language model rescoring with multiple forward and backward running RNNLMs, and word posterior-based system combination provide a 20% boost. The best single system uses a ResNet architecture acoustic model with RNNLM rescoring, and achieves a word error rate of 6.9% on the NIST 2000 Switchboard task. The combined system has an error rate of 6.2%, representing an improvement over previously reported results on this benchmark task.
نوع الوثيقة:	Working Paper
DOI:	10.1109/ICASSP.2017.7953159
URL الوصول:	http://arxiv.org/abs/1609.03528
رقم الأكسشن:	edsarx.1609.03528
قاعدة البيانات:	arXiv

الوصف
DOI:	10.1109/ICASSP.2017.7953159