Voice command generation using Progressive Wavegans

التفاصيل البيبلوغرافية
العنوان:	Voice command generation using Progressive Wavegans
المؤلفون:	Wiest, Thomas, Cummins, Nicholas, Baird, Alice, Hantke, Simone, Dineley, Judith, Schuller, Björn
سنة النشر:	2019
المجموعة:	Computer Science Statistics
مصطلحات موضوعية:	Computer Science - Computation and Language, Computer Science - Machine Learning, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing, Statistics - Machine Learning
الوصف:	Generative Adversarial Networks (GANs) have become exceedingly popular in a wide range of data-driven research fields, due in part to their success in image generation. Their ability to generate new samples, often from only a small amount of input data, makes them an exciting research tool in areas with limited data resources. One less-explored application of GANs is the synthesis of speech and audio samples. Herein, we propose a set of extensions to the WaveGAN paradigm, a recently proposed approach for sound generation using GANs. The aim of these extensions - preprocessing, Audio-to-Audio generation, skip connections and progressive structures - is to improve the human likeness of synthetic speech samples. Scores from listening tests with 30 volunteers demonstrated a moderate improvement (Cohen's d coefficient of 0.65) in human likeness using the proposed extensions compared to the original WaveGAN approach. Comment: 7 pages, 2 figures
نوع الوثيقة:	Working Paper
URL الوصول:	http://arxiv.org/abs/1903.07395
رقم الأكسشن:	edsarx.1903.07395
قاعدة البيانات:	arXiv

الوصف
الوصف غير متاح.