دورية أكاديمية

The Effect of Noise on Deep Learning for Classification of Pathological Voice.

التفاصيل البيبلوغرافية
العنوان: The Effect of Noise on Deep Learning for Classification of Pathological Voice.
المؤلفون: Hasebe K; Department of Otolaryngology, Head and Neck Surgery, Graduate School of Medicine, Kyoto University, Kyoto, Japan., Fujimura S; Department of Otolaryngology, Head and Neck Surgery, Graduate School of Medicine, Kyoto University, Kyoto, Japan., Kojima T; Department of Otolaryngology, Head and Neck Surgery, Graduate School of Medicine, Kyoto University, Kyoto, Japan., Tamura K; Department of Otolaryngology, Head and Neck Surgery, Graduate School of Medicine, Kyoto University, Kyoto, Japan., Kawai Y; Department of Otolaryngology, Head and Neck Surgery, Graduate School of Medicine, Kyoto University, Kyoto, Japan., Kishimoto Y; Department of Otolaryngology, Head and Neck Surgery, Graduate School of Medicine, Kyoto University, Kyoto, Japan., Omori K; Department of Otolaryngology, Head and Neck Surgery, Graduate School of Medicine, Kyoto University, Kyoto, Japan.
المصدر: The Laryngoscope [Laryngoscope] 2024 Aug; Vol. 134 (8), pp. 3537-3541. Date of Electronic Publication: 2024 Jan 27.
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: Wiley-Blackwell Country of Publication: United States NLM ID: 8607378 Publication Model: Print-Electronic Cited Medium: Internet ISSN: 1531-4995 (Electronic) Linking ISSN: 0023852X NLM ISO Abbreviation: Laryngoscope Subsets: MEDLINE
أسماء مطبوعة: Publication: <2009- >: Philadelphia, PA : Wiley-Blackwell
Original Publication: St. Louis, Mo. : [s.n., 1896-
مواضيع طبية MeSH: Deep Learning* , Voice Disorders*/diagnosis , Voice Disorders*/physiopathology , Voice Disorders*/etiology , Noise*, Humans ; Retrospective Studies ; Voice Quality/physiology ; Male ; Female ; Neural Networks, Computer
مستخلص: Objective: This study aimed to evaluate the significance of background noise in machine learning models assessing the GRBAS scale for voice disorders.
Methods: A dataset of 1406 voice samples was collected from retrospective data, and a 5-layer 1D convolutional neural network (CNN) model was constructed using TensorFlow. The dataset was divided into training, validation, and test data. Gaussian noise was added to test samples at various intensities to assess the model's noise resilience. The model's performance was evaluated using accuracy, F1 score, and quadratic weighted Cohen's kappa score.
Results: The model's performance on the GRBAS scale generally declined with increasing noise intensities. For the G scale, accuracy dropped from 70.9% (original) to 8.5% (at the highest noise), F1 score from 69.2% to 1.3%, and Cohen's kappa from 0.679 to 0.0. Similar declines were observed for the remaining RBAS components.
Conclusion: The model's performance was affected by background noise, with substantial decreases in evaluation metrics as noise levels intensified. Future research should explore noise-tolerant techniques, such as data augmentation, to improve the model's noise resilience in real-world settings.
Level of Evidence: This study evaluates a machine learning model using a single dataset without comparative controls. Given its non-comparative design and specific focus, it aligns with Level 4 evidence (Case-series) under the 2011 OCEBM guidelines Laryngoscope, 134:3537-3541, 2024.
(© 2024 The Authors. The Laryngoscope published by Wiley Periodicals LLC on behalf of The American Laryngological, Rhinological and Otological Society, Inc.)
References: Hirano M. Psycho‐acoustic evaluation of VOICE. In: Arnold GE, Winckel F, Wyke BD, eds. Clinical Examination of Voice. Springer‐Verlag; 1981:81‐84.
Kreiman J, Gerratt BR, Precoda K. Listener experience and perception of voice quality. J Speech Lang Hear Res. 1990;33(1):103‐115. https://doi.org/10.1044/jshr.3301.103.
Saenz‐Lechon N, Godino‐Llorente JI, Osma‐Ruiz V, Blanco‐Velasco M, Cruz‐Roldan F. Automatic assessment of voice quality according to the GRBAS Scale. In: 2006 International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE; 2006:2478–2481. https://doi.org/10.1109/IEMBS.2006.260603.
Villa‐Canas T, Orozco‐Arroyave JR, Arias‐Londono JD, Vargas‐Bonilla JF, Godino‐Llorente JI. Automatic assessment of voice signals according to the GRBAS scale using modulation spectra, Mel frequency Cepstral Coefficients and Noise parameters. In: Symposium of Signals, Images and Artificial Vision – 2013: STSIVA – 2013. IEEE; 2013:1–5. https://doi.org/10.1109/STSIVA.2013.6644930.
Fujimura S, Kojima T, Okanoue Y, et al. Classification of voice disorders using a one‐dimensional convolutional neural network. J Voice. 2022;36(1):15‐20. https://doi.org/10.1016/j.jvoice.2020.02.009.
Li J, Dai W, Metze F, Qu S, Das S. A comparison of Deep Learning methods for environmental sound detection. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2017:126‐130. https://doi.org/10.1109/ICASSP.2017.7952131.
Kojima T, Fujimura S, Hori R, Okanoue Y, Shoji K, Inoue M. An innovative voice analyzer “VA” smart phone program for quantitative analysis of voice quality. J Voice. 2019;33(5):642‐648. https://doi.org/10.1016/j.jvoice.2018.01.026.
Mizuta M, Shoji K, Kojima T, et al. New VA software program quantitatively analyzes voice quality. Pract Otorhinolaryngol (Basel). 2011;104(4):297‐302. https://doi.org/10.5631/jibirin.104.297.
Li J, Deng L, Gong Y, Haeb‐Umbach R. An overview of noise‐robust automatic speech recognition. IEEE/ACM Trans Audio Speech Lang Process. 2014;22(4):745‐777. https://doi.org/10.1109/TASLP.2014.2304637.
Ye J, Kobayashi T, Murakawa M, Higuchi T. Robust acoustic feature extraction for sound classification based on noise reduction. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE; 2014:5944‐5948. https://doi.org/10.1109/ICASSP.2014.6854744.
Prisyach T, Mendelev V, Ubskiy D. Data augmentation for training of noise robust acoustic models. In: Analysis of Images, Social Networks and Texts: 5th International Conference, AIST 2016, Yekaterinburg, Russia, April 7‐9, 2016, Revised Selected Papers 5; 2017:17‐25. https://doi.org/10.1007/978-3-319-52920-2&#95;2.
Abayomi‐Alli OO, Damaševičius R, Qazi A, Adedoyin‐Olowe M, Misra S. Data augmentation and deep learning methods in sound classification: a systematic review. Electronics (Switzerland). 2022;11(22):3795. https://doi.org/10.3390/electronics11223795.
معلومات مُعتمدة: JP22K09695 Japan Society for the Promotion of Science
فهرسة مساهمة: Keywords: 1D‐CNN; GRBAS scale; machine learning; noise resilience; voice disorders
تواريخ الأحداث: Date Created: 20240127 Date Completed: 20240712 Latest Revision: 20240712
رمز التحديث: 20240712
DOI: 10.1002/lary.31303
PMID: 38280184
قاعدة البيانات: MEDLINE
الوصف
تدمد:1531-4995
DOI:10.1002/lary.31303