دورية أكاديمية

On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition

التفاصيل البيبلوغرافية
العنوان: On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition
المؤلفون: Juraj Kacur, Boris Puterka, Jarmila Pavlovicova, Milos Oravec
المصدر: Sensors, Vol 21, Iss 5, p 1888 (2021)
بيانات النشر: MDPI AG, 2021.
سنة النشر: 2021
المجموعة: LCC:Chemical technology
مصطلحات موضوعية: windows, frequency scales, spectrograms, psychoacoustic filter banks, LPC, cepstral features, Chemical technology, TP1-1185
الوصف: Many speech emotion recognition systems have been designed using different features and classification methods. Still, there is a lack of knowledge and reasoning regarding the underlying speech characteristics and processing, i.e., how basic characteristics, methods, and settings affect the accuracy, to what extent, etc. This study is to extend physical perspective on speech emotion recognition by analyzing basic speech characteristics and modeling methods, e.g., time characteristics (segmentation, window types, and classification regions—lengths and overlaps), frequency ranges, frequency scales, processing of whole speech (spectrograms), vocal tract (filter banks, linear prediction coefficient (LPC) modeling), and excitation (inverse LPC filtering) signals, magnitude and phase manipulations, cepstral features, etc. In the evaluation phase the state-of-the-art classification method and rigorous statistical tests were applied, namely N-fold cross validation, paired t-test, rank, and Pearson correlations. The results revealed several settings in a 75% accuracy range (seven emotions). The most successful methods were based on vocal tract features using psychoacoustic filter banks covering the 0–8 kHz frequency range. Well scoring are also spectrograms carrying vocal tract and excitation information. It was found that even basic processing like pre-emphasis, segmentation, magnitude modifications, etc., can dramatically affect the results. Most findings are robust by exhibiting strong correlations across tested databases.
نوع الوثيقة: article
وصف الملف: electronic resource
اللغة: English
تدمد: 1424-8220
Relation: https://www.mdpi.com/1424-8220/21/5/1888; https://doaj.org/toc/1424-8220
DOI: 10.3390/s21051888
URL الوصول: https://doaj.org/article/7e67c19cad9c452aab601d2da9fb8344
رقم الأكسشن: edsdoj.7e67c19cad9c452aab601d2da9fb8344
قاعدة البيانات: Directory of Open Access Journals
الوصف
تدمد:14248220
DOI:10.3390/s21051888