Hybrid BBO_PSO and higher order spectral features for emotion and stress recognition from natural speech

التفاصيل البيبلوغرافية
العنوان: Hybrid BBO_PSO and higher order spectral features for emotion and stress recognition from natural speech
المؤلفون: Sazali Yaacob, Abdul Hamid Adom, Ruzelita Ngadiran, Kemal Polat, Muthusamy Hariharan, C K Yogesh
المصدر: Applied Soft Computing. 56:217-232
بيانات النشر: Elsevier BV, 2017.
سنة النشر: 2017
مصطلحات موضوعية: business.industry, Computer science, Feature vector, Speech recognition, Particle swarm optimization, Pattern recognition, Feature selection, 02 engineering and technology, Set (abstract data type), Support vector machine, 030507 speech-language pathology & audiology, 03 medical and health sciences, Dimension (vector space), 0202 electrical engineering, electronic engineering, information engineering, Feature (machine learning), 020201 artificial intelligence & image processing, Artificial intelligence, 0305 other medical science, business, Software, Bicoherence
الوصف: Display Omitted We proposed higher order spectral based Bispectral and Bicoherence features for multi-class emotion/stress recognition from speech signal.Utterances from three speech emotional databases namely BES, SAVEE and SUSAS have been used in this work.Multi-cluster feature selection, Hybrid Bio-geographical based optimization and particle swarm optimization (HBBO_PSO) are used for feature selection.Experiment results show the effectiveness and efficiency of the proposed method by yielding higher emotion/stress recognition rates. The aim of the present study is to select a set of higher order spectral features for emotion/stress recognition system. 50 Bispectral (28 features) and Bicoherence (22 features) based higher order spectral features were extracted from speech signal and its glottal waveform. These features were combined with Inter-Speech 2010 features to further improve the recognition rates. Feature subset selection (FSS) was carried out in this proposed work with the objective of maximizing emotion recognition rate for subject independent with minimum features. The FSS contains two stages: Multi-cluster feature selection was adopted in Stage 1 to reduce feature space and identify relevant feature subset from Interspeech 2010 features. In Stage 2, Biogeography based optimization (BBO), Particle swarm optimization (PSO) and proposed BBO_PSO Hybrid optimization were performed to further reduce the dimension of feature space and identify the most relevant feature subset, which has higher discrimination ability to distinguish different emotional states. The proposed method was tested in three different databases: Berlin emotional speech database (BES), Surrey audio-visual expressed emotion database (SAVEE) and Speech under simulated and actual stress (SUSAS) simulated domain. The proposed feature set was evaluated with subject independent (SI), subject dependent (SD), gender dependent male (GD-male), gender dependent female (GD-female), text independent pairwise speech (TIDPS), and text independent multi-style speech (TIDMSS) experiments by using SVM and ELM classifiers. From the results obtained, it is evident that the proposed method attained accuracies of 93.25% (SI), 100% (SD), 93.75% (GD-male), and 97.58% (GD-female) for BES; 62.38% (SI) and 76.19% (SD) for SAVEE; and 90.09% (TIDMSS), 97.04% (TIDPS Angry vs. Neutral), 98.89% (TIDPS Lombard vs. Neutral), 99.07% (TIDPS Loud vs. Neutral) for SUSAS.
تدمد: 1568-4946
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_________::310a865f55fedd73e4cb283d28b1998d
https://doi.org/10.1016/j.asoc.2017.03.013
حقوق: CLOSED
رقم الأكسشن: edsair.doi...........310a865f55fedd73e4cb283d28b1998d
قاعدة البيانات: OpenAIRE