دورية أكاديمية

3D CNN-Based Speech Emotion Recognition Using K-Means Clustering and Spectrograms.

التفاصيل البيبلوغرافية
العنوان: 3D CNN-Based Speech Emotion Recognition Using K-Means Clustering and Spectrograms.
المؤلفون: Hajarolasvadi N; Department of Electrical and Electronics Engineering, Eastern Mediterranean University, 99628 Gazimagusa, North Cyprus, via Mersin 10, Turkey., Demirel H; Department of Electrical and Electronics Engineering, Eastern Mediterranean University, 99628 Gazimagusa, North Cyprus, via Mersin 10, Turkey.
المصدر: Entropy (Basel, Switzerland) [Entropy (Basel)] 2019 May 08; Vol. 21 (5). Date of Electronic Publication: 2019 May 08.
نوع المنشور: Journal Article
اللغة: English
بيانات الدورية: Publisher: MDPI Country of Publication: Switzerland NLM ID: 101243874 Publication Model: Electronic Cited Medium: Internet ISSN: 1099-4300 (Electronic) Linking ISSN: 10994300 NLM ISO Abbreviation: Entropy (Basel) Subsets: PubMed not MEDLINE
أسماء مطبوعة: Original Publication: Basel, Switzerland : MDPI, 1999-
مستخلص: Detecting human intentions and emotions helps improve human-robot interactions. Emotion recognition has been a challenging research direction in the past decade. This paper proposes an emotion recognition system based on analysis of speech signals. Firstly, we split each speech signal into overlapping frames of the same length. Next, we extract an 88-dimensional vector of audio features including Mel Frequency Cepstral Coefficients (MFCC), pitch, and intensity for each of the respective frames. In parallel, the spectrogram of each frame is generated. In the final preprocessing step, by applying k -means clustering on the extracted features of all frames of each audio signal, we select k most discriminant frames, namely keyframes, to summarize the speech signal. Then, the sequence of the corresponding spectrograms of keyframes is encapsulated in a 3D tensor. These tensors are used to train and test a 3D Convolutional Neural network using a 10-fold cross-validation approach. The proposed 3D CNN has two convolutional layers and one fully connected layer. Experiments are conducted on the Surrey Audio-Visual Expressed Emotion (SAVEE), Ryerson Multimedia Laboratory (RML), and eNTERFACE'05 databases. The results are superior to the state-of-the-art methods reported in the literature.
معلومات مُعتمدة: BAP-C-02-18-0001 BAP-C project, Eastern Mediterranean University
فهرسة مساهمة: Keywords: 3D convolutional neural networks; deep learning; k-means clustering; spectrograms; speech emotion recognition
تواريخ الأحداث: Date Created: 20201203 Latest Revision: 20201207
رمز التحديث: 20221213
مُعرف محوري في PubMed: PMC7514968
DOI: 10.3390/e21050479
PMID: 33267193
قاعدة البيانات: MEDLINE
الوصف
تدمد:1099-4300
DOI:10.3390/e21050479