Skip to main navigation Skip to search Skip to main content

An Improved Time-Frequency Based Deep Learning Algorithm for Speech Emotion Recognition

  • Ajman University
  • University of Sfax

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

Emotion identification can be very useful in diverse applications including medical diagnosis, social interaction, marketing, etc. Nevertheless, emotions are complex and still pose numerous challenges. In this work, a new framework is developed for improving state-of-art works in identifying emotion types from speech data. We improve the representation of input speech data by adding new features to the time-frequency (TF) spectrogram of audio segments and show that such a modified TF image-like model has a major impact on accurately identifying emotions. For each audio segment, a sequence of spectrogram images was generated, and three additional spectral features were estimated, namely spectral centroid, pitch frequency and spectral roll off. These features were superimposed as layers to the spectrogram images. The new improved spectrogram images, with a proper augmentation step, were then fed to different types of deep networks. The best result in detection accuracy was obtained using the VGG network with a score of 92.05% outperforming state-of-the-art by more than 6% on the average.

Original languageEnglish
Title of host publication2024 21st International Multi-Conference on Systems, Signals and Devices, SSD 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages334-339
Number of pages6
ISBN (Electronic)9798350374131
DOIs
StatePublished - 2024
Event21st International Multi-Conference on Systems, Signals and Devices, SSD 2024 - Erbil, Iraq
Duration: 22 Apr 202425 Apr 2024

Publication series

Name2024 21st International Multi-Conference on Systems, Signals and Devices, SSD 2024

Conference

Conference21st International Multi-Conference on Systems, Signals and Devices, SSD 2024
Country/TerritoryIraq
CityErbil
Period22/04/2425/04/24

Keywords

  • CNN
  • Deep Learning
  • DenseNet
  • Emotion Detection
  • IEMOCAP
  • Pitch Frequency
  • Spectral Centroid
  • Spectral Roll off
  • Spectrogram
  • Speech Emotion Recognition
  • Traditional CNN
  • VGG

Fingerprint

Dive into the research topics of 'An Improved Time-Frequency Based Deep Learning Algorithm for Speech Emotion Recognition'. Together they form a unique fingerprint.

Cite this