Skip to main navigation Skip to search Skip to main content

A multi-modal deep learning system for Arabic emotion recognition

  • Jordan University of Science and Technology

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

Emotion analysis is divided into emotion detection, where the system detects if there is an emotional state, and emotion recognition where the system identifies the label of the emotion. In this paper, we provide a multimodal system for emotion detection and recognition using Arabic dataset. We evaluated the performance of both audio and visual data as a unimodal system, then, we exposed the impact of integrating the information sources into one model. We examined the effect of gender identification on the performance. Our results show that identifying speaker’s gender beforehand increases the performance of emotion recognition especially for the models that rely on audio data. Comparing the audio-based system with the visual-based system demonstrates that each model performs better for a specific emotional label. 70% of the angry labels were predicted correctly using the audio model while this percentage was less using the visual model (63%). The accuracy obtained for the surprise class was (40.6%) using the audio model while it was (56.2%) using the visual model. The combination of both modalities improves accuracy. Our final result for the multimodal system was (75%) for the emotion detection task and (60.11%) for emotion recognition task and these results are among the top results achieved in this field and the first which focus on Arabic content. Specifically, the novelty of this work is expressed by exploiting deep learning and multimodal models in emotion analysis and applying it on a natural audio and video dataset for Arabic speaking persons.

Original languageEnglish
Pages (from-to)123-139
Number of pages17
JournalInternational Journal of Speech Technology
Volume26
Issue number1
DOIs
StatePublished - Mar 2023
Externally publishedYes

Keywords

  • Emotion recognition
  • Emotional Arabic datasets
  • Facial expressions
  • Multimodalities
  • Speech emotion recognition

Fingerprint

Dive into the research topics of 'A multi-modal deep learning system for Arabic emotion recognition'. Together they form a unique fingerprint.

Cite this