TY - GEN
T1 - A two-stage hierarchical multilingual emotion recognition system using hidden markov models and neural networks
AU - Abo Absa, Ahmed H.
AU - Deriche, M.
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2018/8/27
Y1 - 2018/8/27
N2 - Speech emotion recognition continues to attract a lot of research especially under mixed language speech. Here, we show that emotion is culture/language dependent. In this paper, we propose a two-stage emotion recognition system that starts by identifying the language then using a dedicated language-dependent recognition system for identifying the type of emotion, The system is able to recognize accurately the four main types of emotion, namely Neutral, happy, angry, and sad. These types of emotion states are widely used in practical setups. To keep the computation complexity low, we identify the language using a feature vector consisting of energies from a basic wavelet decomposition of the speech signal. The Hidden Markov Model is then used to track the changes of this energy feature vector to identify the language achieving recognition of accuracy close to 100%. Once the language is identified, a set of traditional speech processing features including pitch, formats, MFCCs.... etc, are used with a basic Neural Network architecture to identify the type of emotion. The results show that that identifying the language first can substantially improve the overall accuracy in identifying emotions. The overall accuracy achieved with the proposed hierarchical system was above 93 %. The work shows the strong correlation between language/culture and type of emotion, and can further be extended to other scenarios such as gender-based recognition, facial-expression based recognition, age-based recognition... etc.
AB - Speech emotion recognition continues to attract a lot of research especially under mixed language speech. Here, we show that emotion is culture/language dependent. In this paper, we propose a two-stage emotion recognition system that starts by identifying the language then using a dedicated language-dependent recognition system for identifying the type of emotion, The system is able to recognize accurately the four main types of emotion, namely Neutral, happy, angry, and sad. These types of emotion states are widely used in practical setups. To keep the computation complexity low, we identify the language using a feature vector consisting of energies from a basic wavelet decomposition of the speech signal. The Hidden Markov Model is then used to track the changes of this energy feature vector to identify the language achieving recognition of accuracy close to 100%. Once the language is identified, a set of traditional speech processing features including pitch, formats, MFCCs.... etc, are used with a basic Neural Network architecture to identify the type of emotion. The results show that that identifying the language first can substantially improve the overall accuracy in identifying emotions. The overall accuracy achieved with the proposed hierarchical system was above 93 %. The work shows the strong correlation between language/culture and type of emotion, and can further be extended to other scenarios such as gender-based recognition, facial-expression based recognition, age-based recognition... etc.
KW - Hidden markov model
KW - Language recognition
KW - Neural network
KW - Speech emotion recognition
UR - https://www.scopus.com/pages/publications/85053881542
U2 - 10.1109/IEEEGCC.2017.8448155
DO - 10.1109/IEEEGCC.2017.8448155
M3 - Conference contribution
AN - SCOPUS:85053881542
SN - 9781538627563
T3 - 2017 9th IEEE-GCC Conference and Exhibition, GCCCE 2017
BT - 2017 9th IEEE-GCC Conference and Exhibition, GCCCE 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th IEEE-GCC Conference and Exhibition, GCCCE 2017
Y2 - 8 May 2017 through 11 May 2017
ER -