Speech Emotion Recognition using Extreme Machine Learning


  • Valli Madhavi Koti GIET Degree College
  • Krishna Murthy Indira Gandhi National Tribal University image/svg+xml
  • M Suganya Sri Sairam Engineering College
  • Meduri Sridhar Sarma GIET Degree College
  • Gollakota V S S Seshu Kumar GIET Degree College
  • Balamurugan N Mohan Babu University




Speech Emotion Recognition, Machine Learning Algorithm, Gaussian Mixture Model, GMM


Detecting Emotion from Spoken Words (SER) is the task of detecting the underlying emotion in spoken language. It is a challenging task, as emotions are subjective and highly contextual. Machine learning algorithms have been widely used for SER, and one such algorithm is the Gaussian Mixture Model (GMM) algorithm. The GMM algorithm is a statistical model that represents the probability distribution of a random variable as a sum of Gaussian distributions. It has been widely used for speech recognition and classification tasks. In this article, we offer a method for SER using Extreme Machine Learning (EML) with the GMM algorithm. EML is a type of machine learning that uses randomization to achieve high accuracy at a low computational cost. It has been effectively utilised in various classification tasks. For the planned approach includes two steps: feature extraction and emotion classification. Cepstral Coefficients of Melody Frequency (MFCCs) are used in order to extract features. MFCCs are commonly used for speech processing and represent the spectral envelope of the speech signal. The GMM algorithm is used for emotion classification. The input features are modelled as a mixture of Gaussians, and the emotion is classified based on the likelihood of the input features belonging to each Gaussian. Measurements were taken of the suggested method on the The Berlin Database of Emotional Speech (EMO-DB) and achieved an accuracy of 74.33%. In conclusion, the proposed approach to SER using EML and the GMM algorithm shows promising results. It is a computationally efficient and effective approach to SER and can be used in various applications, such as speech-based emotion detection for virtual assistants, call centre analytics, and emotional analysis in psychotherapy.


Download data is not yet available.
<br data-mce-bogus="1"> <br data-mce-bogus="1">


Albadr, Musatafa Abbas Abbood et al. “Speech emotion recognition using optimized genetic algorithm-extreme learning machine.” Multimedia Tools and Applications 81 (2022): 23963 - 23989. DOI: https://doi.org/10.1007/s11042-022-12747-w

Daneshfar, Fatemeh and Seyed Jahanshah Kabudian. “Speech Emotion Recognition Using Multi-Layer Sparse Auto-Encoder Extreme Learning Machine and Spectral/Spectro-Temporal Features with New Weighting Method for Data Imbalance.” 2021 11th International Conference on Computer Engineering and Knowledge (ICCKE) (2021): 419-423. DOI: https://doi.org/10.1109/ICCKE54056.2021.9721524

Han, Kun et al. “Speech emotion recognition using deep neural network and extreme learning machine.” Interspeech (2014). DOI: https://doi.org/10.21437/Interspeech.2014-57

Muthusamy, Hariharan et al. “Improved Emotion Recognition Using Gaussian Mixture Model and Extreme Learning Machine in Speech and Glottal Signals.” Mathematical Problems in Engineering 2015 (2015): 1-13. DOI: https://doi.org/10.1155/2015/394083

Verma, Diksha et al. “Multimodal Sentiment Sensing and Emotion Recognition Based on Cognitive Computing Using Hidden Markov Model with Extreme Learning Machine.” International Journal of Communication Networks and Information Security (IJCNIS) (2022): n. pag. DOI: https://doi.org/10.17762/ijcnis.v14i2.5496

R. Corive, E. Douglas-Cowie, N. Tsapatsoulis et al., “Emotion recognition in human-computer interaction,” IEEE Signal Processing Magazine, vol. 18, no. 1, pp. 32–80, 2001. DOI: https://doi.org/10.1109/79.911197

D. Ververidis and C. Kotropoulos, “Emotional speech recognition: resources, features, and methods,” Speech Communication, vol. 48, no. 9, pp. 1162–1181, 2006. DOI: https://doi.org/10.1016/j.specom.2006.04.003

M. El Ayadi, M. S. Kamel, and F. Karray, “Survey on speech emotion recognition: features, classification schemes, and databases,” Pattern Recognition, vol. 44, no. 3, pp. 572–587, 2011. DOI: https://doi.org/10.1016/j.patcog.2010.09.020

D. Y. Wong, J. D. Markel, and A. H. Gray Jr., “Least squares glottal inverse filtering from the acoustic speech waveform,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 27, no. 4, pp. 350–355, 1979. DOI: https://doi.org/10.1109/TASSP.1979.1163260

D. E. Veeneman and S. L. BeMent, “Automatic glottal inverse filtering from speech and electroglottographic signals,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 369–377, 1985. DOI: https://doi.org/10.1109/TASSP.1985.1164544

Srinivasa Rao, C., Tilak Babu, S.B.G. (2016). Image Authentication Using Local Binary Pattern on the Low Frequency Components. In: Satapathy, S., Rao, N., Kumar, S., Raj, C., Rao, V., Sarma, G. (eds) Microelectronics, Electromagnetics and Telecommunications. Lecture Notes in Electrical Engineering, vol 372. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2728-1_49 DOI: https://doi.org/10.1007/978-81-322-2728-1_49

P. Alku, “Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering,” Speech Communication, vol. 11, no. 2- 3, pp. 109–118, 1992. DOI: https://doi.org/10.1016/0167-6393(92)90005-R

Ghosh, H., Tusher, M.A., Rahat, I.S., Khasim, S., Mohanty, S.N. (2023). Water Quality Assessment Through Predictive Machine Learning. In: Intelligent Computing and Networking. IC-ICN 2023. Lecture Notes in Networks and Systems, vol 699. Springer, Singapore. https://doi.org/10.1007/978-981-99-3177-4_6 DOI: https://doi.org/10.1007/978-981-99-3177-4_6

Alenezi, F.; Armghan, A.; Mohanty, S.N.; Jhaveri, R.H.; Tiwari, P. Block-Greedy and CNN Based Underwater Image Dehazing for Novel Depth Estimation and Optimal Ambient Light. Water 2021, 13, 3470. https://doi.org/10.3390/w13233470 DOI: https://doi.org/10.3390/w13233470

G. P. Rout and S. N. Mohanty, "A Hybrid Approach for Network Intrusion Detection," 2015 Fifth International Conference on Communication Systems and Network Technologies, Gwalior, India, 2015, pp. 614-617, doi: 10.1109/CSNT.2015.76. DOI: https://doi.org/10.1109/CSNT.2015.76




How to Cite

V. M. Koti, K. Murthy, M. Suganya, M. S. Sarma, G. V. S. S. Seshu Kumar, and B. N, “Speech Emotion Recognition using Extreme Machine Learning”, EAI Endorsed Trans IoT, vol. 10, Nov. 2023.