Investigation of Imbalanced Sentiment Analysis in Voice Data: A Comparative Study of Machine Learning Algorithms




Audio Feature Extraction, Emotion Detection, Gradient Boosting, machine learning, Speech Emotion Recognition, Sentiment Analysis



INTRODUCTION: Language serves as the primary conduit for human expression, extending its reach into various communication mediums like email and text messaging, where emoticons are frequently employed to convey nuanced emotions. In the digital landscape of long-distance communication, the detection and analysis of emotions assume paramount importance. However, this task is inherently challenging due to the subjectivity inherent in emotions, lacking a universal consensus for quantification or categorization.

OBJECTIVES: This research proposes a novel speech recognition model for emotion analysis, leveraging diverse machine learning techniques along with a three-layer feature extraction approach. This research will also through light on the robustness of models on balanced and imbalanced datasets.

METHODS: The proposed three-layered feature extractor uses chroma, MFCC, and Mel method, and passes these features to classifiers like K-Nearest Neighbour, Gradient Boosting, Multi-Layer Perceptron, and Random Forest.

RESULTS: Among the classifiers in the framework, Multi-Layer Perceptron (MLP) emerges as the top-performing model, showcasing remarkable accuracies of 99.64%, 99.43%, and 99.31% in the Balanced TESS Dataset, Imbalanced TESS (Half) Dataset, and Imbalanced TESS (Quarter) Dataset, respectively. K-Nearest Neighbour (KNN) follows closely as the second-best classifier, surpassing MLP's accuracy only in the Imbalanced TESS (Half) Dataset at 99.52%.

CONCLUSION: This research contributes valuable insights into effective emotion recognition through speech, shedding light on the nuances of classification in imbalanced datasets.


Ragot M, Martin N, Em S, Pallamin N, Diverrez JM. Emotion recognition using physiological signals: laboratory vs. wearable sensors. InAdvances in Human Factors in Wearable Technologies and Game Design: Proceedings of the AHFE 2017 International Conference on Advances in Human Factors and Wearable Technologies, July 17-21, 2017, The Westin Bonaventure Hotel, Los Angeles, California, USA 8 2018 (pp. 15-22). Springer International Publishing.

Ali H, Hariharan M, Yaacob S, Adom AH. Facial emotion recognition using empirical mode decomposition. Expert Systems with Applications. 2015 Feb 15;42(3):1261-77.

Liu ZT, Wu M, Cao WH, Mao JW, Xu JP, Tan GZ. Speech emotion recognition based on feature selection and extreme learning machine decision tree. Neurocomputing. 2018 Jan 17;273:271-80.

Acheampong FA, Wenyu C, Nunoo‐Mensah H. Text‐based emotion detection: Advances, challenges, and opportunities. Engineering Reports. 2020 Jul;2(7):e12189.

Sahoo C, Wankhade M, Singh BK. Sentiment analysis using deep learning techniques: a comprehensive review. International Journal of Multimedia Information Retrieval. 2023 Dec;12(2):41.

Atmaja BT, Sasou A. Sentiment analysis and emotion recognition from speech using universal speech representations. Sensors. 2022 Aug 24;22(17):6369.

Abbaschian BJ, Sierra-Sosa D, Elmaghraby A. Deep learning techniques for speech emotion recognition, from databases to models. Sensors. 2021 Feb 10;21(4):1249.

Khalil RA, Jones E, Babar MI, Jan T, Zafar MH, Alhussain T. Speech emotion recognition using deep learning techniques: A review. IEEE access. 2019 Aug 19;7:117327-45.

Tripathi A, Singh U, Bansal G, Gupta R, Singh AK. A review on emotion detection and classification using speech. InProceedings of the international conference on innovative computing & communications (ICICC) 2020 May 15.

Basu S, Chakraborty J, Bag A, Aftabuddin M. A review on emotion recognition using speech. In2017 International conference on inventive communication and computational technologies (ICICCT) 2017 Mar 10 (pp. 109-114). IEEE.

Li Y, Xia X, Jiang D, Sahli H, Jain R. MEMOS: A Multi-modal Emotion Stream Database for Temporal Spontaneous Emotional State Detection. InCompanion Publication of the 2020 International Conference on Multimodal Interaction 2020 Oct 25 (pp. 370-378).

Akçay MB, Oğuz K. Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Communication. 2020 Jan 1;116:56-76.

Nasim AS, Chowdory RH, Dey A, Das A. Recognizing Speech Emotion Based on Acoustic Features Using Machine Learning. In2021 International Conference on Advanced Computer Science and Information Systems (ICACSIS) 2021 Oct 23 (pp. 1-7). IEEE.

Yuan X, Wong WP, Lam CT. Speech emotion recognition using multi-layer perceptron classifier. In2022 IEEE 10th International Conference on Information, Communication and Networks (ICICN) 2022 Aug 23 (pp. 644-648). IEEE.

Ullah R, Asif M, Shah WA, Anjam F, Ullah I, Khurshaid T, Wuttisittikulkij L, Shah S, Ali SM, Alibakhshikenari M. Speech emotion recognition using convolution neural networks and multi-head convolutional transformer. Sensors. 2023 Jul 7;23(13):6212.

Arya R, Pandey D, Kalia A, Zachariah BJ, Sandhu I, Abrol D. Speech based emotion recognition using machine learning. In2021 IEEE Mysore Sub Section International Conference (MysuruCon) 2021 Oct 24 (pp. 613-617). IEEE.

Chen H, Liu Z, Kang X, Nishide S, Ren F. Investigating voice features for Speech emotion recognition based on four kinds of machine learning methods. In2019 IEEE 6th International Conference on Cloud Computing and Intelligence Systems (CCIS) 2019 Dec 19 (pp. 195-199). IEEE.

Yalamanchili B, Anne KR, Samayamantula SK. Speech emotion recognition using time distributed 2D-Convolution layers for CAPSULENETS. Multimedia Tools and Applications. 2022 May;81(12):16945-66.

Nandwani P, Verma R. A review on sentiment analysis and emotion detection from text. Social network analysis and mining. 2021 Dec;11(1):81.

Islam MU, Chaudhry BM. A framework to enhance user experience of older adults with speech-based intelligent personal assistants. IEEE Access. 2022 Dec 22;11:16683-99.

Huang L, Zhuang S, Wang K. A text normalization method for speech synthesis based on local attention mechanism. IEEE Access. 2020 Feb 18;8:36202-9.

Abramov S, Korotin A, Somov A, Burnaev E, Stepanov A, Nikolaev D, Titova MA. Analysis of video game players’ emotions and team performance: An esports tournament case study. IEEE Journal of Biomedical and Health Informatics. 2021 Oct 11;26(8):3597-606.

Nie W, Chang R, Ren M, Su Y, Liu A. I-GCN: Incremental graph convolution network for conversation emotion detection. IEEE Transactions on Multimedia. 2021 Oct 8;24:4471-81.

Lu Q, Sun X, Long Y, Gao Z, Feng J, Sun T. Sentiment analysis: Comprehensive reviews, recent advances, and open challenges. IEEE Transactions on Neural Networks and Learning Systems. 2023 Jul 21.

Xiang CZ, Fu NX, Gadekallu TR. Design of resource matching model of intelligent education system based on machine learning. EAI Endorsed Transactions on Scalable Information Systems. 2022 Feb 10;9(6):e1-.

Zhihan H, Yuan L, Jin T. Design of music training assistant system based on artificial intelligence. EAI Endorsed Transactions on Scalable Information Systems. 2022 Feb 11;9(6):e2-.

Singh R, Subramani S, Du J, Zhang Y, Wang H, Miao Y, Ahmed K. Antisocial Behavior Identification from Twitter Feeds Using Traditional Machine Learning Algorithms and Deep Learning. EAI Endorsed Transactions on Scalable Information Systems. 2023 May 12;10(4).

Wang C, Sun B, Du KJ, Li JY, Zhan ZH, Jeon SW, Wang H, Zhang J. A novel evolutionary algorithm with column and sub-block local search for sudoku puzzles. IEEE Transactions on Games. 2023 Jan 12.

Pichora-Fuller MK, Dupuis K. Toronto emotional speech set (TESS). Scholars Portal Dataverse. 2020 Feb 13;1:2020.

Al Dujaili MJ, Ebrahimi-Moghadam A, Fatlawi A. Speech emotion recognition based on SVM and KNN classifications fusion. International Journal of Electrical and Computer Engineering. 2021 Apr 1;11(2):1259.

Nassif AB, Shahin I, Attili I, Azzeh M, Shaalan K. Speech recognition using deep neural networks: A systematic review. IEEE access. 2019 Feb 1;7:19143-65.

Ferras M, Bourlard H. MLP-based factor analysis for tandem speech recognition. In2013 IEEE International Conference on Acoustics, Speech and Signal Processing 2013 May 26 (pp. 6719-6723). IEEE.

Taunk K, De S, Verma S, Swetapadma A. A brief review of nearest neighbor algorithm for learning and classification. In2019 international conference on intelligent computing and control systems (ICCS) 2019 May 15 (pp. 1255-1260). IEEE.

Thambi SV, Sreekumar KT, Kumar CS, Raj PR. Random forest algorithm for improving the performance of speech/non-speech detection. In2014 First International Conference on Computational Systems and Communications (ICCSC) 2014 Dec 17 (pp. 28-32). IEEE.

SÖNMEZ YÜ, VAROL A. In-depth investigation of speech emotion recognition studies from past to present The importance of emotion recognition from speech signal for AI. Intelligent Systems with Applications. 2024 Mar 11:200351.




How to Cite

Shah VN, Shah DR, Shetty MU, Krishnan D, Ravi V, Singh S. Investigation of Imbalanced Sentiment Analysis in Voice Data: A Comparative Study of Machine Learning Algorithms. EAI Endorsed Scal Inf Syst [Internet]. 2024 Apr. 22 [cited 2024 May 18];. Available from:



Research articles