Human Emotion Recognition with an Advanced Vision Transformer Model
DOI:
https://doi.org/10.4108/eetcasa.8101Keywords:
facial expression, facial emotion detection, face recognition, Vision Transformer, ViT, EffectiveViT-M5Abstract
This paper proposes a novel deep-learning technique that leverages the Efficient Vision Transformer –M5 (Efficient ViT-M5) model to improve the existing design by offering a more computationally economical version that maintains good performance, making it highly suitable for practical applica-tions. The utilization of transfer learning involved leveraging pre-trained weights from the ImageNet dataset, substantially enhancing the model's accu-racy and efficiency. The proposed method involves training the advanced Effi-cientViTM5 model utilizing three widely recognized facial emotion recognition datasets: FER2013+, AffectNet, and RAF-DB. A comprehensive data augmentation pipeline is employed to enhance the diversity of the training data and bolster the model's robustness. The trained proposed model proved exceptional accuracy rates of 94.28% (FER2013+), 94.69% (AffectNet), and 97.76% (RAF-DB). The results emphasize the strength and effectiveness of the proposed model in identifying face emotions in various datasets, showcasing its potential for practical use in emotion-aware computing, security, and health diagnostics. The research significantly improves facial emotion recognition by introducing a reliable and practical way of recognizing emotions using cutting-edge deep learning techniques. The results show the possibility of enhancing and flexible interactions between humans and computers, highlighting the efficacy of sophisticated deep learning models in addressing complex computer vision problems.
References
[1] S. Singh and F. Nasoz, "Facial Emotion recognition with Convolutional Neural Networks," 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 2020, pp. 0324–0328,
[2] R. Jaiswal, "Facial Expression Classification Using Convolutional Neural Networking and Its Applications," 2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS), RUPNAGAR, India, 2020, pp. 437-442,
[3] Shao, Jie, and Qian Ye, “Three Convolutional Neural Network Models for Facial Emotion recognition in the Wild.” Neurocomputing, vol. 355, Aug. 2019, pp. 82–92,
[4] Jain, Deepak Kumar, et al, “Extended Deep Neural Network for Facial Emotion Recognition.” Pattern Recognition Letters, vol. 120, Apr. 2019, pp. 69–74,
[5] K. Mohan, A. Seal, O. Krejcar and A. Yazidi, "Facial Emotion recognition Using Local Gravitational Force Descriptor-Based Deep Convolution Neural Networks," in IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–12, 2021, Art no. 5003512,
[6] Min Wu, et al. “Two-Stage Fuzzy Fusion Based-Convolution Neural Network for Dynamic Emotion Recognition.” IEEE Transactions on Affective Computing, vol. 13, no. 2, Apr. 2022, pp. 805–17,
[7] Choi Jae Young, and Bumshik Lee. “Combining Deep Convolutional Neural Networks With Stochastic Ensemble Weight Optimization for Facial Emotion recognition in the Wild.” IEEE Transactions on Multimedia, vol. 25, Jan. 2023, pp. 100–11,
[8] Jianghai Lan, et al. “Emotion recognition Based on Multi-Regional Coordinate Attention Residuals.” IEEE Access, vol. 11, Jan. 2023, pp. 63863–73,
[9] Sun-Hee Kim, et al. “Facial Emotion recognition Using a Temporal Ensemble of Multi-Level Convolutional Neural Networks.” IEEE Transactions on Affective Computing, vol. 13, no. 1, Jan. 2022, pp. 226–37,
[10] Han, Ziyang, et al., “Face Merged Generative Adversarial Network with Tripartite Adversaries.” Neurocomputing, vol. 368, Nov. 2019, pp. 188–96,
[11] Dharanya V., et al., “Facial Emotion recognition Through Person-wise Regeneration of Expressions Using Auxiliary Classifier Generative Adversarial Network (AC-GAN) Based Model.” Journal of Visual Communication and Image Representation, vol. 77, May 2021, p. 103110,
[12] L. Yang, Y. Tian, Y. Song, N. Yang, K. Ma, and L. Xie, “A Novel Feature Separation Model exchange-GAN for Facial Emotion recognition,” Knowledge-Based Systems, vol. 204, p. 106217, Sep. 2020,
[13] Yifan Xia, et al. “Local and Global Perception Generative Adversarial Network for Facial Expression Synthesis.” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 3, Mar. 2022, pp. 1443–52.
[14] Zhe Sun, et al. “A Discriminatively Deep Fusion Approach With Improved Conditional GAN (im-cGAN) for Facial Emotion recognition.” Pattern Recognition, vol. 135, Mar. 2023, p. 109157.
[15] Daeha Kim, et al. “Towards the Adversarial Robustness of Facial Emotion recognition: Facial Attention-aware Adversarial Training.” Neurocomputing, vol. 584, June 2024, p. 127588.
[16] Fangzheng Huang, et al. “Cyclic Style Generative Adversarial Network for Near Infrared and Visible Light Face Recognition.” Applied Soft Computing, vol. 150, Jan. 2024, p. 111096.
[17] A. S. Rokkones, M. Z. Uddin and J. Torresen, "Facial Emotion recognition Using Robust Local Directional Strength Pattern Features and Recurrent Neural Network," 2019 IEEE 9th International Conference on Consumer Electronics (ICCE-Berlin), Berlin, Germany, 2019, pp. 283-288,
[18] H. Liu, J. Zeng and S. Shan, "Facial Emotion recognition for In-the-wild Videos," 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG, 2020), Buenos Aires, Argentina, 2020, pp. 615-618,
[19] Pau Rodriguez, et al. “Deep Pain: Exploiting Long Short-Term Memory Networks for Facial Expression Classification.” IEEE Transactions on Cybernetics, vol. 52, no. 5, May 2022, pp. 3314–24.
[20] Manalu, Haposan Vincentius, and Achmad Pratama Rifai. “Detection of Human Emotions Through Facial Expressions Using Hybrid Convolutional Neural Network-recurrent Neural Network Algorithm.” Intelligent Systems With Applications, vol. 21, Mar. 2024, p. 200339.
[21] Wissam J. Baddar, Sangmin Lee, et al. “On-the-Fly Facial Expression Prediction Using LSTM Encoded Appearance-Suppressed Dynamics.” IEEE Transactions on Affective Computing, vol. 13, no. 1, Jan. 2022, pp. 159–74.
[22] Tong Zhang, et al. “Spatial–Temporal Recurrent Neural Network for Emotion Recognition.” IEEE Transactions on Cybernetics, vol. 49, no. 3, Mar. 2019, pp. 839–47.
[23] M. K. Chowdary, T. N. Nguyen, and D. J. Hemanth, “Deep learning-based facial emotion recognition for human–computer interaction applications,” Neural Computing and Applications, vol. 35, no. 32, pp. 23311–23328, Apr. 2021,
[24] S. Shaees, H. Naeem, M. Arslan, M. R. Naeem, S. H. Ali and H. Aldabbas, "Facial Emotion Recognition Using Transfer Learning," 2020 International Conference on Computing and Information Technology (ICCIT-1441), Tabuk, Saudi Arabia, 2020, pp. 1–5,
[25] Soyeon Hong, et al. “Cross-Modal Dynamic Transfer Learning for Multimodal Emotion Recognition.” IEEE Access, vol. 12, Jan. 2024, pp. 14324–33.
[26] Hyeongjin Kim, Byoung Chul Ko, et al. “Facial Emotion recognition in the Wild Using Face Graph and Attention.” IEEE Access, vol. 11, Jan. 2023, pp. 59774–87.
[27] X. Xu, Z. Ruan and L. Yang, "Facial Emotion recognition Based on Graph Neural Network," 2020 IEEE 5th International Conference on Image, Vision and Computing (ICIVC), Beijing, China, 2020, pp. 211-214,
[28] Mojtaba Kolahdouzi, et al. “FaceTopoNet: Facial Emotion recognition Using Face Topology Learning.” IEEE Transactions on Artificial Intelligence, vol. 4, no. 6, Dec. 2023, pp. 1526–39.
[29] Ning Sun, et al. “Appearance and Geometry Transformer for Facial Emotion recognition in the Wild.” Computers & Electrical Engineering, vol. 107, Apr. 2023, p. 108583.
[30] S. Zhang, X. Pan, Y. Cui, X. Zhao, and L. Liu, "Learning Affective Video Features for Facial Emotion recognition via Hybrid Deep Learning," in IEEE Access, vol. 7, pp. 32297–32304, 2019,
[31] Muzaffer Aslan. “CNN Based Efficient Approach for Emotion Recognition.” Journal of King Saud University. Computer and Information Sciences/Maǧalaẗ Ǧamʼaẗ Al-malīk Saud: Ùlm Al-ḥasib Wa Al-maʼlumat, vol. 34, no. 9, Oct. 2022, pp. 7335–46.
[32] Yucel Cimtay, et al. “Cross-Subject Multimodal Emotion Recognition Based on Hybrid Fusion.” IEEE Access, vol. 8, Jan. 2020, pp. 168865–78.
[33] Jun Shao, and Tien D. Bui. “Wavelet-based Multi-level Generative Adversarial Networks for Face Aging.” Computer Vision and Image Understanding, vol. 223, Oct. 2022, p. 103524.
[34] Islam, Md. Rabiul, et al. “Emotion Recognition From EEG Signal Focusing on Deep Learning and Shallow Learning Techniques.” IEEE Access, vol. 9, Jan. 2021, pp. 94601–24.
[35] Understanding Transfer Learning for Deep Learning (December 07, 2023)
[36] Dosovitskiy, Alexey, et al. “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.” arXiv.org, Oct. 2020.
[37] A. Mollahosseini, B. Hasani and M. H. Mahoor, "AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild," in IEEE Transactions on Affective Computing, vol. 10, no. 1, pp. 18-31, 1 Jan.-March 2019.
[38] Barsoum, E., Zhang, C., Ferrer, C. C., & Zhang, Z. (2016b). Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution. arXiv (Cornell University).
[39] S. Li, W. Deng and J. Du, "Reliable Crowdsourcing and Deep Locality-Preserving Learning for Expression Recognition in the Wild," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 2584-2593.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 EAI Endorsed Transactions on Context-aware Systems and Applications

This work is licensed under a Creative Commons Attribution 3.0 Unported License.
This is an open-access article distributed under the terms of the Creative Commons Attribution CC BY 3.0 license, which permits unlimited use, distribution, and reproduction in any medium so long as the original work is properly cited.