A spatio-temporal attention fusion model for students behaviour recognition

Authors

  • Xiaoli Wang SanMenXia College of Social Administration

DOI:

https://doi.org/10.4108/eai.3-9-2021.170905

Keywords:

student behavior, spatio-temporal attention, channel information, multi-spatial attention, CNN

Abstract

Student behavior analysis can reflect students' learning situation in real time, which provides an important basis for optimizing classroom teaching strategies and improving teaching methods. It is an important task for smart classroom to explore how to use big data to detect and recognize students behavior. Traditional recognition methods have some defects, such as low efficiency, edge blur, time-consuming, etc. In this paper, we propose a new students behaviour recognition
method based on spatio-temporal attention fusion model. It makes full use of key spatio-temporal information of video, the problem of spatio-temporal information redundancy is solved. Firstly, the channel attention mechanism is introduced into the spatio-temporal network, and the channel information is calibrated by modeling the dependency relationship between feature channels. It can improve the expression ability of features. Secondly, a time attention model based on convolutional neural network (CNN) is proposed, which uses fewer parameters to learn the attention score of each frame, focusing on the frames with obvious behaviour amplitude. Meanwhile, a multi-spatial attention model is presented to calculate the attention score of each position in each frame from different angles, extract several saliency areas of behaviour, and fuse the spatio-temporal features to further enhance the feature representation of video. Finally, the fused features are input into the classification network, and the behaviour recognition results are obtained by combining the two output streams according to different weights. Experiment results on HMDB51, UCF101 datasets and eight typical classroom behaviors of students show that the proposed method can effectively recognize the behaviours in videos. The accuracy of HMDB51 is higher than 90%, that of UCF101 and real data are higher than 90%.

Downloads

Published

03-09-2021

How to Cite

1.
Wang X. A spatio-temporal attention fusion model for students behaviour recognition. EAI Endorsed Scal Inf Syst [Internet]. 2021 Sep. 3 [cited 2024 Nov. 14];9(34):e1. Available from: https://publications.eai.eu/index.php/sis/article/view/353