Student Behavior Detection in Classroom Environments Using Deep Learning Models
DOI:
https://doi.org/10.4108/airo.11141Keywords:
Computer Vision, Student Behavior Detection, YOLO, ByteTrack, RT-DETRAbstract
Observing student behavior in classroom environments provides valuable insights into attention and engagement. However, manual monitoring is labor-intensive and often inconsistent, particularly in large lecture settings. This study proposes a deep learning–based framework for automatic student behavior analysis in real classroom scenarios. A dedicated dataset was constructed, consisting of 3,373 images annotated with five attention-related behaviors—“Focused”, “Raising Hand”, “Distracted”, “Sleeping”, and “Using Phone”—totaling 9,659 labeled instances. The dataset captures diverse real-world conditions, including variations in classroom layout, camera viewpoints, and occlusion. We systematically evaluated stateof-the-art object detection models, including YOLOv8–YOLOv12 and the Real-Time Detection Transformer (RT-DETR). Experimental results show that YOLOv8m achieved the highest localization accuracy (mAP@0.5 = 0.920), YOLOv11s/m demonstrated the best overall performance (mAP@0.5:0.95 = 0.726), and RT-DETRX achieved the highest F1-score (0.886). Notably, larger model size does not necessarily translate into better performance. In addition to accuracy, inference speed was evaluated to assess real-time applicability. Lightweight models such as YOLOv11s achieved a favorable balance between performance and efficiency, enabling real-time processing on resource-constrained hardware. Furthermore, YOLOv11s—with high Precision (0.890) and only 9.4M parameters—was integrated with ByteTrackV2 to perform behavior tracking and temporal analysis in classroom environments. This enables the generation of behavior distribution charts that provide interpretable insights into student engagement over time. These findings demonstrate the potential of automated behavior recognition systems for classroom analytics and data-driven teaching improvement.
Downloads
References
[1] Yu M., Xu J., Zhong J., Liu W. and Cheng W. (2017) Behavior detection and analysis for learning process in classroom environment. In 2017 IEEE Frontiers in Education Conference (FIE), pp. 1–4. IEEE.
[2] Yin Albert C.C., Sun Y., Li G., Peng J., Ran F., Wang Z. and Zhou J. (2022) Identifying and monitoring students’ classroom learning behavior based on multisource information. Mobile Information Systems, 2022(1), 9903342.
[3] Lin L., Yang H., Xu Q., Xue Y. and Li D. (2024) Research on student classroom behavior detection based on the real-time detection transformer algorithm. Applied Sciences, 14(14).
[4] Li Y., Qi X., Saudagar A.K.J., Badshah A.M., Muhammad K. and Liu S. (2023) Student behavior recognition for interaction detection in the classroom environment. Image and Vision Computing, 136, 104726.
[5] W. Song, T. He, H. Zhang, Z. Shi, H. Chen, C. Shangguan, and C. Hao, “Intelligent recognition and analysis of student behavior in real-classroom scenarios: a comprehensive survey, exploration, and future perspectives,” Journal of King Saud University Computer and Information Sciences, 2026.
[6] Yang F., Wang T. and Wang X. (2023) Student classroom behavior detection based on YOLOv7+ BRA and multimodel fusion. In International Conference on Image and Graphics, pp. 41–52. Springer Nature Switzerland.
[7] Chen H., Zhou G. and Jiang H. (2023) Student behavior detection in the classroom based on improved YOLOv8. Sensors, 23(20), 8385.
[8] Sheng X., Li S. and Chan S. (2025) Real-time classroom student behavior detection based on improved YOLOv8s. Scientific Reports, 15(1), 14470.
[9] A. Abozeid, I. Alrashdi, and R. M. Al-Makhlasawy, “Intelligent recognition of students’ behavior for smart learning environments,” Scientific Reports, 2026.
[10] T. Li, J. Wang, C. Xu, B. Xu, N. An, and J. Zhang, “CBHA-DETR: multi-kernel attention and deformable fusion network for behavior recognition in classroom monitoring,” Multimedia Systems, vol. 32, no. 2, p. 112, 2026.
[11] Lin F.C., Ngo H.H., Dow C.R., Lam K.H. and Le H.L. (2021) Student behavior recognition system for the classroom environment based on skeleton pose estimation and person detection. Sensors, 21(16), 5314.
[12] Jocher G., Chaurasia A. and Qiu J. (2023) Ultralytics YOLOv8. Version 8.0.0. Available at: https://github. com/ultralytics/ultralytics
[13] Wang C.-Y., Yeh I.-H. and Liao H.-Y.M. (2024) YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv:2402.13616. Available at: https://arxiv.org/abs/2402.13616
[14] Wang A., Chen H., Liu L., Chen K., Lin Z., Han J. and Ding G. (2024) YOLOv10: Real-Time End-to-End Object Detection. arXiv:2405.14458. Available at: https: //arxiv.org/abs/2405.14458
[15] Wang C.-Y. and Liao H.-Y.M. (2024) YOLOv11: An Overview of the Key Architectural Enhancements. arXiv:2410.17725. Available at: https://arxiv.org/ abs/2410.17725
[16] Tian Y., Ye Q. and Doermann D. (2025) YOLOv12: Attention-Centric Real-Time Object Detectors. arXiv:2502.12524. Available at: https://arxiv.org/ abs/2502.12524
[17] Zhao Y., Lv W., Xu S., Wei J., Wang G., Dang Q., Liu Y. and Chen J. (2024) DETRs beat YOLOs on real-time object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16965– 16974.
[18] Zhang Y., Sun P., Jiang Y., Yu D., Weng F., Yuan Z., Luo P., Liu W. and Wang X. (2022) ByteTrack: Multi-object tracking by associating every detection box. In European Conference on Computer Vision, pp. 1–21. Springer Nature Switzerland.
[19] Yang F. (2023) SCB-dataset: A dataset for detecting student classroom behavior. arXiv:2304.02488. Available at: https: //arxiv.org/abs/2304.02488
[20] Yang F. and Wang T. (2023) SCB-dataset3: A benchmark for detecting student classroom behavior. arXiv:2310.02522. Available at: https://arxiv.org/abs/2310.02522
[21] H. Hukkelås and F. Lindseth, “Does image anonymization impact computer vision training?,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 140–150, 2023.
[22] Y. Xue, V. Chinapah, and C. Zhu, “A comparative analysis of AI privacy concerns in higher education: News coverage in China and Western countries,” Education Sciences, vol. 15, no. 6, p. 650, 2025. DOI: 10.3390/educsci15060650.
[23] M. Otani, R. Togashi, Y. Nakashima, E. Rahtu, J. Heikkilä, and S. Satoh, “Optimal correction cost for object detection evaluation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 21107– 21115, 2022.
[24] R. Sajja, Y. Sermet, D. Cwiertny, and I. Demir, “Integrating AI and learning analytics for data-driven pedagogical decisions and personalized interventions in education,” Technology, Knowledge and Learning, pp. 1–31, 2025.
[25] L. Cabral, R. Pinto, and G. Gonçalves, “AI-powered learning analytics dashboards: a systematic review of applications, techniques, and research gaps,” Discover Education, vol. 4, no. 1, p. 525, 2025.
[26] Q. Liu, X. Jiang, and R. Jiang, “Classroom behavior recognition using computer vision: A systematic review,” Sensors, vol. 25, no. 2, p. 373, 2025.
[27] R. Yang, T. Tian, and J. Tian, “Versatile teacher: A class-aware teacher–student framework for cross-domain adaptation,” Pattern Recognition, vol. 158, p. 111024, 2025.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Thi-Nguyen Nguyen, Dinh-Thai Kim, Anh-Phuong Pham

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
This is an open access article distributed under the terms of the CC BY-NC-SA 4.0, which permits copying, redistributing, remixing, transformation, and building upon the material in any medium so long as the original work is properly cited.