Hardware-Aware INT8 Quantization and FPGA Deployment of MobileNetV2 for Real-Time Facial Landmark Detection

Van-Khoa Pham; Manh-Dung Do; Trung-Nghia Dang; Long Tran

doi:10.4108/airo.10070

Authors

Van-Khoa Pham Ho Chi Minh City University of Technology and Engineering https://orcid.org/0000-0002-6129-5856
Manh-Dung Do Ho Chi Minh City University of Technology and Engineering
Trung-Nghia Dang Ho Chi Minh City University of Technology and Engineering
Long Tran Ho Chi Minh City University of Technology and Engineering

DOI:

https://doi.org/10.4108/airo.10070

Keywords:

facial landmark detection, MobileNetV2, quantization-aware training, post-training quantization, edge computing, AMD/Xilinx Kria KV260

Abstract

Facial landmark detection is a key component of always-on edge vision systems, but practical deployment requires balancing localization accuracy, model size, throughput, and power consumption. This study proposes a two-stage, hardware-aware framework for MobileNetV2-based facial landmark detection. In Stage I, a lightweight detector is developed in PyTorch and evaluated in FP32 and INT8 using post-training quantization (PTQ) and quantization-aware training (QAT). In Stage II, the quantized model is realized on the AMD/Xilinx Kria KV260 FPGA and assessed in terms of real-time throughput, power consumption, and hardware resource utilization. INT8 quantization reduces the model size from 6.59 MB to 1.65 MB, and QAT retains accuracy more effectively than PTQ (91.74% vs. 90.94%) relative to the FP32 baseline (92.42%). The hardware implementation achieves approximately 30 FPS at approximately 3 W while using 14.8% of LUTs, 8.1% of FFs, 16.3% of BRAM, and 4.5% of DSPs. Among the evaluated platforms, the KV260 delivers the highest measured energy efficiency, whereas the RTX 4060 delivers the highest throughput. Within the evaluated setup, these results support the practicality of explicitly separating software-stage quantization analysis from hardware-stage realization for real-time, low-power facial landmark detection on FPGA-based edge platforms.

Downloads

Download data is not yet available.

References

[1] Bonomi F, Milito R, Zhu J, Addepalli S. Fog computing and its role in the Internet of Things. In: Proceedings of the First ACM Workshop on Mobile Cloud Computing; 2012; p. 13-16.

[2] Satyanarayana M. The emergence of edge computing. Computer. 2017;50(1):30-39.

[3] Singh KD, Singh P. Fog cloud computing and IoT integration for AI-enabled autonomous systems in robotics. EAI Endorsed Trans AI and Robotics. 2024;3.

[4] National Highway Traffic Safety Administration. Traffic safety facts research note: drowsy driving. Washington (DC): U.S. Department of Transportation; 2017. Report No.: DOT HS 812 446.

[5] Uddin NMI, et al. The face detection/recognition, perspective and obstacles in robotic: a review. EAI Endorsed Trans AI and Robotics. 2022;1(1):e14.

[6] Guo X, Li S, Yu J, Zhang J, Ma J, Ma L, Liu W, Ling H. PFLD: a practical facial landmark detector. arXiv [Preprint]. 2019;arXiv:1902.10859.

[7] Yang H, Mou W, Zhang Y, Patras I, Gunes H, Robinson P. Face alignment assisted by head pose estimation. In: Proceedings of the British Machine Vision Conference; 2015.

[8] Wu W, Qian C, Yang S, Wang Q, Cai Y, Zhou Q. Look at boundary: a boundary-aware face alignment algorithm. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018; p. 2129-2138.

[9] Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C. MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018;. p. 4510-4520.

[10] Krishnamoorthi R. Quantizing deep convolutional networks for efficient inference: a white paper. arXiv [Preprint]. 2018;arXiv:1806.08342.

[11] Khan WZ, Ahmed E, Hakak S, Yaqoob I, Ahmed A. Edge computing: a survey. Future Gener Comput Syst. 2019; 97: 219-235.

[12] Nagel M, van Baalen M, Blankevoort T, Welling M. Data-free quantization through weight equalization and bias correction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision; 2019;. p. 1325-1334.

[13] Zhao R, Hu Y, Dotzel J, De Sa C, Zhang Z. Improving neural network quantization without retraining using outlier channel splitting. In: Proceedings of the 36th International Conference on Machine Learning; 2019;. p. 7543-7552.

[14] Jacob B, Kligys S, Chen B, Zhu M, Tang M, Howard A, Adam H, Kalenichenko D. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2018;. p. 2704-2713.

[15] Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays; 2015;. p. 161-170.

[16] Nurvitadhi E, Venkatesh G, Sim J, Marr D, Huang R, Ong JGH, Liew YT, Srivatsan K, Moss D, Subhaschandra S, Boudoukh G. Can FPGAs beat GPUs in accelerating next-generation deep neural networks? In: Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays; 2017;. p. 5-14.

[17] Tragoudaras A, Stoikos P, Fanaras K, Tziouvaras A, Floros G, Dimitriou G, Kolomvatsos K, Stamoulis G. Design space exploration of a sparse MobileNetV2 using high-level synthesis and sparse matrix techniques on FPGAs. Sensors. 2022;22(12):4318.

[18] Sutjiadi R, Sendari S, Herwanto HW, Kristian Y. Leveraging synthetic mammograms to enhance deep-learning performance for breast cancer classification using EfficientNetV2L architecture. EAI Endorsed Trans AI and Robotics. 2025;4:. doi:10.4108/airo.9749.

[19] Nagel M, Fournarakis M, Amjad RA, Bondarenko Y, van Baalen M, Blankevoort T. A white paper on neural network quantization. arXiv [Preprint]. 2021;arXiv:2106.08295.

[20] Cai Y, Yao Z, Dong Z, Gholami A, Mahoney MW, Keutzer K. ZeroQ: a novel zero-shot quantization framework. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2020;. p. 13169-13178.

[21] Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y. Quantized neural networks: training neural networks with low-precision weights and activations. J Mach Learn Res. 2018;18(187):1-30.

[22] He Y, Liu J, Wu W, Zhou H, Zhuang B. EfficientDM: efficient quantization-aware fine-tuning of low-bit diffusion models. In: Proceedings of the International Conference on Learning Representations; 2024.

[23] Wuraola A, Patel N. Resource efficient activation functions for neural network accelerators. Neurocomputing. 2022;482:163-185. doi:10.1016/j.neucom.2021.11.032.

[24] Jourabloo A, Liu X. Pose-invariant 3D face alignment. In: Proceedings of the IEEE International Conference on Computer Vision; 2015.

[25] Kumar A, Chellappa R. Disentangling 3D pose in a dendritic CNN for unconstrained 2D face alignment. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018.

[26] Wu H, Judd P, Zhang X, Isaev M, Micikevicius P. Integer quantization for deep learning inference: principles and empirical evaluation. arXiv [Preprint]. 2020;arXiv:2004.09602.

[27] AMD. Vitis AI user guide (UG1414). Version 3.5.: AMD; 2023 Sep 28.

[28] Weng O. Neural network quantization for efficient inference: a survey. arXiv [Preprint]. 2021;arXiv:2112.06126.

[29] Tanama F, Xiang H, Cheung WK, Liang X, Zhou B. Quantized distillation: optimizing driver activity recognition models for resource-constrained environments. arXiv [Preprint]. 2023;arXiv:2311.05970.

[30] Hariharan M, Varior R, Karunakaran P. Real-time driver monitoring systems on edge AI device. arXiv [Preprint]. 2023;arXiv:2304.01555.

[31] Ghasemi M, Samie F, Ejlali A. A TinyML-based driver drowsiness detection system using a novel low-power CNN architecture. IEEE Trans Circuits Syst II Exp Briefs. 2022;69(7):3257-3261.

[32] Ma N, Zhang X, Zheng H-T, Sun J. ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision; 2018;. p. 116-131.

[33] Cech J, Soukupova T. Real-time eye blink detection using facial landmarks. Prague: Center for Machine Perception, Department of Cybernetics, Faculty of Electrical Engineering, Czech Technical University in Prague; 2016. p. 1-8.

Hardware-Aware INT8 Quantization and FPGA Deployment of MobileNetV2 for Real-Time Facial Landmark Detection

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

Scopus CiteScore

SCimago

Latest publications

Information