SHELF: Combination of Shape Fitting and Heatmap Regression for Landmark Detection in Human Face

Authors

  • Ngo Thi Ngoc Quyen Viettel Cyberspace Center
  • Tran Duy Linh Viettel Cyberspace Center
  • Vu Hong Phuc Viettel Cyberspace Center
  • Nguyen Van Nam Thuyloi University image/svg+xml

DOI:

https://doi.org/10.4108/eetinis.v10i3.3863

Keywords:

facial landmarks, heatmap regression, shape fitting, coordination regression

Abstract

Today, facial emotion recognition is widely adopted in many intelligent applications including the driver monitoring system, the smart customer care as well as the e-learning system. In fact, the human emotions can be well represented by facial landmarks which are hard to be detected from images, due to the high number of discrete landmarks, the variation of shapes and poses of the human face in real world. Over decades, many methods have been proposed for facial landmark detection including the shape fitting, the coordinate regression such as ASMNet and AnchorFace. However, their performance is still limited for real-time applications in terms of both accuracy and efficiency. In this paper, we propose a novel method called SHELF which is the first to combine the shape fitting and heatmap regression approaches for landmark detection in human face. The heatmap model aims to generate the landmarks that fit to the common shapes. The method has been evaluated on three datasets 300W-Challenging, WFLW, 300VW-E with 31557 images and achieved a normalized mean error (NME) of 6.67% , 7.34%, 12.55% correspondingly, which overcomes most existing methods. For the first two datasets, the method is also comparable to the state of the art AnchorFace with a NME of 6.19%, 4.62%, respectively.

Downloads

Download data is not yet available.

Author Biographies

Ngo Thi Ngoc Quyen, Viettel Cyberspace Center

Viettel Cyberspace Center (VTCC), Viettel Group, 7 Alley, TonThatThuyet Street, CauGiay district, Hanoi,
Vietnam

Tran Duy Linh, Viettel Cyberspace Center

Viettel Cyberspace Center (VTCC), Viettel Group, 7 Alley, TonThatThuyet Street, CauGiay district, Hanoi,
Vietnam

Vu Hong Phuc, Viettel Cyberspace Center

Viettel Cyberspace Center (VTCC), Viettel Group, 7 Alley, TonThatThuyet Street, CauGiay district, Hanoi,
Vietnam

Nguyen Van Nam, Thuyloi University

Viettel Cyberspace Center (VTCC), Viettel Group, 7 Alley, TonThatThuyet Street, CauGiay district, Hanoi,
Vietnam

References

Nam, N.V. and Quyen, N.T.N. (2023) Flash: Facial landmark detection using active shape model and heatmap regression. In The 9th EAI International Conference on Industrial Networks and Intelligent Systems.

He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’16 (IEEE): 770–778. doi:10.1109/CVPR.2016.90, URL http://ieeexplore. ieee.org/document/7780459. DOI: https://doi.org/10.1109/CVPR.2016.90

Tan, M. and Le, Q.V. (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In Chaudhuri, K. and Salakhutdinov, R. [eds.] Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA (PMLR), Proceedings of Machine Learning Research 97: 6105–6114. URL http://proceedings. mlr.press/v97/tan19a.html.

Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A. and Chen, L. (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018 (Computer Vision Foundation / IEEE Computer Society): 4510– 4520. doi:10.1109/CVPR.2018.00474. DOI: https://doi.org/10.1109/CVPR.2018.00474

Ma, N., Zhang, X., Zheng, H.T. and Sun, J. (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y. [eds.] Computer Vision – ECCV 2018 (Cham: Springer International Publishing): 122–138. DOI: https://doi.org/10.1007/978-3-030-01264-9_8

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M. et al. (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 (OpenReview.net). URL https://openreview.net/forum?id=YicbFdNTTy.

Cootes, T., Baldock, E. and Graham, J. (2000) An introduction to active shape models. Image processing and analysis 328: 223–248.

Cootes, T.F., Edwards, G.J. and Taylor, C.J. (1998) Active appearance models. In Burkhardt, H. and Neumann, B. [eds.] Computer Vision — ECCV’98 (Berlin, Heidelberg: Springer Berlin Heidelberg): 484–498. DOI: https://doi.org/10.1007/BFb0054760

Cristinacce, D. and Cootes, T. (2006) Feature detection and tracking with constrained local models. 41: 929– 938. doi:10.5244/C.20.95. DOI: https://doi.org/10.5244/C.20.95

Asthana, A., Zafeiriou, S., Cheng, S. and Pantic, M. (2013) Robust discriminative response map fitting with constrained local models. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’13 (USA: IEEE Computer Society): 3444–3451. doi:10.1109/CVPR.2013.442, URL https://doi.org/10.1109/CVPR.2013.442. DOI: https://doi.org/10.1109/CVPR.2013.442

Liu, Y., Jourabloo, A., Ren, W. and Liu, X. (2017) Dense face alignment. In Proceedings of the IEEE International Conference on Computer Vision Workshops: 1619–1628. DOI: https://doi.org/10.1109/ICCVW.2017.190

Dong, X., Yan, Y., Ouyang, W. and Yang, Y. (2018) Style aggregated network for facial landmark detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition: 379–388. DOI: https://doi.org/10.1109/CVPR.2018.00047

Zhao, Y., Liu, Y., Shen, C., Gao, Y. and Xiong, S. (2019) Mobilefan: Transferring deep hidden representation for face alignment. Pattern Recognition 100: 107114. doi:10.1016/j.patcog.2019.107114. DOI: https://doi.org/10.1016/j.patcog.2019.107114

Xu, Z., Li, B., Yuan, Y. and Geng, M. (2021) Anchorface: An anchor-based facial landmark detector across large poses. In Proceedings of the AAAI Conference on Artificial Intelligence, 35: 3092–3100. DOI: https://doi.org/10.1609/aaai.v35i4.16418

Jolliffe, I.T. and Cadima, J. (2016) Principal component analysis: a review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374(2065): 20150202. DOI: https://doi.org/10.1098/rsta.2015.0202

Trigeorgis, G., Snape, P., Nicolaou, M.A., Antonakos, E. and Zafeiriou, S. (2016) Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Los Alamitos, CA, USA: IEEE Computer Society): 4177–4187. doi:10.1109/CVPR.2016.453, URL https://doi.ieeecomputersociety.org/10.1109/CVPR.2016.453. DOI: https://doi.org/10.1109/CVPR.2016.453

Lv, J., Shao, X., Xing, J., Cheng, C. and Zhou, X. (2017) A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 3691–3700. doi:10.1109/CVPR.2017.393. DOI: https://doi.org/10.1109/CVPR.2017.393

Feng, Z., Kittler, J., Awais, M., Huber, P. and Wu, X. (2018) Wing loss for robust facial landmark localisation with convolutional neural networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Los Alamitos, CA, USA: IEEE Computer Society): 2235–2245. doi:10.1109/CVPR.2018.00238, URL https://doi.ieeecomputersociety.org/10. 1109/CVPR.2018.00238. DOI: https://doi.org/10.1109/CVPR.2018.00238

Wang, X., Bo, L. and Fuxin, L. (2019) Adaptive wing loss for robust face alignment via heatmap regression. In The IEEE International Conference on Computer Vision (ICCV). DOI: https://doi.org/10.1109/ICCV.2019.00707

Xiong, Y., Zhou, Z., Dou, Y. and Su, Z. (2021) Gaussian Vector: An Efficient Solution for Facial Landmark Detection, 70–87. doi:10.1007/978-3-030-69541-5_5. DOI: https://doi.org/10.1007/978-3-030-69541-5_5

Huang, Y., Yang, H., Li, C., Kim, J. and Wei, F. (2021) Adnet: Leveraging error-bias towards normal direction in face alignment. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) : 3060–3070. DOI: https://doi.org/10.1109/ICCV48922.2021.00307

Wu, W., Qian, C., Yang, S., Wang, Q., Cai, Y. and Zhou, Q. (2018) Look at boundary: A boundary-aware face alignment algorithm. In CVPR. DOI: https://doi.org/10.1109/CVPR.2018.00227

Fard, A.P., Abdollahi, H. and Mahoor, M.H. (2021) Asmnet: A lightweight deep neural network for face alignment and pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2021, virtual, June 19-25, 2021 (Computer Vision Foundation / IEEE): 1521–1530. doi:10.1109/CVPRW53098.2021.00168.

Newell, A., Yang, K. and Deng, J. (2016) Stacked hourglass networks for human pose estimation. In Leibe, B., Matas, J., Sebe, N. and Welling, M. [eds.] Computer Vision – ECCV 2016 (Cham: Springer International Publishing): 483–499. DOI: https://doi.org/10.1007/978-3-319-46484-8_29

Le, V., Brandt, J., Lin, Z., Bourdev, L. and Huang, T.S. (2012) Interactive facial feature localization. In Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y. and Schmid, C. [eds.] Computer Vision – ECCV 2012 (Berlin, Heidelberg: Springer Berlin Heidelberg): 679–692. DOI: https://doi.org/10.1007/978-3-642-33712-3_49

Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J. and Kumar, N. (2013) Localizing parts of faces using a consensus of exemplars. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(12): 2930–2940. doi:10.1109/TPAMI.2013.23. DOI: https://doi.org/10.1109/TPAMI.2013.23

Köstinger, M., Wohlhart, P., Roth, P.M. and Bischof, H. (2011) Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops): 2144–2151. doi:10.1109/ICCVW.2011.6130513. DOI: https://doi.org/10.1109/ICCVW.2011.6130513

Fard, A.P., Abdollahi, H. and Mahoor, M. (2021) Asmnet: A lightweight deep neural network for face alignment and pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition: 1521–1530. DOI: https://doi.org/10.1109/CVPRW53098.2021.00168

Dong, X., Yu, S.I., Weng, X., Wei, S.E., Yang, Y. and Sheikh, Y. (2018) Supervision-by-Registration: An unsupervised approach to improve the precision of facial landmark detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 360–368. DOI: https://doi.org/10.1109/CVPR.2018.00045

Zhu, S., Li, C., Loy, C.C. and Tang, X. (2015) Face alignment by coarse-to-fine shape searching. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 4998–5006. doi:10.1109/CVPR.2015.7299134. DOI: https://doi.org/10.1109/CVPR.2015.7299134

Miao, X., Zhen, X., Liu, X., Deng, C., Athitsos, V. and Huang, H. (2018) Direct shape regression networks for end-to-end face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DOI: https://doi.org/10.1109/CVPR.2018.00529

Xiao, S., Feng, J., Liu, L., Nie, X., Wang, W., Yan, S. and Kassim, A. (2017) Recurrent 3d-2d dual learning for large-pose facial landmark detection. In 2017 IEEE International Conference on Computer Vision (ICCV): 1642–1651. doi:10.1109/ICCV.2017.181. DOI: https://doi.org/10.1109/ICCV.2017.181

Honari, S., Yosinski, J., Vincent, P. and Pal, C. (2016) Recombinator networks: Learning coarse-to-fine feature aggregation. In Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on (IEEE). DOI: https://doi.org/10.1109/CVPR.2016.619

Kumar, A. and Chellappa, R. (2018) Disentangling 3d pose in a dendritic cnn for unconstrained 2d face alignment. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Los Alamitos, CA, USA: IEEE Computer Society): 430–439. doi:10.1109/CVPR.2018.00052, URL https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00052. DOI: https://doi.org/10.1109/CVPR.2018.00052

Ding, H., Zhou, P. and Chellappa, R. (2020) Occlusion adaptive deep network for robust facial expression recognition. In 2020 IEEE International Joint Conference on Biometrics (IJCB) (IEEE Press): 1–9. doi:10.1109/IJCB48548.2020.9304923, URL https://doi.org/10.1109/IJCB48548.2020.9304923. DOI: https://doi.org/10.1109/IJCB48548.2020.9304923

Cao, X., Wei, Y., Wen, F. and Sun, J. (2012) Face alignment by explicit shape regression. In 2012 IEEE Conference on Computer Vision and Pattern Recognition: 2887–2894. doi:10.1109/CVPR.2012.6248015. DOI: https://doi.org/10.1109/CVPR.2012.6248015

Xiong, X. and De la Torre, F. (2013) Supervised descent method and its applications to face alignment. In 2013 IEEE Conference on Computer Vision and Pattern Recognition: 532–539. doi:10.1109/CVPR.2013.75. DOI: https://doi.org/10.1109/CVPR.2013.75

Downloads

Published

26-09-2023

How to Cite

Quyen, N. T. N., Linh, T. D., Phuc, V. H., & Nam, N. V. (2023). SHELF: Combination of Shape Fitting and Heatmap Regression for Landmark Detection in Human Face. EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, 10(3), e3. https://doi.org/10.4108/eetinis.v10i3.3863