SHELF: Combination of Shape Fitting and Heatmap Regression for Landmark Detection in Human Face

Ngo Thi Ngoc Quyen; Tran Duy Linh; Vu Hong Phuc; Nguyen Van Nam

doi:10.4108/eetinis.v10i3.3863

Authors

Ngo Thi Ngoc Quyen Viettel Cyberspace Center
Tran Duy Linh Viettel Cyberspace Center
Vu Hong Phuc Viettel Cyberspace Center
Nguyen Van Nam Thuyloi University , Thuyloi University

DOI:

https://doi.org/10.4108/eetinis.v10i3.3863

Keywords:

facial landmarks, heatmap regression, shape fitting, coordination regression

Abstract

Today, facial emotion recognition is widely adopted in many intelligent applications including the driver monitoring system, the smart customer care as well as the e-learning system. In fact, the human emotions can be well represented by facial landmarks which are hard to be detected from images, due to the high number of discrete landmarks, the variation of shapes and poses of the human face in real world. Over decades, many methods have been proposed for facial landmark detection including the shape fitting, the coordinate regression such as ASMNet and AnchorFace. However, their performance is still limited for real-time applications in terms of both accuracy and efficiency. In this paper, we propose a novel method called SHELF which is the first to combine the shape fitting and heatmap regression approaches for landmark detection in human face. The heatmap model aims to generate the landmarks that fit to the common shapes. The method has been evaluated on three datasets 300W-Challenging, WFLW, 300VW-E with 31557 images and achieved a normalized mean error (NME) of 6.67% , 7.34%, 12.55% correspondingly, which overcomes most existing methods. For the first two datasets, the method is also comparable to the state of the art AnchorFace with a NME of 6.19%, 4.62%, respectively.

Downloads

Download data is not yet available.

Author Biographies

Ngo Thi Ngoc Quyen, Viettel Cyberspace Center

Viettel Cyberspace Center (VTCC), Viettel Group, 7 Alley, TonThatThuyet Street, CauGiay district, Hanoi,
Vietnam
Tran Duy Linh, Viettel Cyberspace Center

Viettel Cyberspace Center (VTCC), Viettel Group, 7 Alley, TonThatThuyet Street, CauGiay district, Hanoi,
Vietnam
Vu Hong Phuc, Viettel Cyberspace Center

Viettel Cyberspace Center (VTCC), Viettel Group, 7 Alley, TonThatThuyet Street, CauGiay district, Hanoi,
Vietnam
Nguyen Van Nam, Thuyloi University, Thuyloi University

Viettel Cyberspace Center (VTCC), Viettel Group, 7 Alley, TonThatThuyet Street, CauGiay district, Hanoi,
Vietnam

References

Nam, N.V. and Quyen, N.T.N. (2023) Flash: Facial landmark detection using active shape model and heatmap regression. In The 9th EAI International Conference on Industrial Networks and Intelligent Systems.

He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’16 (IEEE): 770–778. doi:10.1109/CVPR.2016.90, URL http://ieeexplore. ieee.org/document/7780459.

Tan, M. and Le, Q.V. (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In Chaudhuri, K. and Salakhutdinov, R. [eds.] Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA (PMLR), Proceedings of Machine Learning Research 97: 6105–6114. URL http://proceedings. mlr.press/v97/tan19a.html.

Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A. and Chen, L. (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018 (Computer Vision Foundation / IEEE Computer Society): 4510– 4520. doi:10.1109/CVPR.2018.00474.

Ma, N., Zhang, X., Zheng, H.T. and Sun, J. (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y. [eds.] Computer Vision – ECCV 2018 (Cham: Springer International Publishing): 122–138.

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M. et al. (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 (OpenReview.net). URL https://openreview.net/forum?id=YicbFdNTTy.

Cootes, T., Baldock, E. and Graham, J. (2000) An introduction to active shape models. Image processing and analysis 328: 223–248.

Cootes, T.F., Edwards, G.J. and Taylor, C.J. (1998) Active appearance models. In Burkhardt, H. and Neumann, B. [eds.] Computer Vision — ECCV’98 (Berlin, Heidelberg: Springer Berlin Heidelberg): 484–498.

Cristinacce, D. and Cootes, T. (2006) Feature detection and tracking with constrained local models. 41: 929– 938. doi:10.5244/C.20.95.

Asthana, A., Zafeiriou, S., Cheng, S. and Pantic, M. (2013) Robust discriminative response map fitting with constrained local models. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’13 (USA: IEEE Computer Society): 3444–3451. doi:10.1109/CVPR.2013.442, URL https://doi.org/10.1109/CVPR.2013.442.

Liu, Y., Jourabloo, A., Ren, W. and Liu, X. (2017) Dense face alignment. In Proceedings of the IEEE International Conference on Computer Vision Workshops: 1619–1628.

Dong, X., Yan, Y., Ouyang, W. and Yang, Y. (2018) Style aggregated network for facial landmark detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition: 379–388.

Zhao, Y., Liu, Y., Shen, C., Gao, Y. and Xiong, S. (2019) Mobilefan: Transferring deep hidden representation for face alignment. Pattern Recognition 100: 107114. doi:10.1016/j.patcog.2019.107114.

Xu, Z., Li, B., Yuan, Y. and Geng, M. (2021) Anchorface: An anchor-based facial landmark detector across large poses. In Proceedings of the AAAI Conference on Artificial Intelligence, 35: 3092–3100.

Jolliffe, I.T. and Cadima, J. (2016) Principal component analysis: a review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374(2065): 20150202.

Trigeorgis, G., Snape, P., Nicolaou, M.A., Antonakos, E. and Zafeiriou, S. (2016) Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Los Alamitos, CA, USA: IEEE Computer Society): 4177–4187. doi:10.1109/CVPR.2016.453, URL https://doi.ieeecomputersociety.org/10.1109/CVPR.2016.453.

Lv, J., Shao, X., Xing, J., Cheng, C. and Zhou, X. (2017) A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 3691–3700. doi:10.1109/CVPR.2017.393.

Feng, Z., Kittler, J., Awais, M., Huber, P. and Wu, X. (2018) Wing loss for robust facial landmark localisation with convolutional neural networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Los Alamitos, CA, USA: IEEE Computer Society): 2235–2245. doi:10.1109/CVPR.2018.00238, URL https://doi.ieeecomputersociety.org/10. 1109/CVPR.2018.00238.

Wang, X., Bo, L. and Fuxin, L. (2019) Adaptive wing loss for robust face alignment via heatmap regression. In The IEEE International Conference on Computer Vision (ICCV).

Xiong, Y., Zhou, Z., Dou, Y. and Su, Z. (2021) Gaussian Vector: An Efficient Solution for Facial Landmark Detection, 70–87. doi:10.1007/978-3-030-69541-5_5.

Huang, Y., Yang, H., Li, C., Kim, J. and Wei, F. (2021) Adnet: Leveraging error-bias towards normal direction in face alignment. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) : 3060–3070.

Wu, W., Qian, C., Yang, S., Wang, Q., Cai, Y. and Zhou, Q. (2018) Look at boundary: A boundary-aware face alignment algorithm. In CVPR.

Fard, A.P., Abdollahi, H. and Mahoor, M.H. (2021) Asmnet: A lightweight deep neural network for face alignment and pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2021, virtual, June 19-25, 2021 (Computer Vision Foundation / IEEE): 1521–1530. doi:10.1109/CVPRW53098.2021.00168.

Newell, A., Yang, K. and Deng, J. (2016) Stacked hourglass networks for human pose estimation. In Leibe, B., Matas, J., Sebe, N. and Welling, M. [eds.] Computer Vision – ECCV 2016 (Cham: Springer International Publishing): 483–499.

Le, V., Brandt, J., Lin, Z., Bourdev, L. and Huang, T.S. (2012) Interactive facial feature localization. In Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y. and Schmid, C. [eds.] Computer Vision – ECCV 2012 (Berlin, Heidelberg: Springer Berlin Heidelberg): 679–692.

Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J. and Kumar, N. (2013) Localizing parts of faces using a consensus of exemplars. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(12): 2930–2940. doi:10.1109/TPAMI.2013.23.

Köstinger, M., Wohlhart, P., Roth, P.M. and Bischof, H. (2011) Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops): 2144–2151. doi:10.1109/ICCVW.2011.6130513.

Fard, A.P., Abdollahi, H. and Mahoor, M. (2021) Asmnet: A lightweight deep neural network for face alignment and pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition: 1521–1530.

Dong, X., Yu, S.I., Weng, X., Wei, S.E., Yang, Y. and Sheikh, Y. (2018) Supervision-by-Registration: An unsupervised approach to improve the precision of facial landmark detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 360–368.

Zhu, S., Li, C., Loy, C.C. and Tang, X. (2015) Face alignment by coarse-to-fine shape searching. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 4998–5006. doi:10.1109/CVPR.2015.7299134.

Miao, X., Zhen, X., Liu, X., Deng, C., Athitsos, V. and Huang, H. (2018) Direct shape regression networks for end-to-end face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Xiao, S., Feng, J., Liu, L., Nie, X., Wang, W., Yan, S. and Kassim, A. (2017) Recurrent 3d-2d dual learning for large-pose facial landmark detection. In 2017 IEEE International Conference on Computer Vision (ICCV): 1642–1651. doi:10.1109/ICCV.2017.181.

Honari, S., Yosinski, J., Vincent, P. and Pal, C. (2016) Recombinator networks: Learning coarse-to-fine feature aggregation. In Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on (IEEE).

Kumar, A. and Chellappa, R. (2018) Disentangling 3d pose in a dendritic cnn for unconstrained 2d face alignment. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Los Alamitos, CA, USA: IEEE Computer Society): 430–439. doi:10.1109/CVPR.2018.00052, URL https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00052.

Ding, H., Zhou, P. and Chellappa, R. (2020) Occlusion adaptive deep network for robust facial expression recognition. In 2020 IEEE International Joint Conference on Biometrics (IJCB) (IEEE Press): 1–9. doi:10.1109/IJCB48548.2020.9304923, URL https://doi.org/10.1109/IJCB48548.2020.9304923.

Cao, X., Wei, Y., Wen, F. and Sun, J. (2012) Face alignment by explicit shape regression. In 2012 IEEE Conference on Computer Vision and Pattern Recognition: 2887–2894. doi:10.1109/CVPR.2012.6248015.

Xiong, X. and De la Torre, F. (2013) Supervised descent method and its applications to face alignment. In 2013 IEEE Conference on Computer Vision and Pattern Recognition: 532–539. doi:10.1109/CVPR.2013.75.

SHELF: Combination of Shape Fitting and Heatmap Regression for Landmark Detection in Human Face

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

References

Downloads

Published

Issue

Section

License

How to Cite

Make a Submission

Scopus_CiteScore

Latest publications