SHELF: Combination of Shape Fitting and Heatmap Regression for Landmark Detection in Human Face
DOI:
https://doi.org/10.4108/eetinis.v10i3.3863Keywords:
facial landmarks, heatmap regression, shape fitting, coordination regressionAbstract
Today, facial emotion recognition is widely adopted in many intelligent applications including the driver monitoring system, the smart customer care as well as the e-learning system. In fact, the human emotions can be well represented by facial landmarks which are hard to be detected from images, due to the high number of discrete landmarks, the variation of shapes and poses of the human face in real world. Over decades, many methods have been proposed for facial landmark detection including the shape fitting, the coordinate regression such as ASMNet and AnchorFace. However, their performance is still limited for real-time applications in terms of both accuracy and efficiency. In this paper, we propose a novel method called SHELF which is the first to combine the shape fitting and heatmap regression approaches for landmark detection in human face. The heatmap model aims to generate the landmarks that fit to the common shapes. The method has been evaluated on three datasets 300W-Challenging, WFLW, 300VW-E with 31557 images and achieved a normalized mean error (NME) of 6.67% , 7.34%, 12.55% correspondingly, which overcomes most existing methods. For the first two datasets, the method is also comparable to the state of the art AnchorFace with a NME of 6.19%, 4.62%, respectively.
Downloads
References
Nam, N.V. and Quyen, N.T.N. (2023) Flash: Facial landmark detection using active shape model and heatmap regression. In The 9th EAI International Conference on Industrial Networks and Intelligent Systems.
He, K., Zhang, X., Ren, S. and Sun, J. (2016) Deep Residual Learning for Image Recognition. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’16 (IEEE): 770–778. doi:10.1109/CVPR.2016.90, URL http://ieeexplore. ieee.org/document/7780459. DOI: https://doi.org/10.1109/CVPR.2016.90
Tan, M. and Le, Q.V. (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In Chaudhuri, K. and Salakhutdinov, R. [eds.] Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA (PMLR), Proceedings of Machine Learning Research 97: 6105–6114. URL http://proceedings. mlr.press/v97/tan19a.html.
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A. and Chen, L. (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018 (Computer Vision Foundation / IEEE Computer Society): 4510– 4520. doi:10.1109/CVPR.2018.00474. DOI: https://doi.org/10.1109/CVPR.2018.00474
Ma, N., Zhang, X., Zheng, H.T. and Sun, J. (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Ferrari, V., Hebert, M., Sminchisescu, C. and Weiss, Y. [eds.] Computer Vision – ECCV 2018 (Cham: Springer International Publishing): 122–138. DOI: https://doi.org/10.1007/978-3-030-01264-9_8
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M. et al. (2021) An image is worth 16x16 words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021 (OpenReview.net). URL https://openreview.net/forum?id=YicbFdNTTy.
Cootes, T., Baldock, E. and Graham, J. (2000) An introduction to active shape models. Image processing and analysis 328: 223–248.
Cootes, T.F., Edwards, G.J. and Taylor, C.J. (1998) Active appearance models. In Burkhardt, H. and Neumann, B. [eds.] Computer Vision — ECCV’98 (Berlin, Heidelberg: Springer Berlin Heidelberg): 484–498. DOI: https://doi.org/10.1007/BFb0054760
Cristinacce, D. and Cootes, T. (2006) Feature detection and tracking with constrained local models. 41: 929– 938. doi:10.5244/C.20.95. DOI: https://doi.org/10.5244/C.20.95
Asthana, A., Zafeiriou, S., Cheng, S. and Pantic, M. (2013) Robust discriminative response map fitting with constrained local models. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, CVPR ’13 (USA: IEEE Computer Society): 3444–3451. doi:10.1109/CVPR.2013.442, URL https://doi.org/10.1109/CVPR.2013.442. DOI: https://doi.org/10.1109/CVPR.2013.442
Liu, Y., Jourabloo, A., Ren, W. and Liu, X. (2017) Dense face alignment. In Proceedings of the IEEE International Conference on Computer Vision Workshops: 1619–1628. DOI: https://doi.org/10.1109/ICCVW.2017.190
Dong, X., Yan, Y., Ouyang, W. and Yang, Y. (2018) Style aggregated network for facial landmark detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition: 379–388. DOI: https://doi.org/10.1109/CVPR.2018.00047
Zhao, Y., Liu, Y., Shen, C., Gao, Y. and Xiong, S. (2019) Mobilefan: Transferring deep hidden representation for face alignment. Pattern Recognition 100: 107114. doi:10.1016/j.patcog.2019.107114. DOI: https://doi.org/10.1016/j.patcog.2019.107114
Xu, Z., Li, B., Yuan, Y. and Geng, M. (2021) Anchorface: An anchor-based facial landmark detector across large poses. In Proceedings of the AAAI Conference on Artificial Intelligence, 35: 3092–3100. DOI: https://doi.org/10.1609/aaai.v35i4.16418
Jolliffe, I.T. and Cadima, J. (2016) Principal component analysis: a review and recent developments. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374(2065): 20150202. DOI: https://doi.org/10.1098/rsta.2015.0202
Trigeorgis, G., Snape, P., Nicolaou, M.A., Antonakos, E. and Zafeiriou, S. (2016) Mnemonic descent method: A recurrent process applied for end-to-end face alignment. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Los Alamitos, CA, USA: IEEE Computer Society): 4177–4187. doi:10.1109/CVPR.2016.453, URL https://doi.ieeecomputersociety.org/10.1109/CVPR.2016.453. DOI: https://doi.org/10.1109/CVPR.2016.453
Lv, J., Shao, X., Xing, J., Cheng, C. and Zhou, X. (2017) A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 3691–3700. doi:10.1109/CVPR.2017.393. DOI: https://doi.org/10.1109/CVPR.2017.393
Feng, Z., Kittler, J., Awais, M., Huber, P. and Wu, X. (2018) Wing loss for robust facial landmark localisation with convolutional neural networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Los Alamitos, CA, USA: IEEE Computer Society): 2235–2245. doi:10.1109/CVPR.2018.00238, URL https://doi.ieeecomputersociety.org/10. 1109/CVPR.2018.00238. DOI: https://doi.org/10.1109/CVPR.2018.00238
Wang, X., Bo, L. and Fuxin, L. (2019) Adaptive wing loss for robust face alignment via heatmap regression. In The IEEE International Conference on Computer Vision (ICCV). DOI: https://doi.org/10.1109/ICCV.2019.00707
Xiong, Y., Zhou, Z., Dou, Y. and Su, Z. (2021) Gaussian Vector: An Efficient Solution for Facial Landmark Detection, 70–87. doi:10.1007/978-3-030-69541-5_5. DOI: https://doi.org/10.1007/978-3-030-69541-5_5
Huang, Y., Yang, H., Li, C., Kim, J. and Wei, F. (2021) Adnet: Leveraging error-bias towards normal direction in face alignment. 2021 IEEE/CVF International Conference on Computer Vision (ICCV) : 3060–3070. DOI: https://doi.org/10.1109/ICCV48922.2021.00307
Wu, W., Qian, C., Yang, S., Wang, Q., Cai, Y. and Zhou, Q. (2018) Look at boundary: A boundary-aware face alignment algorithm. In CVPR. DOI: https://doi.org/10.1109/CVPR.2018.00227
Fard, A.P., Abdollahi, H. and Mahoor, M.H. (2021) Asmnet: A lightweight deep neural network for face alignment and pose estimation. In IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2021, virtual, June 19-25, 2021 (Computer Vision Foundation / IEEE): 1521–1530. doi:10.1109/CVPRW53098.2021.00168.
Newell, A., Yang, K. and Deng, J. (2016) Stacked hourglass networks for human pose estimation. In Leibe, B., Matas, J., Sebe, N. and Welling, M. [eds.] Computer Vision – ECCV 2016 (Cham: Springer International Publishing): 483–499. DOI: https://doi.org/10.1007/978-3-319-46484-8_29
Le, V., Brandt, J., Lin, Z., Bourdev, L. and Huang, T.S. (2012) Interactive facial feature localization. In Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y. and Schmid, C. [eds.] Computer Vision – ECCV 2012 (Berlin, Heidelberg: Springer Berlin Heidelberg): 679–692. DOI: https://doi.org/10.1007/978-3-642-33712-3_49
Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J. and Kumar, N. (2013) Localizing parts of faces using a consensus of exemplars. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(12): 2930–2940. doi:10.1109/TPAMI.2013.23. DOI: https://doi.org/10.1109/TPAMI.2013.23
Köstinger, M., Wohlhart, P., Roth, P.M. and Bischof, H. (2011) Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops): 2144–2151. doi:10.1109/ICCVW.2011.6130513. DOI: https://doi.org/10.1109/ICCVW.2011.6130513
Fard, A.P., Abdollahi, H. and Mahoor, M. (2021) Asmnet: A lightweight deep neural network for face alignment and pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition: 1521–1530. DOI: https://doi.org/10.1109/CVPRW53098.2021.00168
Dong, X., Yu, S.I., Weng, X., Wei, S.E., Yang, Y. and Sheikh, Y. (2018) Supervision-by-Registration: An unsupervised approach to improve the precision of facial landmark detectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 360–368. DOI: https://doi.org/10.1109/CVPR.2018.00045
Zhu, S., Li, C., Loy, C.C. and Tang, X. (2015) Face alignment by coarse-to-fine shape searching. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR): 4998–5006. doi:10.1109/CVPR.2015.7299134. DOI: https://doi.org/10.1109/CVPR.2015.7299134
Miao, X., Zhen, X., Liu, X., Deng, C., Athitsos, V. and Huang, H. (2018) Direct shape regression networks for end-to-end face alignment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). DOI: https://doi.org/10.1109/CVPR.2018.00529
Xiao, S., Feng, J., Liu, L., Nie, X., Wang, W., Yan, S. and Kassim, A. (2017) Recurrent 3d-2d dual learning for large-pose facial landmark detection. In 2017 IEEE International Conference on Computer Vision (ICCV): 1642–1651. doi:10.1109/ICCV.2017.181. DOI: https://doi.org/10.1109/ICCV.2017.181
Honari, S., Yosinski, J., Vincent, P. and Pal, C. (2016) Recombinator networks: Learning coarse-to-fine feature aggregation. In Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on (IEEE). DOI: https://doi.org/10.1109/CVPR.2016.619
Kumar, A. and Chellappa, R. (2018) Disentangling 3d pose in a dendritic cnn for unconstrained 2d face alignment. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Los Alamitos, CA, USA: IEEE Computer Society): 430–439. doi:10.1109/CVPR.2018.00052, URL https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00052. DOI: https://doi.org/10.1109/CVPR.2018.00052
Ding, H., Zhou, P. and Chellappa, R. (2020) Occlusion adaptive deep network for robust facial expression recognition. In 2020 IEEE International Joint Conference on Biometrics (IJCB) (IEEE Press): 1–9. doi:10.1109/IJCB48548.2020.9304923, URL https://doi.org/10.1109/IJCB48548.2020.9304923. DOI: https://doi.org/10.1109/IJCB48548.2020.9304923
Cao, X., Wei, Y., Wen, F. and Sun, J. (2012) Face alignment by explicit shape regression. In 2012 IEEE Conference on Computer Vision and Pattern Recognition: 2887–2894. doi:10.1109/CVPR.2012.6248015. DOI: https://doi.org/10.1109/CVPR.2012.6248015
Xiong, X. and De la Torre, F. (2013) Supervised descent method and its applications to face alignment. In 2013 IEEE Conference on Computer Vision and Pattern Recognition: 532–539. doi:10.1109/CVPR.2013.75. DOI: https://doi.org/10.1109/CVPR.2013.75
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 EAI Endorsed Transactions on Industrial Networks and Intelligent Systems
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
This is an open-access article distributed under the terms of the Creative Commons Attribution CC BY 3.0 license, which permits unlimited use, distribution, and reproduction in any medium so long as the original work is properly cited.