A Systematic Literature Review on the Accuracy of Face Recognition Algorithms

Real-time facial recognition systems have been increasingly used, making it relevant to address the accuracy of these systems given the credibility and trust they must offer. Therefore, this article seeks to identify the algorithms currently used by facial recognition systems through a Systematic Literature Review that considers recent scientific articles, published between 2018 and 2021. From the initial collection of ninety-three articles, a subset of thirteen was selected after applying the inclusion and exclusion procedures. One of the outstanding results of this research corresponds to the use of algorithms based on Artificial Neural Networks (ANN) considered in 21% of the solutions, highlighting the use of Convolutional Neural Network (CNN). Another relevant result is the identification of the use of the Viola-Jones algorithm, present in 19% of the solutions. In addition, from this research, two specific facial recognition solutions associated with access control were found considering the principles of the Internet of Things, one being applied to access control to environments and the other applied to smart cities.


Introduction
Facial recognition is a prominent Artificial Intelligence (AI) solution [1] characterized by a multidisciplinarity of concepts, resources and technologies, covering studies on pattern recognition, image processing, computer vision and artificial neural networks (ANN) [2]. Its applications are present in different segments of industry and business, such as biometrics, access control systems, security and surveillance systems, among others [3].
For [4] facial recognition must identify patterns such as the shape of the face, nose, mouth, distance from the eyes, among others. For [5], the collection of these data is categorized into non-intrusive methods, as it is performed without the need for direct interaction with individuals, just a focused camera capable of capturing their image and a * Corresponding author. Email: rogeriorossi8@gmail.com system to perform facial recognition and subsequent identification.
According to [6] the facial recognition process considers three phases: 1) Face detection -responsible for identifying and locating the image as a human face; 2) Feature extraction -deals with the vectorization of the human face, extracting and converting the patterns of the individual's face into data; and 3) Facial recognition -when the recognition test takes place, that is, the comparison of the data obtained (phases 1 and 2) with the database for later decision-making on the identity of that human face. In summary, the recognition process is based on identifying, verifying and recognizing the human face as represented in Figure 1.
Among the characteristics of a facial recognition system, possibly the most relevant is the identification of a person and the guarantee that this person belongs to a database used as a repository of faces to carry out the verification. For this, facial recognition systems depend on models that offer and prove to have low error rates, which characterizes the accuracy of these systems. Accuracy is relevant to ensure the success of systems based on facial recognition technologies, because the closer to the 100% accuracy rate, the better is the result of the system.

Figure 1. Facial Recognition Process
Ref. [7] interpreted accuracy as the degree of proximity between the tested values and the reference or true value, and its precision is related to the distribution of values to the mean. Thus, accuracy consists of a statistical quantity at the value of the parameter, which is the true value or reference, and its precision is intrinsically associated with the dispersion of the observed distributions.
However, accuracy measures the percentage of chance that the registered faces will be recognized by the system when the person is identified in a database. Therefore, the measured accuracy should approach the rate of 100%, favoring the reliability of the system for its users, since the accuracy presents the results from facial recognition systems to the users.
From this introductory context and by a Systematic Literature Review (SLR), the objective of this article is to present which are the current algorithms used to treat, in real time, the accuracy of facial recognition systems. To present the research results, this article is organized as follows: section two presents the methodological procedures for carrying out the SLR; section three presents the results from the data extracted according to the queries carried out; section four presents the analysis and discussion of the results according to the research questions defined for the SLR; and, finally, section five presents the conclusions.

Research Methodology
A Systematic Literature Review (SLR) is investigative research that aims to approach knowledge about a specific topic, starting with the identification of the state of the art, then understanding existing gaps and concluding with the creation of arguments useful to propose future research [8].
This type of research is generally comprehensive and should not be biased in its preparation, as it considers all the relevant articles on the central theme of the research; these may be present in books, periodicals, historical records, government reports, theses, dissertations, among others [9].
Ref. [8] proposed a protocol for carrying out an SLR divided into three phases: 1) Planning -dedicated to developing research questions based on the objective of the article; 2) Conduction -dedicated to data collection; and 3) Presentation of results -according to the parameterization determined in the two previous phases. In this work, phases 1 and 2 are detailed in this section and phase 3 is detailed in section 3.

Planning Phase
The purpose of this SLR is to map scientific articles on the accuracy of facial recognition algorithms in real time and identify responses that can collaborate with the research objective presented for this article.
For this, two research questions were defined: Q1. What are the most used algorithms for real-time facial recognition? Q2. What are the accuracy rates of real-time facial recognition algorithms?
The criteria for inclusion and exclusion are fundamental steps within an SLR [8]. For the present SLR, these criteria include the period of publication of the articles, considering the articles published between 2018 and 2021 and that present information capable of answering the research questions. It was also defined that the data sources for obtaining the articles are the following scientific article data sources: Elsevier, Springer Link, World Scientific and IEEE Xplore Digital Library.
In summary, the inclusion criteria defined for this SLR are: 1) period of publication of articles between the years 2018 and 2021; 2) belong to one of the scientific article databases; and 3) present studies on the accuracy of a specific algorithm for facial recognition.
Likewise, the exclusion criteria are defined considering all the situations that exclude the inclusion criteria.

Conduction Phase
Solutions based on real-time facial recognition technology need accuracy in order not to compromise the operation and also ensure that legal norms and rules are obeyed. It is important to highlight that such solutions are subject to the same dependencies as human eyes. Some dependencies that can be considered are: viewing angle, distance from the target, sensitivity to lighting and, the dependency on a database, which will give an image with close similarity to the one stored internally.
In order not to limit the search and to obtain the greatest possible breadth, the search for articles was performed using the following strings: "face recognition in real time" AND "accuracy". An adjustment was also made for the period, considering exactly the period between January  The final result is given after applying the inclusion criteria that allowed identifying a subset of thirteen articles, highlighted as results of this SLR.

Research Results
Based on the observations of the criteria defined for including and excluding articles, a subset of thirteen articles was identified, which are presented in Table 2. These are subject to a thorough analysis, and a summary of each of them is presented further on. The analysis and discussion of the results based on what is seen in Table 2 and the summaries in this section are highlighted in section 4.
Among the 13 articles identified in the Conduction Phase, the different real-time facial recognition techniques presented by each, which result in different levels of accuracy, are addressed. These techniques and other relevant points of each article are highlighted in their abstracts, as shown below.
Ref. [10] (item 1 of Table 2) used the HAAR Cascade algorithm for face recognition, and the Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) statistical data classifiers to build the methods of facial recognition. From the experiment, they performed comparative tests, and concluded that both have the same level of accuracy of 98%, which is the best result presented for the classification and recognition of faces.
Ref. [11] (item 2 of Table 2) developed a system whose objective is to control and to automate the process of checking students in classrooms. The image repository for the system is created separately and upon registration. The authors used the HAAR Cascade algorithm to train face detection in real time, using videos of classrooms where students were present. The best test result was 50% accuracy.
Ref. [12] (item 3 of Table 2) proposed the use of computer vision applied to surveillance systems that consider the Internet of Things (IoT) for real-time facial recognition. The experiment was implemented on Raspberry Pi 2 or 3 hardware, using the Viola-Jones algorithm for training with the Gradient Oriented Histogram (HOG) feature, with the objective of extracting facial features by deep learning. The result of the experiment, which is based on deep learning, showed better accuracy in facial identification and subsequent recognition of people through video stream.
Ref. [13] (item 4 of Table 2) carried out studies and development of experiments in order to meet the needs of COVID-19 in the period between 2020 and 2021: control student access to virtual classrooms, typical of Distance Learning systems. The authors used detection and recognition techniques to control and authenticate the presence of these students. For this, it was proposed to use the ADAboost and Face Counter algorithms added to the Convolutional Neural Network (CNN) for face detection. The method used was Three-Level Wavelet Decomposition Principal Component Analysis with Mahalanobis Distance (3WPCA-MD), which showed better response time compared to PCA. The accuracy of the experiment to find the facsimile images in the database was 98%.
Ref. [14] (item 5 of Table 2) proposed a facial recognition system using a Deep Neural Network (DNN) face detector with accuracy superior to current standards. Three classifiers were applied, and for Support Vector Machine (SVM) and Multilayer Perceptron Networks (MLP), the PCA and LDA classification algorithms were associated; in the case of the third classifier, CNN, the images were fed directly. The test results reached 87%, 86.5% and 98% accuracy in their respective self-generated databases. An important highlight is presented regarding external factors -such as lighting and camera quality -since these interfere with the accuracy of the final result. In this case, the CNN classifier continues to present better accuracy results, being 89% versus 56% for the other classifiers.
Ref. [15] (item 6 of Table 2) innovated with the proposal to add pedestrian movement recognition to the entire facial recognition process, in order to improve the performance and accuracy of the system. With the application of the dual approach to identify the individual, there was a reduction in the classification of false positives, and the accuracy of the solution reached 77.4% compared to 67.7% in solutions that use only one approach.
Ref. [16] (item 7 of Table 2), motivated by the COVID-19 pandemic that started in 2020, proposed methods capable of improving the accuracy in the process of recognizing a person's identity and detecting the use of a face mask in humans. This is a useful solution to check the use of masks in public places, applicable and necessary in schools, hospitals and companies. The study used HAAR Cascade and MobileNet to perform face detection, adding the cosine distance method.
Subsequently, in the facial recognition process, the Visual Geometry Group   Ref. [17] (item 8 of Table 2) presented a proposal to use the Local Binary Pattern Histogram (LBPH) algorithm for real-time facial recognition, in conditions in which the resolution of the image captured by the camera is low resolution -according to the study; the rate of 35px is considered low. The authors created their own database, named LR500, to train and to classify faces in different positions and situations. The results from the tests were 94% for 45px resolution and 90% for 35px resolution.
Ref. [18] (item 9 of Table 2) highlighted that the objective of their research is to study the importance of response time for the operational success of the system and also consider this factor as a major challenge for a real-time face recognition system. Thus, the research was based on the simplest process to encode facial recognition, following 3 steps: a) face detection, b) feature selection, and c) facial recognition. Their results and conclusions highlight several methodologies for each step and the CNN algorithm corresponds to the state of the art for face detection. For feature selection, the challenge is to separate and to remove redundant features; the tools currently useful are Redundancy-constrained Features Selection (RCFS), minimum Redundancy -Maximum Relevance (mRMR) and Global Redundancy Minimization (GRM). To conclude, in the case of facial recognition, the LBPH method proved to be better adapted, responding better to the system variables, such as: lighting, distance from the object, age of the person.
Ref. [19] (item 10 of Table 2) developed a solution capable of improving the accuracy of facial recognition identification of very small images, typical of intelligent security control systems. To do so, the researchers used an CNN as a basis and expanded the concept to a new architecture, which is now known as the Deep Convolutional Neural Network (DEEP-CNN) model. This new model has more than 30 convolution layers as its main feature. The results are considered satisfactory by the researchers, as they surpass any other existing method based on the accuracy of identification.
Ref. [20] (item 11 of Table 2) studied face recognition in video streams, typical of security and access control systems. For that, a system was developed based on the Viola-Jones algorithm, used to detect people in a sequence of video images and local binary templates using the Python programming language to classify the detected people, and an Open-Source Computer Vision library (OpenCV). The testing of the developed system showed results in approximately 93% of recognizable people when processing video stream from a webcam in real time.
Ref. [21] (item 12 of Table 2) described a feature and classification scheme to recognize the face in low quality videos. The technique consists in using the Viola-Jones algorithm to detect the faces in the video frames. Once the face part is detected from the input frame, the second phase starts, which corresponds to the resolution of the face part detected using the super face resolution method based on interpolation. In the third phase, face recognition is performed. The features representing the facial parts are extracted using the Local Directional Pattern (LDP) also called Wavelet-based Local Directional Pattern (SW-LDP) by using scattering and wavelet transform, since the LDP does not consider the directional strength of the edge pixels. Finally, the proposed Fractional Krill-Lion (Fractional-KL) algorithm, based on Actor Critic Neural Network (ACNN) or Krill-Lion Actor Critic Neural Network (KL-ACNN), was applied and tested on a standard FAMED database. The results were satisfactory, and the best accuracy result corresponded to 95%.
Ref. [22] (item 13 of Table 2) highlighted that the technologies used in a smart city improve the quality of life, but pointed out that there may be an increase in information security risks resulting from the great demand caused, for example, by access control systems used by different services in the city. They also considered that Internet of Things (IoT) oriented systems are the basis for the success of smart cities cases, directing special attention to the authentication system, in addition to suggesting the use of the NIST 800-63 standard that can be used in surveillance cameras, both for detection and for real-time facial recognition.
The articles selected and presented in Table 2 were identified for recognizing the minimum requirements to compose their inclusion. The next step of the SLR considers the analysis and discussion of the results based on the readings and summaries presented in this section to highlight the information of interest according to the proposed research questions and that are analysed and discussed in section 4.

Analysis and Discussion
After completing the selection of articles, the stage of analysis and discussion of the results is carried out, followed by the understanding from these analyses that favor the argumentation and proposition of future considerations.
To deal with the analysis of the results and to present the discussions that allow a better understanding of the research objective, this section initially presents some common characteristics of the selected articles and then highlights the analysis of the results according to the research questions that were defined for the SLR.

Selected articles and their common characteristics
The analysis of the selected articles allows verifying the similarities between the proposals for algorithmic solutions for real-time facial recognition, which result in different levels of accuracy. A summary of the relevant characteristics of the identified solutions are presented in Table 3. The characteristics are categorized as follows: C1the solution presents the result of accuracy; C2the solution uses neural networks in its construction; and C3the solution is directly associated with the Internet of Things (IoT).
When analysing the characteristics and particularities of each solution, relevant evidence regarding the solutions is that they use a composition of different technologies. This situation has multiple purposes; however, the most prominent is the search for a solution with the best accuracy. Therefore, the characteristics categorized and presented in Table 3, according to their relevance for each article, consider the following denominations: (LR) Low Relevance for the article; (MR) Medium Relevance to the article; and (HR) High Relevance to the article. Table 3 summarizes the articles identified for this SLR and considers them according to their common characteristics, as well as associating with each of them an indication of relevance regarding the characteristic identified for each article.
Subsequently, the questions defined for the SLR, the answers and analyses for each are presented.

Analysis according to the research questions
This SLR initially identified two questions for the research that were highlighted in section 2. The two questions are highlighted below, as well as the answers identified for them according to the sample of articles collected and the analyses carried out.
The first question addresses the most used algorithms for real-time facial recognition, as highlighted below, followed by its considerations.

Q1. What are the most used algorithms for real-time facial recognition?
To answer this first research question, it is relevant to remember that the facial recognition process is based on three specific phases: 1) Face detection, 2) Feature extraction, and 3) Facial recognition [6]. This statement offers two possibilities of analysis: the first highlights that it is possible to have more than one algorithm that meets and responds to the need for the research question; and the second is that there is a composition of technologies for the solution that results in better accuracy.
Analysing the selected articles, specifically those using ANN, there are two possible analyses. The first is that the authors are concerned with separating the technologies applied to each of the stages of the facial recognition process. The second is that there is a wide variety of solutions that are heterogeneously composed. As a result, the authors had to carry out tests and find out where, among the different stages of the facial recognition process, the algorithm best adapts, and then analyse and achieve better levels of accuracy for the entire solution.
The second most used algorithm is Viola-Jones. This algorithm has a high detection rate, high performance in image processing which benefits the performance for realtime facial recognition [23]. It was also applied together with the ANN algorithms to perform the face detection step [24]. Figure 2 presents the most frequently identified algorithms according to the solutions presented in the set of thirteen articles.
The second research question developed for this SLR addresses the accuracy rates identified in algorithms for real-time facial recognition, which is highlighted below, followed by its considerations.

Figure 2. Most frequently identified algorithms
Accuracy is a percentage factor, and represents how much the system can correctly respond to the face samples presented to it. In short, the closer to the value of 100%, the accuracy is considered better, since it recognizes the entire sample according to what is identified in the database.
According to the set of selected articles, the best accuracy corresponds to 98% and was found in three of the articles. Worth highlighting is [10], which presents a solution using HAAR Cascade to classify and detect faces, then applying the Eigenface and Fisherface algorithms to control access and attendance of students in classrooms. The lowest accuracy found is 50%, a result of the experience of [11], who use the HAAR Cascade algorithms for classification/detection, extraction and facial recognition to identify students in the classroom to confirm attendance.
It is important to highlight that the 98% accuracy is also presented by [10], [13] and [14], who built their solutions using HAAR Cascade for detection and ANN for carrying out the complementary steps. It should be noted that the 3 articles that present an accuracy of 98% are from different authors that consider different scenarios and datasets. [10] uses the Olivetti dataset, [13] does not present the dataset used, and [14] built his own dataset with 11 people in different directions of gaze, containing a total of 234 images of 244x244. Considering the accuracy according to a subset of selected articles, the average among the values was 93%, as shown in Figure 3.

Real time facial recognition algorithms
According to the articles selected, for real-time facial recognition, there are a total of twelve different algorithms used in different solutions. Table 4 presents the algorithms used in each of the three stages of the facial recognition process.
It is possible to verify that there is a greater presence of the ANN and Viola-Jones algorithms, as they represent approximately 40% of the total (Figure 2). The database most used by the authors is the open-source library OpenCV, which has useful characteristics for carrying out the experiments, in addition to contemplating some of these algorithms -ANN, Viola-Jones, Eigenface, Fisherface and LBPH -which facilitate and streamline the development of most solutions. The conclusion is not exact about which method is most used, especially between the different phases of real-time facial recognition.

Artificial Neural Network (ANN)
ANNs correspond to computational techniques that present a mathematical model inspired by the neural structure of intelligent organisms [25] being comparable to the human brain and with the ability to learn from training, store knowledge and later reproduce and replicate such knowledge [26].
The first article on ANN was presented in 1943 by MC-Cullock and Pitts [27] and since then, several models have been developed to meet different purposes. There are several ANN architectures, including the Convolutional Neural Network (CNN), the Multilayer Perceptron Network (MLP) and the Recurrent Neural Network (RNR) [28].
Regarding CNNs, they are recognized for being artificial neural networks that have excellent performance to classify images, group them by similarity and perform object recognition within scenes. These are algorithms that can identify faces, individuals, and many other aspects of visual data such as street signs, fruits, and animals. In short, CNN is a deep learning approach with many hierarchical layers trained, which try to represent the structure in relation to the recognition of an image [29].
Hence, a CNN consists of a set of features extracted from the input image -which are layered using convolutions and subsampling -, and which in the end infers which class the input image will belong to.
According to [30], a CNN has at least three layers: Convolutional Layer: responsible for filtering and extracting features starting from small portions of the input data, and then passing them on to the next layer in the form of feature maps; Pooling Layer: layer dedicated to reducing the dimensions of the data received from the Convolutional layer, applying grouping layers; Fully Connected Layer: present at the end of the network, performing its classification. It uses the features extracted from convolutions performed previously to perform the classification output of the network.
Currently, CNN has shown good levels of accuracy for applications such as face and image detection, video recognition and voice recognition; thus, becoming a relevant tool in the field of machine learning for these types of applications [30].

Viola-Jones
The Viola-Jones algorithm was developed by researchers Paul Viola and Michael Jones in 2001 [31], it is a low computational cost algorithm and is characterized by high performance in facial recognition in real time [32]. Its operation is based on the so-called HAAR filters, which represent the image in a feature space called HAAR features, to extract representative features of the face or a variety of objects [33]. Also known as HAAR Cascade, the Viola-Jones algorithm is implemented in the OpenCV library [34] and in OpenBR [35].
The facial recognition performed using the algorithm proposed by Paul Viola and Michael Jones is divided into three steps. In the first step, HAAR filters are used to create a spatial image that results in the full image. The second step is training using the Boosting classification method, which, in the case of the Viola-Jones algorithm, uses AdaBoost, a high-precision training classifier, to later  obtain the most relevant characteristics of the integral image. Finally, the third step is the creation of the HAAR Cascade, which is a tree structure also known as cascade classifiers [33]. The Viola-Jones algorithm is also referenced by HAAR Cascade in OpenCV, because it uses the resources of HAAR features, which are masks that change the luminosity to characterize an object [36]. The masks capture variations in different directions and amplitudes to be trained using the AdaBoost algorithm to generate classifiers, one for each HAAR feature.
In their work Viola and Jones [31] present the three basic types of masks, as follows: Characteristics of two rectangles: the numerical value is the difference between the sums of pixels contained in both rectangles. The regions have the same area and are adjacent; Characteristics with three rectangles: the numerical value is the calculation of the difference between the outer and inner rectangles multiplied by a weight to compensate for the difference in areas; Characteristics of four rectangles: the numerical value is the calculation of the difference between diagonal pairs of rectangles.
In summary, Viola-Jones, or Haar Cascade, is an algorithm for object detection that uses several classifiers.

Eigenface
Eigenface is the set of eigenvectors of a covariance matrix of a set of faces. According to [37], it is a method that seeks to delimit a set of characteristics independent of the geometric shapes of the face, such as the eyes, mouth, nose and ear and, for that, it uses the information of representation of the face.
Eigenface uses the PCA algorithm for dimensionality reduction, which is very convenient for reducing the magnitude of the data and, consequently, optimizing the number of images in the dataset. Also known as Karhunen-Loeve methods, PCA, when applied to facial recognition, is called Eigenface [38].
The PCA, from statistical analysis based on the existing redundancy and variance in the data, reduces the dimensionality of the data without changing the information, so that the result does not change. According to [39] the Eigenface algorithm is an appearance-based method, as it is an algorithm that does not require prior knowledge about what will be recognized, and as special detail is that the algorithm searches for the main EAI Endorsed Transactions on Internet of Things 05 2022 -09 2022 | Volume 8 | Issue 30 | e5 9 components at the time of recognition: the eigenvectors that describe a person's face.
However, it is important to highlight that the Eigenface algorithm is sensitive to lighting conditions and also to some types of noise, which compromises its efficiency and worsens the accuracy of the system [40].

Fisherface
The Fisherface algorithm is considered an evolution of the Eigenface algorithm [41]. Fisherface uses the LDA algorithm [42], which is an alternative to the use of PCA. The LDA algorithm is also known as Fisher Linear Discriminant Analysis (FLDA) [41], which is based on linear combinations of variables based on weight factors that determine which group the object belongs to [42]. Ref. [43] also highlited that the Fisherface algorithm is a useful statistical method for reducing dimensionality, preserving the information as much as possible.

LBPH
The Binary Local Pattern (LBP), initially presented by [44] and, later, by [45] introduced the LBP as a descriptor based on texture features extracted from regions of the face, which uses a binary pattern. When the LBP pattern is integrated into the HOG classifier, it is called LBPH, considering the composition of the LBP algorithm with the histogram.
LBPH is based on the local binary operator, whereby the operator allocates a label to each pixel value of each image, performing the comparison and checking if the neighboring pixel value is greater than the central pixel, returning a value of '1' in this casebut if the neighboring pixel is smaller, the value '0' is returned. At the end of all the comparisons, there is a binary number, which, converted to decimal, will form a histogram [46].

OpenCV
The open-source library OpenCV stands out in interfacing and working with facial recognition algorithms: Eigenface, Fisherface and LBPH [47], found in various applications.
OpenCV was created and developed by Intel Corporation for computer vision applications. The highlight for this library is the efficiency in real-time computer vision applications. This library is compatible with Windows, Mac Os and Linux operating systems and is also available for IOS and Android platforms, has interfaces for Python, C ++ and Java languages [47]. Its more than 500 functions are divided into five groups: image processing, structural analysis, motion analysis and object tracking, pattern recognition, camera calibration and 3D reconstruction.

Conclusion
Facial recognition, as described in this article, is divided into three stages, starting with detection, followed by feature extraction, and ending with facial recognition itself. A greater presence of Artificial Neural Networks (ANN) is identified for the facial recognition stage, with greater emphasis on the Convolutional Neural Network (CNN). However, the other steps are marked by well-defined algorithms, and for the detection phase, the Viola-Jones is the most used. For the feature extraction phase, the Eigenface and Fisherface algorithms are very present; they are based on the PCA. The use of composite solutions aims to improve the response time of the entire facial recognition system, achieve operational simplification and improve the response to eliminate the False Rejection Rate (FRR) and False Acceptance Rate (FAR).
Finally, the accuracy found in the solutions identified in the articles selected from the research had an expressive value, with a maximum of 98%, in three different cases, and 50% as the lowest accuracy value. A mean accuracy result of 93% was identified. However, it is not possible to determine which the best solution is, since these accuracies were obtained from different experimental models, that is, without standardization, which makes it difficult to determine a relationship between them.