A survey on gait recognition in IoT applications

In IoT applications, identity recognition is a basic and critical requirement. In recent years, IoT technology has developed rapidly, and IoT devices such as wearable devices, environmental sensors, and WiFi devices have been popularized and developed. The unobtrusive, low-cost, continuous advanced identity recognition methods are needed in IoT applications. Gait recognition not only has high performance but also is not easy to be forged or hidden. It has excellent potential in IoT intelligent identity recognition. This review discusses the implementation of gait analysis, representative datasets, and algorithms. Finally, we also discussed the challenges of gait analysis in IoT applications.


Introduction
Pervasive Computing was proposed [1] as early as 1991, emphasizing the integration of the physical and digital worlds. In Pervasive Computing, one of the most basic needs is identity recognition. However, due to the limitation of perception ability, this technology has not been applied and developed. In recent years, with the rapid development of IoT (Internet of Things) technology based on comprehensive perception [2,3], identity recognition has obtained new development opportunities. With the great development of perception ablity, we can expect better performance such as unobtrusive, cost-effective or continuous sensing to meet up the specific requirement of IoT application.
IoT emphasizes the interconnection of all things and pursues perception-based computing, communication, and control to integrate the physical and digital worlds [4]. In recent years, the rapid popularity of Internet of things devices such as cameras, smartphones, wearables, WiFi device etc., has laid the foundation for a privacy-safe, accurate and unobtrusive identity detection methods.
The IoT has changed people's lives greatly. A typical example of IoT apllication is smart spaces [5], which integrates perception technology into the environment In a specific application scenario, it is a many-tomany mapping task, which requires an authentication scheme with high complexity and high robustness [9]. However, the present challenge is that gait recognition is affected by many factors, such as different walking speeds [25][26][27], different clothes [30], different sensor viewing points [28,29,31] and etc. [18,32], which will make the recognition process more difficult and greatly affect the recognition results.
Although Many recognition methods have achieved good user recognition accuracy, there are still deficiencies in specific IoT applications [10]. The password and PIN can be obtained through various attacks. For example, shoulder surfing and recording attacks [11][12][13], thermal attack [14,15] and smudge attacks [16,17]. Also, it has managing problems since people can only remember a few passwords. A survey [18] shows that 49% of the users save their passwords somewhere, and 67% of the users never change their passwords. Image recognition can be passed by using face photos. Fingerprints can be conquered by using fake fingerprints made by conductive printing or silicon like glue [19] or by using the method of deep learning [20]. Voice-based identification is also challenged by many malicious spoofing attacks [21,22].
Existing methods require users to cooperate actively, which are obtrusive. For example, the user needs to walk to a specific location and look at the camera in the facial identification, which is unnatural. We should reduce the obtrusiveness in recognition and complete the recognition without the user's extra attention. The image-based method and the voice-based method can not protect users' privacy, and the user's information may be leaked during the transmission process. Besides, the recognition method based on image and video has high recognition cost and needs a lot of computing resources.
Gait recognition is an emerging technology that uses people's walking patterns for recognition. It is very suitable to solve the problem of identity recognition in IoT applications. Firstly, the most prominent feature of gait recognition is that it can achieve unobtrusive measurement [23] because it use sensors embedded in ambient environment which can minimize the disturbance in recognition and give the users the best experience [24]. People can be recognized without cooperative action or even imperceptible. Gait recognition has a long recognition distance. Even when it is far from the user, gait has good accuracy when the resolution of the target is low. Compared with other methods, because gait recognition is based on dynamic biological information rather than static, it is more difficult to attack [11][12][13][14][15][16][17] and will not be forgotten, stolen, hidden or forged. Wireless-based gait recognition have excellent privacy because it does not transmit biological information. In real complex scenes, identification should not be a one-off task, and continuous recognition has become a basic requirement, which can also be achieved by sensor-based gait recognition. Gait recognition can use IoT devices and public devices (WiFi, camera), so it is cost-effective and easy to deploy. These characteristics make gait recognition a competitive method to solve identity recognition in IoT.
At present, there have been many reviews in the field of gait analysis, including wireless-based [74], vision-based [68,77] and sensor-based [75]. There is also a review of identity authentication in IoT. In this reviews, Liang et al. claimed that IoT authentication need to be continuous, unobtrusive, and cost-effective and discussed the challenges in the future work. However, this review focused on behavioral biometrics information rather than just gait information. Also, there have been new breakthroughs in the field of gait recognition in recent years. At present, there is no review just focusing on IoT gait recognition. With the rapid development of IoT and gait recognition, and the importance of identity recognition, a paper focusing on IoT gait recognition is needed.
In this paper, the second part introduces the wireless method, discusses how to implement it in four steps, and introduces several related works. The third part 2 EAI Endorsed Transactions on Internet of Things 04 2022 -04 2022 | Volume 7 | Issue 28 | e3 introduces the types of sensor-based methods, focusing on the acceleration sensor. The fourth part presents the method based on vision, introduces the implementation method according to the feature extraction method [68] and raises the related work. In the fifth part, we discuss the current challenges in IOT gait recognition. The sixth part concludes this paper.

Wireless-based method
Currently, Wireless-based methods are mainly based on WiFi. WiFi devices are prevalent in daily life. Using infrastructure such as WiFi for gait recognition has many advantages in cost, ease of use, privacy and universality. It can be widely used in personal identification in smart homes, offices, and public places. It is also the most studied direction at present. Recently, some studies have realised gait recognition based on RFID [33] and millimetre wave [34]. In this section, we will mainly discuss the implementation of WiFi-based sensing technology.
At present, the research of using WiFi signals for identity recognition is mainly based on action recognition, which mostly senses the identity by analysing the disturbance characteristics of gait action on the signal. The experiment includes explicitly four steps: signal acquisition, preprocessing, feature extraction and identity recognition. Firstly, the signal data reflecting the action characteristics of the human body is obtained from the specific WiFi signal receiving device. Secondly, preprocess the collected data to reduce the signal noise. Then, the effective segments containing action information are separated by the algorithm, and then the features are extracted. Finally, the trained classifier is used for identity recognition. The following mainly introduces the concepts that will be used in the methods mentioned in this section by steps:

Signal Acquisition
Signal acquisition is the first step of sensing signals. Whether the collected data is effective or not is directly related to the effect of identity recognition. The acquisition device generally comprises a transmitting end and a receiving end. The transmitting end is usually a commercial WiFi device, and the receiving end is typically a computer with a wireless network card. At present, there are two main ways of WiFi sensing. One is that the received signal from the MAC layer is RSSI (Received Signal Strength Indication), and the other is CSI (Channel State Information) from the physical layer. RSSI signal is easy to be disturbed and has low stability, the complexity of indoor environment will make RSSI seriously disturbed by multipath effect [35]. CSI information is more sensitive to RSS information and can provide more finegrained changes and richer information, such as the scattering, fading, Doppler shift and other information of the transmission signal [36]. CSI is also less susceptible to multipath effects because the computer receiver uses OFDM (Orthogonal Frequency Division Multiplexing) technology for modulation. RSS was used to realise identification in early research, but all of the gait recognition based on WiFi proposed rensently choose to use CSI because of the CSI's superiority.
After the CSI is generated, the CSI information will not be obtained directly. Because of this situation, in 2010, CSI tools [37] were proposed to get CSI information directly from commercial equipment. This set of tools is modified from the driver of the Intel WiFi link 5300 network card and provides the reading interface of CSI. After that, CSI information has been widely used in WiFi perception. In 2015, Atheros CSI Tool have been proposed [38] which make CSI more widely used, but the current WiFi-based gait recognition is mainly based on the CSI tool.

Pretreatment
After getting the signal, whether RSS or CSI signal, due to the interference caused by the equipment or the environment, it contains a lot of noise and needs preprocessing. Preprocessing methods include outlier removal, filtering, PCA and so on.
Due to the internal changes of the equipment, such as the changes in transmission power and transmission rate, these will lead to apparent differences in the signal. These obvious abnormal points are outliers, which need to be removed [39].
Filtering, human actions are generally concentrated in low frequency, so high-frequency components and DC components need to be filtered out. According to the frequency of different human activities, the range of filtering is also diverse (the human gait frequency in 0.3hz-2hz is noticeably high [40]).

Signal Processing
After obtaining effective data, it is necessary to extract the feature of the signal. In WiFi sensing, the commonly used features include statistical features in the time domain and frequency domain, Doppler frequency shift features, wavelet transform features and time-frequency map features. Statistical time-domain features are extracted directly from the waveform, and statistical frequency-domain features are extracted after the FFT transformation of the original waveform.
The calculation of statistical features is intuitive, and the processing is relatively simple, but it is easy to ignore the effective feature information. The characteristics of the Doppler frequency shift have good discrimination. Wavelet transform can realise signal 3 EAI Endorsed Transactions on Internet of Things 04 2022 -04 2022 | Volume 7 | Issue 28 | e3

Classification and Identification
After extracting features and establishing a database, it is necessary to recognise the signal. For human identification, there are already works done by conventional classification such as SVM [41], KNN [42], decision tree [40] and by deep learning-based classification including CNN [43], RNN [44], MLP [45] and AE [46]. The conventional classification relies on feature engineering using the hand-crafting feature, which can be time-consuming. At the same time, the biometric feature is expected to change when the user behaviour changes [] and the machine leaening-based method cannot be adjusted when the user behaviour changes. To solve this problem, many researcher [41] reduced dimension to get better performance for not all hand-crafted features are noticeably contributive. They found that variability reduction in feature engineering could enhance the authentication mechanisms [47]. And many researcher turn to use deep learning methods, in order to increase the scale, accuracy and universality of sensing.

Related Works based on WiFi
WiWho [40] is the first research on WiFi-Based identity recognition, proposed in 2016. This method can identify a person from a small group of people through WiFi. WiWho uses multipath elimination and bandpass filtering to remove noise and conduct walking detection in the frequency domain. When a person starts walking, it carries out gait recognition After getting the information, including the step and walk features, Wiwho uses the decision tree to classify. WiWho carries out experimental evaluation in multiple places. In most cases, a walking distance of 2-3 meters is enough to identify. The experimental results show that WiWho can recognise one person among 2-6 people with an average accuracy of 92%-80%.
WiFi-ID [48] uses CSI amplitude information for identification, which mainly uses CSI data on the LOS path because this part of the data contains the most significant features of human gait. The experimental results show that the accuracy of WiFi-ID for single person identification in 2-6 people is 93%-77%.
Freesense [42] method is also based on identity recognition on Los path and uses PCA and DWT (Discrete Wavelet Transform) to extract the shape features of Los path waveform for identity recognition through the nearest neighbor classifier. The experimental results show that the accuracy of Freesense for single person identification in 2-6 people is 94.5%-88.9%.
WifiU [41] is also a human gait recognition system based on WiFi. WiFiU obtains CSI signals from commercial WiFi devices, uses PCA analysis to denoise, and uses STFT (Short Time Fourier Transform) to transform them into the time-frequency joint domain. Use the SVM to classify the characteristics of gait extracted, such as gait speed and step length. For the gait data of 50 people in a 50 m 2 room, the recognition accuracy is 79.82%.
Crosssense [45] is a system that extends WiFi awareness to new scenes and a wider range. The existing application can be translated and utilised across sites and solve a larger sensing problem with CrossSense. Crosssense used MLP for generating the virtual samples. It used a feed-forward fully connected network with seven hidden layers used data from two domains to learn the mapping relation between them.
AGait [46] is a cycle-independent human gait recognition and walking direction estimation system using the attention-based RNN. AGait is implemented in three different indoor environments on commercial WiFi devices and achieves average F1 scores of 97.32%-89.77% in 4-10 people.

Related Works based on other passive sensing
Luo et al. [33] realised RFID-based gait recognition by monitoring the interruption of RFID signal when the user passes between transmitter and receivers [53]. By designing a set of edge cloud service architecture (GRaas), realising the fast response and low cost, achieving a robust recognition with 96.3%.
Zhen et al. [34] Realised the gait recognition based on millimeter waves. Firstly, they established a millimeter wave gait dataset, which contains the gait data of 95 volunteers in two cases. Based on this database, mmGaitNet based on deep learning is proposed. mmGaitNet achieved 90% accuracy for single person scenarios and 88% accuracy for five co-existing persons scenarios.
All the methods and its details are listed in Table1. Wiwho requires subjects to move in a fixed direction within the specified area, and the accuracy decreases rapidly when the number of people increases. WiFi-ID does not take non-Los paths into consideration, and perform bad when there are a large group of people. The innovation of WiFiU is mainly converting the signal into a spectrum, which can be analysed by image processing. The disadvantage is that it is not ideal in large groups. Freesense has strong robustness and improves the recognition accuracy of large groups. WiDIGR [49] mainly achieved directionindependent recognition, which can be recognized no matter which direction people walk. In the latest works, Transfersense [50] performs well in large-scale groups and has strong tranfer ability. AGait achieves the best performance in both small and large groups by using attention-based RNN. As for other based algorithms, GRaas achieves both accurate recognition and low cost. Because of different signal carriers, mmGaitNet has more advantages for large-scale recognition.

Sensor-based method
Among senoser devices, there are accelerometer and gyroscope-based sensor methods, pressure sensor-based mothed, Electromyograph (EMG)-based approaches and ground reaction force (GRF) measurements. With the development of WIoT (Wearable Internet of Things) technology, various sensor technologies have been integrated into wearable Internet of things devices, such as smartwatches, smartphones, smart glasses, smart headphones, and so on.
Among them, accelerometers and gyroscope sensors are widely used. Unobtrusive acceleration gait recognition gets new development opportunities, which is also the current focus of sensor-based gait recognition research. Take the mobile phone as an example, the phone is fixed on the part of people's bodies during the experiment. The acceleration obtained by the sensor and the acceleration mode on the three axes become the features, and the gyroscope can get the rotation angle on the three axes. Pressure and GRF sensors measure the force applied to the sensor and has already been used for gait recognition [51,52]. The EMG sensor measures either voluntary or involuntary muscle contract. The signal is obtained by the electrode on the skin surface, which can obtain different gait information. The acceleration sensor-based method and GRF measurement are suitable for IoT applications because of their unobtrusiveness and low cost for using the existing IoT devices.
There are few public datasets of accelerometers at present [53,54], and it is difficult to make meaningful comparisons. Researchers generally establish their own database. This is because the experiment settings are quite different. The actual fixed place is different. Moreover, the number of volunteers in each database also varies greatly, so there is no way to compare the performance directly.
According to medical research [56], walking is performed according to the nerve output from the brain and then to muscles, joints, etc. For any kind of exercise, 5 EAI Endorsed Transactions on Internet of Things 04 2022 -04 2022 | Volume 7 | Issue 28 | e3 it is nesscary to change the brain's ideas into specific muscle patterns for walking [55]. The reaction force generated by the force applied to the ground supports the body to walk. The periodic movement of legs during walking is the essence of gait periodicity. Each gait cycle can be divided into two stages and eight configurations [56], as shown in Figure 2.

Figure 2. Walk cycle dynamics [56]
According to a study, the basic movement pattern is constant over a fixed range of motion speeds. Gait characteristics are often unique and naturally stereotyped [57]. For example, differences in pelvic sizes, age, and muscle will affect gait information greatly. In the exercise process, it is necessary to coordinate the muscles of the whole body to save energy as much as possible under the condition of maintaining stability. The final movement patterns often have significant differences, which can be used for identity recognition.

Related Works
Maesico et al. [58] proposed a mobile accelerometer gait recognition method based on a single consumer, using a new segmentation method that segments the gait signal into periods and steps. It achieves a 93% of recognition rate on the ZJU-Gaitacc dataset.
The accuracy of accelerometer gait recognition will decline when the speed changes. To solve the problem, Sun et al. [59] proposed a speed adaptive gait cycle segmentation method and individualized matching threshold generation method. And it achieved the average gait recognition and user authentication rates of 96.9% and 91.75%, respectively, in the ZJU-GaitAcc dataset and the self-collected dataset sampled at various walking speeds.
However, traditional methods also have many challenges. The above two test results are based on closed set identification, is achieved using the most constrained method, like limiting the walks in the gallery and in the probe to have a similar number of steps. All the methods do not perform well in the open set, and there are many influencing factors in practical application, which are difficult to apply to the application. For example, people always fix the mobile phone on their bodies in the experiment, but in reality, people will put it in their pocket or hold it in their hands. Zou et al. [60]proposed a deep learning method based on gait data for learning and modelling gait features to solve this problem. In the test of establishing the dataset, the subjects were not constrained and did not limit the direction and speed. Finally, 93.5% higher than and 93.7% accuracy in person identification and authentication, respectively.
Terrier et al. [52]proposed a gait recognition method based on GRF. 30 people is asked to wear non-high heels and walk for 30 minutes on the force platform of the treadmill without prompting. The collected data are used to train CNN network. The data of the other six people are used for migration learning. Using several small samples for fine-tuning and the accuracy achieves 100%. It proved that as long as there is a pretrained CNN, only a few steps are enough to learn and recognize a new user's gait.

Vision-based method
Vision-based methods are different from WiFi-based and sensor-based methods. Public datasets are widely used in vision-based methods. Evaluation protocols are generally divided into subject-dependent and subjectindependent. In the subject-independent test protocol, the data are divided into test sets and train sets, which are disjoint. After using the model to extract the features, the classifier compares the test features and train features, and the two highest similar features will be marked as the same person. At present, both protocols are widely used in gait recognition.
At present, there are a large number of datasets with different parameters, including clothes, observation angle, subjects' self-occlusion or ambient brightness. Generally speaking, we hope that the more samples and parameters in the dataset, the better. A better dataset can enable us to obtain a model with stronger performance. The current datasets mainly include ISIR datasets (MVLP [69], LP [64], LP-bag [66]) by the Osaka University, CASIA-B [72] dataset by the Chinese Academy of Sciences and USF [67] dataset by the University of South Florida. As shown in Table 2.

Related datasets
CASIA-B is currently the most widely used gait dataset, providing RGB and silhouette images from 124 subjects. This dataset captured and obtained data from eleven viewpoints and considered three different walking patterns, normal walking (NM), walking with a coat (CL), and walking with a bag (BG). The dataset was trained with the data of 74 people and tested with the data of the remaining 50 people. 6 EAI Endorsed Transactions on Internet of Things 04 2022 -04 2022 | Volume 7 | Issue 28 | e3 OU-MVLP is the largest dataset of gait sequences at present (259,013). The data were obtained from fourteen observation angles of 0°-90°, 180°-270°. The data of 5153 people were designated as the training set and the rest as the test set.
In addition to these two datasets, there are datasets taking many other parameters into account. Including outdoor gait information [67], different walking speeds [70], multiple clothing combinations [63], carrying different items [66], and human bone information [61]. In 2015, after the deep neural network was widely used, the neural network began to be used for gait recognition. In recent years, the classification method of gait recognition has an apparent trend from using nondeep learning to deep learning.

Related works
According to a deep gait recognition focused taxonomy [68], classification methods can be divided into four dimensions: body representation, temporal representation, feature representation and neural architectures.
(i) Body representation. Body representation is mainly based on people's body contour or bone information, of which the most commonly used is the silhouette. To obtain the silhouette, we need to detect, segment the pedestrians, remove them from the background image and binarize it. This method can focus on gait information, but it is not ideal when the appearance changes. In the pose-estimation method, because the skeleton information can be obtained, it is more adaptive to the change of viewing angle, but it is very sensitive to occlusion.
(ii) Temporal Representation. This dimension is based on time information. It summarizes the walking information of a sequence of silhouettes in a single map. As for deep gait recognition architectures, gait silhouettes can be aggregated in the initial layer of a network (Figure 3.a), also known as temporal templates. Gait silhouettes can alternatively be aggregated in an intermediate layer of the network after several convolution and pooling layers (Figure 3.b), also known as convolutional template. A typical and widely used example of temporal template is GEI [73], which averages gait silhouettes over one period . A convolutional template example is GCEM [76], which average convolutional maps obtained by several convolution and pooling layers. The frame rate does not affect this type of method.  GaitSet [62] uses the set perspective, regards gait as a group of independent frames, which are not affected by frame permutation, and can integrate frames of different videos in different scenes. GaitSet achieved 95% (NW), 87.2% (BG) and 70.4% (CL) accuracy rates on CASIA-B. And only a few frames are needed to achieve good accuracy. On CASIA-B, the accuracy of 7 frames is 82.5%.
3DCNNGAit [65] proposed a novel multipletemporal-scale gait recognition, which uses both frame and interval fusion information. Realized the intervallevel representation by a local transformation module and applied 3D CNN to the temporal scales. This model achieved 96.7% (NW), 93.0% and 81.5% on CASIA-B.
EV-GATE-3DGRAP [71] uses dynamic vision sensors (event cameras), which has ultra-low resource consumption, extremely high sampling rate, greater resolution and dynamic range. The information obtained is an asynchronous event and event flow. In this work, based on the graph and image-like representation of event flow, a new event-based gait recognition method using GCN and CNN is proposed. EV-GATE-3DGRAP achieved 94.9% in the self-built DVS128-Gait dataset.

Challenges
With the development of IoT technology, gait recognition has new development opportunities, but it comes with higher requirements and challenges in IoT applications. In vision-based methods, although the accuracy of NM has been high, the problems of BG and CL recognizing have not been well solved. (the SOTA algorithm achieves 93.0% in BG and 81.5% in CL) In reality, more factors will affect the results, such as occlusion or cross angle parameters. Moreover, mood, drinking alcohol [78] and health status can also affect people's gait information.
Although the method based on deep learning achieves higher accuracy, it needs higher computational cost. Even the best algorithm at present may not achieve the continuous gait recognition for some edge computing devices with weak computing performance, such as mobile phones and wearable devices. The deep learning model needs to be compressed, including tensor decomposition [79], pruning [82] and parameter sharing [80]. At the same time, the deep learning method relies heavily on data, and the current dataset is not big enough. Models based on the sketched distribution are not robust. IoT systems are also faced with numerous security threats on physical, protocol, communication, and application layers [82].

Conclusion
In IoT applications, identity recognition is a fundamental and critical requirement. With the development of IOT technologies such as wearable devices and environmental sensors, it can meet the new and higher requirements put forward in practical IoT applications. This paper reviewed the implementation of gait recognition according to three methods: wirelessbased, sensor-based and vision-based method. We also reviewed the representative and latest datasets, algorithms, and compared the perfomance of algorithms. Finally, we discussed the current challenges, pointing out several promising future reserach directions in IOT gait recognition. We expect that this survey provides insights into comprehensive landscape of IOT gait recognition guiding researchers in advancing future research.