Video Shot Boundary Detection and Sports Video Classification Algorithm Based on Particle Filter

INTRODUCTION: Sports video is an essential information resource. Classifying sports videos with high accuracy can effectively improve users' browsing and query effect. This project intends to study a motion video classification algorithm based on deep-learning particle filters to solve the problems of solid subjectivity and low accuracy of existing motion video classification algorithms. A critical box extraction method based on similarity is proposed. The moving video classification algorithm is studied based on a deep learning coding model. Examples of various types of sports videos are analyzed. The overall performance of the motion video classification algorithm proposed in this paper is much better than other existing motion video classification algorithms. This algorithm can significantly improve the classification performance of motion video.


Introduction
With the rapid development of multimedia and Internet technology, video data on the Internet shows an explosive growth trend.For a considerable amount of video, it is impossible to annotate it artificially.In addition, because of the subjective nature of artificial marks, text-based information retrieval has been unable to meet users' needs.Therefore, it is essential to realize the effective processing and query of video and to classify it automatically.In medical care, network supervision, intelligent video surveillance, and other fields, automatic video classification are also of great significance: for example, classifying videos to screen out harmful videos to meet network supervision needs and ensure online video quality [1].An image classification algorithm is used for wireless endoscopy images in medical treatment.The image is divided into four parts: entrance, stomach, small intestine and colon, which can reduce the workload of medical staff.In 2001, a group of researchers proposed a new approach to multimodal monitoring.This method proposes an algorithm for pedestrian detection and classification based on multi-view and applies it to long-distance environments.
Fisher first proposed the content-based classification of video types in 1995.It divides video into categories such as news, sports, commercial advertising and animation and puts forward three steps to classify video: extracting visual and audio features of video frames [2].The image segmentation, camera motion and target motion obtained in the first step are analyzed in the second step.The third step is to classify the image according to its features.Since then, researchers have continuously improved and innovated in several fields, such as classifiers and video features, which have enabled the development of classification-based videos.Some scholars extract the text based on the data generated by users.Some researchers have recently proposed an acyclic support vector machine (SVM) that fuses multiple image information [3].Its standard features are editing, color, and movement characteristics.Previous studies have combined the Mel frequency band's cepstrum value with the image's color statistical histogram to classify different types of videos.Some people identified four types of videos in their doctoral dissertations.Firstly, football, badminton, basketball and table tennis are classified on the color information of the court.Then, distinguish basketball and table tennis according to the way of movement.
At last, football and badminton are identified by the texture feature of the image.Some researchers have proposed image recognition algorithms based on temporal patterns [4].Firstly, the different actions that can take place in the video are defined.Different videos are classified using the machine learning method.Finally, a video classification algorithm is proposed based on a time series pattern.
However, the existing video classification algorithms have three common problems: 1) Video classification methods often design classification rules according to prior knowledge.It has certain limitations.2) Use the classifier to learn and classify the complete image.This will inevitably bring about errors, which will affect the accuracy of recognition.3) The classification method is too ideal.Simultaneously, an image file can include different kinds of image fragments.In general, this method can only classify one type of image.Given the shortcomings of the existing motion video recognition algorithms, this project intends to study the particle filter algorithm of motion video recognition [5].The results are expected to have some reference significance for classifying motion video and even images.

Feature Extraction
A feature represents an object, some attribute that can be quantified.There are two elements in motion video recording: universal and specific domain features.Considering the extraction speed of keyframes in moving video, the image characteristics of moving video are set as color histogram and color branch descriptor [6].Color characterization of moving images is a color space problem.This paper presents a method to determine the similarity of keyframes using the edge of the shot.When the image is in the HSV color space, the index of the histogram is performed first.The similarity between the two frames a jh Qγ represents the h term of the discrete cosine factor of the γ component in frame a f .ϖ indicates the weight.

Edge detection of motion video
The lens edge factor is set according to the motion video field transformation characteristics.Suppose the width of the field window is 2 1 N + .So, for frame a, the difference between adjacent Windows is . Changes in shots are higher than changes between shots.If the spacing between two arbitrary frames is s Q , and the spacing between the two If a lens mutation occurs between frames 3 N = and G and G+1, then the constant sequence ( ) sw K b is (1,3,6,6,3,1).Set the a frame lens boundary similarity factor to: When converting between adjacent frames, if the similarity coefficient of the lens edge is close to 1, in the rest, it is a minimal value from 0 to 1.

Keyframe sequence clustering
The same sports video will appear in the same momentum, which leads to the repetition of keyframes.K-means clustering is performed on the keyframe sequence to reduce the repeatability of the final keyframe sequence [8].Finally, the K value is determined by the clustering efficiency method.The characteristic parameters of clustering are: ( ) ( ) F represents the cluster result of the critical frame sequence in the motion video.
( ), ( ) Scat a dis a specifies the sequential class of keys., a b i i indicates the spacing between groups in sequence.A weighting factor is set since there is a large difference in the range of values between the two.max d represents the preset maximum number of clusters.When the value reaches the minimum, d is the optimal number of clusters.

Depth coding method of motion video image
The critical frame information of the moving video is extracted, and the moving video is classified by deep learning of particle filter.The specific process is as follows: 1) Set the important frame feature database of the moving image to . N is the number of features.
2) FF is coded using an unsupervised, finite Boltzmann machine (RBM).The encoding book of convergent parameters is obtained by using this method.
3) Carry out error non-forward propagation of the obtained label information.The supervised fine adjustment of the RBM neural network is realized.The F-code is encoded, the optimal image dictionary is obtained, and the image features are extracted.
4) The SVM classifier is described by the depth coding vector for training motion video images to complete the motion video classification.

Improve the AdaBoost algorithm based on the particle filter method
AdaBoost method is an up-and-coming method.The theoretical basis of this method is strict, and the prediction accuracy is high.This method will enhance the ability to distinguish incorrect samples in the previous cycle with each cycle, but the primary classifier will produce greater sensitivity to most categories of data after several cycles [9].Therefore, the final classification algorithm can not predict the user well under the unbalanced sample set.The classic AdaBoost algorithm is improved.The following will give the specific steps to amend the AdaBoost algorithm.The flow of the modified AdaBoost algorithm is shown in Figure 1.
The method aims at the most minor classification error   (  , ,  , , ) In formula (4), the AdaBoost algorithm increases the weight of the wrong or wrong classification samples.Finally, the weighted function 1 n S + is obtained by modifying the weighted function [10].The concept of cost sensitivity is used to improve the method, and the cost sensitivity coefficient is introduced.Modified sampling weighted modified formula.Here is the detailed calculation method: Here n Z is the normalization factor, so ζ has a great influence on the final classification effect.When the cost sensitivity coefficient is iterated, the particle swarm optimization method is used to modify the cost sensitivity coefficient a ζ .The particle swarm optimization method is simple to construct and easy to control.This method is a heuristic global optimal solution.Set the fitness function.In this algorithm, the particles are co-optimized according to the fitness values of the population and the particles themselves.This new solution method can ensure fast convergence.A single exception does not affect the global optimal solution of PSO.It's very stable.Here are the specific formulas 1 2 ( 1) ( ) ( ) ( ) ( 1) ( ) ( 1) The t in formulas ( 7) and ( 8) represents the number of

Experimental results and analysis
This paper takes Windows XP as the test platform.This paper uses Matlab 2010 as the implementation platform of the system.Three sets of data are used in the experiment to check the correctness and validity of the algorithm.Its basic features are listed in Table 1.Each category in Data Set A contains more than 66 videos.The Data Set has 18 videos per category.Data Set C is a collection of videos that combine various movements.

Video image classification
The classification results of sports video frames based on the particle filter model are presented.This training database has seven categories of motion videos, including two marker-shot motion videos and nine non-type marker-shot videos.The training frame is 9x100,900.The measured video frame of the picture is 9x40, a total of 360 frames.
Figure 2 shows the classification of footage of players with and without class-labelled shots based on particle filtering (image cited in Sports Videos in the Wild (SVW): A Video Data Set for Sports Analysis).The diagonal line in Figure 2 is the recall rate of each video frame classification.The other row, i, column j, lists the possibility of a video frame being incorrectly identified as type i.Overall, the lens video picture with a category mark is better than the one without a category mark.This is due to the variety of lenses without model marks.The class-labelled lens is thinner than the non-class-labelled lens.

Tests to Exclude Irrelevant Frames
It can be seen from Figure 2 that the representation frame classification based on the particle filter model has achieved good results, but there are still errors in some representation frame classifiers.The correctness of the method is tested by randomly selecting a segment from the data set for different types of motion videos.Finally, recall rate and accuracy were used to evaluate the effect of eliminating outlier frames in time series.The results of the experiment are shown in Table 2.In Table 2, A represents all the frames classified.B refers to the number of isolated frames correctly removed by the convolutional neural network algorithm.C represents the number of frames of interference eliminated in the convolutional neural network algorithm.

Single sports video classification experiment
Single sports video is a particular case of mixed sports video, so this method is also suitable for classifying single-moving images.In this paper, recall rate and accuracy are used to evaluate the performance of convolutional neural networks.The recall rate, accuracy and other indexes are used as BP neural network performance indexes.The experimental results show that combining the convolutional and BP neural networks influences single-moving image training.This experiment used the data set as a test video collection.The threshold for the BP neural network is T=8.The results of image classification thus obtained are shown in Table 3.The convolutional neural network and BP neural network algorithms were used to classify several sports vi: basketball, badminton, football, table tennis, snooker, tennis and volleyball.And he got good results.For a single-motion video, the BP neural network's recall rate and accuracy are slightly lower than the convolutional neural network method.
The algorithm in this paper is tested on Data Set A and Data Set B, compared with the results in the literature [11].In this paper, four types of sports videos are classified and tested.There's badminton, there's table tennis, there's basketball, there's football.Figure 3 shows the results of the comparison.Literature [12] provides recall rates for classifying basketball, football and tennis sports videos, so this paper tests the convolutional neural network algorithm on Data Set B. The recall rate of three types of videos was compared.
It can be seen from Figure 3 that for video test set data SetA and data SetB, both convolutional neural network and BP neural network have achieved good results.However, because there are a large number of interference shots in the tennis video on Data Set B, the error detection rate of these interference shots is relatively high.Therefore, compared with basketball and football, the classification effect of tennis video using convolutional neural network algorithm is slightly inferior [13].At the same time, it also shows that the interference lens in the image will have a great impact on the image recognition.When classifying a single moving image, the convolutional neural network has better classification results than the BP neural network for mixed moving images, but for mixed moving images, the classification results of the convolutional neural network are not ideal.The learning method of the BP neural network is suitable for a single motion picture for a variety of motion pictures.which has better versatility than a convolutional neural network.However, these two methods have advantages and disadvantages in practical application, so there are some differences.This method effectively classifies single-moving images, making it suitable for knowing them in advance [14].The BP neural network learning method is more suitable for learning unknown information.Sports videos classified can include a single video or multiple videos [15].Chakraborty et al. proposed a new shot boundary detection method to optimize the weights of feedforward neural networks (FNNs).In order to improve the performance of the system, the output of the hybrid technology is analyzed again by forming a continuity matrix (phi) [16].Video annotation, as the foundation of video indexing and retrieval, has important application prospects and research value.In semantic detection, the detection and tracking of moving targets are the foundation [17].

Experiment on image recognition effect of interference lens
f and b f of the motion video histogram ( , , ) h s v Ξ can be expressed as follows: a significant color column difference between

nε 4 ) 5 )
and is based on the most minor classification error in each cycle.Here i x is the actual fraction of sample i .i y represents the classifier value predicted by the classifier for sample i .mi w represents the weight of the i sample in the n F cycle.The classification error n ε of n F is described by the weighted sum of the incorrect samples selected by n F .The results show that the weight reassignment of classifier n F on the training set has a great influence on its classification effect.3) Calculate the weighting factor n When class error n ε is lower than the threshold or reaches the maximum number of repeats, the repeat terminates.Update the weight distribution in the training data set, then go back to step 2.

Figure 2 .
Figure 2. Video frame classification results

1 sR
is the recall rate of a video frame classifier in a typical frame classification.

2 sR
is to eliminate the lone wave frame in the sample frame by convolutional neural network in the process of sample frame classification to improve the sample recall degree.The experimental results show that this method can improve the recall rate of the image feature frame classifier.The algorithm effectively reduces the influence of interframe identification errors on video classification.

Figure 3 .
Figure 3.Comparison of the effect of a video exercise test

Figure 4 .
Figure 4.The effect of the classification method of hybrid motion video based on BP neural network under T-value Two control experiments are done in this paper: only the type flag shot class is in the video frame training library of the group.The other is the training base of an existing type labelled video frame.Through the example analysis of the method, it is proved that adding an unmarked lens category to the method can effectively improve the recall and accuracy of the method.In this paper, the data set SetA is used as a test sample to study the influence of disturbed scenes on the performance of convolutional neural networks.Recall rate and accuracy rate were used as evaluation indexes.The results of the comparison between the two test groups are shown in Figure5.It can be seen from FIG.5that adding class-free shot categories to the training library of a video frame can significantly improve the classification performance of the proposed algorithm.

Figure 5 .
Figure 5.The role of interference lens in video recognition

Table 1 .
Experimental data Recall rate and accuracy are measured, where a is the number of positive cases.b represents the number of positive cases in the test sample.c is the number of positive examples obtained by classification.

Table 2 .
Test results without independent frames

Table 3 .
Results of single video classification