Application of Decision Tree Classification Algorithm in Quality Assessment of Distance Learning in Colleges

INTRODUCTION: The quality assessment technology of distance education in colleges and universities, as the critical technology for identifying the quality of distance education in colleges and universities, is conducive to the improvement of the quality of distance teaching and the progress of the existing means and methods of distance education, which makes the means of distance teaching in colleges and universities rich in science. OBJECTIVES: Aiming at the evaluation methods of higher education institutions, there are problems such as insufficient objectivity and comprehensiveness of the evaluation system, single process, and inadequate quantitative analysis. METHODS:Proposes a decision tree and intelligent optimization algorithm for the college distance teaching quality assessment method. Firstly, the kernel principal component analysis method is used to carry out dimensionality reduction analysis on the index system of college distance teaching quality assessment; then, the decision tree parameters are optimized through the marine predator algorithm to construct a college distance teaching quality assessment model; finally, the robustness and efficiency of the proposed method are verified through simulation experimental analysis. RESULTS: The results show that the proposed method improves the accuracy of the assessment model. CONCLUSION: The problem of insufficient objective and scientific evaluation and low precision of distance teaching quality assessment methods in colleges and universities is solved.


Introduction
Accompanied by the development of information technology and Internet technology, online distance education continues to innovate and develop, and 100 countries worldwide have carried out online distance education.Europe, the United States, and other countries have researched intelligent distance education in combination with artificial intelligence technology [1].Entering the new era, China's new period of diversified educational models continues to emerge, especially with distance education as the focus of innovation.To accelerate the recommendation of distance education, expanding the coverage of high-quality educational resources has become the direction of national strategic development [2].Distance education is a kind of education form with teachers and students quasiseparation as the essential characteristics, using various media to carry out education and teaching [3] systematically.Distance education is a combination of higher education and distance education, the use of a fast and convenient way to find information, greatly supplementing the diversity of education students [4].Higher education distance education is good or bad, whether it can meet the learning needs of students in colleges and universities, has become the focus of research by experts and scholars in the field of education.College distance education quality assessment technology, as a critical technology to identify the quality of distance education in colleges and universities, is conducive to the improvement of the quality of distance teaching, improving the existing means and methods of distance education so that the standards of distance teaching in colleges and universities rich in science [5].Therefore, research on the quality assessment of distance education in colleges and universities to guide the deepening of reform and innovative development of distance education in colleges and universities is an urgent task and a problem to be solved.In recent years, many experts and research scholars in distance education quality assessment and teaching effectiveness analysis of the problem of in-depth research and analysis have proposed several teaching assessment methods [6].Currently, distance education quality assessment methods include fuzzy theory [7], hierarchical analysis [8], Markov theory [9], and machine learning methods [10], of which machine learning methods include neural networks [10], support vector machines [11], and decision trees [12].Literature [7] adopts a fuzzy method to process the judgment of distance teaching assessment comprehensively and puts forward the distance teaching quality assessment method based on fuzzy theory; literature [8] combines the candidate's performance and puts forward the distance teaching quality assessment model based on the Markov method; literature [9] adopts hierarchical analysis method to analyze the dimensionality reduction of the indicators of the distance teaching quality assessment, and constructs a concise distance teaching quality assessment method.To overcome the limitations of linear methods, [10] proposed a neural network-based distance teaching quality assessment method; [11] proposed a support vector machine-based distance teaching quality assessment method to overcome the phenomenon of "overfitting" under the condition of limited data samples; [12] proposed a decision-based distance teaching quality assessment method by analyzing the factors affecting the quality of distance teaching in colleges and universities, and offered a decision-based distance teaching quality assessment model.Teaching quality influencing factors put forward a distance teaching quality assessment model based on a decision tree classification algorithm; literature [13] uses an optimized BP neural network to construct a teaching quality assessment model for colleges and universities; literature [14] combines the convolutional neural network and the long and short-term memory network to build an online education evaluation method.Given the above literature analysis, the existing distance teaching quality assessment methods have the following defects: 1) the evaluation index system research is not objective and comprehensive enough; 2) the evaluation method is single, and the optimization performance is poor; 3) there are fewer quantitative analyses [15].A decision tree is a primary classification and regression method [16], representing a mapping relationship between object attributes and values.The decision tree model is a tree structure and classification problem; it means classifying instances based on features, which can be used in distance learning quality evaluation problems.Intelligent optimization algorithms are a class of optimization algorithms based on the phenomena of biological evolution and group intelligence in nature [17], which aim to solve complex optimization problems by simulating the processes of natural growth and group behavior in nature with good results.Using an intelligent optimization algorithm to improve the decision tree algorithm can improve the accuracy of distance teaching quality assessment.Aiming at the above problems of distance teaching quality evaluation methods, this paper proposes a distance teaching quality evaluation method based on a decision tree and intelligent optimization algorithm.The main contributions of this paper are (1) the use of kernel principal component analysis to extract distance teaching quality assessment indexes; (2) the use of ocean predation algorithm to optimize the decision tree algorithm, and at the same time, put forward a method of distance teaching quality assessment for colleges and universities based on the optimization of the decision tree by the ocean predation algorithm; (3) the experimental analysis verifies that this paper's method has a higher assessment accuracy.

Related Theory 2.1 Kernel Principal Component Analysis
The Kernel Principal Component Analysis (KPCA) [18] method is an improvement of the principal component analysis method, using the kernel function, which is used to construct complex nonlinear classifiers.The core idea of KPCA is to use the kernel function to map the raw data to a high-dimensional feature space and then perform PCA in that space.The specific steps are as follows: Step 1: Normalization of indicator features.To eliminate the difference in magnitude between different influence factors, the original data matrix is standardized, and the standardized matrix is obtained using the Z-Score method, where is the number of samples and the dimension of the sample indicator features.
Step 2: Calculate the equation for the correlation coefficient between each indicator: Where ki z denotes the value after standardization of the ith indicator of the kth sample; is the average value taken by the ith arrow; is the covariance of the vector sum.
Step 3: Select the Gaussian kernel function as the kernel function and calculate the kernel function value: Where  is the Gaussian kernel function parameter, which controls the distribution of data points in the highdimensional space?
Step 4: Calculate the diagonal matrix of the symmetric positive definite matrix to obtain the characteristic roots.
Step 5: Determine the contribution of the matrix.Calculate the assistance of the principal component: (3) Step 6: Determine the Number of principal components.Sort the parts once according to the size of the contribution rate, determine the information retention threshold after decoupling, and if the cumulative contribution rate of the first k components is more significant than, then the number of principal components is k.
Step 7: Output the principal components associated with individual indicator features.The advantage of KPCA is that it can deal with nonlinearly divisible data while retaining the nonlinear characteristics of the original data.However, KPCA has a high computational complexity and requires the computation of kernel matrices so the computation time will be longer for large-scale datasets.

Decision Tree
A decision tree is an attribute-structured predictive model representing a mapping relationship between object attributes and values.It consists of nodes and directed edges, with two types of nodes: internal nodes and leaf nodes, where an internal node represents a feature or attribute and a leaf node means a class.Decision tree learning generalizes a set of classification rules from the training set to obtain a decision tree that is less contradictory to the data set and, at the same time, has good generalization ability.The loss function for decision tree learning is usually a regularized great likelihood function, and heuristics are traditionally used to approximate the solution of this optimization problem.The decision tree learning algorithm consists of feature selection, decision tree generation, and pruning of the decision tree.A decision tree represents a conditional probability distribution, so decision trees of different shades correspond to probabilistic models of varying complexity.Decision tree generation corresponds to local selection (local optimization) of the model, and decision tree pruning corresponds to the global section (global optimization).The commonly used algorithms for decision trees are ID3, C4.5, and CART.The decision tree algorithm used in this paper is CART [19], as shown in Figure 1, with the following steps: Step 1: Feature selection.The CART decision tree uses the Gini coefficient (Gini Index) as the criterion for feature selection, which is calculated as follows: ( ) ( ) Where k p denotes the probability that the sample point belongs to the kth category.The Gini coefficient reflects the likelihood that two randomly selected samples from the dataset will have inconsistent types.Therefore, the smaller the Gini coefficient is, the higher the purity of the dataset.During feature selection, the CART decision tree calculates the Gini coefficient of each feature and selects the part with the smallest Gini coefficient as the division criterion.
Step 2: Decision tree generation.Starting from the root node, the feature with the smallest Gini coefficient and its segmentation point are selected as the optimal feature and cut-off point by calculating the Gini coefficient of the possible features.The above method is used recursively until the stopping condition is satisfied and the CART decision tree is generated.
Step 3: Decision tree construction.To prevent the overfitting phenomenon, CART uses the cost-complexity pruning algorithm Cost-Complexity Pruning (CCP), which views the cost complexity of the tree as a function of the number of leaf nodes in the tree and the error rate of the tree.Starting at the bottom of the tree, for each internal node N, calculate the cost complexity of the subtree at N and the cost complexity of the subtree at N after that subtree is pruned (i.e., replaced by a leaf node).Compare these two values, and if pruning the subtree at node N results in a minor cost complexity, then prune that subtree; otherwise, keep the subtree.

Marine Predator Algorithm
Marine Predators Algorithm (MPA) [20] is a novel metaheuristic optimization algorithm inspired by the theory of survival of the fittest in the ocean, where a marine predator chooses the optimal foraging strategy by choosing between Lévy wandering or Brownian wandering, as shown in Figure 2. The algorithm finds the optimal solution by simulating the foraging behavior of marine predators.Its main features include a solid ability to find the optimal solution, fast convergence speed, and easy implementation.The algorithm has been applied in several fields, such as wireless sensor network coverage optimization.

Figure 2 Decision schematic diagram of ocean predator algorithm
The marine predator algorithm is a population-based intelligent optimization algorithm.To ensure the search quality, the initial solutions are distributed in the search space as evenly as possible.
where min X and max X denote the upper and lower boundaries of the search space and the dimension of the population, respectively.MPA is mainly divided into three phases: the exploration phase, the equilibrium phase, and the development phase, which take place before, during, and after the optimization search process, respectively.Exploration phase: This phase usually occurs at the beginning of the iteration to search for more space; the predator's action mainly obeys the Brownian motion, and the giant step size of the Brownian movement is favorable to the exploration ability of the algorithm.The mathematical model of the exploration phase is shown below: Where iter denotes the current iteration number and denotes the maximum number of iterations.It is a random vector obeying a normal distribution based on Brownian motion, denotes the position information of the first individual, then the global optimal individual position information, is a constant taking the value of 0.5, and is a uniformly distributed random vector.Equilibrium phase: In this phase, the predator has to consider exploring and exploiting the search space, so the population is divided into two parts.One part relies on the large step size of the Brownian motion to conduct a wide range of searches, and the other utilizes the smaller step size of the Levy distribution to perform an in-depth investigation.The mathematical model for this phase is described below: max max 12 33

While iter iter iter 
For the first part of the population, exploitation behavior is mainly carried out: where CF is an adaptive parameter that controls the predator step size.Exploitation phase: In the final stage of the search, the predator locally exploits the search space; the mathematical model of this phase is as follows: .

Prey
Elite stepsize = +  ii P CF (15) In addition, Fish Aggregating Devices (FADs) are susceptible to being used as food by predators, thus losing their natural prey.Therefore, to avoid FADs, a more significant step size is used for movement.The mathematical model of this behavior is described as follows: Where 0.2 FADs = denotes the probability of being affected by FADs, is a binary vector including 0 or 1.When a random vector from 0 to 1 is generated and less than 0.2, all the vector elements are changed to 0, and vice versa, to 1. and are two randomly selected individuals.

Indicators of distance learning quality assessment
By analyzing domestic and foreign distance teaching quality evaluation index methods, combined with the case of efficient distance education work experience, the distance teaching quality assessment index based on four dimensions is proposed.The four dimensions proposed in this paper include teaching quality, teaching attitude, teaching content, and teaching methods [21].
(1) Teaching quality Distance teaching quality generally refers to the professional quality of teachers' teaching, including teaching mode, professional level, and lecture rationality.

Construction of evaluation index system
The distance teaching quality assessment system takes teaching quality, teaching attitude, training content, teaching methods, and other vital elements [22] as the first-level indicators, and the 13 influencing factors such as teaching mode, professionalism, lecture organization, information feedback, Q&A, enthusiasm for lectures, teaching objectives, explanations, training content, vital and challenging points, scientific setting of doubts, teaching methods, teaching according to the material, etc. as the second-level indicators [23], which fully embodies The whole process of distance teaching is fully reflected.A scientific, objective, and comprehensive quality assessment system for distance teaching in colleges and universities is constructed, as shown in Figure 3.

MPA-optimized decision tree-based quality assessment method for distance teaching in colleges and universities
This paper adopts the marine predator algorithm to optimize the decision tree algorithm to improve the accuracy of the decision tree-based university distance teaching quality assessment method.In this paper, the data assessment weights are selected as the optimized decision variables, and the decision tree assessment accuracy is the objective function to construct the optimized decision tree.The accurate part of the method is: where accuracy denotes the assessment accuracy of the decision tree.
The CART decision tree (KPCA-MPA-CART) assessment model based on the kernel principal component analysis method and MPA algorithm is mainly divided into the data preprocessing module, the optimization parameter module, and the CART decision tree algorithm module, as shown in Figure 4. First, KPCA is used to screen distance teaching quality assessment indexes in colleges and universities.The CART decision tree algorithm module uses the MPA algorithm to optimize the parameter decoding for the CART decision tree hyperparameters to construct the CART decision tree; then, the incoming training data from the data module is used to train the optimal CART decision tree, and the optimal use of the test set is used for the evaluation, to get the error between the real value and the actual output value.Step 1: The Z-Score method is used to preprocess the raw data.KPCA is utilized to screen the indicators for assessing the quality of distance teaching in colleges and universities; the selected data are divided into a test set and a training set; Step 2: The marine predator algorithm encodes the initial parameters of the decision tree and also initializes the algorithm parameters, such as the population parameters and the number of iterations; initializes the population and calculates the value of the fitness function; Step 3: Generating new population positions using the marine predator algorithm position update strategy; Step 4: calculating the fitness function value and update the optimal solution; Step 5: Judge whether the termination condition is satisfied.If satisfied, exit the iteration, output the optimal decision tree weight parameters, and execute step 6; otherwise, continue to complete step 3; Step 6: Decode the optimized decision tree weight parameters based on the marine predator algorithm and obtain the decision tree weight parameters; Step 7: Construct the MPA-CART evaluation model, train the model using the training set to obtain the evaluation model, and input the test set into the model to get the evaluation results.

Experiment and result analysis
To verify the performance of the proposed university distance teaching quality assessment method in this paper, the assessment results of the proposed algorithm are analyzed and discussed in this section by selecting the case university distance teaching evaluation data.

Simulation Environment Setting
In this paper, MATLAB 2021a is used to write the program, and the test environment is a Windows 10 system; the processor is AMD Ryzen 9 5900HX with Radeon Graphics, and the RAM is 16.0 GB.The experimental dataset is selected as the validation data of the 2022 case university distance teaching evaluation data, and the number of data is 520 groups, of which 400 groups are assessment model training set and 120 groups are assessment model testing set.The specific parameter settings of the university distance teaching quality assessment and comparison assessment method proposed in this paper are shown in Table 1.

Simulation Analysis
According to the method of kernel principal component analysis of college distance teaching quality assessment indexes proposed in this paper, analyze the data of college distance teaching quality assessment indexes, select the indexes with higher contribution to college distance teaching quality assessment through dimensionality reduction, and the results of kernel principal component analysis are shown in Figure 5.As can be seen from Figure 5, the cumulative contribution rate of the first 13 leading component indicators to the teaching evaluation indicators of higher education institutions has reached 95%.The results show that the first ten indicators can represent the leading indicators covering the quality of distance teaching, teaching attitude, teaching content, teaching means, and other aspects of higher education, including the whole process of distance teaching in higher education, which further indicates that the teaching evaluation index system proposed in this paper has objectivity and comprehensiveness.Therefore, the characteristic dimension of the data in this paper is 10.

Figure 4 Analysis results of the KPCA method
To verify the effectiveness of the quality assessment model of college distance teaching based on KPCA-MPA-CART, this subsection utilizes the test set to compare the performance of MPA-CART, BP, GA-BP, SVM, and CART methods.Figure 5 gives the results of the data evaluation of the test sample set of data based on different algorithms for assessing the quality of distance teaching in universities.
As can be seen from Figure 5, the evaluation accuracy of the MPA optimization decision tree algorithm is better than the other algorithms.The statistical results show that the evaluation results of the university distance teaching quality assessment model of MPA optimization decision tree on the test set data are closer to the actual value than the evaluation results of other evaluation model methods, indicating that the model has a high evaluation accuracy.

Conclusion
Aiming at the problems of the current university distance teaching quality assessment method, such as insufficient objectivity and comprehensiveness of the evaluation system, a single process, and inadequate quantitative analysis, this paper proposes a university distance teaching quality assessment method based on the optimization of the decision tree of the MPA algorithm.This method uses the kernel principal component analysis method to carry out dimensional analysis on the index system of college distance teaching quality assessment, adopts the MPA algorithm to optimize the decision tree, and constructs the college distance teaching quality assessment method.Through simulation, the accuracy of the decision tree evaluation model based on MPA optimization is better than other models.The evaluation model proposed in this paper could perform better in some samples, and further improving the robustness of KPCA-MPA-CART is the following research focus.

Figure 1
Figure 1 Decision tree flowchart vector obeying a Lévy distribution.For the second part of the population, the primary exploration behavior is performed:

( 2 )
Teaching attitude Distance teaching attitude refers to the attitudes and behaviors shown by online teachers in distance teaching, including guidance, answering questions, respect, understanding, support, and encouraging students.Indicators of distance teaching attitude generally include information feedback, answering questions, enthusiasm for teaching, and other indicators.(3) Teaching content Distance teaching content generally includes teaching objectives, methods, explanation content, teaching points of emphasis, etc. Factors affecting the assessment of distance teaching quality include indicators of teaching objectives, explanation, teaching content, points of focus, etc. (4) Teaching methods Distance teaching means through the Internet, questioning, lecturing, and other means to realize the educational and learning activities.Distance teaching indicators generally include scientific questioning, education means, and teaching according to materials.

Figure 3
Figure 3 Distance learning quality assessment system

Figure 4
Figure 4 KPCA-MPA-CART model diagram The steps of the CART decision tree evaluation method based on Kernel Principal Component Analysis and Marine Predator Algorithm are as follows:Step 1: The Z-Score method is used to preprocess the raw data.KPCA is utilized to screen the indicators for assessing the quality of distance teaching in colleges and universities; the selected data are divided into a test set and a training set;Step 2: The marine predator algorithm encodes the initial parameters of the decision tree and also initializes the algorithm parameters, such as the population parameters and the number of iterations; initializes the population and calculates the value of the fitness function;

Figure 5
Figure 5 Evaluation results of test samples of different modeling methods Figure6gives the results of evaluating the data of the test sample set of university distance teaching quality assessments based on different algorithms.From Figure6, it can be seen that the evaluation accuracy of KPCA-MPA-CART is better than other algorithms.The results of KPCA-MPA-CART with MPA-CART and CART can be seen that the kernel principal component analysis method is beneficial to the improvement of the evaluation value of the quality assessment of distance teaching in colleges

Figure 6
Figure 6 Relative error of test samples for different modeling methods