Deep Biased Matrix Factorization for Student Performance Prediction

In universities that use the academic credit system, selecting elective courses is a crucial task that can have a significant impact on a student's academic performance. Students who perform poorly in their courses may receive formal warnings or even face expulsion from the university. Thus, a well-designed study plan from a course recommendation system can play an essential role in achieving good academic performance. Additionally, early warnings regarding challenging courses can help students better prepare and improve their chances of success. Therefore, predicting student performance is a vital component of both the course recommendation system and the academic advisor's role. To this end, numerous studies have addressed the prediction of student performance using various approaches such as association rules, machine learning, and recommender systems. More recently, personalized machine learning approaches, particularly the matrix factorization technique, have been used in the course recommendation system. However, the accuracy of these approaches in predicting student performance still needs improvement. To address this issue, this study proposes an approach called Deep Biased Matrix Factorization, which carries out deep factorization via multi-layer to enhance prediction accuracy. Experimental results on an educational dataset have demonstrated that the proposed approach can significantly improve the accuracy of student performance prediction. By using this approach, universities can better recommend elective courses to their students as well as predict student performance, which can help them make informed decisions and achieve better academic outcomes.


Introduction
In a university setting, student performance prediction refers to the process of using data about a student's past academic performance to predict their future performance in a particular course or program. The goal is to identify students who may be at risk of not performing well and provide them with additional support or resources to help them succeed. In addition, universities can identify courses and programs that are likely to be a good fit for the student and align with their academic goals and interests [1]. curriculums, and mismatched interests. This issue is particularly problematic in the context of elective courses, where students have a wide range of options to choose from and often lack guidance and knowledge about the potential benefits and challenges of each course.
To address this problem, course recommendation systems have been proposed as a solution to provide students with personalized guidance on selecting courses that fit their interests, goals, and abilities. These systems use various methods, including collaborative filtering, machine learning, and data mining, to analyze student performance data, course attributes, and other relevant information and generate recommendations that optimize the match between the student and the course [3]. The potential benefits of such systems are numerous, including enhancing academic performance, reducing dropout rates, improving student satisfaction, and reducing the burden on academic advisors. However, the accuracy and effectiveness of these systems depend on several factors, such as the quality of the data, the complexity of the algorithm, and the validity of the evaluation metrics. Therefore, there is a need for ongoing research and development to improve the performance and usability of course recommendation systems in universities.
Several studies employ the learner's behavior or grades to predict student performance [4] [5]. While the former is an implicit method that predicts student performance by observing their learning activities through the application system, the latter is a more explicit and straightforward method that relies on the student grading system available in all schools. As a result, using grades as a predictor of student performance is widely used in many research studies.
Effective academic advising is important for improving student retention, and new techniques for predicting student performance and the risk of failing or dropping a class can help address this issue. Personalized prediction approaches based on recommender systems are effective for accurately forecasting student grades in future courses and in-class assessments. For example, a study [6] with 772 students found that personalized prediction was much more effective than general rule prediction for the whole group of students, using video learning analytics and data mining techniques to predict students' overall performance at the end of the semester. This demonstrates that recommender systems for personalized advising are better than traditional data mining for suggesting general predictions. While many researchers have addressed student performance prediction using recommender systems, there is still room for improvement in these approaches [7].
In the field of artificial intelligence, specifically in machine learning, having a sufficiently large dataset is crucial for effective data mining. Fortunately, an educational data mining competition was held, providing a free dataset aligned with the goals of promoting educational data mining. Although the competition has ended, researchers interested in this field can still obtain the dataset with permission [8]. As a result, many studies in educational data mining have utilized this dataset, including the study being used.
Deep Learning (DL) has become increasingly popular and has shown significant improvement in various domains, such as cybersecurity, natural language processing, bioinformatics, robotics, and medical information processing. As such, there is a growing need to have a more comprehensive understanding of DL and its techniques. In a recent study [9], the authors propose a holistic approach to provide a more appropriate starting point for developing a complete understanding of DL. They discuss the importance of DL and present the architecture of DL techniques and networks, which can be applied to other methods. This approach can be useful in developing advanced models for predicting student performance and recommending elective courses. With the increasing availability of educational data and the application of DL techniques, it is possible to develop accurate and personalized course recommendation systems that can greatly benefit students and universities.
The architecture of deep learning techniques has been successfully applied in various fields, including entertainment, and has the potential to be adapted to other methods. For instance, a previous study utilized deep learning architecture in matrix factorization to improve prediction accuracy in the entertainment industry [10]. As every method is suitable for a particular issue and data sample, the approach demonstrated positive results in the entertainment field and may be similarly effective in the education sector [11]. To this end, this study proposes a deep learning-based approach, called Deep Biased Matrix Factorization, to enhance the accuracy of student performance prediction. This method involves using multilayered deep factorization to improve prediction accuracy, potentially enhancing course recommendation systems in educational settings.

Related work
In the field of educational data mining, predicting student performance is a critical task. A wide range of data mining techniques has been employed for this purpose, each with its own strengths and limitations. A survey was conducted to provide a comprehensive overview of the intelligent models and paradigms used in education [12]. The survey identifies various challenges in predicting student performance, such as the high dimensionality of educational data, class imbalance, and the lack of labelled data. It also discusses the pros and cons of several data mining algorithms, including traditional methods like decision trees, Bayesian models, and instance-based models, as well as more recent techniques like neural networks and support vector machines. Finally, the survey provides recommendations for future research in educational data mining, including the need for more transparent and interpretable models, the use of multimodal data, and the development of hybrid models that combine different techniques. In the field of educational data mining, many studies have explored the use of decision tree algorithms for predicting student performance. One such study, conducted by the author of [13], compared three different decision tree algorithms and found that J48 was the most effective for classifying and predicting student actions. The study also noted that the decision tree graphs' structure was influenced by the number of input attributes and the end class attributes. Another study examined using k-NN and decision tree classification methods to predict employee performance using internal data. Additionally, researchers have conducted comparative studies, such as [14], which compared the effectiveness of decision tree and Bayesian network algorithms for predicting student academic performance. These studies highlight the importance of selecting appropriate algorithms for specific prediction tasks and suggest avenues for further research in educational data mining.
In recent years, predicting student performance in higher education has gained significant attention, and various techniques have been proposed for this task [15]. One such technique is the association rule mining method in data mining, which has been employed to identify hidden relationships and patterns among student data to predict their academic performance. This method is effective in improving the accuracy of student performance prediction and can be used to provide personalized academic advising and support for students. For example, in a study [16], an association rule mining-based model was developed to predict student performance using data collected from a learning management system. The results demonstrated that the proposed model outperformed other commonly used machine learning algorithms in predicting student performance. This highlights the potential of using association rule mining for student performance prediction and the importance of exploring this technique further.
Additionally, another study has proposed an academic advising system using case-based reasoning (CBR) that recommends the most suitable major for each student by comparing the historical case with the student case, thus providing personalized advice [17].
The authors in [18] address developing a recommender system that suggests a set of learning objects to multiple students. In order to handle recommendations for groups, they represent the recommendation process as a noncooperative game, intending to achieve a Nash equilibrium. They showcase the efficacy of their model through a case study experiment. In addition, they create a system that helps university students select elective courses by utilizing a hybrid multi-criteria recommendation system that incorporates genetic optimization techniques [19].
Rivera A.C. et al. [20] conducted a systematic mapping study of education recommender systems (RS) and identified several statistical methods to address the problem of predicting student performance using RS. In their paper [21], the authors propose several methods for building course recommendation systems, including student k-nearest neighbours, course-kNN, standard MF, and biased MF. These methods are analyzed and validated using an actual data set before selecting the appropriate methods. The authors presented a comprehensive framework for building a course recommendation system. However, it is important to note that this study mainly focuses on the application systems and relies on baseline methods. Further research is needed to explore more advanced techniques and evaluate their performance.
There has been increasing interest in integrating social networks into RS, as studies such as [22] have shown that prediction accuracy can be improved by utilizing information from users' social networks. To this end, various methods have been proposed to integrate social networks into the MF. Several experiments have confirmed that social networks can provide independent sources of information that can be effectively utilized to enhance the quality of recommendations. Moreover, the study [23] demonstrated that incorporating the relationship between classroom members into the training model can significantly improve prediction accuracy. However, it should be noted that this algorithm only applies to a dataset with user relationships.
In addition, another paper [24] proposed a novel approach for incorporating the relationships between courses (e.g., knowledge/skills) into the MF, which can help to solve the PSP problem. This approach involves gathering information about course relationships and using this information to enrich the recommendation system. Such methods have shown great potential in improving the performance of RS, particularly in situations where traditional methods may not be effective.
Recently, there has been a growing interest in applying knowledge transfer between domains. For example, Tsiakmaki M. et al. [25] investigated the use of transfer learning in improving the performance of a learning model. Transfer learning is a machine learning approach that leverages knowledge obtained from one problem to improve the performance of a learning model on another related problem. In another study [26], deep learning models, such as Long Short-Term Memory and Convolutional Neural Networks, were proposed to predict student performance in educational data mining. The authors optimized these models by using various data preprocessing techniques, such as Quantile Transforms and MinMax Scaler, and applied robust machine learning approaches, such as Linear Regression, for prediction tasks. However, these studies did not incorporate personalized prediction using data mining techniques.
Collaborative filtering is a widely used technique for making recommendations in recommendation systems. To improve the quality of predictions and recommendations made to the user, the Matrix Factorization (MF) technique has been extended with a Deep Learning paradigm to create Deep Matrix Factorization (DeepMF) [27]. DeepMF uses a layered architecture that refines an MF model successively, with each layer using the knowledge acquired from the previous layer as input. The main objective of DeepMF is to learn high-level representations of user-item interactions and to capture the non-linear relationships between users and items. DeepMF effectively improves the EAI Endorsed Transactions on Context-aware Systems and Applications Vol. 9 (2023) accuracy of recommendation systems, making it a promising approach for enhancing collaborative filtering. Another study has validated the effectiveness of DeepMF in a student performance prediction system [11].
The main aim of this study is to enhance the precision of predicting student performance through the extension of the Deep Matrix Factorization (DeepMF) model using Biased Matrix Factorization instead of the standard Matrix Factorization in the education domain. To achieve this aim, the paper offers an overview of the problem formulations used in predicting students' grades and explains the fundamental methods employed, including matrix factorization and biased-matrix factorization. The proposed approach integrates a deep factorizing architecture with the Biased Matrix Factorization to achieve a higher accuracy rate. The study then presents the findings and comparative analysis to evaluate the effectiveness of the proposed method in enhancing the precision of student performance prediction.

Problem Definition
The problem of predicting student performance has been mapped to a recommendation prediction task in several studies, including [21], [23], [24]. In a recommender system, there are three primary elements: user, item, and rating, where the task is to predict the user's rating for all unrated items and recommend the top-N highest predicted scores. Similarly, in the PSP problem, there are three virtual objects: student, course, and performance (correct or incorrect), and the task is to predict the course results that the students have not learned or solved in this context. Figure 1 depicts a similar mapping between the PSP and RS, with student, course, and grading becoming user, item, and rating, respectively. The availability of student scoring management systems in universities presents an opportunity to predict student performance using computer science methods. Despite the potential benefits, these systems have not been effectively utilized for this purpose. A common challenge is diversity in dataset structures, though they typically contain three key fields: student ID, course ID, and performance. By leveraging computer science methods, such as machine learning and data mining, it is possible to uncover valuable insights that can aid in predicting student performance. However, it is important to carefully select the appropriate techniques and ensure the data is properly processed to achieve accurate results.
The process of factorizing students and problems based on their performance is illustrated in Figure 2. Throughout the rest of the paper, we will use the terms "course," "problem," and "task" interchangeably.
In the past few years, the research community has dedicated significant efforts to enhance the accuracy of predictive models by incorporating information from independent sources. This information integration has been demonstrated to improve results in several studies. Furthermore, some researchers have successfully applied models from other techniques to supplement their existing models and achieved positive outcomes. These endeavours have advanced the field's understanding of how various data sources and models can be combined to enhance prediction performance, showcasing the potential for further research in this direction.

Biased Matrix Factorization Method
There are two main types of recommender systems: collaborative filtering and content-based filtering [28]. Content-based filtering systems use information about both the user and the item to find matches between users and items . On the other hand, collaborative filtering systems rely on the past behavior of users to recommend items in the future. Collaborative filtering can be further divided into two types: user-based collaborative filtering and itembased collaborative filtering. Similar users are identified in user-based collaborative filtering, and recommendations are made based on the items they rated highly. In itembased collaborative filtering, items are compared, and recommendations are made based on similar items that a user has shown interest in. Collaborative filtering approaches can be classified into two main categories: memory-based and model-based collaborative filtering. Memory-based approaches, also known as neighborhood-based methods, rely on the user's or item's neighborhood for prediction. Examples of memory-based approaches include User-kNN and Item-kNN methods. On the other hand, model-based approaches typically use statistical models with some parameterization, such as latent factor models. Among the model-based approaches, matrix factorization (MF) has gained popularity, particularly after being used in the Netflix Prize competition.
The Matrix Factorization technique has gained popularity due to its scalability and high predictive performance. This approach involves creating a low-rank matrix approximation of the rating matrix, which can be enhanced by integrating additional terms to improve prediction accuracy. By combining the MF and the bias, a more advanced model called Biased Matrix Factorization (Biased-MF) can be created [21]. The main idea of matrix factorization is to approximate the matrix (| |×| |) by an inner product of two smaller matrices (W and H). Biased-MF incorporates two small matrices ( , ) and the global mean μ to produce predictive ratings, making it more scalable. A model that factorizes the matrix with the biased term and generates a predicted grading for students to learn a course is shown in Fig. 3. The gradings in this model include the global mean, user-specific bias, item-specific bias, and the true interaction between users and items.
In the context of educational data mining, a popular method for predicting student performance involves the use of a matrix factorization technique. This approach involves the use of a matrix = ( , ) ∈ × that collects the grades earned by students for courses . Typically, these grades are represented as float values between 0 and 1, or as an empty value ∅ if the student has not taken the course. The goal is to factorize into two smaller matrices, and , of dimensions × and × , respectively, where is a lower rank than and . These matrices can be seen as projections or co-projections of the students and courses into a k-dimensional latent space. The elements of these matrices, and ℎ , respectively, can be used to predict the grade/mark g for a student s to learn a course . To achieve this prediction, the dot product of the corresponding row of and column of is computed, which represents the predicted grade/mark.
In order to capture the interplay between users and items, the BiasedMF approach can be used within the context of Recommender Systems. This approach can address the "user effect" (also known as "user bias") and "item effect" (or "item bias") that can be present. In the realm of education, the user and item biases refer to the student and course biases, respectively. The student bias represents how proficient or unskilled a student is in completing a course, while the course bias captures how challenging or simple the course is to complete successfully. By incorporating these biases into the model, predictions can be improved in accuracy. The prediction function for a given student s on a given course c can be formulated as follows: ̂= + + + (2) where ̂ represents the predicted performance of student on course , is the overall average performance in (3), is the bias term for student in (4), is the bias term for course in (5), is the weight vector for student , and is the weight transpose vector for course .
The Root Mean Square Error (RMSE) is a widely used criterion for optimizing W and H parameter values in matrix factorization models. RMSE measures the difference between the predicted and actual values of the ratings or grades in a recommender system. By minimizing the RMSE, we can obtain the optimal parameter values that lead to the best predictive performance of the model. The RMSE value is computed by taking the square root of the average of the squared differences between the predicted and actual ratings. Mathematically, the RMSE can be computed using the equation below: The Matrix Factorization (MF) technique [18] involves optimizing W and H parameters during the model's training. W and H matrices are initialized with random values, typically from a normal distribution. Additionally, Where ‖•‖ 2 is a Frobenius † norm, λ is a regularization weight. The error function − can be derived to the and ℎ resulting in the following updated rules for learning the model parameters. The and ℎ are updated by the equations below (where = −̂, and ′ is the updated value of , and ℎ ′ is the updated value of ℎ ) The values of the ′ and ℎ ′ are carried out respectively.
The bias terms are typically initialized to 0 and then updated during the model training process, with the magnitude of the update being proportional to the learning parameters that are used for updating the main matrices W and H. Mathematically, the updated bias terms can be computed using the equation below: Where β is the learning rate. We update the values of and iteratively until the error converges to its minimum ( −1 − < ) or reaching a predefined number of iterations. Finally, the performance of student on courses is now determined by equation (12): Algorithm Biased-MF-StudentPerformancePrediction 1. Let ∈ be a student, ∈ a course, ∈ a grade, be an average grade from .

Let
, ℎ be latent factors of students, courses 3.

Deep Biased Matrix Factorization method
In the current landscape of predicting student performance, a common strategy is to enhance the accuracy of the predictions by utilizing various techniques of Matrix Factorization, including Biased-MF [21], Social-MF [23], and CRMF [24]. These techniques aim to pose an optimization problem by defining an error function that integrates multiple sources of information to evaluate the model's performance. The error function measures the divergence of the model from the actual data, and the objective is to minimize this error. Ultimately, the model with lower errors is considered more effective in predicting student performance.
One approach to improving the accuracy of matrix factorization-based recommendation systems is to use a deep factorizing architecture that combines multiple stages of biased matrix factorization methods. This approach involves recursively applying biased matrix factorization to the input matrix, with each iteration refining the approximation until the desired level of accuracy is achieved. The complete model can be viewed as a stack of models, with each stage using a biased matrix factorization technique to estimate the optimal values of the latent factors. By integrating multiple stages of biased matrix factorization, the model can capture more complex patterns in the data and improve the quality of recommendations.
Our objective is to expand on the paradigm of Deep Matrix Factorization as presented in the research papers [11] [27] by introducing a novel integrated model named Deep Biased Matrix Factorization (DBMF). Figure 4 provides a visual illustration of the functioning of the DBMF model. The model's initialization is based on the input of the Biased Matrix Factorization in a Collaborative Filtering-based recommender system, which involves a matrix X comprising the grades/marks of students for various courses/exercises, along with their corresponding bias terms. DBMF model improves the quality of predictions by repeatedly training and refining the output.
As in the Biased-MF method, this matrix will also be called = 1 which is the beginning of the process. Approximating a matrix | |×| | 1 by a product of two smaller matrices, 1 and 1 , is a form ̂1 ≈ 1 • 1 . The matrix ̂1 provides all the predicted gradings stored in the stack at the first step. At this step, the recursive is begins. A new matrix 2 = 1 −̂1 is built by computing the attained errors between the original gradings matrix and the predicted gradings stored in ̂1 . A factorization again approximates this new matrix 2 into two new small-rank matrices ̂2 ≈ 2 • 2 , which produces the errors in the second step 3 = 2 −̂2. This process is repeated many times by generating and factorizing successive error matrices 1 … . Presumably, this sequence of error matrices converges to zero, so we get precise predictions as we add new layers to the model. Similar to the Biased-MF method presented in the baseline methods section above. Two small k-rank matrices are trained such that the product 1 • 2 is a good approximation of the rating matrix 1 = , that is, in the usual Euclidean distance. The term 1 ∈ 1 | |×| 1 | is a matrix where each row is a vector (rendering the student ) and has 1 latent factors.
Similarly, the term 1 ∈ 1 | |×| 1 | is a matrix where each row is a vector ℎ (rendering the course ) and has 1 latent factors. The approximation can be expressed as: 1 ≈̂1 = 1 • 1 (13) To implement the deep factorizing model, we subtract the approximation performed by ̂1 to the original matrix , to obtain a new sparse matrix 2 that contains the prediction error at the first iteration: Note that positive values in the matrix 1 mean that the prediction is low and needs to be increased. Similarly, the negative values in the matrix 1 mean that the prediction is high and needs to be decreased. Indeed, this adjustment is the main idea of applying the deep factorizing approach. To do this, we need to perform a new factorization to the error matrix 2 in such a way that 2 ≈̂2 = 2 • 2 (15) The approximation process in each step is performed similarly. However, two matrices 2 and 2 have orders × 2 and 2 × for a definite number of latent factors 2 . Note that we should take 2 ≠ 1 to get various resolutions in the factorization. In the general case, if we computed at − 1 steps of the deep factorizing, the ℎ matrix of errors can be expressed: At the step , we are also factorizing into matrices and have the latent factors until the sequence of error matrices converge to zero. ≈̂= • (17) Once the deep factorization process ends after steps, the original grading matrix can be reconstructed by adding the estimates of the errors as ≈̂=̂1 (18) For any step = 1, ⋯ , the factorization ≈ • is sought by the standard method of minimizing the euclidean distance between and • by gradient descent with regularization. The error function for the DBMF now becomes: Where the term is the regularization hyper-parameter of step to avoid overfitting. The error function can be derived to and ℎ resulting in the following updated rules for training the model parameters. The and ℎ are updated by the equations below (where = − , and ′ is the updated value of , and ℎ ′ is the updated value of ℎ ). With the new error function, the values of ′ , ℎ ′ and bias terms are updated, Where is the learning rate hyper-parameter of step to control the learning speed. In this way, after finishing the nested factorization, all the predicted ratings are collected in the matrix ̂= (̂), where the predicted grading of the student to the course is given by: ̂= ∑ * (ℎ ) + ∑ + ∑ + ∑ Note that this method consists of successive repetitions of the BiasedMF process using the results of the previous BiasedMF as input. All the parameters are stored in the stack, so a recursive implementation is easily used.

Proposed Algorithm -DBMF
The proposed method, which integrates deep factorizing architecture, is described in detail in the function "Deep-Biased-Matrix-Factorization (DBMF)". The algorithm takes the original matrix X and the model parameters as inputs. The output of the algorithm is a stack that contains the set 〈 , , , , 〉.
It is worth noting that the parameters used in each factorization of the DBMF algorithm are stacked in a way that each factorization uses a different set of parameters. The parameters used in the first factorization are placed at the top of the stack, followed by the parameters used in the second factorization and so on, until the parameters used in the last factorization are placed at the bottom of the stack. This enables us to set the stopping criterion of the algorithm as the depth, typically around four layers.
The DBMF employs a recursive process to factorize student and course data using stacks of stochastic gradient descent with k latent factors, a β learning rate, a λ regularization weight, a stopping condition, and a depth. In each depth, parameters are popped to carry out the block of Biased-MF statements in lines 1-17, which includes the standard MF method and bias terms. The complete training models are recursively pushed to the stack in lines 20-21.

Dataset
The ASSISTments dataset, made available by the ASSISTments Platform, is a collection of data gathered from a web-based tutoring system that aims to help students learn mathematics and provide teachers with an assessment of their students' progress. This platform allows teachers to write customized questions, each of which includes associated hints, solutions, and web-based videos. After preprocessing, the ASSISTments dataset consists of 1011079 gradings (ratings) given by 8519 students (users) on 35978 tasks (items).
This investigation utilizes the ASSISTments dataset, a web-based math tutoring system that was initially developed in 2004 through a collaborative effort between the Worcester Polytechnic Institute and Carnegie Mellon University. The system was designed to provide students with personalized assistance and automated assessments of their performance at a detailed level [7]. ASSISTments is a popular tool middle and high school students use for their daily learning, homework, and preparation for the MCAS (Massachusetts Comprehensive Assessment System) tests. A snapshot of the ASSISTments dataset is depicted in Figure 5, which includes essential fields required for data mining, such as "User_id," "Problem_id," and "Correct."

Evaluation
The primary objective of this study is to predict student marks using rating prediction, which falls under explicit feedback. As a standard measure for evaluating the models' performance in the Recommender Systems (RS) field, we have chosen Root Mean Squared Error (RMSE). We have adopted the hold-out approach for experimentation with the models, which splits the dataset into two parts, using twothirds of the data for training and one-third for testing.
The accuracy of the prediction heavily relies on the parameters fed into the algorithm. If the parameters are unsuitable, the algorithm would produce inaccurate predictions, even if theoretically correct. Consequently, identifying the optimal parameters for the model is of utmost importance. Selecting the best set of hyperparameters entails finding the optimal values that strike a balance between overfitting and underfitting the training data, and can significantly impact the model's predictive power. Thus, selecting the right set of hyperparameters is crucial in enhancing the model's performance.
Optimizing hyper-parameters is critical to improving the performance of machine learning models. In this study, we employed a parameter optimization technique known as hyper-parameters search to search for the best parameter values for the models used [21]. The hyper-parameters search method involves two stages: raw search, which is used to find the best hyper-parameters in the long data segments, and smooth search, which is used to find nearby optimal hyper-parameters for the short data segments. We utilized RMSE as the evaluation metric for the models, and the hyper-parameter search results for the ASSISTments dataset are presented in Table 1. This approach allows us to find the optimal hyper-parameters for the models, which in turn enhances the performance of the models in predicting student marks accurately. Once the optimal hyper-parameters have been determined using the hyper-parameter search method, the selected parameters are utilized for training and testing the individual models. However, incorporating deep factorization in the models increases the training time, which can become increasingly time-consuming as the depth of the model increases.

Experimental Result
In this study, we have evaluated the performance of our proposed method, which integrates deep factorizing architecture into Biased-MF called (DBMF), against two baseline methods. The baseline algorithms, including MF and Biased-MF, are widely used and implemented in various open-source libraries, such as LibRec, MyMediaLite, and Collaborative Filtering For Java (CF4J). By leveraging these existing libraries, we could extend and implement our proposed algorithm, DBMF, and compare its performance with the baseline algorithms.
We carried out three experiments to compare the proposed approach, DBMF, with other methods. The experimental results are presented in Figure 6, and they show that the proposed approach achieves the smallest RMSE of 0.332 on the dataset, which is the best result compared to the other methods. The smaller the RMSE, the better the model; hence, the proposed approach (DBMF) is considered the most accurate and effective method for predicting student performance.

Conclusion
The present study introduces an innovative approach to improving the accuracy of student performance prediction by leveraging the deep factorizing architecture to enhance the Biased-MF method. Applying deep factorizing principles can refine the model's output through successive training, significantly improving the prediction results. Experimental results on the published competition datasets reveal that this method is highly effective and outperforms the baseline approaches, including MF and Biased-MF.
While applying deep factorizing architecture to matrix factorization may slow down the training process, it can be addressed by implementing parallel algorithms. This study focuses on deep factorizing architecture for biased matrix factorization without using other complex integrating techniques. Further research could involve exploring metadata integration to enhance the approach's effectiveness. Another potential avenue for future work is using graphics processing units (GPUs) to expedite the algorithm's execution.
In summary, the current work highlights the potential of utilizing deep factorizing architecture to enhance the accuracy of student performance prediction. By leveraging this technique, researchers and educators can better understand students' learning outcomes and tailor their teaching methods to meet individual needs. The proposed approach is simple to implement and can significantly improve prediction accuracy, making it a promising avenue for future research and development in this field.