Movie Recommender System Using Machine Learning

In this research, we propose a movie recommender system that can recommend movies to both new and existing customers. It searches movie databases for all of the relevant data, such as popularity and beauty that is required for a recommendation. We apply both content-based and collaborative filtering and evaluate their advantages and disadvantages. To build a system that delivers more exact movie recommendations, we employ hybrid filtering, which is a combination of the outcomes of these two processes. The recommendation engines are also used for business purposes and to make strategies for organizations. Due to the growing demands of customers and user’s recommendation systems plays a huge role. These recommender systems also help us to utilize our time in the busy world by giving us more relevant searches. These systems are generally used with the movie’s websites or with many commercial applications and are of great use. This type of recommendation system can be also used for precise results. It will make movies suggestions more relevant as per the need of the users.


Introduction
Our main goal is to create an improved recommender system that provides precise recommendations to the customer. Recommendation engines are generally a mixture of both content-based filtering and collaborative filtering.
The Collaborative Filtering technique is based on the user's previous queries and experiences [1] [2]. We'll anticipate what other users with similar experiences would watch based on that, and then provide the user those recommendations. In collaborative filtering, we will make the customers or users' recommendations and suggestions based on past experiences and behaviours. It works on the principle, for instance, you, have selected an item in the cart or you have purchased it from the website. The same types of recommendations it will show on the basis of your selections of items in the cart.
Content-based filtering techniques make recommendations based on movie characteristics such as genre, director, actor, plot, and so on [3]. We can improve the recommendation by putting more emphasis on a specific attribute. We'll also include a popularity and rating element in this. With Cosine similarity and Term Frequency-Inverse Document Frequency (TF-IDF) Vectorizer, a content-based filtering technique is applied. The content-based filtering technique gives us recommendations that are based on the user's interests. If the user has searched any of the item's past history. This filter will do it works in its way by suggesting to us the same type of content which we are surfing earlier. This tactic is totally dependent on the taste of the customers. Nowadays, we basically use the mixture of both the filtration techniques and called it a hybrid filtration. So, the mixture of both of these types of filtration will result in the new tactic that is the hybrid model. This model will give us a more precise approach to filtration techniques.

Literature Review
In [4] authors suggested a reasonable method of adopting data mining to create a recommendation list. Their method was established on pairs of items which was faster than normal ARM. On average the recommendations scored 88.94 %. In [5] author used collaborative filtering is focused primarily upon the premise that users who have bought a particular product will have similar needs to other users who Sonika Malik 2 also bought the product. Based on customer past purchases, consumer browsing habits, and user segments, author analyse and evaluate three variants of a CF-based recommender system. In [6] procedure used by the authors was collaborative filtering and the similarity measured used was the Pearson correlation coefficient. The dataset was taken from Movie-Lens-100k and the ratings above 2.5 was taken into consideration for recommendations. In [7] movie recommendation system based on a modified user similarity metric and opinion mining has been presented. The primary goal of this paper is to identify the different types of movie opinions (positive, negative, or neutral), as well as to recommend a top-k recommendation list to users. And this system will get the ratings on the basis of particular ratings and reviews. This system will also recommend users depending on patterns and their similarities. Finally, the suggested movie recommendation system was validated using multiple evaluation criteria, and the proposed system outperformed existing systems. In [8] author examine Ecommerce big data, concentrated on the K-means clustering algorithm. Geographic location and the customer's unique identification number are used as clustering restrictions in this study. The challenge of mining such data is difficult. One of the most significant mining tasks is clustering related objects or data, which is extremely beneficial for classification and modelling. The K-means clustering method is a prominent partition-based clustering method that produces high-quality results. In [9] movie recommendation system was devised and implemented. In the world of movies, there are several genres, cultures, and languages to pick from. Users can be recommended a set of movies based on their interests or the popularity of the films. In Hollywood, 600 films are released on average per year, according to a poll. Recommendation algorithms are critical for streaming movie services like Netflix. In helping customers discover new movies to watch. So far, a substantial amount of work has been done in this area. However, there is always an opportunity for improvement. In [10] authors executed a movie recommendation using collaborative filtering. This system is created using Apache Mahout and evaluates the ratings to give movie recommendations. The system displayed the raw output from the collaborative filtering technique. The system recommends 10 movies to users and returns the closest neighbours which have the most similar taste preferences as the user.

Methodology
In the discipline of machine learning, classification algorithms that use several ways of organizing and classify information.

Collaborative Filtering
Collaborative filtering is based on the fact that products and people's interests have a relationship. Many recommendation systems employ collaborative filtering to uncover these connections and make an appropriate recommendation of a product that the consumer would enjoy or be interested in. The relation between user-based and item-based filtering is shown in Fig. 1.

Figure 1. User based & Item based CF
The Singular Value Decomposition (SVD) is a linear algebra method that has been widely applied in machine learning as a dimensionality reduction mechanism. The SVD approach is a matrix factorization technique that decreases the number of features in a dataset by minimizing the space dimension from N to K (in which K<N). The SVD is a collaborative filtering algorithm that is utilized in the recommender system. It is organized as a matrix, with each row representing a user and each column representing an object. The ratings that users give to items are the elements of this matrix. In Fig. 2, singular value decomposition is used in computing the root mean square error and mean absolute error.

Content Based Filtering
This filtering is done based on the product's description or certain data. Based on the context or description of the products, the algorithm determines their resemblance. The user's previous purchases are considered when recommending related products.

Content Based Filtering using Cosine Similarity
Cosine Similarity is a metric that measures how similar two or more vectors are. The cosine of the angle between vectors is the cosine similarity. The vectors are usually non-zero and belong to an inner product space. The divide in between the Euclidean norms or vectors having magnitude or simply between the vectors describes the cosine similarity mathematically. Fig. 3 shows the relation between Collaborative filtering and Content Filtering.

Figure 3. CF & Content Based Filtering
Many libraries such as scikit-learn, matplotlib has cosine similarities inbuilt which is of great use. In Fig. 4, count vectorizer matrix is used to calculate the occurrence of word in description of movie, after that we have compute the cosine similarity between the different movies.

Content Based Filtering using TF-IDF
Term Frequency Inverse Document Frequency is a commonly used algorithm to convert the text into a more logical illustration. This is fit for the prediction algorithm. Term frequency is the number of words repeating or occurring in the document. Inverse term frequency is the informative part which is contained in the document. This gives the whole meaning to the documentations.

Figure 5. Computing the TF-IDF Vectorizer
In Fig. 5, TF-IDF vectorizer matrix is computed on the description of the movies provide in the database.

Result & Discussion
After learning and analysing the above methods we have tried to implement them. The implementations have been done on the given datasets. The dataset which we have taken are from the movies lens website. It has a huge database of the movies. We have considered the TMBD ratings for our implementations. We have taken in consideration genres, cast, crew, reviews and ratings. We have applied different types of filtration techniques like collaborative filtration and content based filtration.   Fig. 6, graph represents the no. of ratings given to the total number of movies in the database. In Fig. 7, graph is used to study the ratings and number of reviews that a particular movie is getting. This is basically a bar graph describing the no. of reviews that specific rating can have. In Fig. 8, the graph is explaining the number of reviews of first 25 movies in the dataset. The given bar graph can be used for analysing and studying the extent for the particular movies. This graph gives us a clear picture of the number of reviews corresponding to a particular movie. Fig. 9 shows the ratings given by the user ID 1 to the different movies. In Fig. 10, we get an estimated prediction of 2.584 for movie ID 202 using collaborative filtering. This recommender has one great feature that it does not work on the basis of genre or what the people have watched. It purely works on basis of ratings or what the customers or users have rated for the specific product.  In Fig. 11, we get the different movies on the basis of input movie provide 'The Godfather' using content based filtering with cosine similarity. The above graph as shown in Fig. 12, provide the analytical understanding of the movies w.r.t. ratings with the help of traditional method i.e. cosine similarity, that recommends the movies of less ratings.
In content based filtering using TF-IDF vectorizer, the English words are eliminated which are not required for our recommendations. Elimination of these types of words will make our recommendations more precise than the traditional method of cosine similarity. In Figure 13 the output of content based filtering using TF-IDF vectorizer is more precise that the traditional cosine similarity method. With the ease of this method we can successfully get the suggestion of the users need.

Figure 13. Content based filtering using TF-IDF vectorizer
Following is the graph as shown in Figure 14 explaining the movies with respect to ratings. It is a more precise as compare to the traditional method i.e. cosine similarity. In this TF-IDF vectorizer is used, which is more efficient as compare to previous one.

Conclusion
In this paper, we used a movie recommendation system based on machine learning algorithms. Consequently, users receive better suggestions as a result of collaborative filtering, which is based on their prior experiences and activities. To suggest movies to the user, we used the SVD algorithm in Collaborative Filtering. The fundamental problem with collaborative filtering is that if a new user has no previous experience, the recommender cannot provide relevant recommendations. It is also possible that collaborative filtering will fail to produce meaningful suggestions if the data becomes too large. By comparing the attributes of the specified item with those of other items, content-based filtering makes suggestions. A TF-IDF vectorizer and Cosine Similarity were employed for Content-based filtering. Due to its ability to count every word in movie genres, actors, and directors, TF-IDF vectorizer provides better results than cosine similarity.