Machine Learning based Disease and Pest detection in Agricultural Crops

INTRODUCTION: Most Indians rely on agricultural work as their primary means of support, making it an essential part of the country’s economy. Disasters and the expected loss of farmland by 2050 as a result of global population expansion raise concerns about food security in that year and beyond. The Internet of Things (IoT), Big Data and Analytics are all examples of smart agricultural technologies that can help the farmers enhance their operation and make better decisions. OBJECTIVES: In this paper, machine learning based system has been developed for solving the problem of crop disease and pest prediction, focussing on the chilli crop as a case study. METHODS: The performance of the suggested system has been assessed by employing performance metrics like accuracy, Mean Squared Error (MSE), Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). RESULTS: The experimental results reveals that the proposed method obtained accuracy of 0.90, MSE of 0.37, MAE of 0.15, RMSE of 0.61 CONCLUSION: This model will predict pests and diseases and notify farmers using a combination of the Random Forest Classifier, the Ada Boost Classifier, the K Nearest Neighbour, and Logistic Regression. Random Forest is the most accurate model.


Introduction
By 2050, projected rise in food demand is predicted to be 70 percent, putting pressure on natural resources and land for food production [1].Complicating factors include things like deforestation, global warming, water scarcity, and soil erosion.Indian agriculture has great potential but has not been developed to its fullest because of a lack of up-to-date technology.This work intends to solve these problems by creating a machine learning-powered platform for farmers to monitor and prevent crop damage from pests and illnesses.
The purpose is to assist farmers in increasing agricultural yields and enhancing their resistance to pests and diseases.Agriculture is more crucial than ever because present food supply is insufficient to fulfil projected demand by 2050.IoT, robotics, drones, and AI are all examples of technologies that can be used in "smart farming," which can aid farmers in making more informed decisions and optimising their operations to produce more and higher-quality crops with less manual effort.The concept of "smart farming" encompasses the utilisation of data and communication technologies to optimise agricultural operations.Incorporating technologies like sensors, telecommunications, data analytics, and  Many parts of the United Kingdom are devoted to arable farming, which entails the cultivation of cereal grains such wheat, barley, maize, rice, millets.Farming practices of "continuous following" can assist boost soil nitrogen naturally.Farming on a subsistence level is done on a modest scale, and not for profit.It's the most common type of conventional farming because it doesn't call for any special machinery or technology.For financial gain, commercial farmers cultivate land and raise cattle, typically on a massive scale and with the use of sophisticated machinery.Techniques like monoculture are best suited to individuals who have access to larger plots of land and include growing a single cash crop instead of numerous varieties at once.To boost crop yields, farmers can either engage in extensive farming, in which more area is put under cultivation, or intensive farming, in which a smaller plot of land receives a greater investment of labour and resources.'Sedentary farming '' is a farming method that requires staying put on the same plot of land and using it repeatedly over the course of several years to grow crops and even raise livestock.This is one of the earliest forms of farming that is still practised today.Plantation crops including coffee, tea, spices, and rubber are grown on massive plantations in either temperate or tropical climates.It is vital in providing the domestic and foreign markets.Growing and planting several different types of crops on the same piece of land is known as "multiple cropping" or "polyculture."This strategy helps to increase productivity in a single growing season by matching the crops according to their season and water requirements.Recent years have seen the development of cutting-edge methods for getting the most out of every crop, such as optimal spacing schemes.
Intercropping, where crops are cultivated in between one another rather than in separate rows, and agroforestry, where annual crops are produced with perennial trees, are further examples of multiple cropping.
The primary purpose of using the chilli fruit is to increase the overall flavour of a dish.First place goes to India, then China, then Peru, Spain, and finally Mexico.Indian chilies have a fiery flavour and vibrant hue.Bell peppers refer to the larger kinds that are commonly used as vegetables.Peppers thrive at a temperature range of 20-25°C, characterised by warm, humid, and dry conditions.Extreme heat and dry air might prevent flowers from opening and stunt fruit growth.To thrive, chillies require a consistently moist environment, specifically black soil or a sandy loam soil that drains well and is rich in organic matter.Before planting, the soil in some regions is amended with gravel and sand.Aphids, thrips, fruit borer, broad mites, melon mush, and root knot nematodes are just some of the pests and diseases that can affect a chilli crop.

Related Works
This research gave us an overview.This study examines how robots and drones use artificial intelligence in agriculture.It reviews robot and drone-based agricultural automation and weeding technologies.Explores how IoT, wireless sensor networks, and machine learning might forecast apple scab in orchards and enhance farming [2].It surveyed local farmers about precision agriculture and the challenges of adopting new technology.A user-friendly software will notify farmers of apple orchard conditions in real time.Orchard sensor nodes capture and evaluate data in real time for immediate intervention.The study used data to create a prediction model and examined real-time application issues [3].The goal of this research is to improve agricultural outputs by collecting and analysing data on environmental factors such as climate, precipitation, soil, seed, crop yields, humidity, and wind speed.Data from a crop prediction system is analysed using MapReduce and k-means clustering.The GUI lets people enter data and get recommendations.It uses k-means clustering visualisation graphs to calculate crop yields for a variety of crops, soil conditions, and seed kinds.2D and 3D graphs illustrate varying relationships.Scalable, it can recommend crops for different states [4].This study discusses how Machine Learning benefits agriculture at every stage.SVM and ANN models are used in species, field conditions, crop, and livestock management [5] 2018-19 agriculture responders were evaluated.Marginal, small, and major farmers paid 393952.74,416867.20,and 420907.37 per hectare, respectively.Human labour utilisation, especially harvesting, was inversely associated with farm size.Farm size adversely affected cattle and family labour utilisation [6].
A vegetable research institute studied the seasonal occurrence of chilli crop pests with respect to climate elements in 2018.Aphids, thrips, whitefly, borers first emerged on 43rd, 45th, and 1st SMWs, respectively.The 44th SMW saw spiders and ladybirds in the field.Borders peaked on the 7th SMW, while sucking pests (aphids, thrips, and whitefly) peaked on the 6th.The 6th SMW had the most predators.Aphids and thrips increased with mean temperature, relative humidity, and rainfall.Borers correlated with rainfall.Maximum temperature and sunshine hours greatly affected spider and ladybird populations.[7] This study examines how plant height, maturity, and meteorological conditions (sunshine length, wind speed, rainfall amount, and relative humidity) affect chilli plant whitefly (WF) populations.Adult WF preferred upper leaves to middle and lower leaves, whereas the number of immature WF (larvae and pupae) was highest at 112 DAT [8].DAT has the fewest adult and immature WF. Adult and immature WF at each sampling date correlated positively with plant height and maturity.Adult and immature WF abundance was unrelated to sunshine duration, wind speed, rainfall, or relative humidity.These findings recommend implementing WF control immediately after transplanting chilli plants.This study examines how climate change affects Indian dry chilli production, a spice crop.Researchers simulated the rainfed yield of two important chilli cultivars in Tamil Nadu using CCSM4 model data.These cultivars exhibited negative yield deviations in most Tamil Nadu agro climatic zones, although the southern zone yielded 7% and 5% by the end of the century [9].This study examines India's diminishing crop output due to plant diseases, notably in chilli plants, which are susceptible to microbes and pests.Chilli plants are frequently evaluated for pesticide use and leaf features to determine their health.Pesticides prevent disease, but they must be tailored to each plant.Pesticides, chilli plant diseases, leaf features [10].
In this work, Python was used to assess agricultural data.The dataset spans the years 2000-2018 and contains information on rainfall, temperature, crop, ET, acreage, and output.Precision was increased with the help of K-means clustering, K-nearest neighbours, support vector machines, and Bayesian network techniques [11].This Provides a baseline for monitoring and assessing Indian agriculture.It can learn about India's agriculture [12].This paper analyses Indian agriculture's biggest problems and offers remedies.Seeds, fertilisers, manures, irrigation, and small, unorganised landholdings are addressed.The study uses secondary data without statistics.[13] Smart farming uses computers to boost crop productivity, according to this paper.It covers smart farming technology.IOT, irrigation, and agriculture automation are examples.This report helps future smart farming researchers comprehend current technology.[14] It uses photos to identify impacted crops using ML algorithms.KNN, Naive Bayes, SVM, and ANN are covered (Most accurate).Its drawback is that no model is used-only approaches from past research articles.[15] This article employs a residual neural network (ResNet) to categorise harvest leaves and predict plant illnesses.54,306 public images of damaged and healthy plant leaves were used to train the model, and it correctly detected 26 illnesses.However, this approach can only detect illnesses after they have harmed the plant and does not offer any treatments.[16] This project uses a Random Forest classifier to identify sick or healthy leaves.They're versatile enough for classification and regression.Random forests outperformed other techniques with less picture data.[17] This experiment evaluated three synthetic pesticides against chilli crop sucking insect pests.Delegate, Novastar, and Transformer were used (Sulfoxaflor).Spinetoram and Bifenthrin + Abamectin were best at reducing thrips and whiteflies, whereas Sulfoxaflor was best at managing aphids.In 2015, Advanta-509 chilli was tested in the Bozdar Agriculture Farm in Tando Allahyar.Insecticides were treated periodically after planting [18][19].According to the findings of this study, India spends approximately Rs.100 billion annually on weed management in arable agriculture.Accurately removing weed roots from the ground is made possible by rotary weeders.This loosens the soil for aeration.Rotary power weeders consume less draught and perform better in fields.Two, four and six blades per flange of a 25-centimetre-wide portable backpack power weeder were put to the test in a chilli crop [20][21][22][23][24][25][26].

Proposed Method and System
This study provides a pest and disease-fighting technique for farmers.We'll forecast pest attacks here.We use humidity, temperature, etc.This work presents a mechanism to help farmers learn about pests that attack their fields and provide better remedies before pest attacks.User input can also predict crop pests using the algorithm.Figure 2 is a block diagram representing the suggested procedure.

Figure 2: Schematic Representation of Suggested System
Crop: The study of the pest and disease deduction is done using the universal crop chilli since it will be more helpful for chilli crop cultivators.
Data Collection Sensors: Here information used is received from crops using different sensors.Database: After the data is collected it is stored into a database.Data pre-processing: Handling of null values and error values before applying a model is an important step.

Modelling and Training:
In this study, we employ multiple categorization models, namely Random Forest, AdaBoost, Logistic Regression, and K Nearest Neighbours, to assess the efficacy of these diverse machine learning techniques.

Materials and Methods
Machine learning is an academic discipline that encompasses the study of enabling computers to acquire novel abilities without the need for explicit programming.There are four sections: Supervised learning is a method whereby inputs and outputs are mapped through the use of labelled data.Learning with the use of rewards and punishments is known as reinforcement learning.One should aim for the highest possible payoff.To learn from both labelled and unlabeled data is the goal of semi-supervised learning [27].It bridges the gap between traditional instruction and self-directed study.

Random Forest
One of the most well-known supervised machine learning techniques, the Random Forest Algorithm, is employed to deal with problems of classification and regression.In general, the more trees there are, the more precise and efficient it is.The method relies on ensemble learning, which pools the outputs of numerous classifiers to maximise accuracy.The Random Forest Algorithm is helpful since it shortens training times and lessens the possibility of overfitting.It has a high degree of accuracy as well, being able to provide reliable forecasts even when working with big databases containing gaps in the data.

AdaBoost
AdaBoost is an ensemble approach that employs one-level decision trees known as "decision stumps."All data points are given the same importance, except those that were incorrectly categorised are given more weight.In the next model, all the points with higher weights have greater importance.Models will continue to be trained until a smaller error is received.

K-Nearest Neighbour
The supervised machine learning method K-Nearest Neighbours (K-NN) is used for assigning a new point to the category to which it is most similar based on its resemblance to existing data points.K-NN is typically employed in classification jobs, but it can also be utilised in regression.It is a "lazy" learning algorithm that does not presuppose anything about the data being studied.The figure 3a) is a pictorial representation of KNN before the model is applied and the figure 3(b) shows about the datapoint which was newly come and assigned to a particular category before the KNN is applied.

Logistic Regression
If you need to make a prediction about a categorical dependent variable based on a set of independent variables, you should look into logistic regression, a well-liked machine learning approach.Classification problems, as opposed to the regression tasks handled by linear regression, are where it shines.Logistic regression differs from linear regression in that, instead of fitting a straight line to the data, a "S" shaped logistic function is used to forecast one of two possible maximum values (e.g.: 0 or 1).Whether cells are malignant or not, or if a mouse is fat depending on its weight, are examples of events that can be represented by this curve.Because of its flexibility in handling continuous and discrete data sets, providing probabilities and classifying new data, and determining which variables are most beneficial for classification, logistic regression is an extremely valuable tool.

Tools and Dataset used
Python can be used as a glue language and for developing applications quickly since it is an interpreted, object-oriented language with dynamic semantics.Compatible with EDA (Exploratory Data Analysis) for finding gaps in machine learning datasets.As part of our internship, we collect data from the industry to use for analysis and modelling.There are seven columns and two thousand four hundred seventy-three rows in the actual dataset.This table has 7 columns.Contains the dataset's index number Minimum Temperature: This field reports the lowest temperature experienced by the chilli harvest.There is a field labelled "MaxTemp" that reports the hottest temperature measured in the pepper harvest.This parameter provides the minimum humidity for the chilli crop, while the next one, "MaxHum," provides the maximum humidity.Problems: Problems with chilli crop minimum and maximum temperatures and humidity are displayed below.Agrometeorological warnings: Each chilli crop issue has a corresponding remedy, which is listed in this column in figure 4.

Implementation and Results
This study was carried out utilising Windows 11, Microsoft Edge, IDLE Jupyter Notebook with Anaconda, and Python 3.10.The Intel Core i5-1135G7 was the hardware of choice.Python libraries such as numpy, pandas, matplotlib, seaborn, and sklearn were employed.As a means of representation, Power BI was employed.The following procedures were taken to put the research into practise:

Implementation
Step (i) Importing required packages Step (ii) Loading the dataset Step (iii) Describing the dataset using describe () and Dataset Information using info () Step (iv) Finding Missing Values and Handling them using isnull () function.These values were then handled using the mode () function and the fillna () function in order to get a more precise data set.In the final data set, all blanks were filled in with zeros.
Step (v) Replacing the categorical values using the label_encoder () function.
Step (vi) Data Visualization: Different types of plots are used for the better understanding and evaluation of the dataset.
Step (vii) Feature Engineering: It is the process of creating new variables from existing data in order to improve model accuracy and simplify data transformations.In this case, the 'StandardScaler' function was used for feature scaling.
Step (viii) Test train Split and Model Fitting: In order to assess the efficacy of a machine learning model, it is customary to split the dataset into a training set, which is used for fitting the model, and a test set, which is not used for training.
Step (ix): In this study, we apply various models to the dataset and assess their performance.Random Forest, K-Nearest Neighbour, AdaBoost Classifier, and Logistic Regression are the various models available.

Accuracy Assessment
Accuracy in classification problems is expressed as a percentage, derived by dividing the number of accurate predictions by the total number of predictions.

Logistic Regression 52
When compared to logistic regression and K Nearest Neighbour, the Random Forest has the highest accuracy 0.90 shown in Table 1.See the graph below for a visual depiction of the accuracy in figure 5.

Evaluation of MSE
MSE is used for quantifying the degree to which a regression line is similar to a given set of data points, and is calculated by averaging the squared errors associated with data connected to a function.The average squared error from the various machine learning techniques employed in this table 2. The mean squared error is 0.37 of Random Forest and 1.01 of K Nearest Neighbors.The mean square error obtained is plotted in the figure 6.

Evaluation of Mean Absolute Error (MAE)
MAE is the difference between what was predicted and what the observation really turned out to be in Table 3.The Random Forest shows the lowest mean squared error of 0.15 and highest to the K Nearest Neighbour 0.62.The mean absolute error obtained is plotted in the below graph in figure 7.  The Random Forest shows the lowest root mean squared error of 0.61 and highest to the K Nearest Neighbor 1.008.The root mean squared error obtained is plotted in the below graph in figure 8.

Conclusion and future scope
Due to this research work, a useful model has been developed to aid agriculturalists.Finding an appropriate dataset in order to teach and evaluate the system with IoT sensors is the first step.Next, the data must be processed so that useful insights can be gleaned and predictions made to aid farmers.The model will predict pests and diseases and notify farmers using a combination of the Random Forest Classifier, the AdaBoost Classifier, the K Nearest Neighbour, and Logistic Regression.Random Forest is the most accurate model.Eventually, this model will be used to build a web application in pickle and flask for widespread release.Pest identification, weather forecasting, crop recommendations, transplantation suggestions, and crop yield prediction are just some of the agricultural issues that have been addressed.

Balasubramaniam S. et al. 2 satellites
are all part of the intelligent farm.Smart farming procedures are depicted in Figure1below. .

Figure 1 :
Figure 1: Smart Farming Regression and categorization are two such examples.Discovering patterns in data without having the data labelled as output is what unsupervised learning is all about.Dimensional reduction and clustering are two such examples.

Figure 8 :
Figure 8: RMSE Using criteria like accuracy, MSE, MAE, and RMSE, it was found that Random Forest outperformed the other models when comparing machine learning algorithms.

Table 1 :
Accuracy Evaluation Table