Smartagb: Aboveground Biomass Estimation of Sorghum Based on Spatial Resolution, Machine Learning and Vegetation Index

This work aims to explore the feasibility of predicting and estimating the aboveground biomass (AGB) of sorghum using multispectral images captured by UAVs, and clarify the quantitative relationship between vegetation index and sorghum AGB based on different spatial resolutions, and build an AGB estimation model based on UAV multispectral images and vegetation index under different spatial resolutions. Combining spatial resolution, vegetation index, and machine learning, a training set is used to train the model, and a verification set is used to verify the model to select the best prediction model corresponding to different spatial resolutions. The three best prediction models under three spatial resolutions are classic machine learning models. 1) when the spatial resolution is 0.017m, the model precision obtained from the random forest is R2=0.8961, MAE=26.4340, and RMSE=32.2459. 2) when the spatial resolution is 0.024m, the model accuracy obtained by the Lasso algorithm is R2=0.8826, MAE=31.106, and RMSE=40.2937; 3) when the spatial resolution is 0.030m, the model accuracy obtained by the decision tree algorithm is R2=0.8568, MAE=30.3373, and RMSE=40.8082; and 4) the model's accuracy decreases with the decrease of spatial resolution. The results show that the combination of spatial resolution, vegetation index, and machine learning algorithm is an effective, fast, and accurate prediction method.


Introduction
Aboveground biomass (AGB) is an important parameter to consider in crop's growth, which is directly related to the final yield and can directly reflect the growth of crop [1,2,3]. Therefore, a rapid and accurate monitoring method of AGB can timely know crop's growth and predict the yield, which is of great significance to agricultural management [1,4]. However, the traditional methods for measuring AGB are not only inefficient but also cause the damage to crops in the measurement process. It is so applicable data acquisition method for large areas. Zhang Xianxian et al. [18] built winter wheat's AGB prediction model by combining vegetation index with machine learning and neural network based on UAV digital images, and the results showed that the effect of machine learning was ideal. Cheng Yifeng et al. [19] built a multivariate compound yield estimation model by combining the relationship between LAI and normalized difference vegetation index (NDVI) and the actual yield of cotton at flowering and boll stage, and realized the prediction of cotton yield in northern Xinjiang. Cui Rixian et al. [20] estimated winter wheat's AGB based on UAV digital image technology by using neural network and regression technology combined with vegetation index, and the results showed that the neural network was better.Tao Huilin et al. [21] based on UAV digital technology, combined machine learning with vegetation index to calculate winter wheat's AGB and got a good prediction result. The above researches show that UAV technology can effectively monitor crop's AGB. At present, few reports have been made on the prediction and estimation of sorghum's AGB by coupling the UAV multispectral image data at different spatial resolutions by machine learning model. Therefore, this study attempts to obtain multispectral image data by UAV at different spatial resolutions, and then estimates sorghum's AGB through the method of combining different spatial resolutions, vegetation index, and machine learning, and compares the differences in generalization ability and prediction accuracy between models, Find out the suitable forecasting methods and provide new theoretical support, and technical means for data collection, phenotype monitoring, intelligent planting management and the yield estimation of sorghum.

Test Zone Overview
The study area is located in Wujiabao Village, Taigu District, Jinzhong City, Shanxi Province (112 ° 30'51 ″ E, 37 ° 26'41 ″ N), as shown in Figure 1. The altitude of this area is about 795~805m, and the average annual frost-free period is 160 to 190 days. The annual average temperature is 10.6 ℃ , and the annual precipitation is 400mm to 600mm. The main precipitation is from July to August. The annual average sunshine hours are 1810 to 2100 hours, which is suitable for sorghum growth.

Experimental Design
The tested sorghum variety is "Jinza No. 22". The sowing method is manual spot sowing. The row spacing is 0.25 to 0.3m, and the plant spacing is 0.2m. The sowing time is April 25, 2021. The sorghum was harvested on October 13, 2021, supplemented by conventional field management. This study is aimed at the real-time monitoring of sorghum growth under the condition of conventional planting and built quadrats under the premise of ensuring the normal growth of crops. The area of a quadrat was 1.0m 2 . The white PVC pipe with a diameter of 5cm is used as the frame of the quadrat. The height of the aboveground part is 2.7m, all as shown in Figure 2.

MultiSpectral Image Acquisition of UAV
The UAV multispectral system consists of the UAV platform and multispectral sensors. The UAV platform is Dajiang Phantom 4Pro 4-axis UAV. The UAV system consists of a flight control system, power supply system, stabilized PTZ, remote control, and display. The UAV system is shown in Figure 3. UAV parameter specifications Smartagb: Aboveground Biomass Estimation of Sorghum Based on Spatial Resolution, Machine Learning and Vegetation Index 3 are shown in Table 1. The multispectral sensor is MicaSense RedEdge MX multispectral camera. The multispectral sensor is light and easy to use. It has five spectral bands: blue, green, red, red edge, and nearinfrared. It is also equipped with an SD memory card to store remote-sensing image data. The multispectral sensor is shown in Figure 3, and the parameters of the multispectral camera are shown in Table 2 and Table 3.   To ensure the accuracy and effectiveness of image data acquisition, it is selected to acquire the affected data in a period of moderate light intensity and stable radiation intensity on a sunny, windless, and cloudless day.
Before the UAV takes off, set the relevant parameters in advance, set the speed at 2m/s, and the flight altitude at 25m, 35m, and 45m respectively. The corresponding spatial resolution is shown in Table 4. The course overlap rate is 80%, and the inter-flight overlap rate is 80%. The planned flight route is larger than the range of the study area. Because of the large range of aerial photography and high accuracy requirements, the UAV uses the autonomous aerial photography mode to plan the route and take vertical photos in the study area and set the shooting time interval of the multispectral camera as 2s. Before the task starts, manually control the multispectral sensor and calibrate it to prepare for later image correction processing.

Image Processing
In this study, AgisoftPhotoScan software is used to preprocess the multispectral image, and ArcGIS10.7 is used to extract the reflectivity of each band. First, screen the images one by one before image splicing, remove the damaged images and remote sensing images outside the study area, including the images of UAV takeoff and landing, and only retain the image data in the test area. The preprocessed image data is imported into ArcGIS again, and five kinds of spectral information contained in the pixel are extracted.

Determination Of Aboveground Biomass
Biomass is the most important indicator of a crop's ability to obtain energy, so measuring the crop's biomass is of great significance for studying crop phenotype [1,22]. In this study, plants with similar growth status around each quadrat and in the quadrat were selected for destructive sampling of the whole selected sorghum plant. After being green in the oven at 105℃ for 30 minutes, the sorghum plant was dried at 80℃ for more than 24 hours until the mass of each part was constant. This is the dried matter of EAI Endorsed Transactions on Internet of Things 01 2023 -04 2023 | Volume 9 | Issue 1 | e1 the sample and then converted into AGB per unit area [23,24].

Model Evaluation
The correlation between vegetation index and AGB is expressed by R, and the model is evaluated by determination coefficient R 2 , mean absolute error (MAE) and root mean square error (RMSE). The closer the R 2 is to 1, the better the prediction effect of the model is. The smaller the MAE and RMSE values are, the better the consistency between the predicted values and the measured values is, that is, the more accurate the model verification results are [25,26,27]. R 2 , MAE, and RMSE are calculated as follows: In the above formulas ， n is the number of model samples， is the measured value，̅ is the mean value of the measured value，̂ represents an estimated value, and the unit of RMSE is g/ m 2 .

Vegetation Index
Based on previous research results, 10 vegetation indexes with the potential ability to estimate AGB were selected. By extracting the color information from the UAV multispectral image, the values of five spectral bands, including blue, green, red, red edge, and near-infrared, are obtained. These parameters are calculated to obtain normalized vegetation index (NDVI), normalized difference red edge (NDRE), green normalized difference vegetation index (GNDVI), vertical vegetation index (PVI), Green optimized soil adjusted vegetation index (GOSAVI), chlorophyll index (CVI), modified triangular vegetation index (MTVI), modified green and red vegetation index (MGRVI), normalized green, blue difference index (NGBDI), and the Visible light difference vegetation index (VDVI). The detailed information of these vegetation indexes is shown in Table 5.

Model Selection
In this study, the unary linear regression model (ULR) and multiple machine learning models were used to predict the AGB of sorghum. And machine learning adopts the ridge regression algorithm of generalized linear regression, the lead absolute regression and selection operator (Lasso), the support vector machine (SVM), the decision tree algorithm, the random forest algorithm (RF) in bagging, and the Adaboost algorithm in boosting. Ridge regression (Ridge) is an improved biased estimation regression method based on least squares estimation. In this study, we choose the ridge regression machine learning model based on generalized linear regression.
Lasso is a kind of compression estimation. By setting some regression coefficients to zero, the sum of the absolute values of the coefficients is forced to be less than a fixed value. It is also a biased estimate. In this study, the Lasso model based on generalized linear regression is selected. Support vector machine (SVM) is one of the common kernel learning methods, which can perform nonlinear classification. After parameter optimization, the regression EAI Endorsed Transactions on Internet of Things 01 2023 -04 2023 | Volume 9 | Issue 1 | e1 model of SVM based on the linear kernel is selected in this study.
Decision tree (DT) is a decision analysis method that judges by calculating the probability that the expected value of NPV is greater than or equal to zero. In machine learning, decision tree is the basis and necessary component of many other algorithms.
Random Forest (RF) is a machine learning algorithm composed of multiple decision trees. Its principle is to judge the input validation samples according to the trained model, and the final results are output by multiple decision trees in the form of voting. It is a classical bagging algorithm, and bagging is a very important integrated learning technology.
Adaboost is an iterative algorithm, whose core idea is to combine multiple weak classifiers into a strong classifier. It is a classical, effective and practical boosting algorithm, and boosting algorithm is an important integrated learning technology.

Data Processing
During the half-year-long experiment, 146 valid data were obtained in total, and the data set was divided into the training set and the verification set, of which the verification set accounted for 30%. In this study, Python program language is used to complete data set partition, data processing, and analysis.

Model Construction and Verification under the Different Spatial Resolutions
3.3.1. Spatial Resolution is 0.017m At this spatial resolution, the flight height of the UAV is 25m, and the vegetation index with the highest correlation with AGB is NGBDI, whose correlation value is 0.80. Using the vegetation index NGBDI as the independent variable and AGB as the dependent variable, the ULR model is established by using the training set and verified by using the verification set. The fitting equation obtained is shown in Formula 4.
At the same time, the value of R 2 , MAE, and RMSE of the model are calculated, as shown in Table 6. At the same time, the obtained fitting equation (4) is used to predict the samples in the validation set, and the comparison between the predicted value and the actual value is shown in Figure 4. The above algorithms are trained on the training set, and the model obtained is evaluated with the verification set data. The specific results are shown in Table 7. The obtained machine learning model is used to predict the samples of the verification set, and the comparison between the predicted value and the actual value is shown in Figure 5.  The random forest model has the best effect, with an R 2 value of 0.8961, an MAE value of 26.4340, and an RMSE value of 32.2459. Obviously, under this spatial resolution, the best prediction effect is the random forest regression model in the classical machine learning model.

The Spatial
Resolution is 0.024m At this spatial resolution, the flight height of the UAV is 35m, and the vegetation index with the highest correlation with AGB is NGBDI, whose correlation coefficient is 0.82. With NGBDI as the independent variable and AGB as the dependent variable, a univariate linear regression (ULR) model is established using the training set and verified using the verification set. The fitting equation obtained is shown in Formula 5. The values of R 2 , MAE, and RMSE of the unary linear model are obtained, as shown in Table 8 below. At the same time, the obtained fitting equation (5) is used to predict the samples in the validation set. The comparison between the predicted value and the actual value is shown in Figure 6. The above algorithms are trained on the training set, and the model obtained is evaluated with the verification set data. The specific results are shown in Table 9. The obtained machine learning model is used to predict the samples of the verification set, and the comparison between the predicted value and the actual value is shown in Figure 7: Obviously, the Lasso model has the best effect, with the R 2 value of 0.8826, the MAE value of 31.1061, and the RMSE value of 40.2937. Under this spatial resolution, the Lasso regression model in the classical machine learning model has the best prediction effect.

The Spatial
Resolution is 0.030m At this spatial resolution, the flight height of the UAV is 45m, and the vegetation index with the highest correlation with AGB is MGRVI, whose correlation value is 0.82. Using the vegetation index MGRVI as the independent variable and AGB as the dependent variable, the training set is used to establish a unitary linear regression model and the verification set is used for verification. The fitting equation obtained is shown in Formula 6.
The values of R 2 , MAE, and RMSE of the model are obtained, as shown in Table 10. At the same time, the obtained fitting equation (6) is used to predict the samples in the validation set. The comparison between the predicted value and the actual value is shown in Figure 8. The above machine learning algorithms are trained on the training set, and the model obtained is evaluated with the verification set data. The specific results are shown in Table 11. The obtained machine learning model is used to predict the samples of the verification set, and the comparison between the predicted value and the actual value is shown in Figure 9. The best model is the DT model, whose the R 2 value is 0.8568, the MAE value is 30.3373, and the RMSE value is 40.8082. Under this spatial resolution, the DT regression model in the classical machine learning model has the best prediction effect.
The unary linear regression model used in this study was compared and screened with the traditional, classic, and effective machine learning models. The models with the best prediction effect on sorghum's AGB under the corresponding spatial resolution are summarized in table 12. In general, the machine learning model is more accurate than the linear regression model in predicting AGB with the certain spatial resolution. Moreover, with the decline of spatial resolution (that is, the increase of UAV flight altitude), the prediction's accuracy of the best model for AGB prediction is also declined.

Results and Discussion
The machine learning algorithm has high flexibility and computational efficiency and has been widely used in modeling and prediction of AGB. The three optimal prediction models oin their corresponding spatial resolution in this study are machine learning models, and the R 2 of the models are higher than that of the linear regression model.
The reason why the model accuracy decreases with the decrease of spatial resolution is not only that with the increase of UAV flight altitude, the spatial resolution decreases, but also brings about the decline of information collection and resolution ability for ground crops, and with the decline of spatial resolution, it makes the data acquisition process of multispectral images more vulnerable to the impact of background such as clouds, which brings more noise and errors to the data.
Although this study further could verify the potential of using the machine learning algorithms to the combine spatial resolution with the vegetation indexes to predict the AGB, there are also several deficiencies in the study: First, the grid search method is used to automatically tune the parameters of the models, but this method still has some shortcomings. In the future, the automatic optimization and parameter adjustment algorithm based on AI would be used to tune the parameters of the model automatically; Second, the field operation environment of the field test is unstable and difficult to structure, which makes the collection of test data has certain deficiencies; Third, due to discontinuous observation time and only one-quarter of observation data, and relatively few data samples collected, the available open source data would be integrated or more data will be collected for further correction and verification; Fourth, the research object is only one variety of sorghum, and target object is relatively simple. The promotion research and application of other sorghum varieties, and even other crops (such as corn, and millet) still need experiments, discussion and study.

Conclusion
AGB refers to the total amount of organic matter per unit area in a certain period. It is an important indicator for crop growth monitoring. It not only represents the quality of crop planting but also represents the efficiency of photosynthesis and the accumulation of photosynthetic substances. It is an important basic condition for yield formation [38,39].
At present, the AGB prediction model mainly relies on field-measured data to verify and evaluate the established prediction model. In this study, the data set is trained, modeled, and verified by combining spatial resolution with vegetation index based on random forest, lasso regression, decision tree, and other machine learning algorithms. The main conclusions are as follows: 1) When the spatial resolution is 0.017m, the random forest algorithm is the best model to be used to predict the AGB, and the model's R 2 =0.8961, MAE=26.4340, and RMSE=32.2459 are obtained; 2) When the spatial resolution is 0.024m, the lasso algorithm is the best model to be used to predict the AGB, model's R 2 =0.8826, MAE=31.106, and RMSE=40.2937 are obtained; 3) When the spatial resolution is 0.030m, the decision tree algorithm is the best model to be used to predict AGB, model's R 2 =0.8568, MAE=30.3373, and RMSE=40.8082; 4) It shows that the combination of spatial resolution and vegetation index, with the help of a classical machine learning model, can obtain higher accuracy of the AGB prediction model. It shows that the combination of spatial resolution, vegetation index, and machine learning algorithm is a robust, fast, and accurate prediction method.