Kriging interpolation model: The problem of predicting the number of deaths due to COVID-19 over time in Vietnam

The COVID-19 pandemic can be considered a human disaster, it has claimed the lives of many people. We only know the number of deaths due to COVID-19 through government statistics, but on days when there are no statistics, how do we know whether people died that day or not? This study aims to predict the number of new deaths per day due to COVID-19 in Vietnam on days when observational data is not available and predict the number of deaths in the future. The study used COVID-19 data from the World Health Organization (WHO). A total of 260 days were collected and the author processed and standardized the data. Based on available data, the author uses Kriging interpolation statistical method to build a forecast model. As a result, the author has selected a prediction model suitable for a highly reliable data set, the regression coe ffi cient and correlation coe ffi cient are close to 1, the error between the model’s prediction results compared to data. There are days when the prediction error is almost zero. The study has built a future forecast map of the number of new deaths per day due to COVID-19. The article concludes that applying the Kriging statistical method is appropriate for COVID-19 data. This research opens up new research directions for related fields such as earthquakes, mining, groundwater, environment, etc.


Introduction
The COVID-19 pandemic is a matter of great concern around the world.The COVID-19 pandemic is an infectious disease pandemic caused by the SARS-CoV-2 virus and its variants that is taking place on a global scale.Originating in late December 2019 with the first outbreak in Wuhan city in central China, COVID-19 originated in a group of people with pneumonia of unknown cause.On March 11, 2020, the World Health Organization (WHO) issued a statement calling "COVID-19" a "Global Pandemic".
Governments around the world have responded to protect the health of people and community groups around the world, including: restricting movement, blocking quarantine, declaring a state of emergency, using curfews, implementing social distancing, canceling mass events, closing schools and less important business and service establishments, encouraging people to raise their own awareness of prevention, wear a mask, limit going out when not necessary, and at the same time transform the business, study and work model from traditional to online.The worldwide effects of the current COVID-19 pandemic include: loss of life, economic and social instability.
In the world, there have been many studies related to the topic of COVID-19.Topics are often focused around such issues as transmission origin research [12].Applying the Kriging statistical method to time forecasting has also been studied in some studies, such as: Using geostatistics to analyze spatial and temporal Variations of groundwater levels [1], [6].Analysis of the Spatio-Temporal variation of groundwater levels using geostatistics [7].Using CoKriging method to predict air pollution [10], [11].Application of Geostatistics to forecast DO pollutants in water [9].Analyzing incomplete spatial data in air pollution prediction [8].
In Vietnam, in response to the outbreak of the COVID-19 pandemic, the state has taken measures to control the epidemic, such as limiting mass gatherings, restricting travel, social distancing, wearing masks, disinfection and vaccination.Until now, there have been no domestic or international studies on the problem of predicting This study aims to predict the number of new deaths per day due to COVID-19, the latest research only stops at the problem of estimating the number of infections.Therefore, in this study, the author introduces the Kriging statistical method and its application in predicting the number of new deaths per day due to COVID-19.The research objectives are: 1. Build models based on measured data.
2. Forecasting This study aims to predict the number of new deaths per day due to COVID-19 without statistics.
3. This study uses the Kriging statistical method to forecast This study aims to predict the number of new deaths per day due to COVID-19 in Vietnam.

Kriging
In geostatistics, Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian processes governed by prior covariances.Under suitable assumptions of the prior, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations [3].
Interpolating methods based on other criteria such as smoothness (e.g., smoothing spline) may not yield the BLUP.The method is widely used in the domains of spatial analysis and computer experiments.
The theoretical basis for the method was developed by the French mathematician Georges Matheron in 1960, based on the master's thesis of Danie G. Krige, the pioneering plotter of distance-weighted average gold grades at the Witwatersrand reef complex in South Africa.Krige sought to estimate the most likely distribution of gold based on samples from a few boreholes.
In general, previous studies mainly focused on building predictive models in terms of space using the Co-Kriging method, but no studies have built predictive models in time series.

Gaussian process
A Gaussian process is a stochastic process such that every finite set of such random variables has a multivariable normal distribution.The distribution of the Gau process is the point distribution of all the random variables [2].
In this study, I applied the Kriging method as a management and decision support system tool to analyze the time variations of the COVID-19 epidemic so that I could gain a better picture of the COVID-19 death toll over a long period of time.

Research Questions:
With the aim of understanding how the number of deaths due to the COVID-19 pandemic changed over time, the study sought to answer the following questions: 1. Do the days without statistics predict the number of deaths?
2. Over time, will the number of deaths from COVID-19 be high in the future?
3. Is the forecasting model reliable enough?
4. Is the COVID-19 epidemic in Vietnam really under control?

Data Collection and Analysis
The overall objective of this study was Covid-19 data from the World Health Organization (WHO), with a total of 880 days collected (July 31, 2020 to December 29, 2022).The convenience sampling method was used to select a 95% confidence interval with an error of 5%, and after removing invalid values, we got a sample size of 279 values with continuous collection time in days from July 26, 2021 to April 30, 2022 Table 1.Of the 259 days of observed data that the author included in the model building, the remaining 20 days were not available, but data were not available (NA).An overview of the COVID-19 death toll data is depicted in Table 2.
The author proceeds to process and normalize Figure 1 data by histogram.In Figure 1, the left histogram before normalizing the data is skewed to the left, and the histogram on the right of the data has been more balanced and normalized.[1], that is: where h is the distance between points in space.Z(u i ), Z(u i + h) are observed values at position u i and u i + h.N (h) is the number of pairs of points.
According to the quadratic stationary conditions [13], [4] one obtains and the covariance In geostatistics, there are four most commonly used models: Linear, Spherical, Exponential and Gaussian.Based on an experimental variogram, a variogram model that fits the data is selected using a technique called cross -validation.
The cross plot between the estimated value and the actual value shows the correlation coefficient r 2 .The best fit variogram model is selected based on the highest correlation coefficient and is approximately equal to 1.
The Kriging method is a group of geostatistical methods used to interpolate the data of a random field at an unknown point from known values at neighboring points [13].In the Kriging method, there are two types of Simple Kriging and Ordinary Kriging.The Simple Kriging method is the Kriging method for which the mean µ is known in advance, the formula is as follows The Ordinary Kriging method is the Kriging method of unknown mean, based on the hypothesis of a truly stable stochastic function.

Results
To check the anisotropy of COVID-19, the author compares the histogram of variation in many directions [5].In this study, four main directions were used, namely 0 0 , 45 0 , 90 0 , and 135 0 with an angular tolerance of ±45 0 used to determine anisotropy.
Figure 3 shows that of the 4 models, the model curve Gaussian is in good agreement with the experimental data.From Figure 3   Table 3 is the result of 4 models.To choose which model is the best, it is based on two important criteria: RSS (Residual Sums of Squares) and r 2 (coefficient of determination).The RSS provides an accurate measure of how well the model fits the variance data; The lower of RSS, the more suitable the model, among the 4 models, the Gaussian model has the smallest RSS = 26.2.The r 2 provides an indicator of how well the model fits the variance data.However, r 2 is not as strong as the RSS value [8], [9], [10], [11].

Model Testing
Criteria for evaluating whether the selected model fits the data is based on regression coefficient and correlation coefficient.The model test results are shown in Table 4 and Figure 4.The regression coefficient and the correlation coefficient are close to 1, the standard error is approximately zero.The comparison results between the estimated value and the actual value, shown in Figure 4. Conclusion that the selected model is suitable, the error is small, scatter plot of the New deaths parameter in Figure 5.

Discussion
Based on the maps in Figures 6 and 7, we see that the further we get to the last milestone of the data, 279 (April 30, 2022), the lowest number of deaths, there are almost no deaths due to COVID-19 in Vietnam.This is also completely true of the fact that, after April 30, 2020, the COVID-19 situation in Vietnam will be  Compare the Kriging method with other traditional methods such as the sample averaging and distance inverse methods.The sample mean is the average value of sample values close to the location to be estimated.Inverse distance weighting (IDW) is a deterministic method for multivariate interpolation with a known set of scatter points.The values assigned to the unknown points are calculated as a weighted average of the available values at the known points.Table 5 shows the forecast results for the Kriging method with traditional methods.The forecast error between the observed value and the actual value by the Kriging method is smaller than the error when using the IDW method.
The model's forecast results also have errors, which may be due to inaccurate statistics on the number of new deaths per day due to COVID-19 in some days or deaths due to underlying diseases.Therefore, in the next study, the author will study the effect of the underlying disease on the mortality rate due to COVID-19 to reduce the error in prediction.

Conclusion
With the data set that the author is studying, the Gaussian model is suitable, and the indicators are better than the remaining models.The error between the estimated values and the actual value of the model is very small (0.017).The regression coefficient is equal to 0.988, and the correlation coefficient is 0.933 (approximately 1.0), showing that the choice of interpolation model is appropriate.
Using the Kriging forecasting method to predict COVID-19 deaths for days with no observed data or days with missing data (NA), predicting future COVID-19 deaths results in a very small error between the estimated value and the actual value.The study shows that the efficiency, reasonableness, and high reliability of the Kriging method to build a predictive model are appropriate.When building models, attention should be paid to model error values, object-specific data, and the results of model selection to choose a suitable model for actual data from models providing different accuracy.Therefore, experience in model selection plays an important role in research results.

Figure 2
Figure 2 shows the distribution of the data over time, the horizontal and vertical axes represent time in days, and the transverse axis shows the number of deaths from COVID-19.Different symbols and colours represent different death tolls.Research MethodologyIn geostatistics, the main tool is the variogram, which represents the spatial dependence between observations.The variogram (2γ(h)) is defined as an expectation of the random variable [Z(u) − Z(u + h)]2

Figure 2 .
Figure 2. Distribution of COVID-19 death toll data over time , we have the values in the Gaussian model [N ugget = 0.04; Sill = 18.01;Range = 79.3279;r 2 = 0.95].It shows the best-fit omnidirectional variogram in deaths over time obtained based on cross -validation.Based on the variogram map of the new death parameters, the isotropic variogram model is suitable.The values for the four models are presented in Table3.

Figure 3 .
Figure 3. Fitted variogram for the temporal analysis of New deaths

Figures 6 and 7 are
Figures6 and 7are Kriging interpolation maps of COVID-19 deaths.The highest number of deaths is in white, and decreasing to the lowest is in blue.In the same color, the number of deaths is nearly equal, with only a small difference between them.

Figure 4 .Figure 5 .
Figure 4. Test results fail to predict new deaths

Figure 6 .
Figure 6.2D Kriging interpolation map of the number of new deaths per day due to COVID-19

Figure 7 .
Figure 7. 3D Kriging interpolation map of the number of new deaths per day due to COVID-19

Table 1 .
Data on COVID-19 deaths by day

Table 2 .
Overview of COVID-19 death toll data

Table 3 .
Isotropic variogram values of New deaths

Table 4 .
Model test results

Table 5 .
Forecast of COVID-19 deaths in Vietnam strictly controlled.Due to drastic measures taken by the government to control the disease and people's awareness of disease prevention, the death toll from COVID-19 has almost disappeared.