A comparison of different methods for building Bayesian kriging models

Kriging is a statistical approach for analyzing computer experiments. Kriging models can be used as fast running surrogate models for computationally expensive computer codes. Kriging models can be built using different methods, the maximum likelihood estimation method and the leave-one-out cross validation method. The objective of this paper is to evaluate and compare these different methods for building kriging models. These evaluations and comparisons are achieved via some measures that test the assumptions that are used in building kriging models. We apply kriging models that are built based on the two different methods on a real high dimensional example of a computer code. We demonstrate our evaluation and comparison through some measures on this real computer code.


Introduction
Computer codes have been widely used in different sciences to describe physical systems. A computer code is a function y = η(x) where x ∈ Ω ∈ R p and y is usually a scalar. Computer codes of physical systems are typically complex and so they need a high number of runs to understand how the output of the model changes as the input variables change. However, running computer codes at a high number is computationally very intensive in that obtaining a value of y may take many hours or even many days of computational time. Thus, only a small number of runs can be conducted and analyzed.
One solution to this time-consuming problem of running computer codes is to use a statistical kriging model as a surrogate of the computer code. Kriging models have increasingly become popular for analyzing computer experiments. They can be used to obtain approximations and predictions of complex computer codes. In addition, they can be used to quantify the uncertainty in these predictions. The idea is to build a statistical kriging model of the objective computer code based on a small number of runs of the computer code. Thus, for any point x , the aim of kriging models is to predict a value y = η(x ) conditional on the available outputs. This kriging model is assumed to be smooth and continuous.
The first use of kriging models was by Sacks et al. (1989). Then, kriging models have been used in many fields of science. They have become an essential tool and provide accurate results as surrogate models for many physical systems. For example, in engineering, Li and Sudjianto (2005) constructed a kriging model to obtain approximations of an expensive power cylinder model. Forrester et al. (2007) also used a kriging model as a surrogate for approximating a multi-variable aircraft wing model. Hung et al. (2009) developed a kriging model for optimizing a hard turning process which is a metal cutting process for producing machined parts out of hard materials. Le Gratiet Drignei (2008) presented a multidimensional kriging model that can be used for predicting the development of a virtual brain tumor model.
Kriging models can be built using two different methods. By building the kriging model, we mean estimating the model parameters and constructing the model. The first method for estimating the correlation parameters is the maximum likelihood estimation (MLE) method. The second method is the leave-one-out cross validation (LOO CV) method. The LOO CV method requires more computations than the MLE method in estimating the model parameters.
We always find in the literature for building kriging models that the kriging models for complex computer codes are built based on one method for estimating the correlation parameters. People often choose a method for estimating the correlation parameters in building kriging models without providing much explanation for their choice. The two methods for estimating the parameters, however, may give different estimated values. The set of kriging model parameters may affect the performance of kriging models especially with a small number of design points. This means that if the parameters were not estimated well, we may not obtain good approximations for the computer code outputs or the variances may not be quantified well.
Therefore, the objective of our paper is to compare and evaluate the two methods for building kriging models. This will be done using some measures that are based on the differences between the predictions and the true values of the computer code. The paper is organized as follows. In Section 2, we review the process of building kriging models for complex computer codes. Section 3 presents the existing methods for estimating the kriging model parameters. In Section 4, we discuss the design of experiment that is used in the study. Measures for validating kriging models are presented in Section 5. We demonstrate our comparison through a real example of a computer code in Section 6. Finally, the conclusion is given in Section 7.

Building kriging models
In this section, we describe how kriging models can be constructed under a Bayesian perspective. Kriging models are based on the mathematical representation of the outputs of the computer code. The computer code is assumed as a random process that can be represented by a Gaussian process as a prior distribution. Thus, the kriging model is a probability distribution that can be used for obtaining predictions of computer codes and quantifying the uncertainty in these predictions.ŷ where β is a vector of unknown coefficient parameters and z(x) is assumed to be a realization of a stochastic Gaussian process with zero mean and non-zero covariance, where σ 2 is a constant variance. In this work, we use the Matérn correlation function that is given by where θ i ≥ 0. The correlation parameter, θ i , determines the smoothness of the function predictions whereas the parameter, ν = 5 2 manages the derivative existence of the computer code.
For a given input x , we assume that the output y = η(x) is a realization of a normal random variable Y ∼ N (µ, σ 2 ). Suppose now we have a set of input points X = (x 1 , . . . , x n ) for the computer code. We put all the outputs of the computer code at the n input points in a random vector y = {y 1 = η(x 1 ), . . . , y n = η(x n )}.
Conditional on y , the linear predictor of the output at a point x is given bŷ for x ∈ Ω. We considerŷ(x ) as an uncertain variable. Then, we find the best linear kriging predictor (BLKP),ŷ(x ), in equation (4) that minimizes the mean squared error (MSE) of the predictions under the constraint, . . , η k (x )} T be a set of regression functions. In this case, the kriging model is known as universal kriging whereas it will be called ordinary kriging if f(x ) = {1}. In this work, universal kriging models are used. Furthermore, let F = [f(x 1 ), . . . , f(x n )] T be a vector of regression functions that are evaluated at the inputs points, x 1 , . . . , x n . Also, suppose R is the correlation matrix for the n input points This correlation matrix should be positive semi-definite and satisfies that R(x, x) = 1.
Therefore, conditional on the true values of the outputs, y , the prediction of the output, which is the BLKP ofŷ(x ), at the point x is a conditional normal with posterior mean and the posterior variance, V * 1 (·, ·), is (Martin and Simpson, 2005). From now, for simplicity, we will refer toŷ(x ) byŷ.
A comparison of different methods for building Bayesian kriging models

Estimating the correlation parameters
To find the distribution of Y, we have to find values of the parameters, β,σ 2 and θ. The constant variance, σ 2 , and the coefficient parameters, β, can be found analytically. The likelihood is given by By taking the logarithm and using a flat prior for β,σ 2 , the posterior mode of β iŝ and the posterior mode of σ 2 is given byσ The correlation parameters θ, however, cannot be found analytically. Thus, we can obtain the maximum likelihood estimate of the correlation parameters,θ, from the likelihood function and then assume them as the true values of θ.
The maximum likelihood estimateθ of the true values of the correlation parameters θ can be found by optimizing the following equation (Santner et al., 2003) θ = arg min The leave-one-out cross validation (LOO CV) method can also be used as an alternative method for estimating the correlation parameters. The LOO CV method is based on iterating the space of the parameter for trying to minimize the error of the model. The optimal parameters are determined by minimizing the following equation (Santner et al., After obtaining estimates of the correlation parameters, the kriging model is built based on these estimated values. Kriging models can be implemented using a package in R called DiceKriging by Roustant et al. (2012).

Design of experiment
In this section, we give details of the design that is used in this study. We used the maximin Latin hypercube design (maximin LHD). The maximin LHD is a space-filling design which depends on measuring the distance between the design points. We first calculate the minimum distance between design points. Then, the maximin LHD maximizes the minimum distance between the design points. The maximin LHD is given by Therefore, the maximin LHD ensures that the design points are uniformly spread in the input space (Fang et al., 2010).

Validating kriging models
Kriging models are statistical models that are based on some assumptions. Therefore, it is necessary to use measures to see the performance of kriging models and see whether the assumptions that are used in building kriging models are reasonable or not. In this work, we use two measures for validating kriging models. The first measure is the root relative squared error (RRSE) that is based on the differences between the kriging predictions and the true values of the computer code. The root relative squared error is given by A comparison of different methods for building Bayesian kriging models whereȳ is the mean of the true values of the computer code (Overstall and Woods, 2016). The RRSE tells us how far away are the predicted values from the true value of the computer code. As RRSE → 0, the kriging predictions will be accurate.
The second measure is the individual standardized errors (ISE) that are given by where σ 2 y is the posterior variance of the kriging model given by equation (8) (Kolachalama et al., 2007). The individual standardized errors should be between -2 and 2 if the kriging model is valid (Bastos and O'Hagan, 2009). Thus, if large values of ISE are found, then the predictions of the kriging model will not be accurate and so the kriging model will not be valid.
The ISE are not only based on the differences between the kriging predictions and the true values of the computer code, but also depend on the uncertainty in the kriging predictions. This is because the differences between the kriging predictions and the true values of the computer code are normalized by the squared root of the kriging variance.

Illustrative Example
In this section, the Robot Arm function is considered as an example of a computer code. The Robot Arm function was used by An and Owen (2001) as an example of implementing quasi-regression methods. The Robot Arm function models the robot arm position that has four segments. The shoulder is considered fixed at the origin, whereas each of the four segments has a length L i and is set at ϑ i angle for i = 1, . . . , 4. The Robot Arm function is given by where The output of the Robot Arm function is the distance between the origin and the end of the robot arm. The Robot Arm function has eight input variables and they are x = (ϑ 1 , ϑ 2 , ϑ 3 , ϑ 4 , L 1 , L 2 , L 3 , L 4 ). The ranges of the input variables are: ϑ i ∈ [0, 2π] for i = 1, . . . , 4 where ϑ 1 , ϑ 2 , ϑ 3 and ϑ 4 are the angles of the first, second, third and fourth segments of the arm respectively, L i ∈ [0, 1] for i = 1, . . . , 4 where L 1 , L 2 , L 3 and L 4 are the lengths of the first, second, third and fourth segments of the arm respectively.
We generated a set of 40 design points, x 1 , . . . , x 40 , by the maximin LHD. Then, the outputs of the Robot Arm function were obtained at these 40 points, y = {y 1 = η(x 1 ), . . . , y n = η(x 40 )}. Before we fit the kriging models, the domain of the Robot Arm function was mapped to be in [0, 1] 8 .
Based on the 40 design points, the correlation parameters were estimated using the two methods, the MLE method equation (15), and the LOO CV method, equation ( Conditional on theθ M LE andθ CV , kriging models were built using two methods for estimating the correlation pa-rameters, the MLE and the LOO CV. We used a zero mean function and the Matérn correlation function equation (3) in the package DiceKriging. Then, a set of 16 points, generated using maximin LHD, was used for validating the kriging models, conditioned on y. Figure 1 shows the kriging models for the Robot Arm function with 95% credible intervals based on the MLE method and the LOO CV method. The kriging predictions are reasonable as the majority of them lie around the y = x line. Furthermore, the uncertainty, which is represented by the errorbars, of many points seems to be small. This indicates that the predictions of the kriging models are good approximations of the computer code outputs with 95% credible intervals. However, several points seem to be not close to the y = x line, especially with the LOO CV method. This means that the kriging models are not valid in some parts of the input space. This may also indicate that the predictions that are obtained by the MLE method are more accurate than those that are obtained by the LOO CV method.

Measures for the kriging models
Now, we compare and investigate more closely the performance of the kriging models that were built using the different methods for estimating the correlation parameters, the MLE method and the LOO CV method. This comparison between the kriging models is conducted via some measures. We used two different set sizes of the design points, 5p and 10p, where p is the dimension of the input. The size of the validation points was set to be 2p.

The root relative squared error
The first measure that we calculated for validating and comparing the kriging models is the root relative squared error (RRSE) equation (18). The RRSE measures the overall performance of the kriging models. The RRSE was calculated for the kriging models that were built based on the MLE and the LOO CV methods. We replicated each of the combinations of the design points and the validation points 100 times with different sets of design and validation points for each replication.   It can be seen from Figure 2 that the values of the RRSE are small for kriging models of both methods. The kriging models that are based on the MLE method show smaller values of the RRSE than those of the kriging models that are based on the LOO CV method. Moreover, the box-plots of the RRSE for the kriging models show some outliers but the outliers for the kriging models that are based on the LOO CV method are more than those for the kriging models that are based on the MLE method.
We increased the set size of the design points to become 80 and the processes were repeated again. Figure 3 shows the box-plots of the RRSE for the kriging models that are based on the MLE method and for the kriging models that are based on the LOO CV method with 80 design points and a set of 16 validation points. The box-plots are also based on 100 values of the RRSE.  Figure 3 that the values of the RRSE are very small for kriging models that are based on the MLE method and the LOO CV method. Moreover, the values of the RRSE are smaller than before for the two methods. Figure 3 also shows that the values of the RRSE for the kriging models that are based on the MLE method are also smaller than those from the kriging models that are based on the LOO CV method. However, the values of the RRSE for the kriging models that are based on the LOO CV method are now very close to those for the kriging models that are based on the MLE method. This indicates that the predictions of the kriging models that are based on the LOO CV method were improved by increasing the design points.

Now, it can be shown from
We can conclude that the values of the RRSE, for kriging models that are based on the MLE method and for kriging models that are based on the LOO CV method, have become smaller by increasing the design points. Furthermore, based on the values of the RRSE, the performance of the kriging models that are based on the MLE method is better than that of the kriging models that are based on the LOO CV method with a small number of design points. This is because the predictions of the kriging models that are based on the MLE method are more closer to the true values of the Robot Arm function than those of the kriging models that are based on the LOO CV method.
The performance of the kriging models that are based on the LOO CV method, however, was improved by increasing the design points. The predictions of the kriging models that are based on the LOO CV method tend to be as good as the predictions of the kriging models that are based on the MLE method by increasing the design points.

The individual standardized errors
We applied another measure that doed not only depend on the differences between the predictions and the true values of the Robot Arm function, but also considers the uncertainty in the predictions of the kriging model. We calculated the individual standardized errors (ISE) equation (19). Thus, we consider the plot of the ISE against the index to compare the performance of the kriging models that are based on the two different methods for estimating the correlation parameters. Figure 4 shows the ISE against the index for the kriging model that is based on the MLE method and for the kriging model that is based on the LOO CV method with 40 design points and a set of 16 validation points. The left panel of Figure 4 shows the ISE for the kriging model that is based on the MLE method. It can be seen that majority of the ISE are small and only one error lies outside the bounds, but close to the bounds. The right panel of Figure 4 shows the ISE for the kriging model that is based on the LOO CV method. It can be shown that several large ISE are seen in the plot.
We also increased the set size of the design points to be 80 and the processes were repeated again. Figure 5 shows the plots of the ISE for the kriging model that is based on the MLE method and for the kriging model that is based on the LOO CV method with 80 design points and a set of 16 validation points.  Figure 5: The plots of the ISE for the kriging model that is based on the MLE method and for the kriging model that is based on the LOO CV method with 80 design points and a set of 16 validation points. The ISE seem to be reasonable with two errors are seen in the plot of the kriging model that is based on the LOO CV method. Now, the ISE for the kriging model that is based on the MLE method are also small where all of them lie inside the bounds. The ISE for the kriging model that is based on the LOO CV method in the right panel of Figure 5 are now small with only two ISE lie outside the bounds. These two errors, however, are very close to the bounds.
We can conclude from the plots of the ISE that the kriging model that is based on the LOO CV method may not perform well with a small number of design points. The performance the kriging model that is based on the LOO CV method may become as good as that of the kriging model that is based on the MLE method when an adequate set size of design points is used. This is because the ISE for the kriging model that is based on the LOO CV method have become smaller with 80 design points. We have seen an adequate number of the design points should be no less than ten times the input dimension, 10p.

Conclusion
In this work, we reviewed a popular statistical method, called kriging, for approximating and analyzing complex computer codes. Estimating the kriging model parameters can be conducted using two different methods, the maximum likelihood estimation method and the leave-one-out cross validation method. According to our results, the use of the leave-one-out cross-validation method for estimating the parameters in building kriging models is not very reliable with a small set size of design points. The performance of the leave-one-out cross-validation method may become as good as the performance the maximum likelihood estimation method when a reasonable set size of design points is used in building kriging models. Another disadvantage of estimating the parameters by the leave-out-out cross validation method is that it takes a longer time than the maximum likelihood estimation method to achieve the calculations. This is because we need to calculate the distance between all the design points and this will be computationally expensive with high-dimensional computer codes.