A Graphical Method for Model Selection

In this paper, we present a graphical method for selection of the model among the many competitive models. The proposed method not only selects the model but also tests the equal prediction accuracy of the models.


Introduction
Model selection among many competing models is one of the crucial jobs in regression and time series analysis.Most of these criteria attempt to find the model for which the predicted values tend to be closest to the true expected values, in some average sense.In this paper, selection of the model among several models based on their out-of-sample forecasting errors is discussed.The proposed method is a two step procedure.In first step, we test the statistical significance of the models with the overall mean and in the second step; we select a good model which has minimum measure of error.Section 2 presents various procedures of model selection.Section 3 presents a graphical method for model selection.Section 4 presents an empirical study by considering the three models with equal number of parameters.Section 5 presents the conclusion.

Methods for Model Selection
There are many proposed methods for model selection.Some of these techniques are presented below.

Model Selection using R2
The use of coefficient of determination, 2  R in model selection is a common practice in regression analysis and time series analysis.We have seen that maximizing 2  R is not a sensible criterion for selecting a model, because the most complicated model will have the largest 2  R value.This reflects that fact that 2 R has an upward bias as an estimator of the population value of 2  R .This bias is small for large n but can be considerable with small n or with many predictors.The major criticism of 2  R is that due to the fact that the addition of an explanatory variable cannot cause this statistic to fall.In comparing predictive power of different models, it is often more helpful to use adjusted , where is the estimated conditional error variance (i.e. the mean squared error) and is the sample variance of y.Unlike ordinary 2 R , if an explanatory variable is added to a model that is not especially useful, then 2 adj R may even decrease.This happens when the new model has poorer predictive power, in the sense of a larger value of the mean squared error.One possible criterion for selecting a model is to choose the one having the greatest value of 2 adj R .This is, equivalently, selection of the model with smallest mean squared error value.

Model Selection using Index of agreement (d)
The index of agreement(d) was proposed by Willmott (1981) to overcome the insensitivity of 2 R to differences in the observed and predicted means and variances .The index of agreement represents the ratio of the mean square error and the potential error (Willmot,1982) and is defined as The potential error in the denominator represents the largest value that the squared difference of each pair can attain with the mean square error in the numerator.The range of d is similar to that of 2  R and lies between 0 (no agreement) and 1 (perfect agreement).Select the model which has maximum index of agreement.

Model Selection using Measures of Error
One method for evaluating a forecasting technique uses the summation of the absolute errors.The mean absolute error (MAE) measures forecast accuracy by averaging the magnitudes of the forecast errors (i.e.absolute values of each error).MAE is most useful when the analyst wants to measure forecast error in the same units as the original series.
The mean squared error (MSE) is another method for evaluating a forecasting technique.This approach penalizes large forecasting errors, since the errors are squared.This is important because a technique that produces moderate errors may well be preferable to one that usually has small errors but occasionally yields extremely large ones.
And the root mean squared error (RMSE) is given as MAPE is a relative error statistic measured as average percent errors of the historical data points and is most appropriate when the cost of the forecast error is more closely related to the percentage error than the numerical size of the error.MAPE is computed as the average of the absolute percentage error values.

Model Selection using Percentage Better Statistic
There are several commonly used types of scale-independent statistic.The first type essentially relies on pair wise comparisons.If method A and method B, say, are tried on a number of different series, then it is possible to count the number of series where method A gives better forecasts than B (using any sensible measure of accuracy).Alternatively, each method can be compared with a standard method, such as the random walk forecast (where all forecasts equal the latest observation), and the number of times each method outperforms the standard is counted.Then the percentage number of times a method is better than a standard method can readily be found.This statistic is usually called 'Percent Better'.
denote the relative error, where * t e is the forecast error obtained from the base method.Usually, the base method is a benchmark method or the naive method where t y ˆis equal to the last observation.
Percentage better where I(u)=1 if u is true and 0 otherwise.We select the model which has maximum percentage better performance comparing to other models.(De Gooijer and Hyndman, 2006).

Model Selection using AIC or SBC
An approach to model selection that considers both the model fit and the number of parameters has been developed.The information criterion of Akaike or AIC, selects the best model from a group of candidate models as the one that minimizes AIC = p n 2 ln 2   where 2   is the residual variance, n is the number of residuals and p is the number of parameters in the model.
The Bayesian information criterion developed by Schwartz or SBC, selects the model


. The second term in both AIC and SBC is penalty factor for including additional parameters in the model.Since the SBC criterion imposes a greater penalty for the number of parameters than does the AIC criterion, use of minimum SBC for model selection will result in a model whose number of parameters in no greater than that chosen by AIC.Often, the two criteria produce the same result.We select the model which has minimum of AIC and SBC values.(Akaike, 1974;Schwartz, 1978).

Model Selection using Friedman Statistic
Friedman's test is used to compare the multiple forecasting models with respect to squared errors or absolute errors and trying to infer whether there are significant general differences in performance of the models.Friedman's test is a nonparametric test which is designed to detect differences among two or more groups.Friedman's test, operating on the sum of the ranks j R , considers the null hypothesis that all models are equivalent in performance (have similar mean ranks).Under the null hypothesis, the following statistic: is approximately distributed as 2  with k-1 degrees of freedom and where k= number of models, n= number of observations in each model.Null hypothesis of equal prediction accuracy of the models is tested using Friedman test.If there is a significant difference among the models, we select the model which has first rank.To discover the great winner of all the competing models, the above procedure should be repeated by eliminating the weakest model, to which the largest rank mostly assigned (AdilKorkmaz and Burak Onemli, 2011).

Model Selection using Principle of Parsimony
All things being equal, simple models are preferred to complex models.This is known as the "principle of parsimony" with a limited amount of data; it is relatively easy to find a model with a large number of parameters that fits the data well.However forecasts from such a model are likely to be poor because much of the variation in the data due to random error is modeled.The goal is to develop the simplest model that provides an adequate description of the major features of the data.The principle of parsimony refers to the preference for simple models over complex ones.(Chatfield, 1991).

A Graphical Method for Model Selection
In this section, we propose a graphical procedure using bootstrap method for the selection of a good model among the several competitive models.The bootstrap has been the object of much research in statistics since its introduction by Efron (1979).The bootstrap is a method for estimating the distribution of an estimator or test statistic by resampling one's data.It amounts to treating the data as if they were the population for the purpose of evaluating the distribution of interest.Under mild regularity conditions, the bootstrap yields an approximation to the distribution of an estimator or test statistic that is at least as accurate as the approximation obtained from first-order asymptotic theory.(Efron and Tibshirani, 1993).
Let the forecasting error .Bootstrap graphical procedure for selecting a model among the adequate models is given in the following steps:

4.
The lower decision line (LDL) and the upper decision line (UDL) for the comparison of each of the 2 i s are given by: where

5.
Plot i d against the decision lines.If any one of the points plotted lies outside the respective decision lines, hypothesis of equal prediction performance of the models is rejected at  level and we may conclude that the prediction performance of the models is not same.

6.
If any one of the points plotted above the UDL, then the corresponding models are considered to be inefficient models and may be eliminated from the analysis.
If the points plotted below the LDL, then the corresponding models can be considered as efficient models for prediction and we select the model which is very close to the x-axis or zero.If the points falling in between the UDL and LDL then the corresponding models can be treated as equally efficient in their prediction accuracy.
This method not only tests the significant difference among the models but also identify the source of heterogeneity of models.The proposed method depends only on the supplied information and does not require any distributional assumptions.

Empirical Study
The following table presents the out-of-sample of size 28 and the forecasts generated from the three adequate models A, B and C each having with estimated parameters p=2 (source: Naveen Kumar Boiroju, 2011).The following table presents the forecasts and errors generated from the three models.We compute the error statistics for the three models and the results are presented below.for the models A, B and C respectively.By applying the bootstrap procedure explained in Section 2, the LDL, CDL and UDL are obtained as 0.074, 0.088 and 0.102 respectively.Prepare a chart as in Figure 1, with the above decision lines and plot the points   . From the Figure 1, we observe that B d lie outside the decision lines.Hence, H 0 may be rejected and it may be concluded that the mean absolute errors of three forecasting models are not equal.From the same figure it is observed that

Conclusion
The proposed method being a graphical procedure simultaneously demonstrates the statistical significance and identifies the source of heterogeneity without knowing the underlying distribution of the errors.The proposed procedure depends on the prediction performances that can be measured distances on out-of-sample data and this method can be treated as an alternative test procedure to test the equal prediction accuracy of several models.This proposed method classifies the available prediction models under three categories as inefficient models, equally efficient models and efficient models.Finally the proposed graphical method can be treated as a tool to test the equal prediction accuracy of the models, to classify the models into inefficient, equally efficient and efficient model categories and to choose an efficient model among the several models.
th error generated by the i th model, where m is the number of forecasts generated by the i th model and   .g being some specified loss function, for example,   e e g  or   2 e e g  or   e e g  .And the mean of the error function of the i th model is mean of b-th bootstrap sample form i th model is given by distribution of the mean using B-bootstrap estimates and compute the central decision line (CDL) as , 2, …,B and [x] represents the integer part.
within the LDL and UDL, it indicates that the prediction performance of the models A and C is same.Since the B d value lies below the LDL, therefore the corresponding model B is selected and we may conclude that the model B is an efficient model among the models.

Figure 1 :
Figure 1: Comparison of forecasting models MAPE provides an indication of how large the forecast errors are in comparison to the actual values of the series.A model is said to be good if the MAPE value is not greater than five.Select the model which has minimum MAE, RMSE and MAPE values.(De Gooijer and Hyndman, 2006).

Table 1 : Out-of-sample data, forecasts and errors
A Y ˆB Y ˆC Y ˆA e

Table 2 : Measures of Errors
From the above table it is clear that the model B has maximum index of agreement and minimum MAE, MSE, RMSE, MAPE, AIC and SBC values.Hence the model B is selected among the models.The results of percentage better statistics for the selected models are presented in the following table.

Table 3 : Percentage Better Performance of the Models
From the above table, it is observed that the model A is 21.43% and 46.43% better than the B and C models respectively.Model B is 78.57% and 64.29% better than the A and C models respectively.Model C is 53.57% and 35.71% better than the A and B models respectively.Therefore the best suitable model for forecasting is model B and which has maximum percentage better performance comparing to other models.We apply the Freidman test considering the absolute errors of the models and their mean ranks are 2.304, 1.589 and 2.107 for the models A, B and C respectively.The following table shows the Freidman test statistic and its asymptotic significant probability.

Table 4 : Friedman Test
, therefore the null hypothesis of equal prediction performance of the models is rejected and we may conclude that the prediction performance of the models is not the same.Thus the model B is selected, since it has first rank among the models.