Robust Estimation methods of Generalized Exponential Distribution with Outliers

This paper discussed robust estimation for point estimation of the shape and scale parameters for generalized exponential (GE) distribution using a complete dataset in the presence of various percentages of outliers. In the case of outliers, it is known that classical methods such as maximum likelihood estimation (MLE), least square (LS) and maximum product spacing (MPS) in case of outliers cannot reach the best estimator. To confirm this fact, these classical methods were applied to the data of this study and compared with non-classical estimation methods. The non-classical (Robust) methods such as least absolute deviations (LAD), and M-estimation (using M. Huber (MH) weight and M. Bisquare (MB) weight) had been introduced to obtain the best estimation method for the parameters of the GE distribution. The comparison was done numerically by using the Monte Carlo simulation study. The two real datasets application confirmed that the M-estimation method is very much suitable for estimating the GE parameters. We concluded that the M-estimation method using Huber object function is a suitable estimation method in estimating the parameters of the GE distribution for a complete dataset in the presence of various percentages of outliers.


Introduction
introduced a two-parameter Generalized Exponential (GE) distribution, which is one of the most popular distributions and it has been used quite effectively for analyzing lifetime datasets. A random variable X has GE distribution with parameters α and θ, if it has cumulative distribution function (CDF), probability density function (pdf) and the quantile function respectively, as follows ( ; , ) = (1 − −( ) ) (1.1) The common estimation methods would not be appropriate in solving the parameter estimation problem if data have contained outliers or extreme observations. For more information on common estimation methods see  Ahmad and Almetwally (2020), Basheer et al. (2020). Therefore, we need alternative estimation methods which can handle problems with respect to outliers or extreme observations: these methods of parameter estimation are called robust estimation methods. Almetwally and Almongy (2018) discussed six methods of estimation for regression model to reach the best parameter estimation of model. An alternative robust estimation method based on M-estimations method for the parameters of Burr III distribution have been proposed by Wang and Lee (2010and 2014. Kantar and Yildirim (2015) considered various robust estimators for the extended Burr Type III distribution for complete data with outliers by using different methods of robust estimation. The robustness properties of the estimators are investigated by Aydın et al. (2018) for estimation of the location parameter and the scale parameter of the shifted Gompertz distribution by using least squares, maximum likelihood, and modified likelihood estimators.
The aim of this paper is to assess the effectiveness of alternative robust estimation methods in determining the parameters of the GE distribution, where the LAD and M-estimations as Bisquare and Huber weights have been used as alternative methods of commonly estimation methods. On the other hand, MLE, LS and MPS as a more commonly estimation methods for the GE parameters are also considered. To evaluate the performance of the estimators, a Monte Carlo simulation study is carried out. The final motivation of the paper is to develop a guideline for introducing the best estimation method for GE distribution, where the data contains outliers or extreme observations.
The paper is organized as follows: section 2 is devoted to the GE parameters estimation using the MLE method, the LS method and the MPS method, while in section 3 the robust estimation is considered. In section 4, we present Monte Carlo simulation study to compare the performance of the estimators of the GE distribution parameters for all estimation methods, which are used. Moreover, application of real data is given in section 5. Finally, we show the results and the conclusion of the current study in section 6.

The Classical Estimation Methods
In this section, the parameter estimation by MLE, LS and MPS estimation methods will be discussed.

MLE Method
The likelihood function of the GE distribution is and the log likelihood function is given as: To obtain the normal equations for the unknown parameters, we differentiate (2.1) partially with respect to the vector parameter Θ = ( , ) and equate them to zero. The estimators ̂ and ̂ can be obtained as the solution of the following equations.
The above non-linear equations are not closely form; hence iterative procedures such as Newton-Raphson type algorithm and others which are found in various statistical software's can be used to obtain the solution.
Using least square estimators, which based on the observed sample 1 < ⋯ < from n ordered random sample of any distribution with CDF, we get ( ( )) = ( + 1) , (2.4) where (. ) denotes the CDF The least squares method is obtained by minimizing Putting the CDF of GE distribution in Equation (2.5), we get After differentiating Equation (2.6), with respect to parameters and equating to zero, the normal equations are given as (2.8) The above nonlinear equations cannot be solved analytically so, the ̂ and ̂ of and can be obtained by any iterative procedure techniques such as Newton-Raphson type algorithms.

Maximum Product Spacing
The expression of the Maximum Product Spacing (MPS) introduced by Cheng and Amin (1983), as presented by Almetwally and Almongy (2019) and El-Sherpieny et al. (2020) is given as where is defined as the geometric mean of the product spacing function and . (2.11) The natural logarithm of the product spacing function is given as ]. (2.12) To obtain the normal equations for the unknown parameters, we differentiate Equation (2.12) partially with respect to the parameters and and equate to zero. The estimators ̂ and ̂ of can be obtained as the solution of the following equations. ], ]. (2.14) The non-linear normal equations solutions by equating Equations (2.13) and (2.14) to zero can be obtained numerically.

Robust Estimation Method
When a dataset is contaminated with a single or few outliers, it presents a serious problem in parameter estimations.

Least Absolute Deviations (LAD) Method
Fang and Zhao (2006) introduced the Least Absolute Deviations (LAD) estimation method, which is a robust method in the presence of outliers and asymmetric error terms. The introduction of the easy calculus of the least-squares method by Dodge (2008), made the least-squares method much more popular than the LAD in regression analysis. Yet in recent years and with advances in statistical computing, the LAD method can be easily used. The LAD method aims to obtain the estimated regression parameters that minimizes the sum of absolute value of the residuals. Hence, it treats the outliers influence on the LS estimator sign the residuals sum of square. More so, the LAD estimator are asymptotically unbiased, normally distributed and has a lower asymptotic variance when the distribution is nonnormal. The parameter estimates by least absolute deviation regression is given as ̂= | |.
(3.1) The least absolute deviation method is obtained by minimizing After differentiating Equation (3.2) with respect to the parameters and we get Equating Equations (3.3) and (3.4) to zero we obtain two nonlinear equations that cannot be solved analytically. Hence, iterative procedures such as the Newton-Raphson algorithms can be used to solve for the solution of the ̂ and ̂ numerically.

M-Estimation Method
The most common general method for robust regression is the M-estimation, introduced by Huber (1964). The Mestimation method is regarded as a generalization to the maximum likelihood estimation in the context of location models. The principle of the M-estimation method is to minimize the residual function rather than minimize the sum of squared errors as the objective function. The M-estimation method for estimating the GE distribution parameters is defined by minimizing the objective function of all invariant errors ( ), as follows: To estimate of the two unknown parameters of the GE distribution, a simple comparison between two different objective functions is used. The selected objective functions are Tukey's Bisquare and Huber's weight.

4.685
The estimators of the parameters can be obtained for two objective functions of the M-estimation, derived by differentiating Equation (3.5) with respect to the scale and shape parameters of GE distribution. Then, we can obtain the simultaneous equations, which are given as follows: The derivative of (Θ) for Θ as follows: In order to solve the above equations, the Newton-Raphson method can be employed.

Simulation Study
A Monte Carlo simulation study is carried out and comparisons made between the non-robust and robust estimation methods. The non-robust methods are the maximum likelihood estimation (MLE) method, least square (LS) method, and maximum Product Spacing (MPS) method. The Robust methods are the Least Absolute Deviations (LAD) method, M.Huber (MH) method, and M.Bisquare (MB) method. Complete data with outliers are randomly generated from the GE distribution with the specified values of parameters. The R language statistical software is used to create the program for the Monte Carlo simulation.

Design of the Simulation study
Monte Carlo experiments were carried out using equation (1.3) to generate random samples from the GE distribution process: obtain the error term (ε) using a normal distribution( , 0, 2 ). 2 is the variance of the normal distribution, 2 = 0.25,0.5 and 0.75. Outliers are generated from a random sample from uniform distribution Uniform( ̅ + 4 , ̅ + 7 ), where ̅ is the sample mean of ~( , ) and S is the sample standard deviation of x (Wang and Lee, 2010). Select different sample sizes, = 20, 40 100, to investigate the robustness of the methods against outliers, we randomly generate different percentages of outliers (P= 5%, 10%, 15%, and 20%). Setting the parameter coefficient Θ = ( , ) = (2.5,1.5) and (0.5, 1.8), all simulation results are based on 10000 replications. The simulation results are compared using the bias and mean square errors (MSE).

Summary and Conclusions of Simulation Results:
The simulation results are presented in Tables (2:6) and Figure (1, 2). The numerical results of the robust estimation methods and the non-robust estimation methods using different percentages of outliers (p), different standard normal error term ( 2 ) and different sample sizes (n) are shown in the tables. We observe that an increase in the sample size leads to lower MSE values for robust and non-robust estimation methods. We also observe that higher percentage values of outliers' lead to higher Bias and MSE values for the robust and non-robust estimation methods. Furthermore, a higher value of the standard normal error term leads to higher Bias and MSE values for the robust estimation methods.

Application of Real Data Analysis
In this section, we present the results of the GE distribution parameter estimation using the robust and non-robust estimation methods on two real data sets.
Data set I: The first data set on the active repair times (hours) for an airborne communication transceiver. This data set was analyzed by Jorgensen (2012 (7), we observe the best result is from the MH method, since it has the least standard deviation, followed by the MB method and the LAD method, and so on as shown in Table 7. The goodness of fit test for the models; the Kolmogorov-Smirnov test shows that the efficiency of the test increases with the robust methods for the overall fit of the models with outliers as shown in Figure ( Table 8 provides the parameter estimates for the robust and non-robust estimation methods, the standard deviation for parameter estimates, the Kolmogorov-Smirnov (KS) statistic, and associated p values as shown in Figure (4).Table (8) shows that the MOAPL fits the economic data better than the MOL, APL, PL, KGL, EL and Lomax models based on these different criterions as the AIC, CAIC, BIC and HQIC values.

Conclusion
In this paper, we present various parameter estimation methods for the generalized exponential distribution using a complete dataset in the presence of outliers. We assessed the ability of the classical and robust estimation methods in determining the parameter estimates of the GE distribution using complete datasets in the presence of various percentage of outliers. The classical estimation methods are the MLE, LS, and MPS methods and the non-classical estimation methods are the LAD and M-estimation. The M-estimation method is used to minimize the invariant errors in the Bisquare objective function and Huber objective function. The Monte Carlo simulation results showed that the M-estimation method using the Huber objective function outperformed the other methods in terms of Bias and MSE values. The simulation results also showed that the MLE method, which is a classical method is more suitable than the LS and MPS methods for estimating the GE parameters. The two real datasets application confirmed that the Mestimation method is very much suitable for estimating the GE parameters. We concluded that the M-estimation method using Huber object function is the robust method in estimating the parameters of the GE distribution for a complete dataset in the presence of various percentage of outliers. This study concluded that, before performing the study and analyzing data of it, researchers must examine data through (Boxplot, the goodness of fit, etc.) and verify the presence or absence of outliers to determine the optimal way for estimation of the model's parameters.