Marshall-Olkin Generalized Pareto Distribution: Bayesian and Non Bayesian Estimation

A new generalization of generalized Pareto Distribution is obtained using the generator Marshall-Olkin distribution (1997). The new distribution MOGP is more flexible and can be used to model non-monotonic failure rate functions. MOGP includes six different sub models: Generalized Pareto, Exponential, Uniform, Pareto type I, Marshall-Olkin Pareto and Marshall-Olkin exponential distribution. We consider different estimation methods for estimating the model parameters, namely: Maximum likelihood estimator, Maximum product spacing, Least square method, weighted least square method and Bayesian Method. The Bayesian Method is considered under quadratic and Linex loss functions. Simulation analysis using MCMC technique is performed to compare between the proposed point estimation methods. The usefulness of MOGP is illustrated by means of real data set, which shows that this generalization is better fit than Pareto, Generalized Pareto and Marshall-Olkin Pareto distributions.

(2005). The GP distribution was used as a model for excesses over thresholds. Its applications include environmental extreme events, ozone levels in the upper atmosphere, large insurance claims or large fluctuation in financial data, and reliability studies. Its areas of applications are successfully addressed in several books, such as those by Castillo et al. (2005), Kotz and Nadarajah (2000). Alice and Jose (2004) considered Marshall-Olkin Pareto and Marshall-Olkin semi-Pareto distributions; they developed time series models with modification structure. Ghitany (2005) considered MO of Pareto distribution and studied some of its statistical properties and its hazard rate. He also showed that the limiting distributions of the sample extremes were of exponential and Frichet type. Almetwally and Almongy (2019 b) used different estimation methods to estimate the parameters of a New Weibull-Pareto Distribution, Bdair and Haj Ahmad (2019), discussed different point and interval estimation for MO Pareto distribution. In this work we introduce an extension of generalized Pareto distribution based on MO extension method, and make inferences on the parameters of MO generalized Pareto (MOGP) distribution. We study different classical point estimation methods for the unknown parameters of MOGP distribution as well as the Bayesian method. Some properties of the density function are discussed. Numerical methods are used to solve the obtained nonlinear equations. Simulations are used to make comparison between those methods, and also to determine which method is more efficient according to the Bias and mean square error (MSE). The rest of the paper is organized as follows: In section 2 we introduce MOGP distribution. Classical point estimation methods for the unknown parameters are discussed in section 3, while in section 4, Bayesian estimation method is considered. In section 5 simulation study and real life data analysis are presented and also comparison results among all estimation methods are provided.

Probability Density Function
Let F(x)=1-F(x) denote the survivor function of a continuous random variable X, let f(x) be the density function (pdf) associated with the cumulative distribution function (cdf) F(x), then the MO extended distribution has survival function ̅ ( ) = ̅ ( ) where α=1-α. For α=1, G(x) =F(x), hence F(x) is a special case of G(x). The probability density function (pdf) corresponding to Eq. (1) takes the form Consider Generalized Pareto distribution (GP) with (pdf) given by where θ is the shape parameter, λ>0 is a scale parameter. For θ <0 the range of x is 0<x< -λ/θ and for θ>0 the range is x>0.The GP distribution becomes the uniform distribution for θ=-1 and the exponential distribution for θ = 0 (taken as the limit). The GPD has mean (λ/ (1-θ)) and variance provided θ <0.5. In this work we will consider the case when θ > 0.The cumulative distribution function (cdf) is given by The corresponding probability density function (pdf) of MOGP is Marshall-Olkin Generalized Pareto Distribution: Bayesian and Non Bayesian Estimation 23 for α > 0, θ > 0, λ > 0, and ≥ 0.

Monotonicity of MOGP distribution
In this subsection we study the monotonicity of MOGP distribution, for that purpose let ( ) = (1 + ). Then the logarithm of pdf of MOGP in Eq. (2) can be rewritten as: where ≥ 0, α > 0, θ > 0, λ > 0. The monotonicity of Eq. (3) will give an idea about the monotonicity of MOGP distribution, hence taking the derivative with respect to x we obtain ].
Since V(x) and V′(x) are both positive, then if 0 < α < 1, the above derivative is negative, hence the MOGP distribution is decreasing for x > 0. While if α > 1 the derivative has a single root 0 = [( −2̅ +1 + ̅) − 1], so MOGP is increasing when x ∈ [0, r₀), but it is decreasing when x ∈ (r₀, ∞). Figure (1) illustrates the shapes of MOGP density function. Since the GP distribution is a decreasing function, using MOGP distribution is more appropriate for non-monotone models.

Hazard Rate Function
In this subsection, we provide the hazard rate (failure rate) function of MOGP distribution. We also present some graphs of these functions for some values of parameters that illustrates their properties. Hazard rate for GP is a decreasing function for all x > 0, which is given by: while MOGP distribution has a hazard rate function which takes different shapes. The hazard rate for MOGP is given by the following equation: .
In order to determine the shape of the hazard rate function ℎ * ( ; , , ) it is enough to determine the shape of log ℎ * ( ; , , ) . Hence, Marshall Hazard rate may take several shapes according to the following cases: 1. If 0 < α < 1, then the derivative of log ℎ * ( ; , , ) is negative and hence the hazard rate function is decreasing.

Classical Point Estimation Methods
In this section we consider different methods of point estimation for the MOGP parameters. Numerical techniques are helpful in obtaining the estimated values of these parameters, then simulation is used to compare between these estimation methods in order to decide which method is more efficient.

Maximum Likelihood Estimation
The maximum likelihood estimation (MLE) is used in inferential statistics since it has many attractive properties, such as invariance, consistency, and normal approximation properties. It depends basically on maximizing the likelihood function of MOGP distribution. Let X₁,X₂,...,Xn be a random sample from MOGP distribution, then the log likelihood function for the vector of parameters γ=(α,θ, λ) can be expressed by The above equations are not in closed form, therefore we must use any iterative procedure techniques such as conjugate-gradient algorithms, in order to get the needed numerical solution.

Maximum Product Spacing
According to Cheng and Amin (1983) Maximum Product Spacing method (MPS) was introduced as follows: , where G is defined as the geometric mean of the product spacing function such that It is easy to see that ∑ +1 .
The natural logarithm of the product spacing function is To obtain the normal equations for the unknown parameters, we differentiate partially Eq. (4) with respect to the vector parameter γ and equate them to zero, see Appendix A. The estimators of γ can be obtained by solving the system of nonlinear equations, so the MPS of α, θ and λ can be found by using any iterative procedure technique such as conjugate-gradient algorithms solution.

Least Square Method
Swain et al. (1988) introduced the least square estimators method, it based on the ordered sample x₁<⋯<xn from MOGP distribution, the least squares method are obtained by minimizing After differentiating equation (5) with respect to the parameters α, θ and λ and then equating them to zero we get the following normal equation: The above nonlinear equations can't be solved analytically, so the least square estimates of α, θ and λ can be used by numerical technique such as conjugate-gradient algorithms. Swain et al. (1988) introduced the weighted least square estimators. We use the WLS procedure for estimating the parameters α, θ and λ of the MOGP distribution. For the weighted least square estimators of the unknown parameters we need to minimize

Weighted Least Squares Method
with respect to α,θ and λ. After differentiating Eq. (6) with respect to parameters α, θ and λ and equating them to zero we get the normal equations as follows: ).
The above nonlinear equations can't be solved analytically, so the WLS of α, θ and λ, can be obtained by the use of conjugate-gradient algorithm.

Bayesian Estimation Method
Here we find Bayes estimates for the unknown parameters α, θ and λ. Two different loss functions are assumed, square error (SE) and Linex loss functions. In Bayesian method all parameters are random variables with certain distribution called prior distribution. If prior information is not available which is usually the case, we need to select a prior distribution. Since the selection of prior distribution plays an important role in estimation of the parameters, our choice for the prior of α,θ and λ are the independent gamma distributions i.e. G(a₁, b₁),G(a₂, b2) and G(a₃, b₃) respectively. Thus the suggested prior for α, θ and λ are where a₁, a₂, a₃, b₁, b₂ and b₃ are the hyper parameters of prior distributions. The joint prior of α, θ and λ is ( , , ) ∝ 1 −1 2 −1 3 −1 − 1 − 2 − 3 , , , , 1 , 2 , 3 , 1 , 2 , 3 > 0, while the joint posterior of α, θ and λ is given by ( , , / ) ∝ ( / , , ) ( , , ), where L( /α,θ, λ) is the likelihood function of MOGP distribution. Substituting L( /α, θ, λ) and k(α,θ, λ) for MOGP distribution, the joint posterior will be: . In the case of quadratic loss function Bayes estimate is the posterior mean, the determination of posterior mean for the purpose of obtaining Bayes estimation of the parameters α, θ and λ, is not easy to obtain unless we use numerical approximation methods. In literature there are several approximation methods available to solve this kind of problem. Here we consider Monte Carlo Markov Chain (MCMC) approximation method, see Karandikar (2006). This approximation method reduces the ratio of integrals into a whole and produces a single numerical result. The Bayes estimates of the unknown parameters α, θ and λ under Linex Loss function can be calculated through the following equation: where ν reflects the direction and degree of asymmetry, L is number of periods in the MCMC process, the readers may refer to Almetwally and Almongy (2018) for more details. A wide variety of MCMC schemes are available. An important sub-class of MCMC methods are Gibbs sampling and more general Metropolis within Gibbs samplers. This is often unavailable in MLE. Indeed, the MCMC samples may be used to completely summarize the posterior uncertainty about the parameters α, θ and λ, through a kernel estimate of the posterior distribution. This is also true of any function of the parameters. Therefore, to generate samples from MOGP distribution, we use the Metropolis-Hastings method (Metropolis et al. (1953) with normal proposal distribution). For details regarding the implementation of Metropolis-Hasting algorithm, the readers may refer to Robert and Casella (2004) and Almetwally et al. (2018).

Simulation Study and Data Analysis
In this section, we provide a complete algorithm of Monte Carlo simulation (MCS) study, in order to compare between the different estimation methods proposed in the previous sections, then we apply these computations on a real life example.
Step 3: Generate the sample random values of MOGP distribution by using quantile function = (( Step 4: Solve differential equations for each estimation methods, to obtain the estimators of the parameters for MOGP distribution, so we calculate α, θ, and λ. Step 5: Repeat this experiment (L-1) times. In each experiment use the same values of the parameters. It is certain that, the values of generating random are varying from experiment to experiment even though sample size (n) does not change. Finally, we have L-values of bias and MSE, We compute the average biases and average MSE's over 10,000 runs. This number of runs will give the accuracy in the order ±0.01 (see Karian and Dudewicz (1999)). Therefore, we report all the results up to three decimal places. Remember that Bias estimator is Bias=̂− , where γ is the estimated value of γ, and the mean squared error (MSE) of the estimator is MSE=Mean (̂− )². The results are presented in six different tables where the MSE is given in each cell and the corresponding bias is reported within parenthesis. It is observed from Tables 1 to 6 that all of the estimators usually over estimate for small values of . For > 1 , most of the estimators seems to be underestimates . One can also observe that for each estimation method, the MSE's decreases as the sample size increases. It is observed from Tables 1 to 6 that all of the estimators usually overestimate for all values of for small sample size except the Bayes estimate in table 5 and 6 for Linex 0.75 and Linex1.5 which are usually underestimate where > 1. For large sample size the estimator of tends to be underestimate especially for WLS as shown in Table 5. Also in Table 6 we observe that tends to be underestimate for WLS and Bayes estimate under SE, Linex 0.75 and Linex 1.5. It is also observed that all estimates decrease as the sample size increases. One can observe that for each estimation method, the MSE's decreases as the sample size increases and also as the value of increases, but the MSE increases as the value increases. It is observed from Tables 1 and 2 that the MLE underestimate, while in Table 4 for large value of sample size we realize that the estimators MLE, LS and WLS under estimate. Notice that in Table 6 where > 1 most estimators under estimate, but the MPS method overestimates .
Among all classical methods of estimation, the best method to estimate if <1 is the MPS method while for >1 the best method is the MLE with respect to the bias. Regarding the best estimation method for small sample size is the LS method, while for large samples the WLS method will serve better. For estimating all classical methods are superior with different values of and and for different sample sizes except the MPS which seems not appropriate to estimate .

Marshall-Olkin Generalized Pareto Distribution: Bayesian and Non Bayesian Estimation 28
Now for non-classical Bayes point estimation method the best loss function for estimating , and is Linex 1.5 for most cases where < 1, and we realize that the square error works better for > 1.

Data Analysis
We present the numerical results of the parameter estimation of MOGP distribution of a real data. This real data are mentioned by Birnbaum and Saunders (1969) (7), MOGP model is compared with other special case models like GP, Pareto type 1 and MOP. This comparison was conducted using Kolmogorove-Smernove (KS) test with the corresponding p-values. Also Akaike information criterion (AIC) such that AIC=-2 L(γ)+2p, where p is the number of parameters in the model and L is the maximized value of the likelihood function for the model. Given a set of candidate models for the data, the preferred model is the one with the minimum AIC value. Bayesian information criterion (BIC) is also used as a criteria for comparison between models where BIC can be defined as: BIC=-2 L(γ)+p ln(n), where n is the sample size. As a model selection criterion, the researcher should choose the model with the minimum BIC value. The last criterion is the consistent Akaike information criteria (CAIC), which is defined by CAIC=-2 L(γ)+p(ln n+1). As a model selection criterion, the researcher should choose the model with the minimum CAIC value. The MLEs of , , are computed numerically using the function optimal in R statistical package. The values of the KS statistic with p-values, AIC, CAIC, BIC and HQIC are reported in Table (7). When comparing the values of KS between MOGP and other sub models like GP, Pareto type 1 and MOP we obtain the minimum KS for MOGP which equals 0.054 with related p-value of 0.927. Therefore, this indicates that the MOGP distribution fits the data set well and better than other distributions. This is also indicates the needs of new distributions in managing some sets of data. The same result is found using AIC, CAIC and HQIC criteria since the smallest of these values is noticed for MOGP amongst other distribution. While the minimum value of BIC is indicated to MOP with small difference between it and BIC of MOGP. So in general we can tell that the new distribution is superior according to other sub models. We have estimated parameters of MOGP distribution and compute SE under different estimation methods as shown in Table ( 8). According to the above real data we realize that WLS estimate better for and , while Linex 0.75 is better to be used for estimating  The MPS method is the difference between the observations of the cumulative function, so this method cannot be used for that data because of similar values in it.

Conclusions
In this study we considered a generalization of GP distribution which is MOGP this new distribution proved to be more flexible and more appropriate for monotone life time data. The pdf of MOGP is monotonic depending on the values of parameters. Also we realized that the hazard rate is increasing-decreasing function which can be widely used to model different real data. We studied different classical and non-classical point estimation methods and by numerical methods we conducted a comparison between these methods. Between classical methods we concluded that the best method for estimating is the MPS method and for estimating are LS and WLS depending on sample size while for estimating we found that MLE and WLS are most appropriate methods. Our choice of best estimating method was with respect to minimum value of absolute bias. Regarding Bayes methods Linex loss function was better than square error loss function for estimating all parameters of MOGP distribution. The flexibility of this distribution was illustrated in an application to a real data set.