A Numerical Comparison of Three Procedures Used in Failure Model Discrimination

Three different selection procedures namely RML, S and F-procedure are reviewed with application to exponential, Weibull, Pareto, and Finite range models. Some inaccurate results were discovered in the article of Pandy et al. (1991), it will be illustrated and modified. A simulation study is developed to numerically compare between the three procedures by obtaining the probability of correct selection.


Introduction
To model the time to failure data for estimating reliability, several families of probability distributions are commonly used.In this paper, two pairs of distributions are considered: firstly, the Weibull and exponential distributions; secondly, the finite range and Pareto distributions.The problem that we consider in this article may be described as follows: Suppose that we have a complete random sample 12 , ,..., n T T T as a life time data on a parent random variable T with distribution function F .It is required to decide that F is a member of one of a set of separate families of distribution functions say 12 , ,..., k F F F , without specifying the actual values of all or some parameters of the model.Consequently, we require a selection rule for deciding which of these k families best fits the sample.Practically, if 2 k  , we would like to choose between the two hypotheses 0 1 2 : , ,..., n H t t t was sampled from 1 F Versus 1 1 2 : , ,..., n H t t t was sampled from 2 F .
The concept of separate family of probability distributions was introduced by Cox (1961) who considers this problem from the hypothesis testing viewpoint.Atkinson (1970) considered this problem from the discrimination viewpoint in which the two hypotheses are treated symmetrically.Dumonceaux et al. (1973) proposed the use of the ratio of the maximized likelihoods (RML) statistic and applied it on discriminating between normal and Cauchy distributions, and between normal and exponential distributions.Dumonceaux and Antle (1973) developed the application of this approach to discriminate between lognormal and Weibull distributions.Bain and Engelhardt (1980) recommended a procedure based on the ratio of the maximized likelihoods statistic (RML-procedure) in choosing between Weibull and gamma model.Gupta andKundu (2003, 2004) derived the asymptotic distributions of the RML statistic under null hypotheses in discriminating between generalized exponential and Weibull models and between generalized exponential and gamma models.Kundu et al. (2005) derived the asymptotic distributions of the RML statistic in discriminating between generalized exponential and log-normal models.In addition, S-procedure has been proposed by Quesenberry and Kent (1982) to choose among exponential, gamma, Weibull, and lognormal models.Also, Pandey et al. (1991) derived F-procedure to choose between exponential and Weibull models; and between finite range and Pareto models.
The present paper deals exclusively with discrimination between exponential and Weibull distributions; and between finite range and Pareto distributions.Their densities are given in table I where  and  are the scale and shape parameters of each model respectively.
It is necessary to point out that the exponential and Weibull models are two important failure models used frequently in reliability.Since the exponential distribution is a special case of the Weibull distribution, then we need to develop a method to discriminate between them.In our case of study, the exponential model has only a scale parameter, but the Weibull model has a scale and a shape parameter.Thus, the Weibull model is more powerful to fit any data drawn from the exponential model in case of unknown Weibull shape parameter.In other words, it isn't useful to discriminate between exponential and Weibull models in case of unknown Weibull shape parameter.
The finite range and Pareto models have a common property that the domain of their random variables depends on the scale parameters.This common property makes discriminating between the two models so difficult, therefore many methods were developed to discriminate between them.
A comparison among different selection procedures is done numerically by obtaining probability of correct selection (PCS), which is defined as the probability that a selection procedure leads to select the correct distribution.

Ratio of Maximized Likelihoods (RML)-Procedure
Dumonceaux et al. (1973) proposed the ratio of the maximized likelihoods (RML) statistic, which is given by , where 1 L and 2 L are the likelihood functions for 1 F and 2 F with vectors of parameters  and  respectively and  and  are vectors of the maximum likelihood estimators for the parameters.This formula can be modified easily to be on the form of Bain and Engelhardt (1980) recommended a procedure based on the ratio of the maximized likelihoods statistic (called RML-Procedure) in choosing between two models, where the acceptance or rejection of any hypothesis depends only on the value of the RML statistic.In practice, 1 F best fits data if the RML statistic is greater than one (equivalent to 0   ), otherwise 2 F best fits data.
To discriminate between exponential and Weibull models using RML-procedure, suppose that the unknown scale parameter of exponential distribution is denoted by e  .Also, the unknown scale and known shape parameters of the Weibull distribution are denoted by w  and * w  ; respectively.Pandey et al. (1991) obtained the following natural logarithm of the ratio of the maximized likelihoods where ˆe To discriminate between the Finite range and Pareto models using RML-procedure, suppose that the unknown scale parameter and unknown shape parameter of finite range distribution are denoted by where, ˆfr  , ˆfr  , ˆp  , and ˆp  are the maximum likelihood estimators of the parameters

S-Procedure
Quesenberry and Kent (1982) used a selection procedure based on statistics that are invariant under scale transformations of the data for choosing between the hypotheses mentioned in section 1.They found that their statistic has the property of being independent of the actual values of the scale parameter.The proposed selection rule is to select a distribution family i F by obtaining the S statistic for each family.The S statistic is given by 1 12 ( , ,..., ) , otherwise the distribution family of 2 F is selected.
To discriminate between the exponential and Weibull models using S-procedure, Quesenberry and Kent (1982) obtained the natural logarithm of S statistic for the exponential distribution as and for the Weibull distribution as Thus the exponential model is selected if ( ) ( ) , otherwise the Weibull model is selected.
To discriminate between the Finite range and Pareto models using S-procedure, Pandey et al. (1991) obtained the natural logarithm of S statistic for the Pareto distribution as . Pandey et al. (1991) gave the following incorrect formula of the logarithm of the S statistic of Finite range distribution at page 1379 St n . which proves our formula.
Thus the Pareto model is selected if ( ) ( ) , otherwise the finite range model is selected.

F-Procedure
Pandey et al. (1991) proposed the F-procedure for choosing between the hypotheses mentioned in section 1.The distribution families 1 F and 2 F are fitted using the regression equation of the form ]. After that, equating the left hand side, y , by the corresponding empirical estimation of the right hand side which is a function of ()  Rt or () Ht .Then, calculate the corresponding 12 , ,..., n y y y .Finally, the F-statistic is computed according to the following formula: where Practically, both 1 F and 2 F are fitted to the sample data by calculated F-statistics of 1  and 2  respectively.Then F-procedure is proposed as: . Practically, distribution family of 1 F is selected when 12   otherwise distribution family of 2 F is selected.
To discriminate between exponential and Weibull models using F-procedure,   , while the Pareto model is selected otherwise.

A Numerical Comparison
In our case of study, we consider two sets of two pairs of failure models, exponential versus Weibull and Pareto versus Finite Range.To carry out comparison, programs used to calculate the probability of correct selection (PCS) for each of the three selection procedures.In comparison, we have two cases: Case 1: Exponential versus Weibull Model PCS values for exponential versus Weibull are computed and reported in table II.Numerical comparison in this case is carried out as follows: When the data are coming from an exponential distribution, we consider 5,10, 20,30,50 n  . Then, for different values of n we generated a random sample of size n from exponential distribution with scale parameter 1   and check whether a given procedure correctly select the distribution or not.This process is replicated 10,000 times to obtain the PCS value for different values of n with each procedure.We notice that, when the data are drawn from an exponential distribution, as the shape parameter  increases the PCS values for RML or S-procedure decrease until become near from 1   then switch to increase until reach the certainty with  greater than or equal 10.But in respect to F-procedure, we notice that as  increases the PCS values remain constant all the time.
But when the data are coming from a Weibull distribution, we consider 5,10, 20,30,50 n  and .25,.5,.75,1.25,1.5,1.75, 2,5,10   .Then, for different values of n and  we generated a random sample of size n from Weibull distribution with shape parameter  and scale parameter 1   and check whether a given procedure correctly select the distribution or not.This process is replicated 10,000 times to obtain the PCS value for different values of n and  with each procedure.We notice that, when the data are drawn from a Weibull distribution, as the shape parameter  increases the PCS values for RML or S-procedure decrease until become near from 1   then switch to increase until reach the certainty with  greater than or equal 10.But in respect to F- procedure, we notice that its PCS values increase as  increases until reach the certainty with  greater than or equal 5.

Case 2: Finite Range versus Pareto Model
PCS values for finite range versus Pareto are computed and reported in table III.Similarly, numerical comparison in this case is carried out as follows: When the data are coming from a Finite range distribution, we consider 5,10, 20,30,50 n  and .25,.5,.75,1,1.25,1.5,1.75, 2,5,10   .Then, for different values of n and  we generated a random sample of size n from Finite range distribution with shape parameter  and scale parameter 1   and check whether a given procedure correctly select the distribution or not.This process is replicated 10,000 times to obtain the PCS value for different values of n and  with each procedure.We notice that when the data are drawn from a Finite range distribution, as the shape parameter  increases the PCS values for RML or S or F-procedure remain constant.and scale parameter 1   and check whether a given procedure correctly select the distribution or not.This process is replicated 10,000 times to obtain the PCS value for different values of n and  with each procedure.We notice that when the data are drawn from a Pareto distribution, as the shape parameter  increases the PCS values for RML or S or F-procedure remain constant. .In addition, we generated 10,000 samples in all cases and that led to essentially different results.The reason behind choosing exactly 10,000 generated samples is the stationarity of results for the number of generated samples more than 10,000.On the other hand, Pandey et al. (1991) neither fixed the number of generated samples nor stated an acceptable reason behind generating varied number of samples changes from sample size to another.

Conclusions
Comparing RML with S with F-procedure in case of discriminating between exponential and Weibull models, we have the following two situations: 1.The actual distribution is exponential: using graph I when data are drawn from exponential, we notice that RML is more efficient for 1   while S is more efficient for 15  , but 5   the RML and S are equivalent.

2.
The actual distribution is Weibull: using graph II when data are drawn from Weibull, we notice that S may be preferred to others procedures for 1   while RML may be preferred to others procedures for 15   but for 5   the RML, S and F are equivalent.
Similarly, comparing RML with S with F-procedure in case of discriminating between Finite range and Pareto models, we have the following two situations: 1.The actual distribution is Finite Range: using graph III when data are drawn from Finite Range, we notice that RML and F-procedure are preferred to S procedure for all  since they are equivalent.

2.
The actual distribution is Pareto: When data are drawn from Pareto, we notice that RML is preferred to others procedures.
We still need to know which of the three procedures is the most powerful and reliable.Consequently, we could use table V which shows the percentage of all samples correctly selected by each procedure in our two cases of selection.We notice that in case of discriminating between Weibull and exponential the RML and S procedures are equivalent and preferred.And in case of discriminating between finite range and Pareto the RML procedure is preferred to the others.Finally, all conclusions will conduct us to trust RML-procedure in general.


respectively.Also, the unknown scale and unknown shape parameters of the Pareto probability distribution are denoted by p  and p  respectively.Pandey et al. (1991) obtained the following natural logarithm of the ratio of the maximized likelihoods


respectively.Using equation (2) the finite range model is selected if 0   , otherwise if 0   the Pareto model is selected.
But when the data are coming from a Pareto distribution, we consider 5,10, 20,30,50 n  and .25,.5,.75,1,1.25,1.5,1.75, 2,5,10   .Then, for different values of n and  we generated a random sample of size n from Pareto distribution with shape parameter  Work of Pandey et al. (1991) Pandey et al. (1991) carried out a numerical investigation to compute PCS values for the RML, S and F procedures in discriminating between exponential and Weibull models and between finite range and Pareto.They generated 5000 samples of size 5,10 and 20 n  and 3000 samples of size 30 n and 2000 samples of size 50 n  .They found that the reason behind choosing these varied numbers of generated samples due to the excessive increase in execution time with higher sample sizes.In addition, they excluded the sample size 50 in case of discriminating between exponential and Weibull models because the S statistic of exponential involved gamma function () n  which is too large to be evaluated with 50 n  on their PC machine.The problem which Pandey et al. (1991) faced were solved now.We could use a sample of size 50 n  since the recent IBM machine has the ability to evaluate the gamma function () n  for 171 n 

Table II : PCS of Exponential Distribution versus Weibull Distribution with unknown scale parameter 10,000 Generated Samples
Pak.j.stat.oper.res.Vol.X No.1 2014 pp107-119

Table II : PCS of Finite Range Distribution versus Pareto Distribution with unknown Shape and Scale parameters 10,000 Generated Samples
Pak.j.stat.oper.res.Vol.X No.1 2014 pp107-119