The Weibull-G Poisson Family for Analyzing Lifetime Data

We study a new family of distributions defined by the minimum of the Poisson random number of independent identically distributed random variables having a general Weibull-G distribution (see Bourguignon et al. (2014)). Some mathematical properties of the new family including ordinary and incomplete moments, quantile and generating functions, mean deviations, order statistics, reliability and entropies are derived. Maximum likelihood estimation of the model parameters is investigated. Three special models of the new family are discussed. We perform three applications to real data sets to show the potentiality of the proposed family.

bigger than the mean. All these above merits for finding a mixture of Weibull-G family of distribution, proposed and studied by Bourguignon et al. (2014) (after adding more flexibility to Weibull model itself) with possibly a discrete probability distribution with the same support (0, ∞). This is why we considered Poisson distribution with the Weibull-G type models to capture more flexibility. We hope that our proposed model will be better in capturing several patterns of the data structure to describe appropriately the associated reliability structure, in particular to those cases where the individual Weibull-G family (a specific member) or the Poisson distribution alone might not be a good model.
The well-known generators are the following: beta-G by Eugene et al. (2002), Kumaraswamy-G by Cordeiro and de Castro (2011), exponentiated generalized-G by Cordeiro et al. (2013), generalized transmuted-G by Nofal  We motivate our model by considering a typical system failure in a reliability context. We envision a scenario that we will encounter a data which is a mixture of discrete and continuous type. We begin by assuming the distribution of a system consisting of independent subsystems having a zero inflated Poisson distribution. We discard the scenario that all components simultaneously will fail to work, theoretically viable but realistically not a prudent one. Suppose 1 , . . . , be independent identically random variable (iid) with common CDF Weibull-G and be random variable with Equation (1) is called Weibull-G Poisson (WGP) distribution. Several new models can be generated by considering special distributions for ( ; ) . The corresponding PDF of (1) reduces to The reliability function (rf) of is given by where and are two positive shape parameters. A random variable with PDF (2) is denoted by ∼ WGP ( , , ).
The rest of the paper is organized as follows. In Section 2, we provide a useful mixture representation for its PDF. In Section 3, we define two special models and give some plots of their PDF's and hazard rate functions. In Section 4, we derive some of its general mathematical properties including quantile and generating functions, ordinary and incomplete moments, mean deviations, entropies, order statistics, residual and reversed residual life and stress-strength mode. Maximum likelihood estimation of the model parameters is addressed in Section 5. In Section 6, simulation results to assess the performance of the proposed maximum likelihood estimation procedure are discussed. In Section 7, we provide three applications to real data to illustrate the importance and flexibility of the new family. Finally, some concluding remarks are presented in Section 8.

Linear representation
In this section, we provide a useful representation for (2) using the concept of exponentiated distributions. The WGP family density in (2) can be expressed as The last equation can be expressed as Then, the WGP density can be rewritten as Equation (3) reveals that the WGP density function is a mixture of Exp-G densities. Thus, some mathematical properties of the new family can be derived from those properties of the Exp-G class. The CDF of the WGP family can also be expressed as a mixture of E-G densities. By integrating (3), we obtain the same mixture representation where ( ) is the CDF of the Exp-G family with power parameter ( ) .

Special WGP distributions
The PDF (2) allows greater flexibility of its tails and can be widely applied in many applied areas of statistics. Now, we define and study two special models of the WGP family by taking the following baseline distributions: gamma (G), log-logistic (LL) and exponentiated exponential (EE) distributions. The PDF (2) will be most tractable when the CDF ( ; ) and the PDF ( ; ) have simple analytic expressions.

The WG P distribution
The G distribution with positive parameters and has PDF and CDF (for > 0) given by The plots in Figures 1 and 2 show some possible shapes of the density and hazard rate functions of the WGP distribution.

The WLLP distribution
The LL distribution with positive parameters and has PDF and CDF given by

The WEEP distribution
The EE distribution with scale parameter > 0 and shape parameter > 0 has PDF and CDF given by , respectively. Then, the WEEP density function reduces to Figures 5 and 6 display some possible shapes of the density and hazard rate functions of this distribution.

Mathematical properties
In this section, we derive some general mathematical properties of the new family. Established explicit expressions to calculate statistical measures can be more efficient than computing them directly by numerical integration.
Simulating the WGP random variable is straightforward. If is a uniform variate on the unit interval (0,1), then the random variable = ( ) follows (2). For simulating from WGP if ∼ (0,1) , then solution of nonlinear equation .
Here, we provide two formulae for the mgf ( ) = ( ) of . Clearly, the first one can be derived from equation

Ordinary and incomplete moments
The th moment of , say ′ , follows from (3) as denotes the Exp-G distribution with power parameter . The variance, skewness, and kurtosis measures can now be calculated using the well-known relations. The th central moment of , say , is given by The cumulants ( ) of follow recursively from The main applications of the first incomplete moment refer to the mean deviations and the Bonferroni and Lorenz curves. These curves are very useful in economics, reliability, demography, insurance and medicine. The th incomplete moment, say ( ) , of can be expressed from (3) as

Mean Deviations
The mean deviations about the mean where ( ) = ∫ −∞ ( ) is the first incomplete moment of the Exp-G distribution. A second general formula for 1 ( ) is given by can be computed numerically. These equations for 1 ( ) can be applied to construct Bonferroni and Lorenz curves defined for a given probability by ( ) = 1 ( )/( 1 ′ ) and ( ) = 1 ( )/ 1 ′ , respectively, where 1 ′ = ( ) and = ( ) is the qf of at .

Entropies
The Rényi entropy of a random variable represents a measure of variation of the uncertainty. The Rényi entropy is defined by , > 0 and ≠ 1.
Then, we can write After some algebra, we have Then, the Rényi entropy can be expressed as The -entropy, say ( ) , can be obtained as The Shannon entropy of a random variable , say , is defined by = {−[ ( )]}. The Shannon entropy is a special case of the Rényi entropy when ↑ 1 and it follows by taking the limit of ( ) as tends to 1.

Order statistics
Order statistics make their appearance in many areas of statistical theory and practice. Let 1 , … , be a random sample from the WGP family of distributions. The PDF of th order statistic, say : , can be written as Using (1), (2) and (7) we get Substituting (8) in equation (7), the PDF of : can be expressed as Then, the density function of the WGP order statistics is a mixture of Exp-G densities. Based on the last Equation, we note that the properties of : follow from those properties of (1+ )+ . For example, the moments of : can be expressed as

Stress-strength model
Stress-strength model is the most widely approach used for reliability estimation. This model is used in many applications of physics and engineering such as strength failure and system collapse. In stress-strength modeling, say ( 1 , 2 | 1 > 2 ) = ( | 1 > 2 ) , is a measure of reliability of the system when it is subjected to random stress 2 and has strength 1 .
The system fails if and only if the applied stress is greater than its strength and the component will function satisfactorily whenever 1 > 2 . ( 1 , 2 | 1 > 2 ) can be considered as a measure of system performance and naturally arise in electrical and electronic systems. Further, the reliability of the system is the probability that the system is strong enough to overcome the stress imposed on it.

Estimation
Several approaches for parameter estimation were proposed in the literature but the maximum likelihood method is the most commonly employed. Here, we consider the estimation of the unknown parameters of the new family from complete samples only by maximum likelihood. Let 1 , … , be a random sample from the WGP family with parameters , and . Let be the × 1 parameter vector. To obtain the MLE of , the log-likelihood function, ℓ = ℓ( ) , is given by For doing this, it is usually more convenient to adopt nonlinear optimization methods such as the quasi-Newton algorithm to maximize ℓ numerically. For interval estimation of the parameters, we obtain the × observed information matrix ( ) = { 2 ℓ } (for , = , , ), whose elements can be computed numerically. Under standard regularity conditions when → ∞ , the distribution of ̂ can be approximated by a multivariate normal distribution to obtain confidence intervals for the parameters. Here, (̂), is the total observed information matrix evaluated at ̂ . The elements of ( ) are given in the Appendix A.

Simulation study
In this section, we evaluate the performance of the MLEs by using Monte Carlo simulation for different sample sizes and different parameter values. We choose PWEE model for this purpose. The simulation study is repeated 10,000 times each with sample sizes = 25,50,75,100,200,400 and parameter combinations I: = 0.5 , = 0.5 , = 1 , and II: = 0.5 , = 1.5 , = 2. increases. The CP of the confidence intervals are quite close to the nominal level of 95%. Therefore, the MLEs and their asymptotic results can be used for estimating and constructing confidence intervals even for reasonably small sample sizes.

Applications
In this section, we consider three applications to three real data sets to illustrate the flexibility of the new family of distribution. We also analyzed the hazard rates of these three data sets. In order to identify the shapes of data, we consider the graphical method based on total time on test (TTT) transformed, introduced by Barlow and Campo (1975). The empirical illustration of TTT transform is given by Aarset (1987).
The first data set presents increasing-shaped (unimodal) hazard function while the second and third data sets present upside-down bathtub shaped hazard function. From Figure 3(a), the TTT plot for the data set 1 shows that hazard function τ (x) is concave giving an indication of increasing shape, while in Figures 4(c) and 5(e), TTT-plot for the data sets 2 and 3 show that the hazard rate function is first concave and then convex, giving an indication of upside-down bathtub shape. Hence, the WGP family could be in principle an appropriate model for fitting these data sets.
The Figures 8, 10 and 12, we consider kernel density estimation (a non-parametric approach) with Gaussian Filter. Let X1, X2, . . ., Xn be an independently identically distributed (IID) random vector of variables which follows an unknown distribution f. The kernel density estimator is given by where K(.) is the kernel function usually symmetric and ∫ ( ) = 1 ∞ −∞ , and h > 0 is a smoothing parameter, also known as bandwidth.
The MLEs are calculated and the goodness-of-fit statistics including the log-likelihood function evaluated at the MLEs, Akaike information criterion (AIC), Kolmogorov-Smirnov (K-S) and its P-value are determined to compare the fitted models. The required computations are carried out in the R-language.  The first data set (Crowder et. al [21]) refers to the failure stresses of single carbon fibers (length 1mm  1, we conclude that all the models provide the adequate fit, whereas that EW and GEE provides the best fit followed by WEEP and WLLP. The summary statistics and figure indicate that the first data set is approximately symmetric. This indicates that the new family of distributions has the ability to fit data set with symmetric shape. The P-P plot given in Figure 13 also supports the results of Table 3.    Smith and Naylor (1987). The summary statistics of the second data set are: = 46, x=1.13, s=0.2713, skewness = 0.7935 and kurtosis = 0.5995. From figures in table 3, we conclude that WGP and WEEP models provide the adequate fit, whereas that GEE do not provide the god fit. The summary statistics and figure indicate that the first data set is approximately left skewed. This indicates that the new family of distributions can fit data set with left skewed characteristic. The P-P plot given in figure 14 also supports the results of table 3 The third data set describes the 101 stress-rupture lives of 49 kevlar epoxy strands, which were subjected to constant sustained pressure at the 90 stress level until all had failed, so that we have complete data with exact times of failure. The failure times (in hours) are given in Cooray and Ananda (2008). The summary statistics of the first data set are: n = 101, x = 1.0248, s = 1.1193, skewness = 3.00172 and kurtosis = 13.7089. From the figures in table 5, we verify that WGLLP, WEEP and WGP provides the best fit. A close look at the summary statistics and figure 15 indicate that the third data are right skewed. So, the proposed family has the ability to fit right skewed data. The P-P plot in figure also supports the result in table 5.  Figure 11: TTT plot for data set 3. Figure 12: Gaussian kernel density estimation for data set 3.

Conclusions
In this paper, we present a new Weibull-G Poisson (WGP) family of distributions, which extends the Weibull-G family by adding one extra shape parameter. Some mathematical properties of the new family including explicit expressions for the ordinary and incomplete moments, quantile and generating functions, mean deviations, entropies and order statistics are provided. The model parameters are estimated by maximum likelihood and the observed information matrix is determined. We perform a Monte Carlo simulation study to assess the finite sample behavior of the maximum likelihood estimators. We prove empirically by means of three real data sets that some special models of the WGP family can give better fits than other models generated from well-known families.