The Negative Binomial-New Generalized Lindley Distribution for Count Data: Properties and Application

In this paper, a new mixture distribution for count data, namely the negative binomial-new generalized Lindley (NB-NGL) distribution is proposed. The NB-NGL distribution has four parameters, and is a flexible alternative for analyzing count data, especially when there is over-dispersion in the data. The proposed distribution has submodels such as the negative binomial-Lindley (NB-L), negative binomial-gamma (NB-G), and negative binomialexponential (NB-E) distributions as the special cases. Some properties of the proposed distribution are derived, i.e., the moments and order statistics density function. The unknown parameters of the NB-NGL distribution are estimated by using the maximum likelihood estimation. The results of the simulation study show that the maximum likelihood estimators give the parameter estimates close to the parameter when the sample is large. Application of NB-NGL distribution is carry out on three samples of medical data, industry data, and insurance data. Based on the results, it is shown that the proposed distribution provides a better fit compared to the Poisson, negative binomial, and its sub-model for count data.


Introduction
A Poisson distribution is typically used to fit count data when the number of phenomena is randomly distributed over the time and/or space in which the counts of the phenomenon occur. Equality of the mean and variance is characteristic of the Poisson distribution. Let X be a Poisson random variable with the parameter .  Then, the probability mass function (pmf) of X is given by  (2) The NB distribution is better for overdispersed count data that are not necessarily heavy-tailed. The extreme heavy tail implies overdispersion, but the converse does not hold (Wang, 2011). The traditional statistical distributions or models, such as the Poisson and NB distributions, cannot be used effectively for count data with a heavy tail. The Poisson distribution tends to underestimate the number of zeros given the mean of the data while the NB distributions may overestimate zeros and underestimate observations to be a count data (Lord and Geedipally, 2011). Many researchers proposed the mixture distribution, which is one of the most important ways to obtain new probability distributions in applied probability and operational research (Gómez-Déniz et al., 2008). In this study, we are considering the mixture NB distribution as a more flexible alternative to analyze count data, especially, count data with over-dispersion. It is a mix between the NB distribution and a lifetime distribution. Elbatal et al. (2013) proposed a new generalized Lindley (NGL) distribution as an alternative for modeling lifetime data in many areas. Let  be a random variable distributed as the NGL distribution with parameters ,   and ,  i.e., ~NGL( , , ).    Then the probability density function (pdf) of  is given by The corresponding moment generating function (mgf) of  is Three sub-models of the NGL distribution as follows; (i) if 1 = and 2, = we get the Lindley distribution with a parameter ,  which is proposed by Lindley in 1958 (see Ghitany et al., 2008), (ii) if ,  =  we get the gamma distribution with parameters  and  (Jambunathan, 1954), and (iii) if 1,  =  = we get the exponential distribution with a parameter  (Gupta and Kundu, 1999).
The contents of the article are structured as follows. In Section 2, a new mixed negative binomial distribution by mixing the NB and NGL distributions to create the negative binomial-new generalized Lindley distribution, is proposed. In Section 3, we present some characteristic properties of the proposed distribution. In Section 4, the method to estimate unknown parameters of the proposed distribution is introduced. Next, we illustrate simulation study and application study of the proposed distribution with three real data sets in Section 5. Finally, the conclusion is provided in Section 6.

A new mixture distribution for count data
In this section, we provide the definition and theorem of the new mixture negative binomial distribution. Next, its submodel is provided.
By substituting (7) into (6), we obtain ( ) j0 0 x j j0 r x 1 x f (x; r, , , ) ( 1) e g( ; , , ) d xj By replacing the mgf of NGL distribution in (4) with t (r j) = − + into (8), then the pmf of X is . xj 1 ( r j) ( r j) Some pmf plots of the NB-NGL distribution with some fixed values of parameters r, ,   and  are shown in Figure   1. The NB-NGL distribution has three sub-models as follows.
where the NB-L distribution was proposed by Zamani and Ismail (2010).
Proof. If X~NB-NGL(r, , , ).    and substituting 1 = and 2 = in (5) then pmf of X is given by which is the pmf of NB-L distribution. In the same way, we get the pmf of the NB-L distribution as follow; If  has the Lindley distribution (see Zamani and Ismail, 2010) with the pdf and mgf as with the pmf in (2) and ~Lindley( )  with the pdf and mgf as (3) and (4) respectively. The pmf of X is obtained by By replacing the mgf of the Lindley distribution in (11) with t (r j) = − + into (12), then the pmf of X is Let X~NB-NGL(r, , , ).    If ,  =  we get the negative binomial-gamma (NB-G) distribution with positive parameters r,  and .
 The pmf of the NB-G distribution is Proof. If X~NB-NGL(r, , , ),    and we substitute  =  in (5), then the pmf of X is which is the pmf of the NB-G distribution (Gençtürk and Yiğiter, 2016) Proof. If X~NB-NGL(r, , , ),    and we substitute 1  =  = in (5) then the pmf of X is which is the pmf of the NB-E distribution (Panjer and Willmot, 1981)

Mathematical properties
Some properties of the proposed distribution including the moments and order statistics density function, are introduced in section.
where ()  is the complete gamma function, i.e., Using a binomial expansion in the term

Order statistic density function
Let 1 2 n X , X ,..., X be n independent and identically distributed (iid) random variables defined on  with the cumulative density function (cdf) X F (x) and the pmf X f (x). Let (1) (2) (n) X X X    denote these random variables rearranged in non-descending order of magnitude. Thus, (k ) X is the kth smallest number in the sample, k 1, 2,..., n. = Because order statistics are random variables, it is possible to compute probability values associated with values in their support. The kth order statistics density function of (k ) X is (e.g., Casella and Berger, 2002) given If 1 2 n X , X ,..., X be n iid variables with the pmf X f (x) as in (5) (19) where k 1, 2,..., n, = s, x 0,1, 2,... = for sx  and r, , , 0.    

Maximum likelihood estimation
For this study, the unknown parameters of the proposed distribution are estimated via the method of maximum likelihood estimation (MLE). Let 1 2 n x (x , x ,..., x ) = be a random sample of size n from the NB-NGL distribution with parameters of (r, , , ).  =    From the pmf of the NB-NGL distribution in (5) The maximum likelihood estimators, r , ,  and ˆ,  are obtained by solving the expression (20). In this study, r , ,  and ˆ,  are obtained by using the numerical optimization with the nlm function in the stats package in R (R Core Team, 2018).

Simulation
The simulation study of parameter estimation is illustrated for verification of the MLE performance before application to real data is illustrated. We conducted Monte Carlo simulation studies to assess on the finite sample behavior of the maximum likelihood estimators of r, , , .
   All results were obtained from 1000 replications ( T 1000 = ) and the simulations were carried out using the statistical software package R. In each replication a random sample of size n  Table 1, and we notice that the RMSE values of the maximum likelihood estimators of r, ,  and  decay toward zero as the sample size increases, as expected. The simulation study of parameter estimation is illustrated for verification of the MLE performance to estimate the proposed distribution. The results show that the maximum likelihood estimators give the parameter estimates close to the parameter when the sample is large ( n 200 = )

Application to real data sets
We provide the application of the NB-NGL distribution to show its importance for count data analysis by considering three real data sets. These data sets are shown to fit by distributions, such as the Poisson, NB, NB-E, NB-G, NB-L and NB-NGL distributions. The goodness-of-fit of the Kolmogorov-Smirnov (K-S) is used to decide if a sample comes from a population with a specific distribution. For a discrete distribution, we used ks.test function in the dgof package in R to find the value of the K-S statistics ( Table 1. Based on the KS test in Table 2 indicates that the NB-NGL distribution is a strong competitor to others considered for fitting data set (see Figure 2 (a)).

Data set II:
The data set corresponds to an uncensored data set from Sankaran (1970) on the number of the mistakes in copying groups of random digits; it was used for illustrating the distribution of the Poisson, NB, NB-E, NB-G, NB-L and NB-NGL distributions. The statistic value of the KS test in Table 3 indicates that the NB-NGL distribution is a strong competitor to the others considered for fitting data set (see Figure 2 (b)).

Conclusions
In this paper, we proposed a new four-parameter distribution called the negative binomial-new generalized Lindley (NB-NGL) distribution that mixes the negative binomial distribution (Greenwood and Yule, 1920) and new generalized Lindley distribution (Elbatal et al., 2013). Its moments and order statistics density function are introduced. The negative binomial-Lindley (Zamani and Ismail, 2010), negative binomial-gamma (Gençtürk and Yiğiter, 2016), and negative binomial-exponential (Panger and Willmot, 1981) distributions are special cases of the NB-NGL distribution. The maximum likelihood estimation is used to estimate the unknown parameters of the proposed distribution. The results of the simulation study show that the maximum likelihood estimators give the parameter estimates close to the parameter when the sample is large. Finally, these real data sets are used to illustrate the fitting distribution by using the proposed distribution. The statistic value of the KS test indicates that the NB-NGL distribution is a strong competitor to the others considered for fitting data sets. I expect that the NB-NGL distribution will be a flexible alternative for count data analysis.