Nadarajah-Haghighi Model for Survival Data With Long Term Survivors in the Presence of Right Censored Data

In this paper, a new long term survival model called Nadarajah-Haghighi model for survival data with long term survivors was proposed. The model is used in fitting data where the population of interest is a mixture of individuals that are susceptible to the event of interest and individuals that are not susceptible to the event of interest. The statistical properties of the proposed model including quantile function, moments, mean and variance were provided. Maximum likelihood estimation procedure was used to estimate the parameters of the model assuming right censoring. Furthermore, Bayesian method of estimation was also employed in estimating the parameters of the model assuming right censoring. Simulations study was performed in order to ascertain the performances of the MLE estimators. Random samples of different sample sizes were generated from the model with some arbitrary values for the parameters for 5%, 1.3% and 1.5% cure fraction values. Bias, standard error and mean square error were used as discrimination criterions. Additionally, we compared the performance of the proposed model with some competing models. The results of the applications indicates that the proposed model is more efficient than the models compared with. Finally, we fitted some models considering type of treatment as a covariate. It was observed that the covariatee have effect on the shape parameter of the proposed model.


Introduction
The mixture cure rate model also known as the standard cure rate model assumes that the population of study is a mixture of two individuals: susceptible individuals who experience the event of interest and non-susceptible (also known as cured or immune) that will never experience the event of interest. Hence, the population of studies consists of individuals that are at risk and individuals that are not at risk (cured group) with respect to the event of interest. To model the proportion of non-susceptible, different approaches; parametric, semi-parametric and non-parametric have been used by different researchers. Interested readers can refer to the work of Boag (1949), Berkson and Gage (1952), Farewell (1986), Meeker (1987), Gamel et al. (1990), Cantor and Shuster (1992), Ng and McLachlan (1998) Kutal and Qian (2018) to mention but a few.
According to Kannan et al. (2010), cure rate models have applications in many areas such as health, reliability, criminology and many more areas. For instance, the model have been applied in reliability by Nelsen (2007) while in criminology, a study on recidivism times of prisoners release from prison in western Australia have been reported by Maller and Zhou (1996).
In the present article, we introduced a cure fraction model for survival data based on the Nadarajah-Haghighi distribution in the presence of cure fraction, censoring and covariate. Properties of the model were studied and an application of the model with leukaemia data of Kersey et al. (1987) was provided. The rest of the paper is organized as follows: In section 2, we describe the Nadarajah-Haghighi distribution and provide the survival function, probability density function and cumulative distribution function of the model. Section 3 introduces the Maximum Likelihood Estimation and Bayesian methods of estimation in estimating the parameters of the model assuming right censoring. In section 4, statistical properties of the model such as quantile function and moments of the model were provided. Simulation studies and applications of the model were provided in sections 5 and 6 respectively and finally, we conclude in section 7.

The Nadarajah-Haghighi Distribution
The Nadarajah-Haghighi distribution was introduced by Nadarajah and Haghighi (2011), as an extension of the exponential distribution. The distribution serves as an alternative to the Weibull, generalized exponential and gamma distributions in modeling real life scenarios. Nadarajah-Haghighi exponential distribution reduces to the exponential distribution when the shape parameter takes the value one. The various shapes of the pdf and hazard rate function of the distribution for some selected values can be depicted in figures 1 and 2 respectively. The density of this distribution can have decreasing and unimodal shapes while the hazard rate has a decreasing shape, increasing shape and constant. Hence, the shape of the hazard rate function of the NH distribution is similar to that of the Generalized exponential distribution. A random variable t with parameters β and λ is said to follow the Nadarajah-Haghighi exponential distribution if its pdf, cdf, survival function and hazard rate function are respectively given by: and where β > 0 is the scale parameter and λ > 0 is the shape parameter.

The Nagarajah-Haghighi Mixture Cure Rate Model
The mixture cure rate model also referred to as the standard cure rate model was first proposed by Boag (1949). The model assumes that a certain portion p in the population is cured and the remaining portion 1 − p is uncured. Hence, the model consists of two components: a component representing the proportion of non-susceptible in the population and a distribution representing the survival experience of the susceptible group. The survival function for the entire population of study is expressed as: where 0 < p < 1 is the mixing parameter and S u (t) is the survival function for the uncured group.  (5), the corresponding probability density function (pdf ) is given by: where f u (t) is the pdf for the uncured individuals. From equations (2) and (3), the survival function, pdf and distribution function of the Nadarajah-Haghighi mixture cure rate model are respectively given by: and where β > 0 is the scale parameter, λ > 0 is the shape parameter and p is the proportion of uncured individuals.

Estimation Procedure
In this section, classical and non-classical methods of estimation were used in estimating the parameters of the Nadarajah-Haghighi mixture cure rate (NHMCR) model.

Maximum Likelihood Estimation Method
Consider a random sample of lifetimes (t i , δ i , i = 1, 2, · · · , n) under the assumption of right censoring, the likelihood function of (t i , δ i , i = 1, 2, · · · , n) is given by: where Θ = (β, λp), δ i is a censoring indicator, δ i = 1 if the observed lifetime t i is uncensored and δ i = 0 if the observed lifetime is censored. Substituting equations (7) and (8) in equation (10) and taking natural logarithm gives the observed full log-likelihood function as: To obtain the score function, we differentiate the log-likelihood function partially with respect to β, λ and p. This gives: and where If the log-likelihood function has a global maximizer, then the M LE is the solution of the equations in (12), (13) and (14). However, the system of equations in (12) to (14) are non-linear, hence, numerical methods such as Newton-Raphson method are used in obtaining the M LE. To obtain interval estimates and test for hypothesis, the observed information matrix J (Φ) is obtain. The elements of the observed information matrix J (Φ) are given by: where the diagonal elements of J (Φ) −1 are the variances of the parameters λ, β and p respectively, while the off diagonal elements are the covariances. The elements of J (Φ) are: It is important to note that, the asymptotic distribution of where J Φ is the total observed information matrix evaluated at Φ. The asymptotic 100 (1 − γ) % confidence interval for the parameters β, λ and p are β ± Z γ In the presence of a vector of covariates x = (x 1 , x 2 , · · · , x m ) affecting the parameters β, λ and p, we assumed the regression model For instance, assuming one covariate x i for i = 1, 2, · · · , n affecting the parameter β, λ and p, the link function for the parameters becomes

Bayesian Method of Estimation
Consider the NHMCR model proposed in sections 2.2, let Θ be the vector of unknown parameters. The joint distribution of the model parameters is obtain by combining the joint prior distribution of the parameters and the likelihood function. A gamma prior for the shape and scale parameters of the proposed models are assumed, where Gamma (r, q) denotes gamma distribution with mean r q and variance r q 2 , r and q are hyper-parameters. On the other hand, the proportion of cure parameter is assumed to follow the beta prior. That is, p ∼ Beta(s, v), where s and v are hyper-parameters. To be specific, p ∼ Beta(1, 1) since 0 < p < 1. We further assume independent prior among the parameters of the model.
In the presence of m covariates x = (x 1 , x 2 , · · · , x m ) affecting the parameters of the proposed models, a link function for the parameters β, λ and p is assumed. That is, for β, λ and p respectively. To be specific, assume the NHMCR model with scale parameter β, shape parameter λ and a proportion of cure p are affected by the presence of a covariate x i for i = 1, 2, · · · , n, then the link function are assumed for the scale parameter β, shape parameter λ and proportion of cure p respectively. Inferences considering covariate effect are obtain by replacing β, λ and p with these link functions. Furthermore, normal prior distribution will be assumed for the effect of covariates. That is, N µ, σ 2 will be assumed, where N µ, σ 2 is normal distribution with parameters µ and σ.

Statistical Properties of the Model
In this section, some statistical properties such as quantile function, median and moments of the NHMCR model were discussed.

Quantile Function and Simulations
In simulation studies, quantile function is used in obtaining random realizations from a given model. The quantile function of the NHMCR model is: where u is a random number generated from uniform distribution with parameters zero and one. That is u ∼ U (0, 1).
The quantile function can be use in obtaining the first, second and third quantiles of the NHMCR model. This is done by letting u = 1 4 , 1 2 and 3 4 respectively for the first, second and third quantiles. For instance, to obtain the median, let u = 1 2 , we obtain the median of the NHMCR model as: To simulate a random sample of size n from the NHMCR model with right censored data, the following algorithm is followed: 1. Generate a random sample u i , i = 1, 2, · · · , n from U (0, 1).

For a cure fraction p, return
3. Generate a sample of the censoring times c i , i = 1, 2, · · · , n from the NH distribution.
n} are realizations from the NHMCR model with right censoring.

Moments of the Model
Let T be a random variable that follows the NHMCR with pdf given by (8). The rth moment about the origin of the NHMCR model is obtain as follows: substituting m = (1 + βt) λ the integral can be written as: is the complementary incomplete gamma function. Hence, the first four moments of the model are: Hence, the variance of the NHMCR model is obtain by σ 2 = E T 2 − (E (T )) 2 which is given by:

Simulation Study
In this section, simulation studies was conducted in order to ascertain the performance of the maximum likelihood estimator of Θ (where Θ = (β, λ, p) ) discussed in section 3. The algorithm discussed in section 4.1 was used in generating right censored survival times.
We generated random samples of size n = 100, 200, 300, 400 and 500 from the NHMCR model with some arbitrary parameters β = 1.5, λ = 2.0 and β = 2.5, λ = 2.0 with different values for cure fraction. On the other hand, the censoring variable follows the NH distribution. Inference results from the simulation study were based on 1000 replications. These results were obtained under the MLE estimators. Performance measures such as bias, standard error and mean square error were employed so as to ascertain the performances of the estimates.   Tables 1 to 3 gives the simulation results based on 1000 replications for each parameter setting. The Table gives the mean estimate of the parameters together with the bias, standard error and mean square error (MSE). It is observed that, the estimates are satisfactory closed to the true parameter values in all the different simulation settings. It is also observed that the estimates of bias, standard error and mean square error (MSE) decreases as sample size increases for all the different settings of all examined parameter values. Hence, the results reveal that the proposed model has a good performance.

Real data Analysis
In this section, a data set consisting of 90 observations that was provided by Kersey et al. (1987) and was analyzed by Shao and Zhou (2004), Coelho-Barros et al. (2017) and Kutal and Qian (2018) were use to fit the mixture cure rate model studied in previous sections and compared its performance with that of some competing mixture models. The data recorded the times to recurrence of leukaemia for patients after one of allogeneic transplant or autologous transplant. Forty-six (46) patients were treated by allogeneic transplant while the remaining forty-four (44) were treated by autologous transplant. However, there are thirty-three (33) and thirty-five patients treated by allogeneic transplant and autologous transplant respectively that suffered a recurrence of leukaemia at different ranges of time. It is also observed that thirteen(13) and nine(9) patients from these respective groups have no record of recurrence, that is they are censored. Maximum Likelihood method of estimation and Bayesian method of estimation discussed in section 3 were used to analyzed this data. The data is fitted in three categories: Models without cure fraction assuming right censoring, models assuming cure Nadarajah  Then, the NHMCR model was fitted and compared with the fits of Weibull (WMCR), Generalized Gompertz (GGMCR) and Rayleigh (RMCR) mixture cure rate models. Likelihood ratio test was then performed to ascertain the importance of the addition of the cure fraction parameter.

Maximum Likelihood Result
In this section, the results of the fitted models under MLE procedure is discussed. Different performance measures such as Akaike information criteria (AIC), Bayesian information criteria (BIC) and Consistent Akaike information criteria (CAIC) were used in finding the best fitted model. The model with the least statistic values of these criteria is considered to be the best model. Tables 4 and 5 gives the MLE results of the models without cure fraction and the models with cure fraction respectively. These tables also contain the AIC, BIC and CAIC values of the fitted models. These results reveal that, all the estimates of the models with cure fraction fits the data more efficient than the models without cure fraction.  Hence, the mixture cure rate models are better than the models without cure fraction parameter. This can clearly be seen in figures 3 and 4. Likelihood ratio test was conducted to ascertain the significance of the addition of the cure  On the other hand, comparing between the fitted mixture cure rate models (NHMCR, WMCR, GGMCR and RMCR), we conclude that the proposed NHMCR model is more efficient than WMCR, GGMCR and RMCR models since it has the least values for AIC, BIC and CAIC.

Bayesian Result
In this section, Bayesian method of estimation was used in fitting the model proposed in the previous section and compared its performance with that of the aforementioned models in section 6.1. In analyzing the data, the prior distribution for the shape and scale parameters were assumed to be Gamma(1, 1) while the prior distribution for the cure fraction parameter was assumed to be Beta (1,1). Additionally, in the presence of covariates, a normal prior was assumed (N(0, 100)) for the covariate effects. To compare between the fitted models under the Bayesian frame work, deviance information criteria (DIC) and expected Bayesian information criteria ((EBIC) were used.  Table 7 gives the fits of NH, GG, W and R distributions without cure fraction. The results gives the posterior mean, standard deviation (SD), 95% credible interval and the statistics: DIC and EBIC. Figure 5 gives the Kaplan-Meier curve together with that of NH, W, GG and R distributions. Similar to the fits in figure 3, these fits does not properly mimic that of the empirical survival function. Additionally, the presence of plateau close to 0.3 suggest cure fraction models as an alternatives to these fitted models. Table 8 gives the posterior summaries of the NHMCR, GGMCR, WMCR and RMCR models. It can easily be seen that the cure rate models fits the data better than the models without cure fraction. It is also observed that our proposed NHMCR model is a very strong competitor since it has the least values of DIC and EBIC. This can also be observed from the Kaplan-Meier fits together with that of the fitted mixture models as shown in figure 6.

Covariate Models
In

Conclusion
The presence of cure fraction and covariates usually occur in lifetime data analysis especially medical applications. To analyze this type of data, the cure fraction model is used. This paper proposed a mixture cure fraction model with NH susceptible distribution for right censored data. Results of simulation study shows that, the proposed model has a good performance. Moreover, a real data was used in comparing the performance of the proposed model with WMCR, GGMCR and RMCR models. It was found that the proposed model fits the data more efficient than the aforementioned models. Finally, we analyzed the data assuming type of treatment as a covariate. Six different models were fitted and it was found that model II was better fitted to the data under the MLE and Bayesian method of estimations.