A competing risks cure frailty model: An application to relapse-free survival of breast cancer patients

If there is unobserved heterogeneity among susceptible patients in competing risks cure models, applying the methods that do not consider this heterogeneity may lead to invalid results. Therefore, this study aimed to introduce a model to cover the above properties of survival studies. We introduced a unified model by combining a parametric mixture cure gamma frailty model and vertical modeling of competing risks. We obtained estimates of parameters by an iterative method and Laplace transform technique. Then, we calculated the cumulative incidence functions (CIFs) and related confidence bounds using a bootstrap approach. We conducted an extensive simulation study to evaluate the performance of the proposed model. The results of the simulation study showed the superior performance of our proposed competing risks cure frailty model. Finally, we applied the proposed method to analyze a real dataset of breast cancer patients.


Introduction
Survival modeling is a tool in biostatistics and epidemiology that can help one to predict the risk of disease. In medical studies, identifying the risk of failure for a given disease is remarkable for patients as well as physicians; this would benefit patients to choose the lifestyle, and it would be guidelines to physicians selecting a specific treatment approach. The need to know these risk factors requires the analysis of outcomes to be specific to that disease. Still, the prior occurrence of different types of events may change the probability of observing the event of interest. From the statistical perspective, the analysis is incorporated in the competing risks framework in this situation, and the usual techniques may give distorted results (Pepe and Mori, 1993). For modeling competing risks, several approaches such as the cause-specific model by Prentice et al. (1978), the Larson and Dinse's model by Larson and Dinse (1985), the Fine and Gray's model by Fine and Gray (1999), and the vertical modeling by Nicolaie, van Houwelingen, and Putter (2010) have been introduced. Competing risks models usually assume that all patients are susceptible; if there is a sufficient follow-up, all patients will experience any of the possible outcomes (Pintilie, 2011). However, due to advances in the early detection of diseases and their treatment, some patients may be long-term event-free survivors (cured fraction). When it is confirmed that there is a cured fraction in the study population, typical survival models lead to overestimating the survival of the susceptible subjects, and deployment cure models are recommended (Corbière et al., 2009). A popular and well-known class of cure models is the mixture cure model (Berkson and Gage, 1952;Boag, 1949;Farewell, 1982; Kuk and Chen, 1992). In a mixture cure model, the patients consist of two parts, including susceptible patients (uncured patients) and long-term survivor patients (cured patients). The relation between population survival time (S(t)) and the survival time of susceptible (uncured) patients (Su(t)) is defined as follows ( ) = . ( ) + 1 − .
(1) In the above equation, 1-π is the probability of being cured, and it is evident that the survival probability of cured patients is embedded as one. In equation (1), ( ) is referred to as the latency and is also referred to the incidence part of the mixture model. Modeling the competing risks in the presence of cured fractions has not yet developed very well. Gee (2004) extended a cause-specific competing risks model that a subpopulation of patients could be cured of the event of interest. He introduced two models based on exponential distribution and proportional hazards and employed expectationmaximization (EM) algorithm and weighted risk set approaches to estimate parameters in a semi-parametric model. Basu and Tiwari (2010) unified the mixture cure and competing risks models in a Bayesian framework. Several authors extended the competing risks model of Larson and Dinse (1985) to a competing risks cure model. It is noteworthy that in the approach of Larson and Dinse, the joint distribution of time of failure (T) and failure type (D) decomposes as: ( , ) = ( | ). ( ).
(2) For example, Cai (2013) presented a new estimation method for a semi-parametric mixture cure model with competing risks data. A disadvantage of the proposed approach of Larson and Dines (1985) is from an interpretational point of view. In this approach, the cause of failure is determined from the beginning, but its estimated distribution will depend on the length of follow-up, which is hard to adapt with the implied existence of such a distribution from the outset (Nicolaie, van Houwelingen, and Putter, 2010). Nicolaie, Taylor, and Legrand (2018) combined competing risks and cured fractions in vertical modeling in which the cured fraction meant the individuals who are immune to all causes of failures. In the idea of vertical modeling, a new decomposition of the joint distribution of time (T) and cause of failure (D) is utilized as: ( , ) = ( ). ( | ).
(3) From an interpretation point of view, the decomposition (3) does not have the disadvantage of the decomposition (2). Interestingly, the decomposition (3) consists of two observable quantities: the total hazard and the relative causespecific hazard. In this approach, the marginal event time distribution, P(T), can be estimated considering covariate effects on the total hazard rate using a semiparametric or a parametric regression model for time-to-event data, treating events of any type as failures. On the other hand, some unobserved information exists among individuals in the medical and epidemiological studies, which cannot be explained via observed covariates in survival analysis; so frailty models are recommended in this situation. In a univariate frailty model, the hazard at time t for a person with frailty w is: where W follows a probability distribution with mean equal to one and variance 2 , is a vector of coefficients, X is a vector of covariates with the same dimension of coefficients and 0 ( ) is the baseline hazard function. Patients with a high W value tend to have a high rate of the event; thus, the variance of this distribution is interpretable as a measure of heterogeneity across the population (Wienke, 2010). Hence, developing a competing risks model to include cured fraction and frailty is critical for analyzing a dataset with such properties. In this study, we intend to extend the modeling of competing risks when there are cured fractions as well as heterogeneity due to unobserved covariates. The paper is organized as follows: In Sections 2 to 6, we give details of our method and an extensive simulation scenario. Section 7 presents the application of the proposed model on the real dataset of breast cancer patients. Finally, in Sections 8 and 9, some points for discussion and concluding remarks are made, respectively.

Competing risks-cure-frailty model
In a survival study, consider n patients who may experience one of the K competing events or be right-censored. Furthermore, suppose that a fraction of individuals are long-term survivors (cured fraction) so that they do not experience any of the competing risks events by the end of sufficient follow-up. In these circumstances, patients can belong to any of the two subpopulations, susceptible and non-susceptible groups. Moreover, there may be heterogeneity due to unobserved covariates between susceptible patients so that patients with similar characteristics experience the failure at different times. To handle the above conditions, we want to develop the vertical modeling of competing risks with cured fraction, earlier proposed by Nicolaie, Taylor, and Legrand (2018). The vertical modeling of competing risks with cured fraction includes two parts, namely the incidence and latency part. In the latency part, we determine whether an individual is susceptible, in the sense that she experiences an event finally; in the latency part, conditional on an individual being susceptible, we determine when and which event might occur. We want to extend the vertical modeling approaches to the mixture cure frailty framework for the latency part. Let Y be a binary variable so that Y=1 refers to a susceptible patient and Y=0 refers to a non-susceptible patient. If ̃ denotes the time of the event, D is the type of the event where D=1,2,…,k and C is the right-censoring time, the observed data for an individual i is = ( , Δ , ) where = min (̃, ) is the observed time, ∆ = 1{̃< } is an indicator for the type of event which is equal to zero in the case of censoring and is a vector of covariates measured at baseline for i=1,2,…,n. We assume that data from different patients are independent; also, the time of events and censoring times are independent given Q. Similar to mixture cure models, equation (1), the incidence part is specified by P(Y) so that ( = 1) = , and 1 − represents the fraction of patients who are cured. Following the vertical modeling approach to competing risks, when is a frailty term, the conditional joint distribution ( So we assume that W is conditionally independent of D|T,Y=1 and Y; therefore, equation (5) The population survival function can also be estimated as

Assessing covariates effects on latency and incidence
For assessing the effects of covariates on the incidence, ( ) = ( = 1| ), where ⊆ we postulate a logistic model as A competing risks cure frailty model: An application to relapse-free survival of breast cancer patients 594 where * = (1, ) and denotes a vector of unknown regression parameters.
For assessing the effects of covariates on the latency, we need two models; a model for conditional total hazard and a model for relative cause-specific hazard. For modeling the conditional total hazard, we postulate a Weibull Gamma frailty model as: where, ⊆ , stands for a vector of unknown regression parameters, ℎ 0 * ( | = 1)follows a Weibull distribution with shape parameter k and scale parameter λ and has a gamma distribution with mean equal to one and variance equal to 2 . Furthermore, for the relative cause-specific hazard we specify where, ( ) is an r-vector of pre-specified time functions, ⊆ is a vector of independent variables, and stand for m-vectors of unknown regression parameters and j=1,…,K is jth cause of failure. We denote all regression parameters ( , , , … , , , … , ) by .

Likelihood
For convenience, we omit covariates for a moment. It is worth noting that we assumed is conditionally independent of | , = 1 and Y. The observed likelihood (conditional on frailty term, ) is the product of contributions of patients from two categories: a patient who experiences an event with type at time contributes Therefore, based on the non-informative censoring assumption, the observed likelihood is proportional to In terms of cause-specific hazard and conditional total hazard, the observed likelihood can be written as which can be separated in the following way:

Estimation
We wish to estimate the regression parameters in the incidence, latency, and relative cause-specific hazard parts of the model. In cure models, the EM algorithm is usually used to estimate the parameters (Peng and Dear, 2000; Sy and Taylor, 2000); when the baseline hazard is specified, the Newton-Raphson iterative procedure can be applied (Rondeau et al., 2013). We obtained the asymptotic variance of regression parameters by inverting the observed information matrix, and we used the asymptotic properties of maximum likelihood estimators for inferences. An interesting fact in the likelihood (14) is that the maximum likelihood estimations of the parameters can be obtained by maximizing the 1 ( , , ) and 2 ( , … , , , … , ) independently. Therefore, to estimate the corresponding parameters in 2 ( , … , , , … , ), standard softwares such as the glm function in R can be used. We employed the optim function in R and utilized the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method to estimate the corresponding parameters in 1 ( , , ) as well. By accepting the estimates of parameters, we calculated the CIFs and obtained their confidence intervals using the bootstrapping method. In the bootstrapping method, we re-sampled the original data 200 times and estimated the CIFs in each iteration. We used a quantile-based bootstrap method to compute 95% confidence bands for CIFs.

Simulation
We conducted a simulation study of the competing risks-cure-frailty model to assess the performance of the estimators when the proposed model is correct and compare the competing risks-cure-frailty model with a competing risks cure one (without frailty term). We assumed that there are 500 patients, each of whom can experience event 1 (the event of interest) or event 2 (competing event) or can be subjected to right-censoring. We also assumed a fraction of patients could be cured (which means they are immune to two causes of failure). The study cut-off time is 16 years after inclusion in the study. For simplicity, we only considered two covariates, and , for the incidence and latency parts, respectively. The covariates and were both generated from a Bernoulli distribution with the probability of success equal to 0.5. For each patient ( ), the survival data based on the competing risks-cure-frailty model were generated as follows: The censoring times were generated for each patient from a uniform distribution with a maximum of 16 years.
The parameter values for incidence were specified as ( 0 , 1 ) ∈ {(−0.9 ,2), (1, −1), (1, −2.4)}, leading to three scenarios with various amounts of cure proportions, 25, 50 and 80% for = 1, respectively. The parameter values for latency were specified as = 0.5, = 0.5, = 2, 2 ∈ {0.5,1,1.5}, leading to three scenarios with various heterogeneity amounts. Since the parameters in relative cause-specific hazard were independently estimated from other parameters and we used the standard software (glm function in R) to estimate these parameters, we did not include these parameters in the simulation. We fitted a competing risks-cure-frailty model and a competing risks-cure model to the simulated data. In the competing risks-cure-frailty model, the incidence and latency components of the model were assumed to have the same form as the correct model. In the competing risks-cure model, the incidence part and the relative cause-specific hazard had the same form as the correct model, but for the conditional (on = 1) total hazard, we employed a Weibull model (without frailty term).

Low heterogeneity
In the low level of heterogeneity ( 2 = 0.5 ) and various levels of cured fraction, the bias and MSEs of regression parameters ( ) in the latency component for competing risks cure-frailty model were less than the model without frailty term. Further, the bias of the regression parameter in the incidence part ( 1 ) for the model without frailty term was a little less than the competing risks-cure-frailty model, although its MSE was almost the same ( Table 1). The biases and RMSEs of CIFs in the competing risks-cure-frailty model were less than the model without frailty term in all of the time points, but the difference decreased in the last follow-up times ( Table 2).

Moderate heterogeneity
In the moderate heterogeneity ( 2 = 1) and the low level of cured fraction, the biases and MSEs of regression parameters in the latency and incidence parts of the competing risks-cure-frailty model were less than the model without frailty. However, in the moderate and high levels of cured fraction, the biases and MSEs of the parameters in the incidence part of the competing risks-cure-frailty model were a little more than the model without frailty ( Table  1). The biases and MSEs of regression parameters in the latency part of the competing risks-cure-frailty model were smaller compared to the model without frailty (Table 1). Comparison of the biases and RMSEs of CIFs in the competing risks-cure-frailty model and the model without frailty term indicated that the model with frailty term was more efficient than the model without frailty term in all of the time points. However, the differences were more evident in the early time of follow-up (Table 2).

High heterogeneity
In the high heterogeneity ( 2 = 1.5) and all levels of cured fraction, the biases and MSEs of competing risks-curefrailty model were less than the model without frailty term ( Table 1). The biases and RMSEs of CIFs in the competing risks-cure-frailty model were less than the model without frailty term in all of the time points, and the difference was more evident in the early time of follow-up (Table 2).

Analysis of breast cancer data
The data resource of this paper was part of a historical cohort study that included 550 breast cancer patients who had been referred to Qaem or Metastatic breast cancer patients were excluded from the study. According to the conditions, all patients received adjuvant therapies, including radiotherapy, chemotherapy, and hormone therapy. The start time of the study was the date of operation and the end time was the loco-regional relapse or distant metastasis or the last confirmed date of breast cancer disease-free status. The endpoint of interest was the time from operation to loco-regional relapse (cause 1) in the presence of a competing cause, that is, distant metastasis (cause 2). Additionally, some prognostic medical factors, including hormone receptor status, tumor size, and lymph node status (the number and location of lymph nodes with cancer), were gathered. The tumor size and lymph node status were reported based on the American Joint Committee on Cancer classification (Cancer, 2002). These prognostic factors are reported in Table 3. The maximum follow-up time was 15.6 years. Out of 550 patients, 171 individuals (30.9%) experienced loco-regional or distant metastasis. In the Kaplan-Meier plot of disease-free survival, the curve reaches a plateau after ten years, indicating the presence of cured fraction (Figure 1). Also, the central assumptions of cure models, that is sufficient follow-up time and the presence of cured fraction, were tested and confirmed by Maller and Zhou methods (Maller and Zhou, 1996). Owing to the properties of the data, the presence of cured fraction and competing risk (distant metastasis), and for evaluation of heterogeneity due to unobserved covariates, we used the introduced model in section 2.2, and a similar model without frailty term. We fitted these models without selecting variables; the results of fitting these models are presented in Table 4. Based on a likelihood ratio test, the model with a frailty term was better than the model without frailty (χ 2 = 13.43, p < 0.001). Further, the Akaike information criterion (AIC) in the frailty model and the model without frailty were 1373.83 and 1385.26, respectively, which confirmed that the frailty model was more suitable than the other. An interesting point in the compared results of these models is that the effect of tumor stage in the latency part was significant in the model with frailty but was not significant in the model without frailty (Table 4). According to the competing risks cure-frailty model in the incidence part of the model, the N-stage and T-stage significantly affected being susceptible (uncured). However, the hormone receptor did not have any significant effect (Table 4). After estimating the parameters of the competing risks-cure-frailty model, the CIFs with bootstrapping confidence intervals were plotted for evaluating the cause-specific cumulative incidence of time to loco-regional relapse. For example, in Figure 2, we compared the cumulative incidence of time to loco-regional relapse with the cumulative incidence of time to distant relapse in patients with positive hormone receptors, tumor size more than 5 cm, and more than 4 lymph nodes metastases. Table 5 shows the estimated relative cause-specific hazard implied by the fitted model with associated standard errors for patients with tumor size more than 5 cm, more than 4 lymph nodes metastases, and positive hormone receptors. The dominating cause of failure was distant metastasis, and the probability of relapse in the first quartile was near twice the probability in other quartiles.  Figure 2: The estimated cumulative incidences of time to loco-regional relapse and time to distant relapse based on competing risks cure frailty model for patients with the tumor size more than 5 cm, more than three lymph nodes metastases and positive hormone receptors, with the 95% confidence intervals.

Discussion
Based on our search, no study has combined competing risks models with cure models and univariate frailty models; although few articles have been published regarding the combination of competing risks models with cure models (Choi, Huang, and Cormier, 2015; Eloranta et al., 2014; Nicolaie, Taylor, and Legrand, 2018). The competing riskscure-frailty model is suitable for conditions in which there is clinical evidence of a cured fraction (non-susceptible patients) in a cohort. Also, there is heterogeneity among susceptible patients, which causes some individuals to experience failure faster than other patients with similar prognostic factors. Furthermore, the introduced model can estimate the cumulative incidence of time to a specific cause of failure in the presence of competing risks. Therefore, in the competing risks-cure-frailty model, besides the test of heterogeneity within susceptible patients, we can also infer about the effect of covariates on the chance of cure and the risk of failure, regardless of the cause of failure. In addition, we can set parameters on the relative position of each failure type among all failure types. An appealing technical characteristic of our model that has been adapted from the vertical modeling of competing risks is that the parameters of the two components in likelihood (14) can be estimated independently (Nicolaie, van Houwelingen, and Putter, 2015; Nicolaie, van Houwelingen and Putter, 2010). So, the standard packages such as glm in R can be used for relative cause-specific hazards. For maximizing the other part of the likelihood function (14), the optim function in R can be employed (R Team, 2008). In this study, we used individual frailty, but it is possible to extend this model by using shared frailty for recurrent events or cluster situations. Because vertical modeling can be naturally employed for complex competing risks data such as competing risks with the missing cause of failure, the competing risks-curefrailty model can be extended similarly for masked competing risks (Nicolaie, van Houwelingen, and Putter, 2015).
In this study, we used a Weibull Gamma frailty model, but other parametric or semi-parametric models can be employed for different real data. Assessment of the real data of Iranian female breast cancer patients in this study showed the necessity to consider competing risk, cured fraction, and heterogeneity in modeling. The introduced competing risks-cure-frailty model covered these complex situations as well. Based on AIC, it was necessary to include a frailty term in the model. Compared to the study of Ghavami et al., 2017, we reported the results of both the competing risks cure model and competing risks cure frailty model to emphasize the importance of examining heterogeneity. We can see the effect of the tumor stage on the time of failure (loco-regional relapse or metastasis) in susceptible patients was not significant in the competing risks cure model while it is significant in the competing risks cure frailty model. It is closer to many studies that have demonstrated that the tumor stage is a prognostic factor for disease-free survival (Forse et al., 2013;Rondeau et al., 2013). In our study, the increased positive lymph nodes raised the hazard of failure for susceptible patients and decreased the chance of cure, which agrees with another study in Japan (Asano et al., 2013). An impressive result in our study was that the patients with positive hormone receptors had a significant effect on short-term disease-free survival but did not have any significant effect on long-term diseasefree survival. It is in accordance with medical findings in some studies that have shown that the hazard of recurrence in breast cancer patients with negative hormone receptor is more than patients with positive hormone receptor tumors in the first years after diagnosis, but this difference begins to lower after this time and finally disappears (Bentzon et al., 2008; Moffat, 2014).

Conclusion
In a survival study with competing risks and cured fraction, heterogeneity due to unobserved covariates between susceptible patients can destroy the estimates of parameters in the latency part of the cure model and the CIFs. In this complex situation, the competing risks-cure-frailty model introduced in this study can be recommended for assessing homogeneity in the susceptible population. This model can estimate the rate of cure and determine the prognostic factors that influence short-term and long-term survival. Additionally, the proposed model can estimate the causespecific cumulative incidences.