Regression Modeling of Competing Risks Survival Data in the Presence of Covariates Based on a Generalized Weibull Distribution: A Simulation Study

In survival analysis or medical studies each person can be exposed to more than one type of outcomes which occurrence of one of them prevents the other outcomes' occurrence; this situation is called the competing risks. Assessing the effect of covariates on the survival time (or failure time) is one of the purposes in competing risks analysis. In this paper, we study a competing risks model in the presence of covariates when the causes of failures follow generalized Weibull distributions. Covariates are entered to the model through the scale parameter of this distribution. Also in this study the competing risks are considered to be independent. Parameter estimation has been done by the maximum likelihood approach , in a real data set and a simulation study has shown the advantages of proposed model.


Introduction
In many survival studies, patients can be exposed to more than one outcome. For example, if we are interested in the analysis of data until death occurrence due to heart failure (the outcome of interest), some patients in the study may die due to causes other than heart failure (the competing event). These different causes of death are called competing risks (see Putter et al. (2007) and Pintilie (2007)). Indeed, the competing events are those which prevent the observation of main outcome or change the probability of its occurrence. The main feature of competing risks data is the presence of failure type according to event type in addition to failure time or survival time (see Porta Bleda et al. (2007)).
Modeling competing risks survival data can be carried out using a semi-parametric, nonparametric or parametric survival models (see Fine and Gray (1999)). The parametric models are studied assuming that the competing risks follow different lifetime distributions such as exponential, Lognormal and Weibull (see Sarhan (2007), Cox (1959), Pascual (2010), Yáñez et al. (2014), Iskandar and Gondokaryono (2016)). One of the advantages of using the parametric approaches rather non-parametric and semiparametric approaches is as follows: when the parametric model has been chosen correctly, it is possible to predict the event occurrence probability in future and have a clear picture of survival time and hazard function. Also as the survival pattern follows a special parametric model, the acquired estimates are more accurate than non-or semiparametric approaches (see Jeong (2006)).
One of the distributions widely used in cancer research is the Weibull distribution (see Martinez et al. (2013)). However, only monotonically increasing and decreasing hazard functions can be generated from the classic two-parameter Weibull distribution. Mudholkar et al. (1996) and Mudholkar and Srivastava (1993) introduced the generalized Weibull model which includes an additional form parameter. Probability density, survival, and hazard functions of this distribution have a closed form and it is flexible so it can produce different forms of hazards such as bathtub and unimodel hazards.
When the failure pattern is analyzed considering a special cause, the researchers may care the assessment of covariate effects on failure probability of a special cause. Gray (1988) introduced a non-parametric inference for analyzing the competing risks data also (see Prentice et al. (1978)). Fine and Gray used the Cox proportional hazard model for competing risks data and presented inferences for evaluating the treatment influence and other covariates (see Kleinbaum and Klein (1996)). Also many other studies used the  Weibull model in order to analyze a set of real data but their analyses were based on main parameters of generalized Weibull model and they did not enter the covariates into the model. however, as we observed in the literature, the generalized weibull distribution has not been considered for modeling the competing risks data in presence of covariates.
The aim of this paper is to study a competing risks data in presence of covariates using generalized weibull distribution. The paper is organized as follows: in section 2 we will explain the formulation of the likelihood functions in presence of covariates based on the generalized weibull distribution. Then in section 3, based on a simulation study, we will assess the accuracy of estimates acquired from generalized Weibull model. In section 4 we use a competing risk real data set (medical data set) with covariates to test the generalized Weibull distribution against the Weibull distribution for fitting the data in this practical situation. Section 5 presents discussion and conclusions.

Model Formulation
Generalized Weibull distribution including third parameter γ, in addition to two parameters of ( , β), possesses enough flexibility and can cover different forms of hazard function. This is an important characteristic of generalized weibull distribution that differs from the two-parameter Weibull distribution. Assumed that the random variable T has the generalized Weibull distribution, then the probability density function of this random variable with three parameters (α,β,γ) is as follows: Here, and are the shape parameters, while is the scale parameter of generalized weibull distribution (GWD).
Weibull distribution can be considered as a special case of this distribution, because when γ=1, this distribution is changed to be Weibull distribution with two parameters ( , β). Also generalized exponential distribution is a special case of this distribution when β=1.
In analyzing the competing risks data for each person there exists one type of failure (type of event:) in addition to failure time(survival time). The failure time (T) is assumed to be a continuous and positive random variable, while the failure cause (k) takes values in the finite set (k ≥2) j=1,…,k. Also we assume that there is just one failure time and just one cause of failure, and competing risks are independent. Here the survival time for each person is defined as follows (see Gray (1988) and Beyersmann et al. (2011)): We assume that survival time of each competing risks has generalized Weibull distribution. The probability density function for each of the competing risks (the cause of failure of j) is defined as follows: The hazard function is and the survival function is Which in the above equations > 0 , > 0 , > 0 , ≥ 0 and j=1,2,…,k(k≥2).
In order to assess the covariate effects on the survival time, we define the scale parameter j in the form of a linear combination of covariate as follows: Thus the parameters vector is ∅ = (∅ 1 , ∅ 2 , . . , ∅ ) , ∅ ( 0 , 1 , … , , , ) which is the vector of generalized Weibull distribution parameters for j cause of failure.
Preserving the totality of the subject matter, we can estimate the regression coefficients for the two causes of failure (j=1, 2) using likelihood function as follows, see Jeong and Fine (2006): Where ∅ 1 = ( 1 , 1 , 1 ) and ∅ 2 = ( 2 , 2 , 2 ), and the vector of ( 1 , 2 ) considers the values (0,1), (1,0), and (0,0) for the observations of the cause of first failure, the cause of second failure and the observation which does not experience no event and are censored respectively.
When the survival time of each cause follows the generalized Weibull distribution, the likelihood function is defined as follows: In the equation above 1 and 2 will be defined as the linear combination of independent variables. 1 = ( 01 + 11 11 + ⋯ + 1 1 ) , 2 = ( 02 + 12 12 Therefore, the log-likelihood function can be derived as in the following form: Then in order to acquire the parameters estimates, the first derivation has been calculated from the first likelihood function toward each of the parameters. As there are no closed form solutions for these equations, the numerical method technique such as Newton-Raphson method, is used for computing the MLE of the parameters.
Also the variance of each of the parameters estimated is equal to the inverse of the observed Fisher information matrix, that is, the inverse of the matrix of second derivatives of the log-likelihood function locally at ̂,̂ and ̂ .

A Simulation Study
A simulation study was done in order to assess the accuracy of the estimated parameters from generalized Weibull model and to compare their performance with the Weibull model under different sample sizes. The simulation program has been written in the software R. For the amount of different samples ( n=200, n=500 , n=1000 and n=2000) in 500 repetition ,first it is the time to each of the two competing events ( j =1 and j =2) was assumed to depend on one covariate X and to define the scale parameter j in the form of a linear combination of this covariate.
thus an covariate X from the normal distribution with the mean 0.6 and the standard deviation 0.1 was produced and then 1 = exp( 0 + 1 ) and 2 = exp( 2 + 3 ) were calculated in which 0 = 0, 1 = 1.2, 2 = 0.1, and 3 = 0.5. Also the values of where u is a random variable that has a exponential distribution with parameter 1[u~ Exp(1)] .Then the root of the following equation which is the same survival time (t) can be calculated and it was obtained with the "uniroot" R function. (12) 2-For the time value which is simulated in step1, we can determine the functions ℎ 1 ( ), ℎ 2 ( ) as functions of time which are dependent on covariate (X). for cause of failure 1(event type 1), in this way if the produced random number equals 1, the first cause (j=1), otherwise the second cause of failure (j=2) will be determined.

4-
For the production of times of censor, if the survival time is greater than 2, the acquired time is considered as censoring times.
Thus, we can fit the two models of generalized Weibull and Weibull on the simulated data. The maximum likelihood estimates of the parameter vectors were calculated by "optim" R function and by applying the Newton-Raphson algorithm. The simulation results for scenarios 1 and scenarios 2 are shown in table1 and table2 , respectively. The average estimates, MSE (mean of square error) and (SD) standard deviation acquired from the simulation of generalized Weibull model are shown in tables 1 and 2.
Tables1 shows, for all parameters ( 0 , 1 , 1 , 1 , 2 , 3 , 2 , 2) as it can be seen the SD and MSE of the estimates have been decreased when the sample size increases. Also the average estimates is very close to the true values which show the generalized Weibull model as a good one in estimating the parameters. In The results of the second simulation are summarized in Table 2. The average estimates, the SD and the MSE of the parameters estimates were obtained with the Weibull model for the simulated data. As it be seen, MSE of the estimates is decreased when the sample size increases except in the case where 1 and 2.

A Real Data Set Example
In this section the generalized Weibull model has been applied on a real data set and also it has been compared with Weibull model. colorectal cancer patients that have been registered that have registered in the cancer registry center of Research institute of Gastroenterology and Liver Disease, Shahid Beheshti University of Medical Sciences, Tehran, Iran from 2004 to 2013 were used in this study and their survival situation was identified.
For colorectal cancer data, the first cause of failure (death due to colorectal cancer, j=1) considered as the main event in survival model, and the second cause(death due to other events, j=2) considered as the competing risks. Also the patients who were alive till the end of the study had been considered as the right censoring. We assessed the effects of sex(x1), age at diagnosis(x2), family history (x3), and BMI (x4) on the survival time have been studied. We use the generalized Weibull distribution against the Weibull distribution for fitting the data in the Presence of these covariates.
The estimate of parameters was done through the maximum likelihood approach using the software R; and as there are no closed form solutions for the equations, the numerical Newton-Raphson approach was applied. In this study, in order to compare (Akaike, 1983) generalized Weibull competing risks model and Weibull model the Akaike Information Criterion (AIC) has been used.
From 388 patients, it was observed censored time for 271 patients (69.8%), death due to colorectal cancer for 104 patients (26.8%), and death due to other events (j=2) for 13 patients (3.4%). The maximum likelihood estimate and standard deviation of each parameters of the model are presented in table 3. According to the results of table 3 in generalized Weibull competing risks model, all the variables exist in the model have an impact on colorectal cancer mortality. While in the Weibull competing risks, just the effect of sex was statistically significant for mortality of colorectal cancer.
According to the results of table 3 the AIC of generalized Weibull is lesser than the Weibull model and thus the generalized Weibull model is more suitable for this data set. Moreover, based on the likelihood ratio test statistics (LR), the efficiency of competing risk generalized Weibull model for the real data set of colorectal cancer was evaluated. Following this, the null hypothesis and the alternative hypothesis are defined as follows: 0 : 1 = 1, 2 = 1 , Competing risks with Weibull distributions (W) 1 : 1 ≠ 1, 2 ≠ 1 , Competing risks with generalized Weibull distributions(GW) Under the null hypothesis, The value = 2 − 2 = 26.08 has a chi-square distribution with two degrees of freedom and the p-value is less than 0.001. Here, , and are the log-likelihood functions under 0 , and 1 , respectively. Hence, based on the values of , and p-value, the generalized weibull distribution fits better than the weibull distribution for these data.

Discussion and Conclusion
In this paper, we extend Sarhan and Wahed approach that is the parametric approach to the analysis of competing risks data based on generalized Weibull model. A set of real data and simulation study were used for illustrative purposes.
Our simulation study showed that the estimate of generalized Weibull model parameters are very close to real values and MSE and SD of model parameter drop as the sample size has been increased. The analysis of real data of colorectal cancer showed that based on the AIC s, generalized Weibull distribution is more suitable than Weibull distribution for this set of data and the (se) of estimated data of generalized Weibull model is lesser than the Weibull model. Other studies that analyzed the competing risks using generalized Weibull, included just the main parameters without the presence of independent variables. As evaluating the influence of covariates on survival time is one of the purposes of competing risks analysis, thus in this study we tried to evaluate the covariate effects on the survival time in generalized Weibull model, we defined the scale parameter in the form of a linear combination of covariate. Also and we estimated the parameters of each failure using the maximum likelihood approach.
Although Bayesian estimation has been used widely in survival analysis, but this approach so far has been limited to single risk models. Therefore for further research, in order to estimate the unknown parameters of generalized Weibull model data in the presence of competing risks, the Bayesian approach can be applied instead the maximum likelihood approach. Typically for parametric survival models, the scale parameter is reparameterized in terms of covariates (predictor variables), we suggest the parameters γ and β will be defined based on a linear combination of covariates instead parameter scale.