Joint Modeling of a Longitudinal Measurement and Parametric Survival Data with Application to Primary Biliary Cirrhosis Study

Although longitudinal and survival data are collected in the same study, they are usually analyzed separately. Measurement errors and missing data problems arise because of separate analysis of these two data. Therefore, joint model should be used instead of separate analysis. The standard joint model frequently used in the literature is obtained by combining the linear mixed effect model of longitudinal data and Cox regression model with survival data. Nevertheless, to use the Cox regression model for survival data, the assumption of proportional hazards must be provided. Parametric survival sub-models should be used instead of the Cox regression model for the survival sub-model of the JM where the assumption is not provided. In this article, parametric joint modeling of longitudinal data and survival data that do not provide the assumption of proportional hazards are investigated. For the survival data, the model with Exponential, Weibull, Log-normal, Log-logistic, and Gamma accelerated failure time models and the linear mixed effect model are combined with random effects and the models were applied in primary biliary cirrhosis data set obtained from Mayo Clinic. After determining the best parametric joint model according to Akaike and Bayesian information criterions, the best available model was compared with standard joint model and of separate analysis of survival data and longitudinal data. As a result, in the studies where longitudinal and survival data are obtained together, it is seen that the parametric joint model gives more better results than the standard joint model when the proportional hazard assumption is not provided.


Introduction
The JM is used to investigate the relationship between longitudinal and survival data and the effects of longitudinal data on survival. The standard joint model (JM), which is frequently used in the literature, is obtained by combining the linear mixed effect model of longitudinal measurement (longitudinal sub-model), and Cox regression model of survival data (survival sub-model). However, in order to analyze survival data with Cox regression, the assumption of proportional hazard is required. In studies where survival data for which this assumption is not provided, and longitudinal observation, parametric survival sub-models should be used for the survival sub-model of the JM. The JM was developed for the unbiased and effective estimates of the relationship between longitudinal and survival data after 1990s. JM was first applied in 1992 by Self and Pawitan to obtain the JM parameter estimates of the linear mixed effect (LME) model for longitudinal data and Cox regression model for survival data (Self and Pawitan, 1992).
In the literature particularly in the last 20 years, studies on JM appear. Primary studies on basic JM with one covariate survival and longitudinal data were performed by Self and Pawitan (1992), DeGruttola and Tu (1994), Tsiatis,

Pakistan Journal of Statistics and Operation Research
DeGruttola and Wulfsoh (1995), Faucett and Thomas (1996) and Wulfsohn and Tsiatis (1997). Henderson, Diggle, and Dobson (2000) have developed a general model that adopts two stationary Gaussian processes, including sequential correlation and random effects for longitudinal measurements and survival times. Tsiatis and Davidian (2001) proposed a simple model for estimating JM parameters that do not require distribution assumptions over random effects. Tseng, Hsieh and Wang (2005) studied the maximization of JM likelihood function in the presence of longitudinal data including measurement errors and accelerated failure time. Elashoff, Li and Li (2008) and Hu, Li and Li (2009) have done studies on JM in the event of competing risks and multiple failures. Rizopoulos (2010) developed the "JM" package in the R program to obtain the JM estimates of the longitudinal and survival data. Su and Wang (2012) investigated the JM of left truncation survival data and longitudinal observations.
In the next section, we introduce notation and describe the parametric joint models. In addition, we have defined how model parameters are obtained. In Section 3, parametric and standard JM have been applied to primary biliary cirrhosis (PBC) data obtained from Mayo Clinic in order to determine the effect of longitudinal observation on the survival time of patients with liver cirrhosis. And finally in section 4, we provide final comments for compared to standard JM, parametric JM and separate analysis for longitudinal and survival data and investigated the effects of using the standard JM on parameter estimation when the assumption of proportional hazards is not provided.

Material and Method
JM are two fundamental compact the formed longitudinal sub-model and survival sub-model. Standard JM is composed of LME model for longitudinal sub-model and Cox regression model for survival sub-model.
Standard JM can be obtained as follows; where Equation (1)  shows the covariates (e.g. treatment indicator, disease history etc) and regression coefficient vector corresponding to covariates, respectively. ℎ 0 (. )is the basic risk function and can be defined in a parametric and nonparametric way as in the Cox model (Cox 1972;Bahçecitapar 2018). in Equation (2) is the association parameter for longitudinal sub-model and survival sub-model. There is no relationship between two process when = 0 (Rizopoulos, Verbeke and Molenberghs 2010; Rizopoulos 2012).
In order to apply Cox regression model for survival data, the assumption of proportional hazard (PH) must be provided. Parametric models should be used in cases where the assumption is not provided, and the survival time corresponds to a certain distribution. Parametric models are divided into PH and accelerated failure time (AFT) models. Parametric PH models are parametric version of Cox regression, and commonly used PH models; Exponential, Weibull and Gompertz (Cleves, Gould and Marchenko, 2016). Although parametric PH models are widely used for survival data, there is little distribution that can be used in modelling. In this case, AFT models are used an alternative to parametric PH models. In addition, AFT models are also used as an alternative to PH models to overcome the statistical problems caused by the violation of the assumption of PH (Faruk, 2018). Therefore, survival sub-model for JM should be done by parametric regression methods where survival data do not provide the assumption of PH and have a certain parametric distribution. Let we redefine the survival sub-model given in Equation (2) for parametric AFT models as follows (Pericleous 2016

other parametric models
From this form of ℎ( ), risk function ℎ( ) i. for unit can be written as follows; ℎ( ) = ( ) + × (3) where and indicates the scale parameter and errors from appropriate survival distribution, respectively and ( ) can be defined as follows, Here, parametrical survival sub-models can be written in general form as follows; The survival and basic hazard functions for the parametric sub models of the JM are given in Table 2.1.

Parameter estimates methods for joint modelling
Two methods are used for the parameter estimation of the JMs, namely the two-stage procedure and the solution of joint probability. In literature, different two-stage methods have been suggested (Self and Pawitan 1992; Tsiatis, DeGruttola and Wulfsohn 1995; Wu et al. 2012). Two-stage method has mainly been defined as follows; 1) The longitudinal covariates are modeled with the LME, and the unit-specific values of the covariates are estimated.
2) The survival model is estimated by the values obtained in stage 1.
This approach reduces bias in parameter estimation of the Cox regression model (Wulfsohn and Tsiatis 1997). The approach is simple to implement and allows parameter estimation using existing software (Wu et al. 2012). However, this method cannot simultaneously use information from the survival and longitudinal process at the estimation stage of each model (Wu et al. 2012). The use of only longitudinal results in the first stage may comprise biased estimates in the LME, and as a result, biased and ineffective results may arise in parameter estimates of survival analysis (Ibrahim, Chu and Chen 2010). Methods based on the joint probability function allow for unbiased parameter estimates. The JM paradigm includes different model strategies such as pattern-mixture models and selection models (Pericleous 2016 Among these methods, a random effect model (also is named shared parameter model) that connects survival and longitudinal process with random effect is preferred dominantly (Pericleous 2016). Therefore, in this article is used the random effect model for linked longitudinal and survival sub model.
where = ( ′ , ′ , ′ ) has all parameter vector, and , and shows survival data parameters, longitudinal data parameters, and covariance matrix parameters of random effects, respectively. ( , | ; , ) shows the conditional density of survival sub-model, and ( | ; ) ( ; ) indicates joint density function for longitudinal measurement and random effects. Integral approaches, such as, Gauss-Hermite, Laplace, EM algorithm are needed because the integrals of shared parameter models (Equation 7) are very complex and difficult to calculate analytically to obtain parameter estimates (Rizopoulos 2010).

Results and Discussion
PBC data in the literature were used to examine and compare different parametric JM structures, standard JM structure, and separate analysis of two data. The data were collected to examine the progress of PBC in 312 patients in the Mayo Clinic in 1924-1984 years (Murtaugh et al. 1994). This data was obtained from "JM" package in the R program. PBC is a chronic liver disease with no effective treatment other than liver transplantation (Gordon 1987). Of the 312 patients in the PBC data set, 158 received the D-penicillin drug and the other 154 patients were identified as the placebo group. Serum bilirubin levels were measured repeatedly at specific time intervals (6 months and after each year) and total 1945 consist of unbalanced longitudinal measurements. The main aim of the study was to research the effect of treatments on the relationship between serum bilirubin (longitudinal measurement) and time of death.
Since the patients were started to be followed, the survival time (year) were taken as the period until death. Here the failure is considered as death and other patients expressed as censored. At the end of study 140 of the 312 patients (44.9%) died and 172 (55.1%) was censored. The data set includes clinical, biochemical, and demographic risk factors for each patient. Demographic factors; age and sex of patients, Biochemical factors; drug (D-penicillin and placebo group), ascites (accumulation of water in the abdomen due to liver failure-no/yes), hepatomogaly (liver growth statusno/ yes), skin disorder (blood vessel disorders in the skin-no/yes), edema (swelling condition in hands and feet-no/yes, edema despite drug use/ yes) and consists of histological stage. Serum bilirumin (mg/dl) values were taken as biochemical properties and because of having a left skewed distribution, the logarithmic transformations of bilirubin. The covariates and levels used are given in Table 3.1.   However, the results of the standard JM to compare with the parametric JM results are given in Table 3.2. Since the data did not fit the Cox regression model, parametric survival models were examined for JM. Accelerated Failure Time (AFT) models with more probability distribution for parametric regression models have been used and to determine which of the models are best appropriate have been used the model comparison criteria (AIC and BIC). The AIC and BIC values of the JM of the linear mixed effect model with Exponential, Weibull, Log-normal, Loglogistic and Gamma regression models from parametric models are given in Table 3.3. According to the model comparison criteria given in Table 3.3, the best model was determined as parametric JM of LME model with Weibull regression model. According to the results given in Table 3.4, gender, acid, skin defect, edema and histological stage4 covariates in longitudinal sub-model and age, ascites, hepatomegaly and edema covariates were found statistically significant in Weibull sub-model. According to Weibull sub-model, survival time is shortened as age progresses, survival time in patients with hepatic insufficiency as fluid accumulation in the abdomen is 1.41 times shorter than those without. Survival time in patients with liver growth is 2 times longer than in those without liver growth and in patients with hand and foot swelling despite the use of the drug, survival time is 1.9 times shorter than those without hand and foot swelling, survival time in patients with hand and foot swelling is 4.5 times shorter than those without. The Assoct coefficient shows the association parameter between the longitudinal and survival data. Accordingly, it can be said that the increase in serum bilirubin measurements will result in a shortening of survival time.
To compare the results obtained from the JM, the results of separate analysis of the longitudinal and survival data are given in Table 3.5 and Table 3.6, respectively. According to the results of Weibull parametric model, it is seen that the hazard ratios and standard errors of parameter estimations are higher than the Weibull parametric JM. In addition, the relationship between time-dependent serum bilirubin values and survival was found to be lower than the Weibull parametric JM.
The hazard ratios and standard errors of parameter estimates in the standard JM were found to be higher than the Weibull parametric JM. In addition, the relation between serum bilirubin values and life expectancy was higher than the Weibull parametric model according to standard combined model results.

Conclusion
Survival and longitudinal data obtained in the same study should be analyzed by JM in case there is a relationship between them. The standard JM construction which is generally used is based on combining linear mixed effect model of longitudinal observations and Cox regression model of survival data by using shared parameter models. Nevertheless, on account of using the Cox regression model for survival data, the assumption of PH should be provided. In cases where the assumption is not provided, to obtain unbiased and reliable results Exponential, Weibull, Log-normal, Log-logistic and Gamma parametric models should be used for the JM survival sub-model.
In this study, standard JM, parametric JMs and separate analysis of longitudinal and survival data were performed by the data obtained from Mayo Clinic in order to investigate the effect of treatments on the relationship between serum bilirubin and time of death in PBC patients. Parametric joint model has the indicators that yield better results than separate analysis of longitudinal and survival data and the standard joint model when assumption of PH is not provided. The standard error and hazard ratio of parameter obtained by from parametric joint model are slower than the standard error and hazard ratio of parameter obtained from standard joint model and separate analysis of longitudinal and survival data. Furthermore, when the result of separate analyzes are examined, it is seen that the effect of longitudinal observation on survival data is calculated as very low. Especially on important issues such as human health, it is important to accurately calculate the relationship between longitudinal measurement and survival data and the risk factors of covariates. Therefore, in cases where the assumption of proportional hazards is not provided, parametric survival models should be used for survival partial of the joint model in order to obtain unbiased and effective results.