Evaluation of Diagnostic Accuracy and its Standard Error using Constant Shape Weibull Mixture ROC Curve

Receiver Operating Characteristic (ROC) Curve is a widely used classification technique in Medical Diagnosis which classifies the healthy and diseased individuals on the basis of optimal cut off value of the biomarker. In this article, we have proposed Constant Shape Weibull Mixture ROC (CSWMROC) model. The properties of CSWMROC Curve are discussed and expressions for AUC, its variance and confidence interval are derived. The estimates of AUC of CSWMROC curve are obtained using Method of Moments (MOM). Numerical example is considered to support the proposed theory.


Introduction
Weibull Mixture distribution is very useful in medical diagnosis because it attains many shapes for different values of shape and scale parameters which helps in modeling different types of data.Here, we keep constant shape parameter to obtain the proper CSWMROC Curve so that it never crosses the chance diagonal otherwise it will become worthless.
Only a limited literature is available on the mixture of distributions.Some books on finite mixture distributions are written by Everitt and Hand [1981], Titterington et al. [1985] and McLachlan and Peel [2000].Some authors like Newcomb [1886] studied the finite mixture distributions for outlier and Pearson [1894] estimated the parameters of the two component normal-mixture distribution by using the method of moments.Other than the above mentioned monographs, some other works are attempted on Weibull mixture distribution.Kao [1959] derived the estimates of parameters of weibull mixture distribution using method of moments.Bucar et al. [2003] studied the finite weibull mixture distribution in Reliability theory.Arfa [2008] compared the sestimate of parameters of two component weibull mixture distribution by MOM and graphical method of estimation.Dwidayati et al. [2013] discussed the cure rate model in breast cancer patients through weibull mixture distribution.Dewan and Nandi [2009] estimated the parameters of the bivariate weibull distribution under random censoring using EM algorithm.Erisoglu and Erisoglu [2014] studied and compared the estimates of the weibull mixture distribution in case of heterogeneous data using EM algorithm, Lmoment method and MLE method.They compared the bias, mean absolute error, total mean error and time completion of the algorithm using different method of estimation by simulation studies.Pundir and Amala [2014] proposed and discussed the characteristics of the constant shape weibull ROC Curve.
ROC Curve is a graph between False positive rate (x(t)) and True positive rate (y(t)) for cut off value t.Till date, there are many authors like Green and Swets in [1966], Egan [1975], Zhou et al. [2002] and Krzanowski and Hand [2002] who discussed the ROC Curve for univariate distributions in case of continuous data.They gave the idea on theory of estimation on ROC Curve, AUC of ROC Curve and also used Statistical Inference on ROC Curve.
In practice, medical data is heterogeneous or it may consist of sub populations.Generally, we ignore this fact and apply the existing ROC models without checking for the heterogeneity which gives us the misleading results.Hence, there is a need to introduce mixture ROC models which will give exact accuracy of the diagnostic test with less standard error.
Only few authors discussed the mixture ROC Curve.The first article on the mixture ROC Curve is given by Dass and Kim [2011] where they discussed the Multivariate Bi-normal Mixture ROC Curve.Gonen [2013] also studied the ROC Curve and AUC using Binormal mixture distribution.It was found that if the heterogeneity is found in the data then Bi-normal mixture ROC Curve gives better smoothness as compare to bi-normal ROC Curve.Pundir and Azharuddin [2014] studied the Exponential Mixture ROC Curve and compared the estimates of AUC of Exponential Mixture ROC Curve using Method of Moments and MLE.Pundir and Azharuddin [2016] studied the Normal Mixture ROC Curve along with its properties and found the maximum likelihood estimates of parameters of AUC and confidence interval of AUC of Normal Mixture ROC Curve.
A mixture distribution can be applied if a population contains two or more subpopulations or in the presence of heterogeneity.A random variable X is said to follow a mixture distribution if it has the probability density function as where i p is the weight of the i th component of mixture distribution.
Idenifiability is a necessary assumption for the estimation of mixture distributions.Without checking of identifiability in mixture distributions, one can not estimate the parameters.There are many authors who gave the idea on identifiability on mixture distribution.Teicher (1961Teicher ( , 1963) ) studied the identifiability of finite mixture distribution.Yakowitz and Spragins (1968) discussed the exponential families of mixture distribution are identifiable.Atienza N et al. (2006) discussed the new condition for identifiability on finite mixture distributions.They discussed the identifiability on Log Normal, Gamma and Weibull mixture distribution.In this paper, we are taking Constant Shape Weibull Mixture distribution where Weibull Mixture distribution is a member of exponential family, hence it is also identifiable.
A class N of mixture is said to be identifiable if and only if for all   N x f


and the equality of two representations holds where, n=n' and for all i there exit some j such that j i p p  and A random variable X is said to follow the two component weibull mixture distribution with probability density function The cumulative distribution function of the two component weibull mixture distribution is given as where i  and i  are the shape and scale parameters of the weibull mixture distribution.
In this paper, the shape parameter The paper is organized as follows.In section 2, we have studied the CSWMROC model and its properties.The AUC and optimal cut-off value of biomarker using CSWMROC model are also derived.The moment estimates of AUC of CSWMROC Curve are also obtained in section 3.In section 4, the variance of AUC of CSWMROC model and confidence Interval (CI) are derived using delta method.In section 5, AUC, variance of AUC, Standard Error (SE) of AUC, Mean Square Error (MSE) of AUC, confidence interval and testing of AUC are done by using simulation studies.In the last section, conclusion is given.

Constant Shape Weibull Mixture ROC model
Let X be a random variable from healthy controls which follows Constant Shape Weibull Mixture Distribution with parameters  .The CSWMROC model is defined as where,

Assumptions:
(1) The mean of disease cases should be greater than the mean of healthy cases for CSWMROC curve.
The Shape parameter ( ) should be fixed to obtain the proper CSWMROC curve.
(3) Proof: A function is said to be monotonically increasing function if the first derivative of the function is greater than zero.From (2.1), we have Proof: A function is said to be concave if its second derivative is less than zero.From (2.2), we have Proof: Let f(x) be comparison distribution and g(x) be reference distribution, then KL(f, g) and KL(g, f) are given as From (2.4) and (2.5), we can see that KL(g, f) > KL(f, g) i.e. the CSWMROC Curve is TPR asymmetric.
The AUC of CSWMROC Curve is defined as (2.7)

Optimal cut-off value
In medical diagnosis, the optimal cut-off value (t) tells us about the patient's situation whether his status of disease.The optimal cut-off value is defined by the Fluss et al. (2005) in the Youden index which is obtained by taking the maximum difference between the CDF of healthy and disease cases.The optimal threshold value or cut-off value of biomarker using CSWMROC curve is obtained as (2.8)

Estimates of parameters of AUC of CSWMROC Curve using Method of Moments
It is very old and easy method for estimating the parameters.The r th sample moment of a mixture distribution is defined as where,   The shape parameter  is constant for both sub populations and 1  and 2  are the scale parameters of Weibull Mixture distribution.On putting r=1, 2, 3, 4 in (3.2), we get

Variance of AUC of CSWMROC Curve using delta method
The approximate variance of AUC of CSWMROC Curve by Delta method gives the approximate variance as Using delta method, we have On substituting (4.3) in (4.1), we get To determine the variance of  The Fisher information matrix is given as where,   On substituting (4.9) and (4.10) in (4.1), we get (4.11)Using V(AUC), one can easily find confidence interval, MSE and test of significance for AUC.
(i) The 100(1-α)% confidence interval of AUC is given as where α is the level of significance and 2


Z is the critical value of the confidence interval and SE is the standard error.
(ii) The Mean Square Error (MSE) is used to identify the quality of an estimator.It is defined as  The test statistic is given as where N = m + n, m is the sample size of healthy controls and n is the sample size of disease cases.

Simulation Studies
The random numbers are generated from Weibull mixture distribution with fixed values of shape parameter and scale parameters of healthy controls and disease cases for the sample sizes N=10, 20, 30, 100, 200 and 300.The sample sizes are equal for healthy controls and disease cases.The value of weight of healthy controls and disease case are also taken as equal i.e., p=0.7.The values of shape parameter is same for healthy controls and disease cases 2   A ˆvalues are greater than 0.88, so we reject the null hypothesis and concludes that AUC is not equal to 0.88.

Conclusion
In this paper, we have proposed CSWMROC model and found that CSWMROC curve is monotonically increasing, concave in nature and TPR asymmetric.The Area under the CSWMROC Curve, its variance and the optimal cut-off value of biomarker using CSWMROC Curve are also derived.The estimates of parameters of AUC are obtained by MOM.The variance of AUC of ROC Curve is also derived.The MSE of AUC, confidence interval of AUC and test for AUC are also discussed.From simulation studies, it is concluded that the estimates of parameters of AUC of CSWMROC Curve using MOM become approximately closer to the population parameters for large sample size.It is concluded that when heterogeneity is found in the data and Weibull mixture distribution fits well to the data then one should use Weibull mixture ROC model instead of Bi-Weibull ROC model.

Programs (a) R-Command
The random numbers are generated by using the following command

20 
and Y be another random variable from disease cases which follows Constant shape Weibull Mixture Distribution with The CSWMROC curve remains unaltered if the test scores undergo a strictly increasing transformation.(b)CSWMROC curve is monotonically increasing.
(e) The slope of the CSWMROC Curve at the cut off value t is given as(2.6) the densities of two sub-populations of mixture distribution.The r th sample moment of Constant Shape Weibull Mixture distribution is obtained as where (iii) Consider the problem of testing of AUC of CSWMROC Curve as

Fig. 5 .
Fig. 5.1 shows the CSWMROC curves for different sample sizes and fixed values of parameters mentioned above.

Table 5 .1: Estimates of parameters of AUC of CSWMROC Curve by MOM for different sample sizes
From above table, it is observed that with the increase in sample size, the estimates of CSWMROC model become closer to the parameters.Using the estimators in Table5.1, one can see AUC) decreases with increase in the sample size because the variance of AUC and standard error of AUC depends on sample sizes.From Z values, one can see that all