High Dimensional Logistic Regression Model using Adjusted Elastic Net Penalty

Reduction of the high dimensional binary classification data using penalized logistic regression is one of the challenges when the explanatory variables are correlated. To tackle both estimating the coefficients and performing the variable selection simultaneously, elastic net penalty was successfully applied in high dimensional binary classification. However, elastic net has two major limitations. First it does not encourage grouping effects when there is no high correlation. Second, it is not consistent in variable selection. To address these issues, an adjusted of the elastic net (AEN) and its adaptive adjusted elastic net (AAEM), are proposed to take into account the small and medium correlation between explanatory variables and to provide the consistency of the variable selection simultaneously. Our simulation and real data results show that AEN and AAEN have advantage with small, medium, and extremely correlated variables in terms of both prediction and variable selection consistency comparing with other existing penalized methods.


Introduction
With the advancement of technologies, massive amount of data with increasing dimensions have been generated in many areas such as genetics, medical, economic and social sciences.The expansion of the data is in two dimensions: the number of variables and the number of observations."High dimensional data" refers to the situation where the number of variables measured is greater than the number of observations in the data.This differs from traditional datasets for statistical analysis where we have many observations on a few variables.Such high dimensional data has posed new challenges to statistical analysis, because a lot of conventional statistical methods do not automatically apply into these datasets, for example, the curse of dimensionality makes many classical regression models, such as logistic regression, ineffective, because statistical issues associated with modeling high dimensional data include model overfitting, estimation instability, computational difficulty (Pourahmadi, 2013).
How to reduce the dimensionality has been an important research question in statistical applications.One way to handle the high dimensional data is to perform data reduction.To do this, various penalized methods have been proposed begin by ridge penalty (Hoerl & Kennard, 1970).It estimates the regression coefficients through 2 -norm penalty.It is well-known that ridge regression shrinks the coefficients of correlated predictor variables toward each other, allowing them to borrow strength from each other (Friedman, Hastie, & Tibshirani, 2010).The least absolute shrinkage and selection operator (LASSO) was proposed by Tibshirani (1996) to estimate the regression coefficients through 1 -norm penalty.Zou and Hastie (2005) proposed the elastic net penalty which is based on a combined penalty of LASSO and ridge regression penalties in order to overcome the drawbacks of using the LASSO and ridge regression on their own.Usually, in high dimensional data the explanatory variables are correlated.If there is a group of highly correlated variables, the LASSO will randomly select only one variable from this group and drop the rest whereas elastic net will select the whole group of the highly correla-ted explanatory variables (Zou & Hastie, 2005;Zhou, 2013).Analogously, Bondell and Reich (2008) proposed a penalty called OSCAR to encourage selection of a group of highly correlated explanatory variables.Elastic net often performs better than LASSO in terms of prediction error when there is correlation among variables, also OSCAR has a comparable performance similar to elastic net (Zeng & Xie, 2011).Tutz and Ulbricht (2009) proposed correlation-based penalty to deal with grouping effects.This penalty just makes variable shrinkage rather than variable selection.Elastic net penalty lacks consistent variable selection (oracle property), even though it outperforms LASSO.Zou and Zhang (2009) proposed adaptive elastic net to handle grouping effects and enjoying oracle property simultaneously.El Anbari and Mkhadri (2014) explained though experimental studies that elastic net seems to be slightly less reliable if the correlation between explanatory variables is not so extreme (i.e.0.95

 
).In this paper, an adjusted of the elastic net (AEN) and its adaptive adjusted elastic net (AAEM), are proposed to take into account the small and medium correlation between explanatory variables and to provide the consistency of the variable selection simultaneously.The remainder of this paper organizes as follows.Section 2 covers the penalized logistic regression methods.Description of the AEN and AAEM is explained in section 3.Sections 4 and 5 are devoted to simulation studies and results.While section 6 covered the real data analysis.We end this paper with a conclusion in section 7.

Penalized Logistic Regression Model
Logistic regression is a statistical method to model a binary classification problem.The regression function has a nonlinear relation with the linear combination of the explanatory variables.In binary classification, the response variable of the logistic regression has two values either 1 for the positive class or 0 for the negative class.Let {0,1} i  y be a vector of size . The predicted class is then obtained by , where () I is an indicator function.
Penalized logistic regression (PLR) adds a nonnegative penalty term to Eq. ( 1), such that the size of the explanatory variables coefficients in high dimension can be controlled.Several penalty terms have been discussed in the literature (Li, Jia, & Zhao, 2013;Tibshirani, 1996;Zhenqiu et al., 2007).The 1 -norm penalty, proposed by Tibshirani (Tibshirani, 1996), is one of the popular penalization terms.The 1 -norm penalty performs explanatory variables selection and estimation simultaneously by constraining the log-likelihood function of variables coefficients.Thus, the PLR is defined as: The estimation of the vector  is obtained by minimizing Eq. ( 3) where () P  is the penalty term that regularized the estimates.The penalty term depends on the positive tuning parameter,  , which controls the tradeoff between fitting the data to the model and the effect of the penalization.In other words, it controls the amount of shrinkage.For the 0   , we obtain the maximum likelihood estimation (MLE) solution.In contrast, for large values of  the influence of the penalization term on the coefficient estimates increases.Choosing the tuning parameter is an important part of the model fitting.If the focus is on classification, the tuning parameter should find the right balance between the bias and variance to minimize the misclassification error.Without loss of generality, it is assumed that the explanatory variables are standardized, . As a result, the intercept 0  is not penalized.The estimation of the vector  using the LASSO ( 1 -norm penalization) is defined as: where  is a tuning parameter.It reduces to the MLE estimator when 0   .On the other hand, if   , the penalization term forces all the explanatory variables to be zeros.In practice, the value of  is often chosen by a cross-validation procedure.To solve Eq. ( 5), the traditional numerical methods are through MLE or the Newton-Raphson's algorithm.However, the computation of these methods is prohibitive when the number of explanatory variables is large (Zhu & Hastie, 2004).Equation ( 5) can be efficiently solved by the coordinate descent algorithm (Friedman et As we observe from Eq. ( 6), elastic net estimator depends on two non-negative tuning parameters 1  and 2  which lead to penalized logistic regression solution.However, elastic net performs well when the pairwise correlations between variables are very high.El Anbari and Mkhadri (2014) stated that if the absolute correlation between genes is less than 0.95, elastic net may be slightly less reliable.Moreover, elastic net does not take into account the correlation structure among genes (Bühlmann, Rütimann, van de Geer, & Zhang, 2013).Additionally, it was pointed out by Zou and Zhang (2009) that the elastic net fails in terms of achieving oracle property, although the grouping effect problem for elastic net remains.As a result, adaptive elastic net was introduced by Zou and Zhang (2009) and Ghosh (2011), which it combines the 2 -norm penalization with the adaptive LASSO.

Adjusted Elastic Net Penalty
In this section, we present our proposed adjusted method, AEN and AAEN, in logistic regression model.The main idea behind AEN is to take into account the information about the empirical correlation of the data matrix in the 2 -norm term because elastic net does not.Suppose without loss of generality that the explanatory variables are scaled, we define the AEN estimator as where 1  and 2  are non-negative tuning parameters., jp r is the correlation between j and p explanatory variables where pj  .The quantity 2 , () is helpful to make AEN reliable if the correlation between explanatory variables is not so extreme.
The last term from Eq. ( 7) is greater than zero for any vector  .Therefore, ,, ( ) ( ) rr  represents a Cholesky's decomposition.After suitable data argumentation, Eq. ( 7) is equivalent to a LASSO.The AEN was solved using coordinate descent optimization (Friedman et al., 2010) which computationally efficient method for solving this type of convex optimization problem.The optimal AEN model was found by a grid search over the parameters 1  and 2  .
Furthermore, the adaptive version of AEN, AAEN, is defined by where where 0   .For simplicity, 1   was used for both simulation study and real data application.

Simulation Study
In this section, simulation studies are used to investigate the performance of the proposed AEN and AAEN.Furthermore, we compare AEN and AAEN with  For every simulation case and in each replication we generate training, validation, and testing data.The training data were used for model fitting.The validation data were used to determine the tuning parameters.The testing data were used to evaluate the penalization methods.For each case, the observation numbers of the corresponding data sets are denoted by training/validation/testing.Based on the simulated data, we used three metrics to evaluate all penalization methods which were studied in this paper, missclassification errors for the test data (ME t ), hits which stands for the number of correctly identified true variables, and false positive (FP) which denotes to the number of zero variables which are wrongly considered as true variables.
Since we investigate a penalization method with both variable selection and grouping property, we use simulation scenario with different values of the correlation and different numbers of training, validation, and testing observations.Simulation Scenario: In this setting, we generate data sets with sample sizes 50/50/100 and 1000 explanatory variables.Four cases are studied.The grouping effects were generated as follows , ~(0,1), ii Case C: Similar to case A, we set ~(0,0.
 in order to get correlations within each group equal 0.5.

Simulation Results
To examine the performance of the AEN and AAEN penalties we compare it with elastic net.For the tuning parameters of elastic net, AEN, and AAEN, a prior value of 2  ME t ) is computed as the criterion of evaluation.Figure 1 displays the corresponding boxplots of the ME t for the three used methods for the four cases.It is clearly seen that AEN and AAEN has less variability comparing with elastic net.Also, it can be seen that AEN and AAEN are slightly similar.  1 reveals that the AAEN method produces considerably smaller median ME t and standard deviation among all methods in all cases.For example, in case A the median ME t of AAEN is 5.782 with standard deviation equals to 1.561 which is smaller than 5.824 (1.766) and 6.348 (2.010) for AEN and elastic net methods respectively.Furthermore, the reduction of ME t is usually substantial compared to elastic net.For example, the reduction in case A, case B, case C, and case D is 0.67%, 1.60%, 5.01%, and 6.60% respectively.Moreover, in case A, there is high collinearity among variables.Elastic net is supposed to have the best performance then AEN because elastic net deals with extremely highly correlations.In addition, our method performs well in terms of ME t when the correlation is small and medium.Besides, from the simulation results we can observe that elastic net came the last method.
For variable selection accuracy, the penalization methods should include all important (non-zero variables), hits and FP were used to measure the performance of AAEN, AEN, and elastic net in term of selecting the non-zero variables.From Table 1 both AAEN and AEN succeed in selecting the true non-zero variables in most of the cases in term of hits.For example, AEN selects the all nine non-zero variables.Moreover, when the correlation coefficient varies from small, medium, to extremely high correlation elastic net selects less non-zero variables comparing to AAEN and AEN.We can expect such a result because elastic has its limitation in biased selection.In terms of FP, AAEN and AEN methods usually select less ineffective variables than elastic net in most cases.To this end, it is obvious from our simulation results that the AAEN and AEN methods perform better in term of ME t by obtaining smaller values, hits, and FP followed by elastic net for small, medium, and extremely high correlation and has greater advantage of variable selection with grouping effects in logistic regression model.

Real Data Results
To evaluate our proposed method in the field of binary classification, a publicly wellknown binary cancer classification dataset was used, which is the prostate cancer dataset published by (Singh et al., 2002).It consists of 102 samples of 52 prostate tumor samples and 50 non-tumor tissues, where each sample has 12600 genes.A subset of 5966 genes was adapted in the classification.In order to enable a fair comparison, typically, the dataset was randomly partitioned into a training dataset, which comprised 70% of the samples, and a test dataset, which consisted of 30% of the samples.The partition repeated 50 times.In order to get the best value of the pair 12 ( , )  , the 10-fold CV was employed using the training dataset.All the applications were conducted in R using the glmnet package.
Table 2 shows the median number of explanatory variables selected by each of the AAEN, AEN, and elastic net in the training data set, and the corresponding median ME t .
It can be seen that AAEN performs best in term of prediction error where the ME t of the AAEN is approximately 3.73% lower than AEN and 6.39% lower than elastic net.Moreover, AAEN selects less explanatory variables than the other two methods.

Conclusion
A study of adjusted elastic net was proposed by applying on logistic regression model.AAEN and AEN with elastic net were compared by using simulation studies and real data application.Both the simulation and real data results show that the AAEN and AEN are outperforming the elastic net in terms of ME t of test data and variable selection accuracy.We can conclude that AAEN and AEN more reliable for grouping effects when there are broader ranges of correlation between variables in applying penalized logistic regression model.

 2 
is required to transform the original training data set to the new augmented training data set.A sequence of values for is miss-classification error for the training set ( (Fan & Li, 2001)& Hastie, 2008).The LASSO has an advantage in that it is computationally feasible in high dimensional classification data.On the other hand, the LASSO has three main drawbacks.First of all, if pn  (i.e. the explanatory variables are greater than the number of samples), the LASSO selects at most n variables because of the nature of the convex optimization problem.In addition, the LASSO cannot handle the effect of grouping.When the pairwise correlations among a group of explanatory variables are very high, then the LASSO tends to select only one variable from the whole group and does not take into account which one is selected (Zeny, Xiaojian, Sanjeena, & Paul, 2012).Lastly, the LASSO lacks the oracle properties, as stated in Fan and Li(Fan & Li, 2001).Elastic net is a penalization method for explanatory variables selection, which is introduced by Zou and Hastie (2005) to deal with the first two drawbacks of LASSO.Elastic net tries to merge the 2 -norm and the 1 -norm penalizations, by using ridge regression penalty to deal with high correlation problem while taking advantage of LASSO penalization in variable selection property.The PLR using elastic net penalty is defined by

Table 1
In addition, the median number of hits and FP are reported too.In each case, the bold font indicates the best method on ME t , Std.Dev., hits, and FP.Table