Performance Analysis of Mixed Logit Models for Discrete Choice Models

Mixed Logit model (MXL) is generated from Multinomial Logit model (MNL) for discrete, i.e. nominal, data. It eliminates its limitations particularly on estimating the correlation among responses. In the MNL, the probability equations are presented in the closed form and it is contrary with in the MXL. Consequently, the calculation of the probability value of each alternative get simpler in the MNL, meanwhile it needs the numerical methods for estimation in the MXL. In this study, we investigated the performance of maximum likelihood estimation (MLE) in the MXL and MNL into two cases, the low and high correlation circumstances among responses. The performance is measured based on differencing actual and estimation value. The simulation study and real cases show that the MXL model is more accurate than the MNL model. This model can estimates the correlation among response as well. The study concludes that the MXL model is suggested to be used if there is a high correlation among responses.


Introduction
Discrete choice models (DCM) describe decision maker's choices among alternatives. The decision makers can be people, households, firms, or any other decision-making unit, and the alternatives might represent competing products, courses of action, or any other options or items over which choices must be made. Each model has advantages and disadvantages. The advantage of MNL is to have a simpler form of equation in estimating the model. The weakness of MNL is that the model is prepared by using irrelevant alternative independence (IIA). Meanwhile, MNP was developed to overcome MNL weakness in terms of IIA. However MNP has an open form of equation, so its model estimation requires simulation and iteration methods such as Newton-Raphson.
The MXL has been developed to overcome the limitations of both of the MNL and the MNP for both of the IIA assumption and the computational aspects. MXL can be derived under a variety of different behavioral specifications, and each derivation provides a particular interpretation. The MXL is defined on the basis of the functional form for its choice probabilities. Any behavioral specification which derived the choice probabilities will take this particular form, is called an MXL (Train, 2003). Refer to vast and popular applications of the MXL, it was stated that the MXL as the model of the future (Bhat, 1998;Bolduc, 1999;Lovreglio et al., 2016;Train, 2016) which proposed a flexible procedure to represent the distribution of the random parameter model (Train, 2016). However it raises the next issue on how accurate the MXL compared to MNL. This paper, the profile of MXL for detecting the correlation among alternatives on utilities models is studied. Model was constructed and applied for simulation data on several rank of dependency (correlation) among alternatives and R.3.1.1 software was adopted to run the computational work. Parameters in MXL and MNL are estimated by using the MLE method which has good properties for large samples, especially asymptotically efficient and asymptotic bias in simulated maximum likelihood estimation (Lee, 1995;Horowitz, and Savin, 2001;Lee, 1992). Based on these properties, the estimator accuracy can be measured by using the distance between the estimator and of the parameters at various levels of correlation.
Aim of study is to evaluate the effect of correlation among choice on MXL and MNL models. In order to organize the scheme of study, the specification of MNL and MXL, how to perform parameter estimation using MLE and accuracy of parameters estimation in the MNL and MXL using simulation data are presented. Some examples of the application of MNL and MXL in real cases are also described.

Utility Model
To fit in a discrete choice framework, the set of alternatives, called the choice set, needs to exhibit three characteristics. First, the alternatives must be mutually exclusive from the decision maker's perspective. The decision maker chooses only one alternative from the choice set. Second, the choice set must be exhaustive, in that all possible alternatives are included. The decision maker necessarily chooses one of the alternatives. Third, the number of alternatives must be finite. A decision maker (respondent), denoted as i was faced with a choice among J alternatives. The respondent has a certain level utility (or profit) for each alternative j. Uij for j=1,...,J is the utility that respondent i obtain from alternative j and the real value of Uij that is unkonwn by researcher. The decision maker chooses the alternative that provides the greatest utility. Researcher did not know the value of utility for respondents in each option and looked at the attributes, is denoted Zij, that exist for each choice and respondents attribute is denoted Xi.
A function that relates these observed factors to the decision maker's utility is denoted Vij and is often called representative utility, = + + (1) where i=1,...,n and j=1,...,J. Vij is assigned as representative utility. Equation (1) is a model constructed and reported by Boulduc (1999). Due to the unkown value of Uij , therefore = + .
(2) = ( 1 , … , )′ is a random variable having density of ( ), where Vij is observed factor and is unobserved factor in utility. The density ( ) is the distribution of the unobserved portion of utility within the population of people who face the same observed portion of utility. Different choice models are derived under different specifications of the density of unobserved factors, ( ).

Multinomial Logit Model (MNL)
MNL model is derived under the assumption that is independent and identically distributed (iid) extreme value for all j, Function of extreme value density type I (Gumbel) is This formula is assigned as logit probability (Train, 1998). Parameters ( , , ) can be estimated by using MLE.

Mixed Logit Model (MXL)
Mixed logit assumes that the unobserved portions of utility are a mixture of an independent and identically distributed (iid) extreme value term and another multivariate distribution selected by the researcher. This general specification allows MXL to avoid imposing the IIA property on the choice probabilities. Further, MXL is a flexible tool for examining heterogeneity in responden behavior through random coefficients specifications. Mixed Logit is a very flexible model that can be approached by several random utility models (McFadden and Train, 2000). In Equation (2), if a random variable having density ( ) is added then utility model can served in the following form: Generally, the assumption of ( ) in standard normal distribution is used, ~(0,1) and = ( 1 , … , 1 )~( , ).
It is assumed that has an extreme value distribution and is independent to . The MXL is a standard integral Logit in respect to density . The probabily of i-th respondent chooses the k-th alternative can be formulated into: where ( ) is logit probability that can be writen as: The probability in MXL is a weighted mean to the logit by using weighting density function ( ). The MXL is a mixed form of logit function and density function ( ) and the value of probability in Equation (8) can be approached/computed by using simulation. The steps of simulation are as following (Train, 2003) : a. Take a value of from density ( ) and label it as ( ) . At the first take r=1. b. Calculate the logit probability ( ( ) ) from equation (8). c. Repeate the steps 1 and 2 for R-times and evaluate the average

Maximum Likelihood Estimation (MLE) on Multinomial Distribution
Let Y1,Y2,...,Yn be random variables having pooled density: These functions depend on parameters = ( 1 , … , ). As Yi is independent each other, then we have The Likelihood function, labeled ( | ), is a function of the parameters of a statistical model given data, is Suppose  that is a probable/possible set of values for vector parameter and is also assigned as parameter space. In another experiment. it was defined that MLE for , denoted as ̂ is value of which maximizes the likelihood function ( | ) on data y (Greene, 2005).
If Y1, Y2,...,Yn are random samples having multinomial density, then ( | 1 , … , ) = 1 1 … for i=1,...,n. The is the probability of the i-th decision maker choose the j-th alternative as in the equations (5) and (7) that obtained parameter . Therefore, = ( ). The Likelihood function for parameter can be constructed as: The log-likelihood function is MLE is the value of that maximizes the log ( ) function or is a double integral that can be calculated/computed by using/implementing the simulation method. For example ̃ is the value of that was calculated by using the simulation in Equation (9). The simulated likelihood function was obtained by subtituting the value into log-likehood function in Equation (14) by simulating the ̃ value.
is the value which maximizes the ( ).

Simulation Studies
In order to detect the influence of correlation among alternatives, the multinomial data was generated for J=3. Three alternatives were chosen as representative for the correlation structure among alternatives for J case in general. First, there is a correlation between alternative j and j', meanwhile there is no correlation among them. Therefore, conclusion from the simulation at case J=3 can be generalized for all value of J. It is assumed that there is a correlation among the first and the second, but there is no correlation for the third alternative.
For example an application in the selection of the mode of transport, there are three alternatives: private car, taxi, and public transport. Taxi is probably related to the private car, in a sense that for someone has no private car then taxi will be chosen and vice versa. Furthermore, the simulated data is generated using the following utility model: It is assumed that = + + . Xi is a characteristic individual/decision maker and Zij is the attribute of the alternative. Third alternative was assumed as a baseline, 3 = 3 = 0. Data was generated on parameter 1 = −1 , 2 = 1, 1 = 0.5, 2 = −0.5, = 1 and on several value of 12 . Based on these simulation data was estimated by using MNL and MXL. Furthermore, based on several values of 12 , estimators obtained on MXL are compared to those were obtained by MNL.

Results and Discussion
It is clear from Equation (3) that ( ) = 2 /6 and is assumed that and are independent. Therefore, the covariance among alternatives on Equation (16) is The equation (18) shows a relationship between covarians ( 12 ) to the correlation ( ) value. Data were generated on several values , and they are served in Table 1. Before simulation using correlation structure as presented in Table 1, it is important to conduct simulation to observe the influence of sample size towards parameter estimation value. The observation on the effect of sample size to the estimator was conducted for the independent structure 12 = 0 or = 0 for n=50, n=100, n=500, n=1000 and n=5000. The parameter estimation was perforemd by using geepack packages in R software. The estimation results are presented in Figure 2 Figure 2-4 it is seen that for the sample size less than 100, the big variance is resulted. This is due to the wide deviation which resulted from some samples. It is concluded that for n=50 and 100 the resulted estimators are instable. Moreover, the estimator resulted from n=500, 1000 and 5000 are stable enough which there is no such wide deviation from the targeted value. The bigger sample size, the obtained estimator approaching and fit to the real values. Morever, the bigger n, the more stable estimator or the variance of each estimator is smaller.
Eventhough n is big enoughr (n=5000), the bias is still available. The reasons are, first, the generated data are in normal distribution (not from the extreme value distribution), which the normal and extreme value distribution have different variances. Mean and the variance from distribution of Extreme Value type I are 0.5772 and π 2 /6, respectively Generalized estimating equation (GEE) model, as utilized for geepack packages is based on the Extreme Value distribution.
Furthermore, the simulation was performed in order to evaluate the influence of towards estimator of all parameters with the sample size, n=1000. The actual correlation value and its estimation value are presented in Figure 5.  Figure 5., it can be seen that MXL could estimate the correlation parameter, particularly on the correlation value more than 0.4. By using the hypothesis H0 : = , it can be concluded that there is insignificant correlation among actual and estimated value on the p-value of 0.833413. It suggests that MXL has better ability to estimate the correlation of the parameter.
Further the bias of each parameter ( 1 , 2 , 1 , 2 , ) are illustrated in Figure 6 to Figure  10.   Some conclusions which can be derived from Figures 5 to 10 are: a. Correlation parameters can be well estimated by MXL. b. In general the bias on MNL model is higher than on MXL. c. For the intercept parameters (i.e : 1 , 2 ). In case of the high correlation (more than 0.7) presents then the MNL produces a higher bias in comparison to the MXL. d. For coefficient parameter X (i.e : 1 ). bias from MNL model is relatively equal to MXL. For coefficient parameter Z (i.e : ). In case of high correlation (more than 0.5) then MNL produces a higher bias than MXL.

Applications in Real Cases
In this section, we apply the MNL and MXL in real problems. There are two cases, each represents the state of the low and high correlation circumstances. The first case represents a weak correlation is taken from data "Electricity" and the second one represents a high correlation, is taken from the data "Heating". Both of them are from mlogit package in R.3.1.1 Case 1. Data "Electricity" are taken from mlogit package in R.3.1.1. A sample of residential electricity customers were asked a series of choice experiments. In each experiment, four hypothetical electricity suppliers were described. The person as asked which of the four suppliers he/she would choose (j=1,2,3,4). In the experiments, the characteristics of each supplier were stated. Pf is fixed price at a stated cents per kWh for each choice (electricity suppliers). Cl is the length of contract that the supplier offered, in years (such as 1 year or 5 years).
As the first supplier (j=1) is stated as base line, the representative utility models can be expressed as = + 1 + 2 for j=2,3,4. The results of estimating MNL and MXL are presented in Table 2. Based on the results in Table 2, the correlation values obtained is ̂= The value of this correlation still include low correlation, so that the MNL and MXL relatively the same. The calculation process in the MXL takes longer than the MNL.
The Likelihood Ratio statistic (LR) is an alternative for testing hypotheses about  defined by = −2 where ̂ denotes the MLE of and 0 denotes the restriction that H0 is true. H0 states that all parameters = . L(̂) is the value of the likelihood function at the estimated parameters dan L(0) is its value when all parameters are set equal to zero. Statistic LR is Chi Square distributed with the degree of freedom equal to tested parameters. The value of LR on MXL model is 1095.7 which is higher compared to that of derived by MNL model (266.6). This means that MXL model give more reliable conclusion compared with MNL model. Similar pattern is also obtained from partial test in that the p-value of MXL model is less that 0.04 while MNL model gives p-value of 2 and 3 more are higher than 0.07 parameter.
Train (2003) reported that likelihood ratio index can be utilized for measure the fitness of model and data. The likelihood index is defined as . Ic is the installation cost for the five alternatives. Oc is the annual operating cost for the 5 alternatives. As type ec is stated as base line, representative utility models can be expressed as: = + 1 + 2 for j=2,3,4,5.
From 900 observations, the results of estimating MNL and MXL are presented in Table  3.

Conclusions
Based on data simulation, the conclusions obtained are: MXL can be used to estimate the correlation parameter properly (small bias) and MXL is better than MNL especially for alternatives of the correlation presents among alternatives. Some suggestions for further research are: a) for practitioners who will use the discrete responses in modelling and there is a correlation among alternatives, then the MXL is more appropriate than MNL. If there is no correlation among alternatives then the MNL can be used. b) Statistician can improve the computational method: where the Mixed Logit model requires a long time calculation than MNL model and other estimation methods can also be developed.