Schwarz’s Bayesian Information Criteria: A Model Selection Between Bayesian-SEM and Partial Least Squares-SEM on a Relationship among SDLR, E-learning Readiness and Learning Motivation

In this academic work a comparison between a Bayesian-Structural Equation Modelling (B-SEM) and a Partial Least Squares-Structural Equation Modelling (PLS-SEM) on a relationship amongst self-directed learning readiness (SDLR), E-learning readiness, and learning motivation of undergraduate students throughout the outbreak of Covid-19 is studied. The B-SEM is built using prior distribution i.e., inverse-Gamma, inverse-Wishart, and normal distribution on specific parameters of the model with 19000 iterations on Markov Chain Monte Carlo (MCMC) algorithm. Whereas the PLS-SEM is established using Ordinary Least Squares (OLS) method, PLS algorithm with 300 iterations, and 5000 subsamples on bootstrapping. The objective of this study is to get the most compatible model which represents the relationship between three latent variables in this study. Schwarz’s Bayesian Information Criteria (BIC) is used on model selection between these two models. Data were obtained from 214 undergraduate students with three majors of study at the Faculty of Information Technology, Sebelas April University in Indonesia. Both models produce the same output which depict that self-directed learning readiness significantly affects the learning motivation of the students, while there is not a significant effect of e-learning readiness on learning motivation. With the lower BIC value, which is a negative value, PLS-SEM is more fitted for portraying the influence of self-directed learning readiness, and e-learning readiness to learning motivation of students than B-SEM model


Introduction
Model selection is a critical issue in statistical modeling.A statistical modeling is used to develop a model which approximates a substantive structure and distribution of probabilistic events as accurately as possible using observed data (Konishi & Kitagawa, 2008).When the model is constructed, all kinds of inferences as well as prediction, knowledge discovery, and decision making can be solved using their statistical modeling.Actually, it is hard to seize the true stucture and distribution of probabilistic events using limited observed data.Therefore, we need to build several models and select the model which most fit with the phenomena.As in this study, we built two structural equation models to delve and learn a relationship between learning motivation, e-learning readiness and self-directed learning readiness (SDLR) of undergraduate students at the Faculty of Information Technology, Sebelas April University, Indonesia during the outbreak of COVID-19.Structural equation models which are built in this study are Bayesian-Structural Equation Model (B-SEM) and Partial Least Squares-Structural Equation Model (PLS-SEM).Both models enables for small sample size and do not depend on multivariate normality assumption (Marliana, 2020;Marliana et al., 2022;Marliana & Nurhayati, 2020, 2019).Compared to

Pakistan Journal of Statistics and Operation Research
Covariance-Based SEM (CB-SEM) which requires a large sample and multivariate normality assumption, PLS-SEM performs high efficiency in the estimation of the parameters that leads to higher statistical power (Marliana et al., 2022).In parameter estimation, different with CB-SEM which used sample covariance matrix, and sample variance on PLS-SEM, B-SEM used raw individual random observation which leads to an estimation of latent variables directly, and gives a more direct interpretation (Marliana et al., 2022).In addition, B-SEM enable to estimate residual correlation and all cross-loadings simultaneously in a certain model which could not be done on CB-SEM or PLS-SEM (Marliana et al., 2022).Hence, the objective of this study is to choose the most suitable model in describing the relationship amongst those latent variables.Several information criteria that frequently used for model selection are Akaike Information Criteria (AIC) and Bayesian information criterion (BIC).Akaike is often utilized on the goodness of fit assessment of a prediction model (Konishi & Kitagawa, 2008).The AIC is an assessment criterion for the disrepute of the model when the ML (Maximum Likelihood) method is applied on the parameter estimation, and it signifies the bias of the log-likelihood slightly end up be the "number of free parameters" included in the model (Konishi & Kitagawa, 2008).Meanwhile, the BIC is a measurement criterion for models using their posterior probability (Konishi & Kitagawa, 2008).Some studies often used the Bayesian information criterion (BIC) to select the appropriate models in generalized linear models, multiple regression, and various of statistical modeling (Bollen et al., 2014;Neath & Cavanaugh, 2012;Weakliem, 1999).The BIC also called as a Schwarz's Information Criterion (SIC) derived by Schwarz in 1978 (Konishi & Kitagawa, 2008;Neath & Cavanaugh, 2012).When the actual model is amongst the possible models, the BIC picks the true model with probability near to 1 which is ensure to choose the correct model as the sample size increases (Vrieze, 2012).Whereas the AIC minimizes the highest potential risk in not infinite sample sizes and choose the models which reduces the MSE (Mean Squared Error) of the parameter estimators, but it have a tendency to fail to pick the correct model with nonvanishing probability as the sample size increases, even when the correct model is amongst the candidate models (Vrieze, 2012).Based on study in Vrieze (2012), the BIC is favored not only when the actual model is presumed to be parametric and equal to candidate parametric models, but also when there is a fixed, finite dimensional correct model.In contrast, the AIC is not only preferable in nonparametric modeling, when the proper function is presumed deemed too tricky to sufficiently model with a well-known parametric function, and prospective functions are selected which present the best exchange among variance error and bias in reducing the loss function, but also when the actual model is too intricate to estimated by parametric method (Vrieze, 2012).

Method and Materials
Let M1 as Bayesian-SEM Model and M2 as PLS-SEM model which depict the relationship between learning motivation, e-learning readiness and self-directed learning readiness of undergraduate students at the Faculty of Information Technology, Sebelas April University, Indonesia.There are six indicators of e-learning readiness based on Al-araibi et al., (2019), twenty-nine indicator of self-directed learning readiness based on Akkilagunta et al., (2019), and three indicators of learning motivation based on Law & Geng (2019) of both models.One of the indicators of elearning readiness is the ability to learn independently how to use e-learning.The indicators of SDLR are the measurement of self control, self-management, and desire for learning.Meanwhile, one of the indicators of learning motivation investigate the clarity of goal of study.For more details and to save the space, list of those 38 construct's indicators can be seen at our previous study on Marliana et al., (2022).In addition, we build M1 Bayesian-SEM model specification (Figure 1), and M2 PLS-SEM model specification (Figure 2) with the relationship among e-learning readiness and learning motivation showed in Harandi (2015), and the relationship between learning motivation and self-directed learning readiness depicted in Geng et al.,(2019) and Saeid & Eslaminejad (2016).

Candidate Model of Bayesian-SEM Model (M1)
Assume that X= (x1,x2,x3,...,x35), and Y=(y1,y2,y3) as data matrix, ξ1 as E-Learning Readiness, ξ2 as Self-Directed Learning Readiness, and  1 as Learning Motivation.Measurement model (Figure 1) for  1 with i=1,2,3 can be noted as: Meanwhile, the measurement model for ξ1 and ξ2 with j=1,2,3,…,35 and k=1,2 can be written as: Moreover, the structural model is : The Bayesian-SEM treat all parameters of the model (equation 1, 2 and 3) as random with a certain probability distribution.These probability distributions often called as prior distribution.In addition, the Bayesian enable to computes all parameters of the model simultaneously (Smid et al., 2020) Parameters of the informative prior are called hyperparameters (Marliana et al., 2022).With  as a vector of all the parameters of the models including hyperparameters, (|) as data likelihood of conditional distribution and () as prior distribution of the parameters, basically, we defined the posterior probability of M1 model as: Markov Chain Monte Carlo (MCMC) algorithm is used to estimates all the parameters based on the posterior distribution.The MCMC algorithm generates sample of parameter  with the prior distribution iteratively and stopped when reached converges using trace plot (Marliana & Padmadisastra, 2018).We used the MCMC algorithm and the posterior distribution conducted by Marliana et al., (2022) in the parameters estimation of the candidate model M1.

Candidate Model of PLS-SEM Model (M2)
Assume that X= (x1,x2,x3,...,x38) as data matrix, Y1 as E-Learning Readiness, Y2 as Self-Directed Learning Readiness, and Y3 as Learning Motivation.The measurement model (Figure 2) of M2 with i=1,2,3, and j=1, 2, 3, …, 38 can be defined as follows: =     +δ  (7) At the same time, the structural model is :  and 8).Dissimilar with the Bayesian, the parameters of the model in OLS method are assumed to be fixed but unknown (Ong et al., 2018).Further, the PLS algorithm calculated all the unknown parameters in path model using a partial regression iteratively (Hair, 2014;Marliana, 2020;Marliana & Nurhayati, 2020, 2019).Nevertheless, study in Hair (2014) noted that the PLS-SEM is a nonparametric method in nature.The PLS algorithm use central limit theorem to transform the unnormal data.Moreover, the PLS-SEM showed higher convergence than Covariance-Based SEM (CB-SEM) with a high statistical power (Hair, 2014;Marliana & Nurhayati, 2019).

Bayesian Information Criteria (BIC) on Model Selection
Both models, the Bayesian-SEM and the PLS-SEM can be used for non-multivarite normality with a small sample size data (Marliana, 2020;Marliana & Nurhayati, 2019;Smid et al., 2020).The PLS-SEM used Ordinary Least Squares (OLS) method on the parameter estimation that using a partial regression model iteratively, and transformed data with the violation of multivarite normality using central limit theorem (Hair, 2014;Marliana et al., 2022;Marliana & Nurhayati, 2020, 2019).Meanwhile, the Bayesian-SEM build the posterior probabilty which requires a certain prior distribution and the data likelihood of conditional distribution on the parameter estimation.Unlike the PLS-SEM which used sample variance matrix, the Bayesian-SEM used raw data with the advantages that lead to an estimation of latent variable directly (Anggorowati, 2014;Yanuar, 2014).In addition, the Bayesian-SEM estimates all residual correlation and cross loadings simultaneously, but the PLS-SEM using partial regression model iteratively with a high statistical power (Hair, 2014;Marliana & Nurhayati, 2019;Noudoostbeni et al., 2018).Due to both models have their respective advantages, we use Bayesian Information Criteria (BIC) to choose the most suitable model that faultlessly represents the relationship amongst the e-learning readiness, the self-directed learning readiness, and the learning motivation.Similar with the Bayesian-SEM, the BIC also defined in terms of its posterior probability (Bollen et al., 2014;Konishi & Kitagawa, 2008;Neath & Cavanaugh, 2012;Vrieze, 2012;Weakliem, 1999).The BIC does not need a specific priors and easy to calculate from standard outcome of nearly all statistical software packages (Bollen et al., 2014).Let Mi with i=1,2 be two candidate models, and x is a    data matrix with n is sample size and j= 1, 2, …, 38 is indicators of the latent variables.Both models are characterized by a certain distribution   (|  ), and the prior distribution of all parameters of the models is   (  ), then the marginal likelihood of n observation data for the Mi model is given by (Konishi & Kitagawa, 2008): If we regard P(Mi) as the prior probability of the i th model, in accordance with Bayes Theorem, the posterior probability of Mi model can be defined as follows: Based on Konishi & Kitagawa (2008), if we presume P(Mi) the prior probabilities are same on both models, it takes that the model which is maximizes the marginal likelihood pi(xn) of the data should be chosen.Consequently, if an approximation to the marginal likelihood stated in terms of an integral in the marginal likelihood data could be easily acquired, the necessity to calculate the integral on problems will evaporate.
The saturated model will be more supported than the hypothesized model, when the BIC value is greater than zero.In contrast, when the BIC value is negative, the hypothesized model will be more supported.

Results and Discussion
Both models used 214 observation data from undergraduate students at Faculty of Information Technology, Sebelas April University, Indonesia, consisting of 0.47% majoring in Informatics Management, 27,57% majoring in Information System, and 71,96% majoring in Informatics Engineering.The students have a high e-learning readiness, high SDLR, and high learning motivation during Covid-19 Outbreak with the score 4724 for e-learning readiness, 23623 for SDLR, and 2602 for learning motivation.

Bayesian-SEM Model (M1) Estimator
The posterior distribution of the M1 model computed using blavaan package on R-Software version 4.1.0and MCMC algorithm with 19000 iteration and burn in period at 9000 samples (Marliana et al., 2022).This model need more than 7 hours to be calculated.We used trace plots to evaluate the convergences, but to save the space, we could not show them in this paper.Further, with the t-value greater than 1.96 (Table 2), there is a significant effect in the amount of 7.25 direct effect from self-directed learning readiness to learning motivation with small standard deviation 0.063.In contrast, with the t-value lower than 1.96 (Table 2), there is not a significant effect (0.045 direct effect) from e-learning readiness to learning motivation with 1.894 of standard deviation value (Table 2).

PLS-SEM Model (M2) Estimator
The parameter estimation of the M2 model is calculated using SmartPLS.3with 300 iteration and stop criterion 10 -7 on PLS Algorithm, and 5000 subsamples on Bootstrapping.The next step in PLS-SEM analysis after the specification model and the estimation of the parameters of the models is outer model evaluation or measurement model assessment.
In this step we assess the internal consistency using composite reliability, reliability indicators using outer loading values, convergent validity using average variance extracted (AVE), and discriminant validity using cross loading values.
All values of the composite reliability are higher than 0.708 (Table 3), these high values depict all indicators of the elearning readiness, learning motivation, and self-directed learning readiness has a sufficient internal consistency (Hair, 2014;Marliana, 2020;Marliana & Nurhayati, 2019).Further analysis, except for self-directed learning readiness, all values of the average variance extraced (AVE) are higher than 0.5 (Table 3), these showed the construct mean emphasize more than 50% of the variance of each indicators of e-learning readiness and learning motivation.Even though the AVE values of self-directed learning readiness is less than 0.5, but with 0.05 gap, we assumed this value still reasonable for convergent validity to avoid losing important indicators.At the same time, to save the space, we are unable to provide outer loading values and cross loading values for all the indicators of the variables.The values of the outer loading lies between 0.508 to 0.892.There are a few indicators with outer loading values of indicators are less than 0.708 on self-directed learning readiness, but the gap is still plausible which lies between 0.015 to 0.2.Hence, with the same reason, we tend to keep all those indicators.In addition, all indicators have the cross loading values which are the highest compared to its values on other constructs.The last step is inner model evaluation or structural model assessment which provide the significance examination of the influence of e-learning readiness and self-directed learning readiness toward learning motivation of the students.
For this assessment, it is necessary to determine whether there is collinearity between these variables through the Variance Inflation Factor (VIF) (Table 4).the VIF values need to be less than 5 but higher than 0.2 (Hair, 2014;Marliana, 2020;Marliana & Nurhayati, 2019).Both the VIF values of e-learning readiness and self-directed learning readiness to learning motivation are 1.802 (Table 4) which means there is not a colinearity between these three variables.Next, we proceed the hypothesis significance assessment using p-values and t-statistics (Table 5).With the p-values 0.151 which is higher than the significance level (5%), and t-statistics 1.437 which is less than 1.96, there is not a significant influence from e-learning readiness towards learning motivation of the students.On the contrary, selfdirected learning readiness influence the learning motivation of the students significantly with the p-values less than the significance level and t-statistics higher than 1.96 (Table 5).E-learning readiness showed only a slight direct effect with the amount of 0.085 to the learning motivation of the students, meanwhile self-directed learning readiness present a sufficient direct effect with the amount of 0.757.

BIC of the M1 and M2 Model Selection
Both models (M1 and M2) provide the same outcomes.With a small gap of composite reliability, AVE, and all the estimator of the parameters of each model, both models present an adequate validity and reliability of all the indicators of e-learning readiness, self-directed learning readiness and learning motivation of the students.Furthermore, both models depict not only the same significance influence of self-directed learning readiness towards learning motivation of the students, but also the same insignificance influence of e-learning readiness on learning motivation of the students.Study in Saeid & Eslaminejad (2016) showed a similar output which present a significant relationship between self-directed learning readiness and accomplishment motivation of students at Payam Noor University.Even though both models provided the same output, we still need to choose the most suitable model which is more accurate approximates distribution of probabilistic events and a true structure of the phenomena and observed data.Not only with the lowest value, but also with a negative value of BIC (Table 6), PLS-SEM model is the most fitted model and more supported model than the B-SEM model in explaining the relationship between e-learning readiness, self-directed learning readiness and learning motivation of students at the Faculty of Information Technology, Sebelas April university.Moreover, each estimator in the M2 model (Table 5) have the standard deviation values which are smaller than the M2 model (Table 2).Those values indicate the sample statistics in the M2 model are much closer to the mean of observed data than the M2 model.All the estimators of M2 model parameters can be seen at Figure 3.To save the space, we can only present the estimated structural model based on equation ( 8) which can be written as follows: 3 = 0.085  1 + 0.757  2

Conclusion
In this study, both models which are PLS-SEM model and B-SEM provide the same outcome of the significance test, even though both models used the different approach and different parameters estimation method.The significance assessment of both models concluded that there is significant influence from self-directed learning readiness toward learning motivation of students at the Faculty of Information Technology, Sebelas April University, but the learning motivation is unaffected by their e-learning readiness.In the amount of 7.25 student's self-directed learning readiness affect their learning motivation directly on B-SEM model.At the same time, PLS-SEM model provided different scale of direct effect from self-directed learning to learning motivation which is 0.757.In addition, student learning motivation is not significantly influenced by e-learning readiness with a direct effect of 0.045 in the B-SEM model and 0.085 in the PLS-SEM model.Based on BIC values, the best model in describing the influence of self-directed learning readiness and e-learning readiness towards learning motivation of students at the Faculty of Information Technology, Sebelas April University is PLS-SEM model.This model showed a negative value of BIC which is more supported the hypothesized model.Furthermore, this model has the lowest BIC which means the model can be chosen as the most fitted model.For further study, we suggest using different prior distributions and different algorithm to build the B-SEM model.We also suggest using different information criteria to compare the model such as Haughton Bayesian information criterion (HBIC), the information matrix-based information criterion (IBIC), and the scaled unit information prior Bayesian information criterion (SPBIC) ) In accordance with Hair (2014), Marliana (2020), Marliana et al.,(2022), and Marliana & Nurhayati (2019, 2020) the PLS-SEM model used Ordinary Least Squares (OLS) method and PLS algorithm to estimate the parameters of the model (equation 7
(Bollen et al., 2014)he BIC value is not always computed with a comparison to the saturated model, therefor the model with the lowest BIC value is the most suitable model(Bollen et al., 2014).In other words, the optimal model for the data is picked if the model minimizes the value of BIC(Bollen etal., 2014; Konishi & Kitagawa, 2008; Neath & Cavanaugh, 2012).

Table 1 .
Each trace plot does not present a fluctuation in the chain which means not only λ 1  , λ   ,  11 ,  21 with i=1,2,3; j=1,2,3,…,38 and k=1 ,2 , but also all the parameters on model are convergence.In addition, all standardized loading of ξ1, ξ2 and  1 depict a validity of all indicators which lies between 0.52 to 0.79.Moreover, all composite reliability values (Table 1) showed a high reliability of ξ1, ξ2 and  1 .At the same time, except for Self-Directed Learning Readiness, the Average Variance Extracted (AVE) values (Table 1) are higher than 0.5 which strengthen the outcome of reliability level, except for ξ2 but still acceptable.Composite Reliability and AVE of Bayesian-SEM Model (M1)

Table 3 .
Composite Reliability and AVE of PLS-SEM Model (M2)

Table 6 .
BIC of the M1 Model and M2 Model