Gibbs Sampling for Bayesian Prediction of SARMA Processes

In this article we present a Bayesian prediction of multiplicative seasonal autoregressive moving average (SARMA) processes using the Gibbs sampling algorithm. First, we estimate the unobserved errors using the nonlinear least squares (NLS) method to approximate the likelihood function. Second, we employ conjugate priors on the model parameters and initial values and assume the model errors are normally distributed to derive the conditional posterior and predictive distributions. In particular, we show that the conditional posterior distribution of the model parameters and the variance are multivariate normal and inverse gamma respectively, and the conditional predictive distribution of the future observations is a multivariate normal. Finally, we use these closed-form conditional posterior and predictive distributions to apply the Gibbs sampling algorithm to approximate empirically the marginal posterior and predictive distributions, enabling us easily to carry out multiple-step ahead predictions. We evaluate our proposed Bayesian method using simulation study and real-world time series datasets.


Introduction
Bayesian analysis of seasonal autoregressive moving average (SARMA) models is difficult since the likelihood function is analytically intractable, which causes problems in the posterior and predictive analysis. Different approaches have been proposed in literature for the Bayesian analysis of SARMA models including analytical approximations and Markov Chain Monte Carlo (MCMC) methods-based approximations.
By the analytical approximations, we mean the proposed Bayesian methods that approximate the posterior and predictive densities to be standard closed-form distributions that are analytically tractable, see for example Newbold (1973), Zellner and Reynolds (1978), Broemeling and Shaarawy (1984), and Amin (2018aAmin ( , 2018b. However, these methods are conditioning on the initial values leading to waste observations, and treat SARMA model as an additive not a multiplicative model which can increase the number of unnecessary parameters. In order to address these limitations, MCMC methods, especially Gibbs sampling algorithm, have been proposed to ease and advance the Bayesian time series analysis. Chib and Greenberg (1994) and Marriott et al. (1996) assumed prior distributions on the initial observations and errors and used MCMC methods to develop a Bayesian analysis for ARMA models without dealing with the seasonality feature. Barnett et al. (1996Barnett et al. ( , 1997) estimated the multiplicative seasonal autoregressive (SAR) and ARMA models using MCMC methods. Their algorithm is based on sampling functions of the partial autocorrelations, which makes it relatively complicated compared to other approaches, and it does not consider the prediction problem. These approaches restrict the coefficients space to satisfy stationarity and invertibility conditions. Vermaak et al. (1998) proposed a Bayesian estimation of SAR models, aiming to model speech production for voiced sounds. Their estimation is based on the state space formulation and using the Metropolis within Gibbs sampling algorithm, without considering the prediction problem. Amin (2009) and Ismail (2014,2015) used the Gibbs sampling algorithm to present a Bayesian estimation of multiplicative SARMA and double SAR models, without considering the prediction problem. In addition, Amin (2017aAmin ( , 2017b used the Gibbs sampling algorithm to develop a Bayesian estimation of multiplicative double SMA and double SARMA models respectively. Moreover, Amin (2019a) used the Gibbs sampling algorithm to analyze pure multiplicative double (SAR) models. Based on our review for the existing Bayesian analysis of SARMA models, there is no work that considers the prediction problem along with the estimation of SARMA models and deals with the problems of initial values and multiplicativity.
In order to fill this gap in literature, we use the Gibbs sampling algorithm to present a Bayesian prediction of multiplicative SARMA processes. The main idea of our proposed Bayesian prediction is that we first estimate the unobserved errors in the multiplicative SARMA processes using the nonlinear least squares (NLS) method to approximate the likelihood function. Assuming a prior distribution on the initial observations and errors, we then use the approximate likelihood function along with the conjugate priors to derive the conditional posterior and predictive distributions that we require to implement the Gibbs sampling algorithm. The main advantages of this work are: (1) the proposed Bayesian analysis is unconditional on the initial values and treats the SARMA model as a multiplicative not an additive model to hold the parsimonious property, (2) the conditional posteriors of the model parameters are standard distributions, i.e. normal and inverse gamma, for which sampling techniques exist, and (3) the conditional predictive distribution of the future observations is a multivariate normal, which is expressed in a form that enables us to easily carry out multiple-step ahead predictions.
The paper is organized as follows. Section 2 describes the multiplicative SARMA models and their Bayesian concepts. Sections 3 and 4 introduce the posterior and predictive analyses of the multiplicative SARMA models respectively. Section 5 presents the implementation details of the proposed Bayesian prediction based on Gibbs sampling algorithm, including the convergence monitoring. The proposed Bayesian analysis is evaluated in Section 6 using simulation study and two real-world time series datasets. Finally, the paper is concluded in Section 7.
The model (1) can be expanded and written as and 0 1 and 0 2 are (s-p-1) and (s-q-1) row vectors of zeros respectively. From the model (2), we can observe that the multiplicative SARMA model can be written as ARMA model of order + and + with some coefficients are zeros and some are products of nonseasonal and seasonal coefficients. Thus, the SARMA model is nonlinear in the coefficients , Φ, and Θ, which complicates its Bayesian analysis. For more details about the properties of SARMA model, see Box and Jenkins (1976). Bayesian analysis of time series models is based on Bayes' theorem that combines the prior distribution of the model parameters with the likelihood function of observed sample to get the posterior distribution (Amin, 2019b). Regarding the specification of prior distribution, assuming the parameters , Φ, , Θ, 0 and 0 are independent apriori given the model variance 2 , the joint prior distribution is given as ( , Φ, , Θ, 2 , 0 , 0 ) = ( | 2 ) × (Φ| 2 ) × ( | 2 ) × (Θ| 2 ) × ( 0 | 2 ) × ( 0 | 2 ) × ( 2 ) = ( , 2 Σ ) × ( Φ , 2 Σ Φ ) × ( , 2 Σ ) × ( Θ , 2 Σ Θ ) × where ( , Δ) is the r-variate normal distribution with mean and variance Δ and IG(a, b) is the inverse gamma distribution with parameters a and b. Accordingly, the joint prior distribution can be simplified and written as ( , Φ, , Θ, 2 , 0 , 0 ) (4) The prior distribution (4) is chosen to be conditionally a conjugate prior in order to facilitate the mathematical calculations. Employing a straightforward random variable transformation from to , the likelihood function is given by Multiplying the joint prior distribution (4) by the likelihood function (5) results in the joint posterior density of , Φ, , Θ, 2 , 0 , 0 that can be written as ( , Φ, , Θ, 2 , 0 , 0 | )

Posterior Analysis for SARMA Models
The joint posterior density (6) is analytically intractable because of two reasons. First, the elements of the matrix Λ are unknown lagged errors. Second, the likelihood function (5) is nonlinear function in the coefficients , Φ, , and Θ. In order to address the first problem, the unknown errors can be estimated recursively as: where, ̂∈ , Φ ∈ ,̂∈ , and Θ ∈ are nonlinear least squares (NLS) estimates obtained by minimizing ∑ =1 2 = ( , Φ, , Θ) with respect to , Φ, , and Θ over the staionarity and invertibility regions. Accordingly, the joint posterior density (6) can be approximated as å ( , Φ, , Θ, 2 , 0 , 0 | ) where Λ is an × ( + ) matrix with the ℎ row: Since the likelihood function (5) is nonlinear in the coefficients , Φ, , and Θ, the marginal approximate posteriors of the model coefficients cannot be obtained. In order to solve this problem, we introduce the Gibbs sampling method, as one of the MCMC methods, to approximate the marginal posteriors of the model coefficients. However, to be able to achieve this work the full conditional posteriors of the model parameters have to be standard distributions or at least in closed form as one of the main requirements of implementing the Gibbs sampling algorithm. In general, we can derive the conditional posterior distribution of each of the model parameters by grouping together terms in the approximate joint posterior (8) that depend on this parameter, and then finding the appropriate normalizing constant to form a proper density. In our previous work (Amin, 2009), we showed in detail that all the conditional posteriors are standard distributions, in particular the conditional posteriors of , Φ, , Θ, 0 , and 0 are the multivariate normal and the conditional posterior of 2 is the inverse gamma, for which sampling techniques exist. In the following, we present these conditional posteriors of the model parameters without proofs, and for more details about the proofs see Amin (2009).

The conditional posterior of
The conditional posterior of given , 2 , Φ, , Θ, 0 , and 0 is a multivariate normal with location vector å , and dispersion matrix å , where,

The conditional posterior of
The conditional posterior of Φ given , 2 , , , Θ, 0 , and 0 is a multivariate normal with location vector Φ å , and dispersion matrix Φ å , where,

The conditional posterior of and
Using the model (2), the equations for the elements of 0 and 0 can be expanded and written as and + = ( 1 , 2 , ⋯ + ) which has the (p+Ps) normal distribution with zero mean and variance ( 2 + ) where + is the unit matrix of order (p+Ps). With some manipulations, we derived the conditional posterior of 0 given , 2 , , Φ, , Θ, and 0 as a multivariate normal with location vector 0 å , and dispersion matrix 0 å , where, Similarly, the conditional posterior of 0 given , 2 , , Φ, , Θ, and 0 is a multivariate normal with location vector 0 å , and dispersion matrix 0 å , where,

Predictive Analysis for SARMA Models
The predictive density of the next future observations is the conditional density of future observations = ( +1 , +2 , … + ) given the time series data . This predictive density can be obtained by averaging the distribution of future values with respect to the joint posterior density of the model parameters as This predictive density does not have an analytical closed form, which complicates the predictive analysis of SARMA models. Therefore, we try to obtain the conditional predictive distribution of given , , Φ, , Θ, 2 , 0 , and 0 , and then employ the Gibbs sampling method to approximate the predictive density. In order to derive the conditional predictive distribution of , we use the model (2) to obtain the equations of the future observations as . It is worth noting that we assume has a multivariate normal distribution with zero mean and variance ( 2 ), and and are independent. Using the above defined matrices and with some manipulations, we derive the conditional predictive distribution of given , , Φ, , Θ, 2 , and 0 to be a multivariate normal with location vector å , and dispersion matrix å , where, It should be noted that the elements in are unknown and can be estimated as discussed in Section 3. Accordingly, using this derived conditional predictive distribution of along with the conditional posteriors of the model parameters presented in Section 3, we apply the Gibbs sampling method not only to approximate the marginal posteriors but also to approximate the predictive density (9). In the following section we introduce our proposed Gibbs sampling algorithm for Bayesian analysis of SARMA models.

Proposed Gibbs Sampling Algorithm for SARMA Models
The proposed Gibbs sampling algorithm for the Bayesian analysis of SARMA models can be implemented in the following steps: 1. Specify starting values { 0 , Φ 0 , 0 , Θ 0 , ( 2 ) 0 , 0 0 , 0 0 , 0 } and set j=0. These starting values can be obtained from the initial estimates of the model parameters obtained using the NLS method, as we discussed in Section 3.
2. Generate one value for each model parameter from its conditional posterior distribution and for future observations from the conditional predictive distribution as This iterative process is repeated for a large number of iterations and its convergence is monitored. Once the chain has converged, the simulated values from the conditional posterior and predictive distributions are used as samples from the joint posterior and from the predictive distribution respectively. Accordingly, posterior estimates of the model parameters and Bayesian forecasts of the future observations can be computed directly by sample averages of the simulation outputs. Regarding the convergence of the obtained Markov chain, three groups of diagnostics can be used to monitor including autocorrelation estimates, Raftery and Lewis diagnostics, and Geweke diagnostics. First, autocorrelation estimates indicate how much independence exists in the sequence of each parameter draws. A high degree of autocorrelation indicates that more draws are needed to be generated to get accurate posterior estimates. Second, diagnostics proposed by Raftery and Lewis (1992,1995) include (1) Burn: the number of draws used as initial draws or "burn-in" before starting to sample draws for the purpose of posterior approximation, (2) Total: the total number of draws needed to achieve the desired level of accuracy, (3) Nmin: the number of draws that would be needed if the draws represented an iid chain, and (4) I-stat: the ratio of the (Total) to (Nmin). Raftery and Lewis suggested that a convergence problem may be indicated when the I-stat values exceed 5. Third, diagnostics proposed by Geweke (1992) include two groups: 1. The first group includes the numerical standard errors (NSE) and the relative numerical efficiency (RNE). The NSE estimates reflect the variation that can be expected if the simulation is to be repeated. The RNE estimates indicate the number of draws that is required to produce the same numerical accuracy when iid sample is drawn directly from the posterior distribution.
2. The second group of diagnostics includes a test of whether the sampler has attained an equilibrium state. This is done by carrying out Z-test for the equality of the two means of the first and last parts of draws and the Chi squared marginal probability is obtained. Usually, the first and last parts are the first 20% and the last 50% of the draws.
These convergence diagnostics are used in the following section to monitor the convergence of the proposed Bayesian method.

Simulation Study and Real-World Application
We have two parts in this section. First, we present a simulation study to evaluate the accuracy of our proposed Bayesian analysis for SARMA models. Second, we apply the proposed Bayesian analysis to real-world time series datasets.

Simulation Study
In this subsection we present four simulations of SARMA models, and Table 1 shows the design of these simulations including true parameters values, sample size, seasonality period, model variance, and next future values. By these simulations we try to represent different seasonality patterns with different data types. Once the time series datasets are generated from these SARMA models, the Bayesian analysis is performed by assuming a non informative prior for the parameters , Φ, , Θ, and 2 and a normal prior with zero mean for both initial observations 0 with variance 2 + and initial errors 0 with variance 2 + . To apply the proposed Gibbs sampler, the starting values for the parameters , Φ, , Θ, and 2 are obtained using the NLS method, and the starting values for 0 , 0 , and are assumed to be zeros.
For each dataset, the Gibbs sampler was run 6,000 iterations where the first 1,000 draws are ignored and every fifth value in the sequence of the last 5,000 draws is recorded to have an approximately independent sample. All the posterior estimates of the model parameters and the Bayesian forecasts are computed directly as sample averages of the Gibbs sampler draws. In the following, we discuss the results and investigate the convergence diagnostics. Table 2 presents the Bayesian estimates of the model parameters and the Bayesian forecasts of the next future observations with the corresponding true values for Model I. The 95% credible intervals using the 0.025 and 0.975 percentiles of the simulated draws is computed and presented in Table 2. From Table 2, it is clear that the Bayesian estimates and forecasts are close to the true values and the 95% credible interval includes the true value for each parameter and future observation. Sequential plots of generated sequences of the model parameters together with their marginal posterior densities are displayed in Figure 1, while marginal predictive densities of the next four future observations are displayed in Figure 2.   Using the convergence diagnostics summarized in Section 5, the convergence of the proposed Gibbs sampling algorithm is monitored. In particular, the autocorrelations and Raftery and Lewis diagnostics are displayed in Table 3 and Geweke diagnostics are displayed in Table 4. Table 3 shows that the draws for each parameter have small autocorrelations, indicating that there is no convergence problem. This conclusion was confirmed by the diagnostic measures of Raftery and Lewis where the reported (Nmin) is 994 which is close to the 1000 draws we used and I-stat value is about 1 which is less than 5. Scanning the entries of Table 4, confirms the convergence of the proposed algorithm where Chi squared probabilities show that the equal means hypothesis cannot be rejected and no dramatic differences between the NSE estimates are found. In addition, the RNE estimates are close to 1 which indicates the iid nature of the output sample.
A similar analysis, to that applied to Model I, is applied to Models II, III, and IV, and all the Bayesian results are presented in Tables 5, 6, and 7. From all these results we can claim that similar conclusions to those of Model I are obtained for the other three models, which confirm the accuracy of the proposed Gibbs sampling algorithm.       (Shumway and Stoffer, 2006). These two time series datasets are shown in Figure 3. Figure 3 shows that the average soil temperature time series is stationary, however, Figure 3 shows that the production index time series is nonstationary. We used the first (nonseasonal) difference to stationarize the production index time series but still it seems nonstationary in the seasonal component as shown in Figure 3, and accordingly we used the seasonal difference and now it seems stationary as displayed in Figure 3. This implies that we apply our proposed Bayesian analysis for the stationary differenced production index, not for its nonstationary raw data.
(a) Average monthly soil temperature near Zurich (b) Federal reserve board production index (c) Nonseasonal difference of production index (d) Nonseasonal and seasonal differences of production index

Figure 3: Time-plot of the two real-world time series datasets
We use the first 78 observations of the average soil temperature data and the first 360 observations of the production index data to identify the suitable SARMA models and apply our proposed algorithm to conduct the posterior and predictive analysis to these two time series datasets, and the remaining observations (i.e. = 6 for the average soil temperature data and = 12 for the production index data) we use to evaluate out of sample prediction performance. In order to identify the best suitable order of the SARMA models for the two time series, we use the Akaike information criterion (AIC) with assuming the maximum order of each of the nonseasonal and seasonal polynomials is two in the SARMA model (1) according to the recommendation of Laing and Smith (1987). We compute the AIC for all the combinations of SARMA models and select the best model with smallest value of AIC. In particular, the identified models are SARMA(2,2)(1,1) 12 for the average soil temperature data and SARMA(1,1)(2,2) 12 for the differenced production index data.
Using the identified models, we apply the proposed Bayesian analysis to conduct the posterior and predictive analysis to the two real-world time series datasets with choosing the hyperparameters as in the simulation study. Table 8 summarizes the Bayesian results for the average soil temperature data, wheretheir marginal predictive densities are displayed in Figure 4. On the other hand, the Bayesian results for the differenced production index are presented in Table 9, and their marginal predictive densities are displayed in Figure 5. From the Bayesian forecasts of the next six future values of the average soil temperature data (displayed in Figure 4) and those of the next twelve future values of the production index data (displayed in Figure 5), we can confirm the applicability of our proposed Gibbs sampling algorithm to real-world time series.

Conclusions
In this paper we first used the nonlinear least squares (NLS) method to estimate the unobserved errors in the multiplicative SARMA models and accordingly approximate its likelihood function. We employed conjugate priors on the model parameters and initial values to show that the conditional posterior distributions of the model parameters and variance are multivariate normal and inverse gamma respectively, and the conditional predictive distribution of the future observations is a multivariate normal. Exploiting that the conditional posterior and predictive densities are standard distributions, we used the Gibbs sampling algorithm to present a Bayesian method for estimating the SARMA model parameters and obtaining multiple-step ahead predictions. Simply, we applied the Gibbs sampling algorithm to approximate empirically the marginal posterior and predictive distributions along with using several convergence diagnostics. Accordingly, we computed directly the posterior estimates of the model parameters and the Bayesian forecasts of the future values as sample averages of the Gibbs sampling chains. The empirical results of the simulated and real-world time series datasets confirmed the accuracy of the proposed Bayesian method. Future work may include model identification using stochastic search variable selection, outliers detection, and extension to multivariate seasonal time series models.