Data-driven Analysis and Prediction of COVID-19 Infection in Southeast Asia using A Phenomenological Model

COVID-19 has spread throughout the world, including in Southeast Asia. Many studies have made predictions using various models. However, very few are data-driven based. Meanwhile for the COVID-19 case, which is still ongoing, it is very suitable to use data-driven approach with phenomenological models. This paper aimed to obtain effective forecasting models and then predict when COVID-19 in Southeast Asia will peak and end using daily cumulative case data. The research applied the Richards curve and the logistic growth model, combining the two models to make prediction of the COVID-19 cases in Southeast Asia, both the countries with one pandemic wave or those with more than one pandemic wave. The best prediction results were obtained using the Richards curve with the logistic growth model parameters used as the initial values. In the best scenario, the Southeast Asia region is expected to be free from the COVID-19 pandemic at the end of 2021. These modeling results are expected to provide information about the provision of health facilities and how to handle infectious disease outbreaks in the future.

beginning of the pandemic, and the second group consisting of countries that had more than one pandemic wave in the sense that the confirmed cases had decreased, but then there was a significant addition of confirmed cases and so on.
The COVID-19 case in Southeast Asia was first detected in Thailand on January 13, 2020. Until April 21, 2021, the confirmed cases of COVID-19 in Southeast Asia reached 3,172,104 cumulative cases, with 64,182 deaths and 2,814,972 cases recovered. The COVID-19 cases in Thailand, Malaysia, and Cambodia were discussed by Okada et al. (2020); Azlan et al. (2020); Manning et al. (2020). A study of COVID-19 in Southeast Asia had been conducted, including Singapore tracking, to trace anyone who made contact with patients who were tested positive in order to minimize overall risk in the community (Pung et al., 2020). This was also carried out by Quach and Hoang (2020); Valitutto et al. (2020) in Vietnam and Myanmar, namely, carrying out mass testing, direct and expansive contact, independent isolation, and sterilization. According to Wong et al. (2020) the initial findings from the COVID-19 cases in Brunei showed the proportion of asymptomatic cases (12%), presymptomatic cases (30%) with social distancing as a prevention effort. To control the epidemic, the Indonesian government had implemented various control policies, including social and physical distancing, working and studying from home, and some more strict restrictions in some epicenter areas (Large Scale Social Restriction abbreviated as PSBB in Bahasa Indonesia)(Sifriyani and Rosadi 2020).
Many researchers have developed phenomenological models related to disease outbreaks because the wider community quickly understands the results. The phenomenological model of Ebola disease has been previously studied by Pell et al. (2018) by using a logistic growth model that produces short-term forecasting and determination of basic reproductive numbers. Determination of the basic reproduction number is significant to determine the level of disease spread, as done by Zuhairoh et al. (2021). At the same time, the Richards model has been previously studied by Hsieh (2009Hsieh ( , 2010 in SARS and H1N1 disease outbreaks. We have also used the Richards model for the COVID-19 outbreak in South Sulawesi, Indonesia (Zuhairoh and Rosadi, 2020), where the results accurately predict the pattern of the spread of COVID-19 cases. Furthermore, Wang et al. (2016) developed the gray Richards model to get the predicted results of Growth of Rice Leaves and Amoeba Cell Growth.
In facilitating public understanding of the analysis of the COVID-19 data, this paper applied the Richards curve and the logistic growth model. The two models were used to model COVID-19 cases in Southeast Asia countries that have experienced either one or more than one pandemic wave. If a country had more than one pandemic wave, then data on the last wave were used to predict when the pandemic will end. The best model was then used to predict countries that were still at the peak of the pandemic by providing information on when the peak of the pandemic will occur and when the pandemic will end with the maximum number of cases so it can be used as input for dealing with the COVID-19 pandemic in their respective countries. In this paper, the authors used parameter estimation results with the logistic growth model as the initial assumption to estimate the Richards model parameter value, resulting in a better parameter estimate with data-driven approach.

Data
We used ECDC as a data source that provides COVID-19 case data every day in all the countries in the world. We collected data on the daily reported cases from the report's initial date until March 31, 2021. We then estimated the pandemic's trajectory in Southeast Asia, which should lead to the pandemic's peak according to the latest data.
The countries in the Southeast Asian region were divided into two groups: the first group consisting of countries that still experienced one pandemic wave, namely Indonesia, Singapore, Malaysia, and Myanmar. The data used were data from the beginning of the pandemic (when the first case was detected) in each country until March 31, 2021. Meanwhile, the second group consisted of countries that had more than one wave, namely the Philippines, Thailand, Vietnam and Cambodia. The data used in these countries were from the last wave until March 31, 2021, excluding data from the first wave to make the prediction results more accurate.
Data-driven Analysis and Prediction of COVID-19 Infection in Southeast Asia using A Phenomenological Model

Models
The research applied two phenomenological models, namely the Logistic Growth Model (LGM) and Richards curve. Both of these models have been widely used to model various other infectious disease outbreaks. Richards curve is an extension of a simple LGM with additional scaling parameters (Hsieh, 2010).
In 1838, Verhulst first introduced the LGM to model population growth (Hsieh, 2009). The differential equation of the LGM model is where C(t) denotes the number of cumulative cases at time t, K denotes the expected final epidemic size (total number of cases), t denotes time, and r is the per capita growth rate. Because the logistic differential equation is an autonomous differential equation, we can get the general solution by using variable separation. Setting the right-hand side to zero in the first step results in constant solutions C(t) = 0 and C(t) = K. According to the first option, if there is no virus present, the population will never grow. According to the second solution, once the population reaches its carrying capacity, it will never change. Then, using a basic mathematical operation on both segments, generate Equation (2). So the solution of Equation (1) is obtained as follows.
where K 1+b is the initial value.
The Richards model is a development of the LGM where there is an additional parameter a, which measures the deviation from the symmetric simple logistic curve (Roosa et al., 2020). The differential equation of the Richards model is The same procedure with Equation (2) is followed to obtain Equation (4) where the analytic solution of the Richards model is (Hsieh, 2009) Richards model uses several parameters including K, r, a, t, and t i . The value of C(t) in each country was the cumulative number of cases in that country at time t. K denotes total case number of the outbreak, r denotes growth rate of the infected population, a denotes the exponent of deviation from the standard logistic curve, t denotes time, and t i denotes the inflection point. The Richards curve parameter can be determined using data of patients tested positive for COVID-19.
The procedure used to obtain the prediction results with the Richards curve model in this study follows.
1. Prepare a data collection consisting of the time and number of individuals infected with COVID-19 each day.
2. Determine the initial assumptions of each parameter based on existing data.

Estimate
LGM parameters with the non-linear least square approach.
5. The results of step 5 are used as initial assumptions in the Richards curve model. 6. Estimate the parameters of the Richards curve model by using the non-linear least square approach with the initial assumptions in step 5.
7. The estimated value of the parameter obtained from step 6 is entered in Equation (4)

Result
The countries in Southeast Asia in this study were divided based on the wave of COVID-19 that occurred in each country, which can be seen from the plot of confirmed COVID-19 cases which can be seen in Figure 1. If a country had only one peak, it was included in group 1 and if it had more than one peak, it was included in group 2. Figure 1: Plot of Confirmed COVID-19 Cases. Figure 1 shows that the countries that had only one wave were Indonesia, Singapore, Malaysia, and Myanmar. Meanwhile, the countries with more than one wave consisted of the Philippines, Thailand, Vietnam and Cambodia. In the first step, the initial assumptions used in the LGM can be seen in Table 1. The initial assumptions were obtained from data on COVID-19 cases in each country given in Table 1.
Using these initial parameters from Table 1, we estimated the parameters using the non-linear least square approach. The curve's plot was based on the initial parameters and the optimal ones were based on the LGM, then the results of the parameter estimates from the LGM were used as the initial assumptions to create the Richards curve as given in Figure 2.
Data-driven Analysis and Prediction of COVID-19 Infection in Southeast Asia using A Phenomenological Model   Figure 2 shows the plot results for Indonesia and Malaysia which were still at the peak of the pandemic, while Singapore and Myanmar were already at the end of the pandemic period for group 1. In group 2, it can be seen that the Philippines and Cambodia still experienced an increase in the confirmed cases, while Thailand and Vietnam started experiencing a decrease in the confirmed cases of COVID-19. The curve shape of the two models can be said to follow the actual data pattern, showing that the initial assumptions of the parameters used had been met, so the parameter estimation results of each model can be used to predict COVID-19 cases.
The prediction results using both models can be seen in Figure 3. Figure 3 shows that the prediction results for the countries that had only one wave of COVID-19 with LGM were higher than the Richards model. Meanwhile, in group two or the countries that were still at the peak of the pandemic, the prediction results with the Richards model were higher than the LGM. The parameter estimation results from the LGM were used as the initial assumption in the Richards model, it can be seen that the prediction results from several countries in Southeast Asia with the Richards curve were almost the same as the actual data in the field. However, the predictions were only made for countries that were still at the peak of the pandemic, namely Indonesia, Malaysia, the Philippines and Cambodia.
By using cumulative data up to March 31, 2021, and using the initial assumption value of each parameter from the LGM with the addition of variable a, which provides a measure of flexibility in the curvature of the S shape indicated by the resulting solution curve, the prediction result curve is as presented in Figure 3. Figure 3 shows that the pandemic period in Indonesia will end in December 2021, while that in Malaysia will end in July 2021 because the two countries have passed the peak of the pandemic that occurred in February 2021. Meanwhile, the Philippines and Cambodia were still at the peak of the pandemic which was expected to occur in March 2021. At the same time, the pandemic was predicted to end in June 2021.
Of the four countries modeled with the Richards curve, the parameter estimation results were obtained using the nonlinear approach as shown in Table 2. The estimation results of the Richards curve parameters in Table 2 determined when the pandemic in a country will end and when the pandemic peak occurs, by substituting these values into Equation (4). The K value in Table 2 shows the maximum number of COVID-19 cases in each country. Table 3 present the long-term prediction results, including when the pandemic peak will occur and when the pandemic will end with the maximum number of cases in countries that were still at the peak of the pandemic, namely Indonesia, Malaysia, the Philippines, and Cambodia. Table 4 present the short-term prediction results. To see the level of accuracy of the short-term prediction results, the mean absolute percentage error (MAPE) of each country is calculated, the results are summarized in Table 4. From three countries, namely Indonesia, Malaysia, and the Philippines, a MAPE value of < 10 was obtained, which means a highly accurate prediction result, while for Cambodia, a MAPE value = 10 − 20 was obtained, which means a good prediction result.

Discussion
The plots of positive confirmed COVID-19 case data of each country in the Southeast Asian region have been given in Figure 1. Based on Figure 1, it can be seen that the data plots using both models with the initial assumptions in Table  1 resulted in the same shape. The data used to produce Figure 1 started by detecting the first confirmed cases in each country to the highest number of cases during the pandemic period.
After estimating the parameters with the non-linear least square approach, the actual data were included in the equations of each model, whose results can be seen in Figure 2. The prediction using the Richards Curves from the eight countries showed better results than the predictions using the LGM. In the Richards formula, parameter a was added, where a is the magnitude of the deviation from the standard logistic curve. From the estimation results with the two types of models, to know which model was suitable for each country, we needed to calculate the value of the AIC (Akaike's Information Criterion) in Table 5. Table 5 shows that the AIC value for the Richards curve was smaller compared to that for the logistic growth model. To get a better prediction result, a model with a smaller AIC value was used Bozdogan (2000), so for countries that still experienced pandemics such as Indonesia, Malaysia, the Philippines, and Cambodia, only the Richards Curve was used to find out when the pandemic will end, the estimated maximum number of cases in each country, and the peak period of the pandemic.  Using the initial assumptions in Table 1, actual data, plots, and Richards curves can be obtained using data up to March 31, 2021. The results can be seen in Figure 3. If the curve is not sloping, the spread of the disease will continue to increase. From the four data plots, it can be seen that Singapore and Myanmar had almost passed the peak of the pandemic. They then used the same parameter estimates using the non-linear least square approach (Hsieh, 2010) to get Figure 3. Figure 3 also shows that the peak of the pandemic in Indonesia occurred in February 2021 and will end around December 2021, with the maximum number of cases in Indonesia between 1,500,000 up to 2,000,000 cases, while the peak of the pandemic in Malaysia occurred in February 2021 and expected to end in July 2021 with a maximum number of 300,000 to 350,000 cases of COVID-19. However, in the case of COVID-19, due to a lot of uncertainty, predictions may deviate from what they should Santosh (2020). Some of the main uncertainties may come from the existence of unpredictable social factors and natural disasters that cause large numbers of people to gather, thus increasing the number of infected people, lack of understanding of certain events i.e., hospital setting/capacity, number of daily tests, travelers and other social factors. We observed that the higher the population density, the wider the spread. As a consequence, each prediction did not come close to the actual value, nor did it produce consistent results.

Conclusion
Richards curve provides predictions that are closer to the actual data with a smaller AIC value than the LGM. This was obtained after analyzing the countries in the Southeast Asian, including Indonesia, Singapore, Malaysia, Myanmar, the Philippines, Thailand, Vietnam, and Cambodia.
The countries that were still at the peak of the pandemic, such as Indonesia, Malaysia, the Philippines, and Cambodia, were predicted using the Richards curve. The best prediction results were obtained using the Richards curve using the LGM parameters as initial assumptions. The prediction results showed that the pandemic in Southeast Asia will end at the end of 2021, while the peak of the pandemic will occur in February until March 2021 with different details for each country, depending on the number of confirmed cases.