Generalized Linear Models for Loss Calculation in General Insurance

In most cases, loss in general insurance is calculated based on claim severity and frequency and an assumption of independence. However, in some cases, claim severity depends upon the claim frequency. This paper presents the derivation of aggregate loss calculation by modeling claim severity and frequency as the assumption of independence is eliminated. The authors modeled average claim severity using claim frequency as the covariate to induce the dependence among them. For that purpose, we use the generalized linear models. The calculated loss is obtained after the parameter estimation process.


Introduction
To produce a policy of insurance, an insurer needs a comprehensive calculation. The calculation includes the risk that highly potentially be imposed by the insurer. In this case, how much the aggregate loss required to cover the risk of the policyholders. Aggregate loss is composed of claim frequency and claim severity. Jorgensen and de Souza (1994) developed the loss calculation mathematically within an assumption that claim frequency and severity are independent. Then the model was reviewed deeply by Quijano-Xacur and Garrido (2015). Nevertheless, in practice, we often meet dependency between claim frequency and severity. Frees and Wang (2006) introduced dependency between claim frequency and severity. Afterward, Frees et al. (2011) modeled average severity used frequency as a predictor for severity claim. Czado et al. (2012) link marginal frequency to severity using copula. Shi et al. (2015) modeled regression of average severities by applying frequency claim as the covariate and make a comparison against mixed copula approached to construct the joint distribution of frequency and severity claims.
Generalized Linear Models (GLM) have commonly been applied to model insurance claims. Montgomery et al. (2012) explained GLM as a unique linear regression method that uniting the usual normal-theory linear regression models and nonlinear models such as logistic and Poisson regression. A fundamental assumption in the GLM is that the response variable distribution is a member of the exponential dispersion family (EDF). Jong and Heller (2008) claimed that GLM is a favorite model because in insurance data, more frequently, the data distribution is a member of EDF Pakistan Journal of Statistics and Operation Research rather than Normal distribution. Garrido et al. (2016) used GLM to simulate the frequency and severity of non-life insurance claims as independent and dependent components.
This paper adopts the model which was developed by Garrido et al. (2016). The model used the assumption that the frequency follows the Poisson distribution and the severity follows Gamma distribution. However, the wide variety of policyholders' characteristics impact the claim frequency and sometimes potentially inflict an over-dispersion. In other words, the variance of data will grow higher than the mean, which leads to an increment of residual. Therefore, the claim frequency distribution on Garrido's model needs to develop. This paper used Negative Binomial distribution as a counting distribution for claim frequency.

Loss Modeling
A loss event, which an insurance company experiences, is an accumulation of aggregate loss submitted by policyholders. Claim frequency of the company is uncertain, and it follows discrete random variable with positive integer values. Each claim mostly has a random amount. The amount of j-th claim is denoted by , which follows a continuous random variable. Hence, the aggregate loss is denoted as and given as follows The successive claims are assumed to be under the same distribution and independent. It is important to note that and are members of the EDF, which means the probability density function follow where , , and are the specified functions. is the canonical parameter, and is the dispersion parameter. As the common regression, we will have a set of covariate as explanatory variables. Let = { 1 , 2 , … , } represents the set of explanatory variables. GLM for and ̅ is used to describe their relations with the explanatory variables, which are not linear as the general linear model relation has. In general formulation, and are given by Eq. (1) and Eq. (2), respectively.
where is 1 × vector of explanatory variables. and are × 1 vector of regression coefficients which explain and ̅ , respectively. ∈ ℝ is the parameter that represents the degree of dependence between and ̅ . For some ∈ (1,2, … , ), or may zero deliberately if the corresponding explanatory variables are known to not affect the given expected value. (3) In this model, a log link is chosen to relate the explanatory variables to the expected frequency and severity claims. Hence, for the mean value of average severity claims, we have ln( ) = where denotes the expected value of the average severity claims when the degree of dependence is 0 (i.e., frequency and severity claims are independent). From Eq. (3) and Eq. (4), we obtain is the moment generating function of based on GLM and ′ is the first derivative of with respect to .
It is relatively simple to derive the variance of aggregate claims when = 0. But for the dependent model, it is more complicated and does not lead to a simple form. By the law of total variance, we have where is the dispersion parameter of severity distribution in EDF representation, is the variance function of severity, and ′′ is the second derivative of with respect to .
For ~ℬ( , ) and ̅~( , ), where ℬ denotes a Negative Binomial distribution with number of failures given by and success probability given by , 0 < < 1, and denotes a Gamma distribution. Hence the expected value and the variance of the aggregate claims are given by Eq. (5) and Eq. (6), respectively Eq. (5) and Eq. (6) are derived based on the moment generating function of , that is [ (1−(1− ) ) ] +1 .

Parameter Estimation
As mentioned in the previous section, there are explanatory variables. Suppose there are policyholders, ∈ {1,2, … , }, let be the total claim size and ̅ = when > 0. Based on GLM structure for claim frequency and severity components of the aggregate claims, ( ̅ | ) and ( ) can be expressed as = , = + , where and respectively denote the expected value of claim frequency and severity. Denote and ̅| respectively as the marginal density function of frequency and conditional density function of severity. The likelihood function of joint density functions is given by Eq. (7) ( , , ; , ) = ∏ ̅| ( | ) ( ) =1 . (7) Hence for general EDF distribution, the log likelihood of joint density functions is given by

Results and Discussion
As an illustration of how this model works to calculate aggregate loss under the assumption that claim frequency and average severity are dependent where the claim frequency follows negative binomial distribution and the average severity follows gamma distribution, we present a fictive portfolio involving 1000 policyholders. We also generated two information ( 1 and 2 ) as covariates for claim frequency, as 3 and 4 for claim severity. They are generated following half-normal distribution.
After doing calculation in R used glm function with log-links, we find Standard error and -value test for ̂ and ̂ respectively are shown in Table 1 and Table 2. From the table, by looking at the t-test, the null hypothesis 0 the predictor well explains the response variable is rejected for 2 . So, we are unable to say 2 well significant to describe . For the same reason, we can say that 2 is not substantial to explain ̅ . It may happen because this model simulation just used a generated observation or the using of the unappropriated link function. On the other hand, we have a good estimation for 1 , 1 , and . It means we have well explain the value of ̅ . By inspection, > 0 implies average severity positively correlated to claim frequency.

Conclusion
This paper has described an aggregate claim model. The model is standing with an assumption that claim severity and claim frequency is dependent by modeling average claim severity conditioning to the claim frequency. This assumption makes the model more flexible to use on real data rather than the independent one. Even though the model is not a new one, this paper presents a new condition on this model, that is, the distribution of claim frequency follows the Negative Binomial distribution. Based on their characteristic, the Negative Binomial distribution would be better to overcome data with a heavier tail than the Poisson distribution model. Poisson regression frequently impacts an overdispersion. Negative Binomial distribution will help overcome this matter, especially when the data do not fit the Poisson distribution well.
On the contrary, it fitted the Negative Binomial better. In the simulation section, we have seen that not all estimated parameters are significant to explain claim severity and claim frequency. However, this model can estimate the degree of dependency very well because it is not rejected by the statistical test even though ̅ and 's observation are generated separately. Theoretically, it is quite clear that the model under the Negative Binomial distribution is more complicated than the Poisson one. Nevertheless, it is essential to develop this model by applying another distribution, either the discrete or the continuous type distribution. It is also interesting to consider another way to see dependency between claim frequency and the average severity, not only on the linear form described in this paper.