The Log-Balakrishnan-Alpha-Skew-Normal Distribution and Its Applications

In this paper, log-Balakrishnan-alpha-skew-normal distribution is proposed by the methodology of Venegas et al. (2016). Some of its basic distributional properties including the moments also discussed. Also, the appropriateness of this distribution is checked by performing data fitting experiments and comparing with some other known distributions by using Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC). The Likelihood Ratio test is used for discriminating between log-normal and the proposed distributions.


Introduction
Log normal distribution is preferred over the normal distribution to model random variable with positive support. Typical uses of log-normal distribution are found in descriptions of fatigue failure, failure rates, and other phenomena involving a large range of data. Another common applications where log-normal distributions are used in finance is in the analysis of stock prices. A random variable , follows the log-normal distribution ( , 2 ) with two parameters( , 2 ) if its probability density function (pdf) is defined as ( ) = 1 ( − ), where = ( ), > 0, ∈ , > 0 and (. )is the standard normal pdf. The log-normal distributions are positively skewed with long right tails due to low mean values and high variances in the random variables. Vistelius (1960) showed that the chemical element concentrations in soil samples follow asymmetric distribution. Ahrens (1953Ahrens ( , 1954aAhrens ( , 1954b) studied the chemical element concentration using many data sets with positive asymmetry. Log-skew-normal distribution with positive support was derived from skew normal was used by Mateu-Figueras et al. (2004) to deal with geochemical data as the support of the skew-normal distribution in the real line. This distribution is also used by Azzalini et al. (2003), for family income data. Azzalini (1985) introduced the skew normal distribution with asymmetry parameter and the pdf given by ( ; ) = 2 ( ) ( ); , ∈ (1) where (. )is define above and (. )is cumulative distribution function (cdf) of (0,1). A useful generalization of the skew normal distribution was proposed as a discussant in Arnold and Beaver (2002) by Balakrishnan (2002) and studied some of its properties. The pdf of the same distribution is where is a positive integer and ( ) = ( ( )), ~(0,1). In case, if = 1, this Balakrishnan skew normal distribution reduces to the skew normal distribution of Azzalini (1985).
Using the idea of Huang and Chen (2007), Elal-Olivero (2010) developed a new form of skew distribution known as alpha skew normal distribution which has both unimodal as well as bimodal behavior and has the following pdf given by where 2 ( ) = 4 + 8 2 + 3 4 .
The main aim of this paper includes, first, introducing the log-Balakrishnan alpha skew normal distribution using the idea of Venegas et al. (2016) and discuss its basic properties, second, applying this new proposed distribution which is flexible enough for both unimodality and bimodality behaviors, to real life datasets and third, establish suitability of this proposed distribution over a few other known distributions.
The rest of this paper is organized as follows. In Section 2, we introduce a new form of log-alpha-skew normal distribution and study its mathematical properties. The estimation of parameters and two real life data modeling applications to illustrate the usefulness of the new distribution is presented in Section 3. Finally, concluding remark is noted in Section 4.

The log-Balakrishnan-alpha-skew-normal distribution
In this section we define a new form of log alpha skew normal distribution and studied some of its distributional properties.

Plots of the pdf
The pdf of 2 ( ) distribution for different choices of the parameter are plotted in Figure 1. It can be seen from Figure 1, that the distribution is positively skewed and higher skewness and kurtosis occur for 0 < < 2. Note that the curves in both plots of Figure 1 look different because of the difference in scaling in vertical axis.
The Log-Balakrishnan-Alpha-Skew-Normal Distribution and Its Applications 111

Mode of 2 ( ):
Here, we numerically verify that 2 ( )distribution has at most two modes. First by differentiating the pdf ( ; )of 2 ( )distribution with respect to we get Now, the contour of the equation ( ; ) = 0is drawn and shown in Figure 2 to check that 2 ( ) distribution has at most two modes or not. It can be observed that there is at most three zeros of ( ; )which shows that 2 ( )distribution has at most two modes. Also, for −0.70 < < 1.25, 2 ( )remains unimodal.

Cumulative distribution function Theorem 1:
The cdf of 2 ( )distribution is given by where erf(.) is error function.
On simplifying we get the desired results in eqn. (8). Corollary 1: If we take the limit → ±∞of ( ) in eqn. (8), then we get the cdf of (4)distribution as The cdf is plotted in Figure 3 for studying variation in its shape with respect to the parameter . which is same as in eqn. (9).  The moments of (4)distribution can be derived easily by taking limit → ±∞ in the moments of 2 ( ) distribution so that ( ) → 5.4957and ( ) → 75.7007.

Skewness and Kurtosis
The skewness and kurtosis of 2 ( )distribution are given by The bounds for skewness and kurtosis can be derived by numerically optimizing 1 and 2 with respect to as 15.9462 ≤ 1 ≤ 555.709 and 46.8883 ≤ 2 ≤ 1543.16. Also, to study their behavior we have plotted the skewness and kurtosis respectively in Figure (6) and Figure (7). These plots also verify these bounds.

Remark 4:
If we take the limit → ±∞in the results of 2 ( )distribution, then we can derive the skewness and kurtosis of (4)distribution as 1 → 17.1334 and 2 → 48.5346.

Method of moments
The expressions for the moment estimators are relatively simple because the method needs to solve the following equations for = 1,2,3 to obtain moment estimates of , , and .

Maximum likelihood estimation
Let a random sample 1 , 2 , . . . , of size be taken from 2 ( , , ) distribution of eqn. (6), then the loglikelihood function for = ( , , )is given by (11) On differentiating the eqn.(11) above partially with respect to the parameters , , and the following likelihood equations can be obtained as: ).
Solving the above system of equations in eqn. (12) provides the maximum likelihood estimates for the parameters = ( , , ) . The same can also be obtained by numerically maximizing eqn. (11) with respect to the parameters = ( , , ).

Illustrations with real datasets
The Log-Balakrishnan-Alpha-Skew-Normal Distribution and Its Applications 115 Here we have considered two datasets. The first dataset is related to N latitude degrees in 69 samples from world lakes which appear in Column 5 of the Diversity data set in website: http://users.stat.umn.edu/sandy/courses/8061/datasets/lakes.lsp. The second dataset consists of the velocities of 82 distant galaxies diverging from our own galaxy. The data set is available at http://www.stats.bris.ac.uk/~peter/mixdata. We then compared the proposed distribution 2 ( , , ) with the log-normal ( , 2 ) distribution, the log-skew-normal ( , , ) distribution, and the log-alpha-skew-normal ( , , ) distribution of Venegas et al. (2016).The MLE of the parameters are obtained by using numerical optimization routine. AIC and BIC are used for model comparison. .    9: Plots of observed and expected densities for the velocities of 82 distant galaxies diverging from our own galaxy.
From Tables 1 and 2, it is observed that the proposed log-Balakrishnan-alpha-skew-normal 2 ( , , )distribution provides much better fit to the data sets under consideration in terms of the loglikelihood, AIC and BIC. Again, the plots of observed (in histogram) and expected densities (lines) presented in Figure  8 and Figure 9, also confirms our findings.

Likelihood Ratio Test
Further, since ( , 2 )and 2 ( , , )are nested models, the likelihood ratio (LR) test is used to discriminate between them. The LR test is carried out to test the following hypothesis, 0 : = 0, that is the sample is drawn from ( , 2 ): against the alternative 1 : ≠ 0, that is the sample is drawn from 2 ( , , ). The values of LR test statistic for two datasets are respectively, 31.117 and 68.442 which exceed the critical value at 5% level of significance. Thus there is evidence in favor of the alternative hypothesis. Therefore, we may conclude that the sampled data come from 2 ( , , ) and not from ( , 2 ) in both cases.

Concluding remark
In this article the log-Balakrishnan-alpha-skew-normal distribution which has at most two modes is introduced and some of its basic properties are investigated. The numerical results of the modelling of two real life data sets considered here has shown that the proposed distribution 2 ( , , ) provides much better fit in comparison to the lognormal ( , 2 ) distribution, the log-skew-normal ( , , ) distribution and the log-alpha-skew-normal ( , , ) distribution. It is therefore expected that the proposed distribution will be useful for modelling different types of data.