A generating family of unit-Garima distribution: Properties, likelihood inference, and application

In this paper, the unit Garima distribution is introduced. It is used for analysing proportional data. Some statistical properties of the proposed distribution are investigated, including survival and hazard functions, order statistics, quantile function, and stress-strength reliability measures. A new family of continuous distributions, called the unit Garima-generated family of distributions, is also provided. It used the unit Garima distribution as the main generator. Some sub-models of the unit Garima-generated family of distributions are provided, such as the unit Garima-beta, unit Garima-Weibull, and unit Garima-normal distributions. The method of maximum likelihood is used to estimate the model parameters. A Monte Carlo simulation is used to illustrate the performance of the percentile conﬁdence interval construction for each parameter of the proposed distributions. Finally, the developed distributions are applied to eight real data sets.


Introduction
The Garima distribution is introduced by Shanker in 2016.It is applied in behavioral science and compared with some existing continuous distributions, for instance, Sujatha, Aradhana, Akash, Shanker, Lindley, and exponential distributions (Shanker, 2016).Some results based on the goodness of fit test show that the Garima distribution is one of the continuous distributions for modelling behavioral science data.Its cumulative density function (cdf ) and probability density function (pdf ) respectively, are G(y; θ) = 1 − 1 + θy θ + 2 exp {−θy}, and g(y; θ) = θ θ + 2 (1 + θ + θy) exp {−θy}; for y > 0 and θ > 0. (1) However, in many applied scenarios, we are often confronted with the uncertainty of a phenomenon that can be quantified in different bounded ranges.Some selected probability distributions will be considered to fit with observed data for the model fitting.As an example of a bounded range, in modelling with proportion, we employ a random variable with a unit interval (0, 1), which is followed by a particular unit distribution (Bantan et al., 2020).By this means, the selected distributions have (0, 1) support.A transformation of a random variable is one of several techniques to construct the unit distribution (Johnson, 1949).Some existing unit distributions based on the transformation of random variables, such as unit-Lindley distribution by Mazucheli et al. (2019), unit-Rayleigh distribution by Bantan et al. (2020), new unit-Lindley distribution by Mazucheli et al. (2020a), and unit-Weibull distribution by Mazucheli et al. (2020b).
In this paper, we propose a new one-parameter unit distribution, called the unit-Garima distribution, as an alternative distribution for continuous data modelling on the unit interval (0, 1).In addition, we developed the unit Garimagenerated family of distributions using the introduced method of Alzaatreh et al. (2013).Since the past two decades or so, many researchers have developed a generalized family of distributions.Eugene et al. (2002) used the beta distribution as a generator to develop the so-called family of beta-generated distributions.Its cdf is defined as where r(t) is the pdf of the beta random variable and G(x) is the pdf of the selected random variable.In 2002, the beta-normal distribution was introduced by Eugene et al. (Eugene et al.,2002).Later, Jones 2009 and Cordeiro and de Castro (2011) replaced the beta distribution in (2) with the Kumaraswamy distribution, called the Kumaraswamygenerated distribution, which extends the beta-generated distributions.The beta-generated family of distributions and the Kumaraswamy-generated family of distributions are generated using distributions with support between 0 and 1 as the generators.Alzaatreh et al. (2013) developed a new method for generating families of continuous distributions for the interest of using a generator with support lying between a and b for −∞ < a < b < ∞.A random variable X, "the transformer", is used to transform another random variable T , "the transformed".The distribution is called the T-X family of distributions.Consequently, the cdf of a T − X random variable X is defined as where r(t) is the pdf of the random variable T and W [G(x)] is the function of cdf of selected random variable X.
The W (G(x)) satisfies the following conditions (Alzaatreh et al.,2013), (ii) W (G(x)) is differentiable and monotonically non-decreasing, If T be any continuous random variable in the interval (0, 1), then W [G(x)] can be defined as G(x) or G α (x) for α > 0. Several researchers have generated new probability distributions by utilizing recent families of distributions for different choices of G(x).
The paper is organized as follows: The unit-Garima distribution is a new one-parameter unit distribution that is introduced in Section 2. Its properties are also given, including order statistics, the quantile function, the stress-strength reliability measure, and survival and hazard rate functions.In Section 3, we concentrate on a generated family of distributions by using the unit-Garima distribution.Some properties of the proposed family of distributions are provided.
In addition, some examples of the proposed family of distributions are shown in Section 4. Section 5 presents the proposed distributions' methods of the parameters.In Section 6, a simulation study about estimating the parameters of the proposed distributions is illustrated.In Section 7, the application study is presented to show the performance of the proposed distributions by various real data sets.Finally, the conclusion is included in Section 8.

A new unit distribution
In this section, a new unit distribution, the so-called unit-Garima distribution.Our main interest in this paper is to introduce a new unit distribution, the unit-Garima, which is introduced in Theorem 2.1.Some statistical properties of the proposed distribution are discussed.Figure 1: The pdf and cdf of T ∼ UGa(θ) with some specified values of θ T ∼ UGa(θ), for 0 < t ≤ 1 and θ > 0. Its pdf and cdf respectively are: Let u = −θ (1/s − 1), we have du/ds = θ/s 2 and Survival and hazard rate functions: Let T ∼ UGa(θ), its survival and hazard functions respectively are: Order statistics: Let T 1 , . . ., T n be a random variable sample of size n of the UGa distribution with parameter θ.Let T (1) < T (2) < • • • < T (n) denote the corresponding order statistics.Its pdf and cdf of the k th order statistics, respectively, are: for k = 1, 2, 3, . . ., n.If T i ∼ UGa(θ), then the cdf and pdf of the k th order statistics respectively are: .
Quantile function: Let T ∼ UGa(θ), with its cdf R(t) as in (4).We can set R(t; θ) = U where U is a random variable distributed as the uniform distribution on the interval (0, 1).Hence its quantile function is where 0 < u < 1 and W (•) is the Lambert W function, which is a multi-valued complete function defined as the solution of the expression W (z)e W (z) = z (Corless et al.,1996).
Stress-strength reliability measure: A stress-strength reliability involving two independent random variables, X and Y , where X represents the stress variable, and Y represents the strength variable.Its measure is defined as P (Y < X).
Theorem 2.2.Let X ∼ UGa(θ 1 ) and Y ∼ UGa(θ 2 ).If X and Y are two independent random variables, then its stress-strength reliability measure is Proof: If X and Y are two independent random variables, then the stress-strength reliability measure is calculated as (Kotz and Pensky, 2003), If X ∼ UGa(θ 1 ) and Y ∼ UGa(θ 2 ), we have

A generating of the UGa family of distributions
In this section, a generated family of distributions by using the UGa distribution as a generated random variable, called the unit-Garima-generated (UGa-G) family of distributions Theorem 3.1.If X be a random variable distributed as the UGa-G family of distributions with the parameters α > 0 and θ > 0, and a vector parameter ξ, denoted as X ∼ UGa-G(α, θ, ξ).Then its the cdf and pdf respectively are, Proof: Let T ∼ UGa(θ) with pdf r(t) in ( 4), and W [G(x)] = G α (x; ξ) is a function of the cdf of a random variable X which is any distribution with a parameter vector ξ.By using the method of T −X generated family of distributions (Alzaatreh et al., 2013) where its cdf as in (3).We can obtain the cdf of the UGa-G family of distributions as follows: Its corresponding pdf is Some properties of the UGa-G family of distributions are discussed as the following.
The survival and hazard rate functions of the UGa-G family of distributions respectively are: If X be a random variable distributed as the UGa-G family of distributions with the cdf as in (8).Let F UGa−G (x) = U where U be a random variable distributed as the uniform distribution on the interval (0, 1).By inverting the cdf as (8), we then have its quantile function as: where 0 < u < 1 and W (•) is the Lambert W function.
The measures of skewness and kurtosis, based on quantile functions of the T − X family of distributions (Alzaatreh et al., 2013), of the UGa-G family of distributions are derived.The measure of skewness S is defined by Galton (1883) and the measure of kurtosis K is defined by Moors (1988).They are expressed as respective.When the distribution is symmetric, S = 0 and when the distribution is right (or left) skewed, S > 0 (or S < 0).As K increases, the tail of the distribution becomes heavier.
4. Sub-models of the UGa-G family of distributions

The UGa-beta distribution
The beta distribution has been applied to model the behavior of random variables limited to interval (0,1).Its pdf and cdf are given by where a and b are a positive shape parameter, (9), the cdf and pdf of the UGa-beta (UGa-B) distribution, respectively, are where α > 0, β > 0, a > 0 and b > 0. Consequently its quantile function is

The UGa-Weibull distribution
Let X be a random variable distributed as the Weibull distribution with parameters k and λ.Its pdf and cdf respectively, are where k is a positive shape parameter and λ is a positive scale parameter.From (9), we can obtain the cdf and pdf of the UGa-Weibull (UGa-W) distribution as

The UGa-normal distribution
Let X be a random variable distributed as the normal distribution with the pdf and cdf respectively where erf(x) = 2 √ π x 0 e −s 2 ds is the related error function.Inserting these functions in (9), we can obtain the cdf and pdf of the UGa-normal (UGa-N) distribution as   where α > 0, β > 0, −∞ < µ < ∞ and σ > 0. Consequently, its quantile function is Plots of pdf and cdf for selected parameter values of the UGa-B, UGa-W, and UGa-N distributions are shown in Figures 2-4, respectively.

Parameter Estimation
In this section, the parameter estimation of the unknown parameters of the UGa, UGa-B, UGa-N and UGa-W distributions based on the maximum likelihood (ML) method will be derived.

The ML estimators for the UGa distribution
Let X 1 , . . ., X n be a random variable for observed with pdf 4, then its loglikelihood function of θ can be written as The ML estimator θ of θ is obtained by solving the following linear equation: This equation represents a non-linear system that can be solved simultaneously by a numerical procedure using the nlm function in the stats package as contribution packages in R (R Core Team, 2022) are used to find the value of Figure 4: Plots of the pdf and cdf of X ∼ UGa-N(α, θ, µ, σ).θ.

The ML estimators of the UGa-B distribution
Let X i ∼ UGa-B(α, θ, a, b) with the pdf ( 16), its log-likelihood function of α, θ, a, b can be written as Setting the first-partial derivatives of (28) with respect to each unknown parameter to zero, the respective ML estimators of α, θ, a, and b are obtained by a numerical procedure using the nlm function in the stats package as contribution packages in R (R Core Team, 2022).

The ML estimators of the UGa-W distribution
Let X i ∼ UGa-W(α, θ, k, λ) with the pdf (20), its log-likelihood function of α, θ, a, b can be written as Setting the first-partial derivatives of (29) concerning each unknown parameter to zero, the respective ML estimators of α, θ, k, and λ are obtained by a numerical procedure using the nlm function in the stats package as contribution packages in R (R Core Team, 2022).

The ML estimators of the UGa-N distribution
Let X i ∼ UGa-N(α, θ, µ, σ) with the pdf (24), its log-likelihood function of α, θ, a, b can be written as Setting the first-partial derivatives of (30) with respect to each unknown parameter to zero, the respective ML estimators of α, θ, µ, and σ are obtained by a numerical procedure using the nlm function in the stats package as contribution packages in R (R Core Team, 2022).

Simulation illustration
In this study, the estimating parameters of the proposed distributions (the UGa, UGa-B, UGa-W, and UGa-N distributions) are determined, and the parameters of each distribution are shown in Table 1.Because a theoretical comparison is not possible, a Monte Carlo simulation study was designed using R (R Core Team, 2022).The study was designed to cover cases with different sample sizes (n = 25, 50, 100, 500) reflecting small to large samples.In addition, the 95% percentile confidence interval (PCI) would be (2.5thpercentile, 97.5th percentile), where the percentiles refer to the distribution of the Monte Carlo simulation with 1000 replications.The results are provided in Table 1.The maximum likelihood estimates (MLEs) of each parameter for the proposed distributions have a value close to the true parameter when sample sizes are large.For the PCI for the parameters of the proposed distributions, the width values between the upper and lower limits become smaller as the sample size increases.

Empirical illustration
In this section, we discussed eight different datasets for produced by this model.A real data is provided to illustrate the theoretical results.We shall analyze two real data applications in order to illustrate the proposed distributions of the UGa and UGa-B distributions (Data I and II).In addition, examples of four data applications are employed to show the proposed distributions of the UGa-W and UGa-N distributions (Data III to Data VIII).Statistics of these data are provided in Table 2.For each distribution, we estimated the unknown parameters using the ML method.For the comparison of the distributions, the goodness of fit criteria used is values of the Kolmogorov-Smirnov (KS) statistic and the corresponding p-values.From these results in Table 5, we can see that UGa-N and normal distributions provide smallest KS values and highest p-value as compare to other distributions (UGa-W, UGa-N, Weibull, and normal respectively) for Data V and Data VI, respectively.

Left-skewed data set
The seventh data set presents the fracture toughness (X) of material Alumina (AI2O3) of 119 observations, which were discussed by Nadarajah and Kotz (2007).The eighth data set, which follows the skewed left lifetime data discussed by Xu et al. (2003), represents the time (X) to failure (103h) of the turbocharger of one type of engine.From these results in Table 6, we can see that the UGa-W distribution provides the smallest KS values and the highest p-value as compared to other distributions.This indicates that the proposed UGa-W model provides a better fit to the concerned data than the other distributions.

Conclusion
The unit-Garima (UGa) distribution is proposed for analysing proportion data.Its properties are investigated, such as survival and hazard functions, order statistics, quantile function, and stress-strength reliability measures.A new family of continuous distributions, called the unit Garima-generated (UGa-G) family of distributions, is also included.The UGa-G family of distributions is the feature that uses the UGa distribution as the main generator, as is the concept of the T-X family of distributions.Sub-models, such as the UGa-Beta, UGa-Weibull, and UGa-normal distributions, are introduced.We estimate the parameters in each distribution using the maximum likelihood method.A Monte Carlo simulation, the MLEs and PCI for each parameter of the proposed distributions are provided.The results show that the MLEs of each parameter for the proposed distributions have a value close to the true parameter when sample sizes are large.Based on the confidence level of 95%, the PCI for the parameters of the proposed distributions, the width values between the upper and lower limits become smaller as the sample size increases.Applications to eight practical data sets (examples of left-skewed observations, right-skewed observations, and symmetric observations) are given to demonstrate the usefulness of the proposed distributions.The results are: (i) The UGa-W distribution provides a better fit to the concerned data than the other distributions (UGa-W, UGa-N, Weibull, and normal, respectively) for rightskewed observations.(ii) The UGa-N and normal distributions provide the smallest KS values and highest p-values as compared to other distributions (UGa-W, UGa-N, Weibull, and normal, respectively) for symmetric observations.(iii) The UGa-W distribution provides a better fit to the concerned data than the other distributions for right-skewed observations.

Theorem 2 . 1 .
Let Y be a random variable distributed as the Garima distribution with a positive parameter θ.A random variable T = 1/(1 + Y ) is distributed as the unit-Garima (UGa) distribution with a parameter θ, denoted as

Figure 1 (
Figure 1(a) and Figure 1(b) show the pdf and cdf of the UGa distribution for some specified values of θ, respectively.Some properties of the UGa distribution are shown as follows:Survival and hazard rate functions: Let T ∼ UGa(θ), its survival and hazard functions respectively are:

Table 1 :
The 95%PCI for parameters of the proposed distributions and its MLEs.

Table 2 :
Summary Statistics of the real datasets.The first data (Data I) consists of the first 58 observations of the failure time of the Kevlar 49/epoxy strands test at 90% stress level.This data set is obtained from Andrews and Herzberg (2012).The second data set (Data II) refers to the shape perimeter by squared (area) from measurements on petroleum rock samples obtained from Cordeiro and dos Santos Brito (2012).The 48 rock samples were collected from a petroleum reservoir.From these results in Table3, we can see that UGa-G distribution provide smallest KS values and highest p-value as compare to other distributions.This indicate that the proposed UGa-B model provides better fit to the concerned data than the other distributions (Beta and UGa-B distributions).

Table 3 :
Lawless 2003;Arshad et al., 2021 statistics of the model fitting to the real data on interval value (0,1).The third data set (Data III) represents COVID-19 mortality rate data belonging to Mexico of 108 days (n = 108), it is recorded from 4 March to 20 July 2020, which is discussed byAlmongy et al. (2021).This data is the rough mortality rate (X) of 108 observations.The fourth data set (Data IV), the brake pad lifetime (X) for each car, only cars that still had the initial pads, is selected from a random sample of 98 vehicles (n = 98) sold over the preceding 12 months to a group of dealers (seeLawless 2003;Arshad et al., 2021.From these results in Table4, we can see that the UGa-W distribution provides the smallest KS values and the highest p-value as compared to other distributions (UGa-W, UGa-N, Weibull, and normal, respectively).This indicates that the proposed UGa-W model provides a better fit to the concerned data than the other distributions.

Table 4 :
Aderoju, 2021) 2021)tes and some statistics of model fitting for examples of right-skewed data.The fifth data set consists of 101 observations (n = 101), which is the fatigue life (X) of 6061-T6 aluminum coupons cut parallel to the direction of rolling and oscillated at 18 cycles per second.This data is presented by Birnbaum and Saunders (1969 seeArshad et al., 2021).The sixth data set represents the tensile strength (X), measured in GPa, of 69 carbon fibers tested under tension at gauge lengths of 20 mm taken from Bader and Priest (1982; seeAderoju, 2021).

Table 5 :
The parameter estimates and some statistics of model fitting for examples of symmetric observations.

Table 6 :
The parameter estimates and some statistics of model fitting for examples of left-skewed data.