The Beta Power Muth Distribution: Regression Modeling, Properties and Data Analysis

In this paper, we propose a new flexible lifetime distribution. The proposed distribution will be referred to as beta power Muth distribution. It can be used to model increasing, decreasing, bathtub shaped or upside-down bathtub hazard rates. Some properties of the new model are obtained including moments, quantile function and moments of the order statistics. The unknown model parameters are estimated by the maximum likelihood method of estimation. A Monte Carlo simulation study is carried out to assess the performance of the maximum likelihood estimates. Two reliability data sets are applied to illustrate the usefulness and flexibility of the proposed model. In addition, we introduce a new location-scale regression model based on the logarithm of the proposed distribution and provide a real data application.


Introduction
In reliability and lifetime analysis, the hazard function, also called hazard rate, failure rate or instantaneous failure rate, has a crucial role to characterize real lifetime data. Several real-life data sets have a non-monotone hazard rates such as the bathtub shapes and upside-down bathtub (unimodal) hazard rates. The most popular traditional distributions do not provide a good fit for modeling this kind of the data sets. Hence, many parametric probability distributions have been introduced to analyze real data sets with non-monotone hazard rates.
One of the most recent models is that of Jodrá et al. (2017) who have proposed a new two-parameter lifetime distribution with bathtub-shaped and increasing failure rate called the power Muth (PM) distribution.distribution, where the Muth distribution was first proposed by Muth (1977) in the context of reliability theory. A random variable Y is said to follow the Muth distribution, with shape parameter ∈ ]0,1], if its probability density function (pdf) has the form: and the cumulative distribution function (cdf) of X is where > 0, is a scale parameter and is a shape parameter. Also, Jodrá et al. (2017) studied various statistical properties of this distribution and showed that it gives the best fit for two real data sets than many other distributions.
The PM distribution does not provide enough flexibility for analyzing different types of lifetime data. To increase the flexibility for modelling purposes, it will be useful to consider further alternatives to this distribution. Therefore, the goal of this paper is to introduce a new four-parameter generalization that can capture decreasing, increasing, unimodal (upside-down bathtub) and bathtub-shaped hazard rate functions. The new distribution will be called the beta PM (BPM) distribution. We study some properties of the new model, give maximum likelihood estimation of the parameters, derive the elements of the observed information matrix and apply this model to real-life data sets. Also, based on this distribution, we present a new regression model.
The rest of the paper is organized as follows. In Section 2, we define the BPM model. In Sections 3, 4 and 5 we explicit expressions for the quantile function, moments and the moments of the order statistics respectively. In Section 6, we discuss maximum likelihood estimation of the model parameters. In Section 7, we conduct a simulation study to check the performance of the maximum likelihood estimates. A new regression model and residual analysis are presented in Section 8. Three applications are given in Section 9. Conclusions are given in Section 10. Eugene et al. (2002) proposed the beta-generated family of distributions by using the beta random variable. For an arbitrary baseline cdf G(x), the cdf of the beta generalized family is defined by

The model definition
where > 0and > 0 are two shape parameters, ( , ) = is the incomplete beta function ratio, is the incomplete beta function, ( , ) = Γ( )Γ( )/Γ( + ) is the complete beta function and Γ(. ) is the gamma function. The pdf corresponding to (3) is where ( ) = ( )/ is the baseline pdf. By using Equation (3), several authors have defined and studied many new distributions. For example, Eugene et al. (2002) introduced the beta-normal distribution. Nadarajah and Kotz (2005) defined and studied the beta exponential.
The Beta Power Muth Distribution: Regression Modeling, Properties and Data Analysis 227 The pdf of the BPM distribution is given by It is clear that the PM distribution with parameters and is a special sub-model for a=b=1. For b=1, we obtain the exponentiated PM (EPM) distribution, which is proposed by Irshad et al. (2021). Hereafter, a random variable X with pdf (6) will be denoted by ∼ ( , , , ). Figure 1 shows some possible shapes of the pdf (6) of the BPM distribution for some parameter values of , ,a and b. Then, we observe that the density function (6) can take various forms depending on the parameter values. It is evident that the BPM distribution is much more flexible than the PM distribution. The hazard function of the BPM distribution is given by .
In Figure 2, we plot the hazard rate function of the BPM distribution for selected values of , ,a and b. We observe that the hazard rate function of the BPM distribution can be increasing, decreasing, and bathtub shaped or unimodal shaped depending on the values of the parameters. Therefore, the BPM distribution is quite flexible and can be used to fit various types of data sets in different domains.
The Beta Power Muth Distribution: Regression Modeling, Properties and Data Analysis 228 Figure 2: Plots of the hazard rate function of the BPM distribution for some parameter values.

Quantile function
where , ( ) is the uth quantile of a beta distribution with parameters a and b. Therefore, it is easy to simulate the BPM distribution. Let V be a beta random variable with parameters a >0 and b >0. Then, the random variable follows the BPM distribution. From Equation (8), we can generate a random variable X having the BPM distribution when the parameters are known. The median can be derived from (7) by setting u=1/2.

Moments
Here, we give the moments of the BPM distribution. We can determine the skewness, kurtosis and the expected life time of a device in lifetime data. The following theorem gives the rth moment of the BPM distribution in terms of the generalized integro-exponential function, which is defined by (see Milgram, 1985): where z ∈ ℝ, s ∈ ℝ and k > -1. Theorem 1. If ∼ ( , , , ), then the rth moment of X is given by Proof. The rth moment of X is From (6) we have By making use of the binomial series expansion, if η > 0 is real non-integer and |z| < 1, we obtain By setting = ( / ) in the above integral, we get ].
From the generalized integro-exponential function we have Milgram (1985), we have Therefore, we obtain the desired result.

Moments of order statistics
In this section, the pth moment of the ith order statistic for the BPM distribution is given which have some applications in reliability and lifetime analysis. Let 1, , … , be a simple random sample from BPM distribution and let 1: , … , : denote the order statistics obtained from this sample. The following theorem gives the pth moment of the ith order statistic : . Theorem 2. Then the pth moment of : is given by Proof. By the definition, the pth moment of : is where : ( ) is the pdf of : given by where F(x) and f(x) are given by (5) and (6), respectively. By using (9), the BPM cdf function (5) for b > 0 real noninteger can be rewritten as (2000), we have

From Gradshteyn and Ryzhik
By using (9), we get Similarly, as a proof of Theorem 1, we obtain ( : ).

Estimation
Here, the maximum likelihood estimates (MLEs) of the parameters of the BPM distribution are presented. Let 1 , … , be n random observations from the BPM distribution with unknown parameter vector = ( , , , ) . Then, the log-likelihood function, denoted by ℓ( ), for ,is given by Therefore, the log-likelihood equations are obtained, by taking the partial derivatives of ℓ( ) with respect to , , and , as follows , whose elements are given in Appendix B. So, the approximate confidence intervals for β, γ, a and b are given, respectively, by where var(⋅) is the diagonal element of −1 ( ) corresponding to each parameter and 2 is the quantile 100(1-ω/2)% of the standard normal distribution.

Monte Carlo simulation study
In this section, we conduct a simulation study to check the performance of the MLEs of the parameters of the BPM distribution. From Equation (8), we generate random samples of sizes n =20, 50,100,200 and 500 from the BPM distribution using the following sets of parameters: • Set I: = 0.2, = 0.5, = 2.5, = 5, • Set II: = 2.5, = 1, = 2, = 1, The simulation is performed via the statistical software R through the command mle. The number of Monte Carlo replications made was N=1000. The evaluation of the performance is based on the bias and the mean squared errors (MSE) defined as follows: , where = , , and b. The results of our simulation study are summarized in Table 1. We can see that the bias and MSE of the MLEs decrease when the sample size increases, as expected. This verifies the consistency properties of the MLEs, i.e., we can conclude that the maximum likelihood method performs well for estimating the parameters of the BPM distribution. In some practical applications, the lifetimes are affected by many explanatory variables and for this reason the regression models are widely used to estimate univariate survival functions for censored data. Among them, the location-scale regression model is distinguished since it is frequently used in clinical trials. In this section, we introduce a new location-scale regression model based on the logarithm of the BPM distribution. If where ∈ ℝ is the location parameter, > 0 is the scale parameter and a > 0 and b > 0 are the shape parameters. The corresponding survival function is Then, the pdf of the standardized random variable = − is given by The linear location-scale regression model linking the response variable and the explanatory variable vector = ( 1 , … , ) is given by where the random error has density function (12)  where r is the number of uncensored observations (failures) and = . We can obtain the MLE ̂ of τ by maximizing the loglikelihood function ℓ( ). To compute this estimate, we use the procedure optim in R software.

Residual Analysis
Residual analysis has an important role in judging the adequacy of the fitted model. To study departures from error assumptions and the presence of outliers, we consider here residual analysis based on the martingale and modified deviance residuals.

Martingale residual
The martingale residual is defined in counting processes, for more details see Fleming and Harrington (1994). The martingale residuals for LBPM model is where * = −̂ with ̂=̂.

Modified Deviance Residual
The main drawback of the martingale residual is that when the fitted model is correct, it is not symmetrically distributed about zero. To overcome this problem, modified deviance residual was proposed by Therneau et al. (1990). Th modified deviance residual for LBPM model is is the martingale residual.

Data Analysis
In order to show the flexibility of the BPM distribution we use two reliability real data sets with different shapes. In order to verify which distribution fits better to the data sets, we compute the values of the log-likelihood functions (-2l), Akaike information criterion (AIC), consistent Akaike information criteria (CAIC), Bayesian information criterion (BIC) and the Kolmogorov-Smirnov (K-S) statistic with corresponding p-value. The better model corresponds to smaller values of these measures and high p-value.

First data set: devices failure time data
The first data set is given by Aarset (1987) and represents the time to first failure of 50 devices (in weeks). This data set is: 0.  Figure 3(a), shows a convex shape followed by a concave shape. This corresponds to a bathtub shaped hazard rate function. So, the BPM distribution is appropriate for modeling the first data. Table 2 presents the values of the MLEs of the parameters for all fitted distributions. We see from Table 3 that the BPM distribution has the smallest values of -2l, AIC, BIC, CAIC and K-S and largest p-value. Hence, the BPM distribution gives an excellent fit than the others models for the first data set. In addition, we plot the histogram of this data set and the fitted pdfs in Figure 4(a). This Figure shows that the BPM pdf provides a closer fit to the histogram than other distribution. The plots of the estimated cdfs and empirical cumulative function are displayed in Figure 4(b). These plots reveal that the BPM cdf is the closest curve to the empirical cumulative function. So, we conclude that the BPM model is the best.
The variance-covariance matrix for the estimated parameters of the BPM distribution is given by
The estimated variance-covariance matrix of the BPM distribution for the this data set is

Third data set: HIV survival data
This data set is reported in Hosmer and Lemeshow (1999) and also it is available in Bolstad2 package of R software. The sample size is n=100 on HIV+ subjects belonging to an health maintenance organization, where the goal is to evaluate the survival time of these subjects. Alizadeh et al. (2017) adopted the log-odd power Cauchy-Weibull (LOPCW) regression model to analyse this data set. We use the same data set to prove the flexibility of LBPM regression model, where the aim of this study is to relate the survival time (y) with the history of drug use (v). The variables are: observed survival time (in months), : censoring indicator (0= alive at study end or lost to follow up, 1= death due to AIDS or AIDS related factors) and 1 (1= yes, 0= no) represents the history of the drug use. The regression model fitted to the data set is given by has the LBPM density (10) for i=1,…,100. We compare the LBPM regression model with the LPM, LEPM, LOPCW and log-Weibull (LW) regression models. Table 6 gives the MLEs, their approximate standard errors and p-values obtained from the fitted these models, -2l, AIC, BIC and CAIC statistics. These results indicate that the LBPM regression model has the lowest values of the -2l, AIC, BIC and CAIC. Therefore, it is clear that the LBPM model provides an adequate fit to HIV survival data. We can observe that the explanatory variable 1 is significant at the level of 1%. Figure 6(a) displays the modified deviance residuals (see Section 8) against the index of the observations. Figure  6(b) gives the normal probability plot with generated envelope. We conclude that none of observed values appear as possible outliers. Therefore, the fitted model is is very suitable for this data set.  Figure 6: (a) Index plot of the modified deviance residuals and (b) Normal probability plot for the modified deviance residuals with envelope from the fitted LBPM regression model.

Conclusions
We have introduced a new four-parameter model called the beta power Muth distribution. This distribution has as sub-models the power Muth and exponentiated power Muth distributions. We have derived explicit expressions for the moments, quantile function and moments of the order statistics associated with the proposed model. We have used the maximum likelihood method to estimate the model parameters. We have presented two examples involving reliability data sets. One of the data sets has a bathtub shaped failure rate and the other has an upside-down bathtub shaped failure rate function. For these data sets, our model provides the best fit than other competitive models. Further, we have defined the LBPM regression model and shown that this model gives an adequate fit to HIV survival data.