Comparison of Significant Approaches of Penalized Spline Regression ( P-splines )

Over the last two decades P-Splines have become a popular modeling tool in a wide class of statistical contexts. Fundamentally, semiparametric regression methods combine the leads of parametric and nonparametric approaches to regression analysis, while in precise, penalized spline regression uses the knowledge of nonparametric spline smoothing as a generalization of smoothing splines that let more suppleness in a choice of model with respect to the basis functions and the penalty. The present article compares two significant approaches of penalized spline regression models named as p-splines based on different basis functions with numerous knot selections and various types of penalties. These model fits have been applied on Wood Strength data to compare by calculating nonlinear least square method; also approaches are compared on several aspects: numerical immovability, quality of fit, derivative estimation and smoothing. This comparison will help us to fit best suitable model for conforming best suitable conditions and scenarios.


Introduction
Up until the mid-1990s most literature on spline-based nonparametric regression was concerned with smoothing splines and their multivariate extension, where the penalty takes a particular form and the number of basis functions roughly equals the sample size.P-splines are keen methods for modeling nonlinear smooth effects of covariates within the additive models framework.It's a great approach of smoothing in handling a wide range of nonparametric and semi-parametric modeling situations.These are the regressions which uses splines, or piecewise continuous polynomials, paired with a mathematical penalization to find the best fit to the data.P-splines are generally regression splines fit by least-squares with a roughness penalty.It is very flexible idea that can be seen as a generalization of spline smoothing with a more flexible choice of bases, penalties and knots.
Namely, one chooses a spline basis based on some sufficiently large set of knots and penalizes unnecessary structure.Different basis functions, form of the penalties, amount and location of knots all provide a wide spectrum of smoothers.In husk, penalized regression splines as simple smoothing model is fitted by usurping  0 +  1  +  2  2 + ∑    =1 ( −   ) + 2 = () as an average function for the knots  1 ,  2 , … ,   with large k.A combine recipe is to take high dimensional basis () by choosing open-handedly large number of knots.Then to minimize the criterion ( − ) ′ ( − ) + λ ′  with sufficiently chosen D as penalty matrix and data driven penalty parameter λ.However, in recent years, there has been a great deal of research on more general spline/penalty strategies.Besides for selecting the number of knots, Eilers & Marx (1996) suggest to use a generous numbers, while Rupert (2002) suggests min (40, n/4) knots and Wood (2006) proposes k = 10 *3(m-1) with m as dimension of covariates.Although practical significance has two selection flairs; one to choose adequate number of knots to fit the model and increase the numbers until the likelihood does not further increase and secondly choose a large k and decrease it slightly to check that the likelihood does not decrease.Whereas for Model selection Wager, Vaida & Kauermann (2007) and Kneib & Greven (2010) suggested automatic correction of degree of freedom.
In 1986 O'Sullivan introduced a class of penalized splines that is a direct generalization of smoothing splines based on B-spline basis functions provided by de Boor (1978).After that Eilers & Marx (1992;1996) again raised the issue.In 1996, they introduced a modified version of O'Sullivan's estimator and coined the term P-splines.This P-spline combines regression on B-spline basis, with an equidistant grid of knots and a discrete roughness penalty, i.e., higher-order differences in the penalty to smooth scatterplots.They proposed to use a relatively large number of knots and a difference penalty on the coefficients of adjacent B-splines.Eilers & Marx (1996) is easy to use, and allows great flexibility, that any order of penalty can be combined with any order of the B-spline basis.
Subsequently Ruppert & Carroll (1997) used the truncated power basis (TPB) to represent the components as penalized splines with smoothness controlled by a ridge type penalty over the coefficients of the parametrization.Rather than placing the knots on an equidistant grid, they chose to use quantiles of the observations for the variable X as knot locations.Later Wand (1999) derived an asymptotic approximation of P-splines mean squared error, when penalties are used in combination with fixed basis.Ruppert & Carroll (2000) and Ruppert (2002) provide a strong support to Penalized splines.
Afterwards, Ruppert, Wand & Carroll (2003) present an excellent overview of theory and applications of semi-parametric models based on penalized splines and named their approach as P-splines too.They proposed the use of P-splines as a low rank smoother that can be seen as a compromise between smoothing and regression splines.Ruppert, Wand & Carroll (2003) used truncated power functions in the basis with quantile based knots of the independent variable and a ridge penalty.P-splines are becoming more and more popular and achieved general recognition since Eilers & Marx (1996) and Ruppert, Wand & Carroll (2003) introduced it.

Methodology
This article is presented to compare and document strengths and weaknesses of the two approaches of P-splines, one of Eilers and Marx (1996) which uses B-spline basis, equally-spaced knots and difference penalties and second is Ruppert, Wand and Carroll (2003), which uses a truncated power basis, knots based on quantiles of the independent variable and a ridge penalty.Here is a great collaboration to make well-informed choices for what type of splines should be used, how knots should be spaced and what penalty should be chosen.This article is organized as follows: section 1 give details about spline models of both approaches and compares their components.Section 2 compares both approaches of P-splines by example using wood strength data.Here variable "Concentration" is used as explanatory variable while variable "Strength" as dependent variable.
And in Section 3 in order to gain more insight into the practicability and the limitations of approaches both approaches are discussed in some aspects, including: numerical immovability, quality of the fit & visualization, derivative estimation and smoothing.In present article, we will call Eilers and Marx (1996) approach as E&M P-Spline, while Ruppert, Wand and Carroll (2003) approach as RWC P-spline.The methodology of this article is implemented in R-language.

Model 1: (E&M P-Spline)
As discussed above E&M P-Splines combine regression based on B-splines basis with equally spaced grid of knots and used discrete roughness higher-order finite differences as penalty to smooth scatter plots.Mathematically, Eilers & Marx (1996) presented penalized spline model is: Whereas penalty proposal based on finite differences i.e., ∆   ̂ If suppose Dk =  k , then the matrix Dk is the k th differences of , and  > 0.Where In other words, If XB is the X-matrix corresponding to the t B-spline basis i.e., where Lp is a square invertible matrix.Then fitted values for a penalized spline can be writing as Substitution of eq (4.1) allow us to express a penalized spline fit of degree p in terms of the B-spline basis as An important component of P-splines is number and position of knots as it is crucial problem especially with regression splines.Small number of knots results in a not flexible function space to capture the enough variability of the data.The position of the knots may possibly have a strong influence on estimation and large number of knots may lead to serious over fitting.For this Eilers & Marx (1996) proposed a remedy based on roughness penalty.Also they chose a moderate number of equally spaced knots within the domain of x to ensure enough exibility.Sufficient smoothness of the fitted curve is achieved through a difference penalty on adjacent B-spline coefficients.

Model 2: (RWC P-Spline)
Another approach of p-spline models was offered by Ruppert et al., (2003).They used truncated power functions, F, as basis with quantile based knots and have a ridge penalty on the truncated power function (TPF), whatever their degree.For a given degree p, column j of F is given by The vector t contains the knots chosen as quantiles of the xs.The model for E(y) = μ is given by where X is a m by p+1 matrix with to in row i.So mathematically, Ruppert, Wand and Carroll (2003) presented p-spline model as: i.e. minimization function as in which we recognize a ridge penalty on b.Minimization of QF leads to the system of equations For the number of basis functions in B chosen "too large", means very small value of , the fitted curve over-fitted the data giving too many fluctuations.Increasing  increases the smoothness and in the limit a polynomial fit of degree p is obtained.Quantiles of x are chosen for the positions of the knots.By increasing λ the smoothness can be tuned.Also, we can express the eq (4. 2) in form of penalty used by B-Splines basis as That is a type of ridge regression.
While a ridge penalty on TPF is equivalent to a difference penalty of (fixed order) on Bsplines; B-splines allow a flexible choice of the order of the penalty, such as B-splines can be computed almost trivially from TPF.
As to apply p-splines, different spline bases will be chosen for a balance of numerical stability and ease of implementation.Thus in next section we look at basis functions used for both P-spline models and their mutual relationships.

Truncated Power Function basis vs. B-Spline basis
Truncated polynomials as basis are simple and useful for understanding spline regression but not numerically stable when the number of knots is large and the smoothing parameter λ close to zero.In this case the computation has to be organized carefully, involving QR or Demmler-Reinsch decomposition While B-Spline basis are easy to calculate and numerically superior alternatives.
We begin with TPF and equally-spaced knots since they are a little easier to explain, as TPFs Basis are and B-splines can be derived from them as differences of TPF.
Where B-splines based on a K+1 non decreasing set of knots, 0 ,  1 ,  2 , …   are defined by de Boor (1978), as the i th B-splines function of degree p is obtained by recurrence from first-order B-splines for all real numbers x, with  ,0, () = { 1,    ≤  <  +1, 0, ℎ Now by using example of wood strength data, we derive B-splines from differences of TPFs.Thus the matrix of TPF basis and matrix of B-spline basis derived from TPFs are given below.
From wood strength data presented in section 3, we have: The main advantage of the truncated power function basis is the simplicity of its construction and the ease of interpreting the parameters in a model that corresponds to these basis functions.However, there are two weaknesses when to use this basis for regression.These functions grow rapidly without bound as increases, resulting in numerical precision problems when the data span a wide range.Furthermore, many or even all of these basis functions can be nonzero when evaluated at some value, resulting in a design matrix with few zeros that precludes the use of sparse matrix technology to speed up computation.This weakness can be addressed by using a B-spline basis.
The simplest system of TPF uses p = 0; it consists of step functions with jumps of size 1 at the knots.The right branch of a TPF of degree p looks like ( −   )  ; the left branch is zero.Following figure shows quadratic TPF bases vs quadratic B-Spline basis, with equally-spaced knots.

Comparisons of Models by Example:
For Model 1:   For fitting Model1 at wood strength data we use 10 equally spaced knots, 3 rd order difference penalty, and calculated by using GCV criterion with GCV = 83.3827with 6 df.Plot of the model1 fit is shown below:  By using above code 'tbs' applied on wood strength data with 10 th degree and 10 quantile based knots, we construct the following truncated power basis matrix in table 4.9 (shown on next page).
For fitting Model 2 at wood strength data we use 10 th degree polynomial and 10 quantile knots.Here smoothing parameter is calculated by using GCV criterion with GCV = 10.83 with degree of freedom as 11 (It includes 1 df for intercept).So, plot of the above Fit of model 2 at original fit of given data is shown below:  ...
Table 4.9 Under the null hypothesis this statistic will have an approximate F-distribution with dfres, smallerdfres, larger and dfres, larger degrees of freedom.
For the given dataset the value of the corresponding F-statistic is 4.184 and the corresponding p-value is 0.165, which means that the difference between both P-spline fits is neither statistically nor practically significant.

Comparison on different aspects: a. Numerical Immovability
Technically TPF can be used directly as a basis for regression.But it is not to be recommended especially for the large-scale data or when the number of knots is large and p ≥1 their numerical condition can be poor.If smoothing is the only goal, TPF of degree 1 might be workable, but higher degrees, which are needed when one wishes to smoothly estimate (second) derivatives are essentially unusable.
In very large problems extra care is also needed for using B-spline basis.For example in an engineering application of long series (e.g.15000 observations) of echo sounding measurements had to be smoothed and reduced to a uniform grid, we can use bases with 2000 B-splines, but not possible for TPF basis.Another technique is available, i.e. to first fill a matrix of this size with a TPF basis and then compute differences to reduce it to a sparse matrix.But it is not a good idea.A nice property of B-splines with equi-spaced knots is that we get the same set of values, but shifted by k columns.Hence compute the basis in a matrix with p + 1 columns and transfer the results to the right columns of a sparse matrix.

b. Quality of Fit
The details of a P-spline model can be envisaged in an attractive way.This Presentation shows a significant role of Penalized spline using B-splines and difference penalties that it forces the skeleton, i.e., the coefficients, to follow a smooth pattern.Consequently the full curve that follows from them will also be smooth while penalized spline using TPF and ridge penalty do not lend such an insightful presentation.Also shows that the larger the values of the smoothing parameter λ, the more the fit shrinks towards a polynomial fit, while smaller values of λ result in a wiggly "overfitted" estimate.This is clearly visible in figures where different smoothing parameters are selected.

c. Smoothing
Like in the case of time series and spectra where the number of large observations in the discrete data series, equidistant sampled sometimes only a smoothed discrete series is needed.In such problems, B-spline basis of degree zero, with a knot at every observation, may be an attractive choice.The basis then is the identity matrix, the system of equations becomes (I + λ∆d∆d) α = y and the smooth series μ coincide.Because discrete smoother is very attractive for long data series in case, we have to access the sparse matrix software.So where the sparseness is vital, TPF will not work in such settings.Eilers  Similarly for Multidimensional smoothing applications like image smoothing and for many others of large real-world problem, Eilers and Marx (2010) has shown that tensor products of B-splines and difference penalties are a practical and effective tool.But by contrast, such problems with tensor products are impossible by Penalized Spline smoothing using TPF basis.And when smoothing periodic data on a linear axis, it may happen that both ends do not join smoothly.In such case again B-spline basis with difference penalty seems quite easy.

d. Derivative Estimation
Frequently one is not only interested in a fitted curve, but also in its derivatives.The derivative is also quite an intuitive concept.Basically derivatives are used to measure how a system changes with time.Like in a chemical reaction of say, oxygen + hydrogen goes to water, we can use the derivative of the rate equation to measure how long it will take for a certain percentage of the reactants to be turned into products.Also for example, in the business world, if we figure out the equation of a curve we can find the values for all of the minimums and maximums (that would be where the derivative in equal to zero) so that we can determine profit and loss points, etc.
Penalized splines allow easy calculation of derivatives.For penalized splines of Ruppert, Wand and Carroll (2003) using TPF basis only need is to differentiate the polynomial branches weighted by the estimated coefficients and sum them.But the situation of Eilers and Marx (1996) using B-spline basis is little more complicated.But in the case of equally-spaced knots, the derivative of a weighted sum of B-splines can be computed by:   ∑    (; )  = ∑( − 1)  (;  − 1)∆   ℎ ⁄  If the second derivative curve is required then both TPF and B-spline basis has to be cubic.But we know that cubic TPF have a very poor condition number, so required great attention with the accomplishment of computations for derivative estimation.While cubic B-splines do not comprise such problems.

Conclusion
By above comparison, we got that there are two main approaches of Penalized spline smoothing (both named as P-Spline), first approach as Model1with B-splines and difference penalty and 2 nd approach as Model2 with truncated power function (TPF) basis besides a ridge penalty.Both Models of P-splines (With B-Spline basis and TPF basis) have become popular in statistics and in applied fields, as can be judged from citation counts.
So by comparing these two main approaches, we found: ➢ Ridge Penalty of Model2 has great worth especially in small set of data.➢ Equally-spaced knots presented in Model1 are always to be preferred as they allow easy smooth interpolation with B-splines as well as TPF.

➢
As the basis of Model1 (B-spline basis) can be computed almost trivially from basis of Model2 (TPF basis).So Model1 allow a flexible choice of order of the penalty.

➢
Both Models allow mixed model approach.➢ Model1 have excellent numerical properties over Model2.➢ Basis of Model1 allow informative conception.➢ Model2 is not applied for large-scale sets, while Model1 is sparse and can lend for such problems.➢ Both basis and penalties of Model1 are easily adjusted in the smoothing agenda.

Figure 1 :
Figure 1: B-spline bases with equally spaced knots vs Truncated Power bases with equally spaced knots

4 :
We construct cubic b-spline basis at wood strength data as shown below

Figure 3 :
Figure 3: Model2 Fit (GCV = 10.82,df = 11) Figure below on left illustrates fit with a rich B-spline basis and a second-order penalty.The coefficients of the individual B-splines are shown, scaled by their coefficients at the positions of their maxima.These points are close to the fitted curve and present the skeleton of the fit.And Figure below on right illustrates fit with a Truncated Power function basis and a ridge penalty.Here the estimated curves (bold) are shown together with the used cubic truncated polynomial basis functions.

Table 4 .8: Calculated residuals of model 2 for above fit of P-spline are:
Here from table 4.8, SSe = 13.4542,SSt = 3416.885and R 2 is 0.9960624.To test the statistical significance of difference we use the approximate F-test, which is based on the following statistic and Marx (2010) expressed that a smaller-scale application in which the discrete approach may be appropriate is histogram smoothing.And to estimate a density one constructs a histogram with many, say 150, narrow bins.Such a plotted histogram will look unattractive and ambiguous, but generalized linear smoothing with penalized splines completely changes performance.Also Eilers and Marx presented histogram smoothing with Penalized B-splines in a Poisson regression setting.But did not found any possibility by Penalized splines with TPF basis especially with quantile based knots.