Non parametric smoothing involves letting the data determine the amount of smoothing. Classical smoothing splines use a global smoothing parameter in order to control the amount of smoothing in a function. When homogeneity of the smoothness cannot be reasonably assumed across the whole domain of the function, a natural extension is to allow the smoothing parameter to vary over the domain as a penalty function of independent variable, adapting to the change of roughness  . Adaptive smoothing has been an interesting topic in statistics and it involves allowing the smoothing parameter, the bandwidth or the placement of knots to vary across the domain, adapting to the change of roughness  - . In penalized regression splines,  modeled the penalty function by a linear interpolation on the logarithmic scale,  modeled the penalty function from full Bayesian approach and used Markov chain Monte Carlo for computation,  developed a fast and simple algorithm for the Bayesian p-spline based on Laplace approximation for the marginal likelihood. Modeling the smoothing parameter as a penalty function of independent variable can also be used to achieve adaptiveness. This involves formulating the adaptive smoothing as a minimization problem with a new penalty function in which the estimate has the same form as the smoothing spline and method developed for classical smoothing splines can be used.  derived the reproducing kernels for a generic penalty function and suggested modeling it by B-splines.  studied the solution of the penalized least square estimate in which the Smoothing parameter is a varying function across the domain under the Reproducing Kernel Hilbert Space approach.  proposed to model the penalty function by a step function where the segmentation is data driven and estimate it by maximizing the generalized likelihood. A complexity penalty was added to the generalized likelihood in selecting the best step function from a collection of candidate. This approach was very computational expensive due to the large number of candidate models and proposed search algorithm and thus has a serious limitation. In this research we aim at developing a Hierarchical penalty model using p-splines which will result in more adaptive smoothing.
2. Modeling Approach
2.1. Penalized Splines
P-splines are low-order basis spline with a penalty to avoid under smoothing. They are typically not spatially adaptive and hence have trouble when functions are varying rapidly. Regression splines are approximations to functions typically using low-order number of basis function. These splines are subject to lack of smoothness and various strategies have been proposed to attain this smoothness. e.g Regression P-splines  achieves smoothness by penalizing the sum of squares or likelihood by a single penalty parameter. The penalty parameter and the fit using P-splines are easy to compute using mixed model technology;    and are not sensitive to knot parameter selection . A penalized spline can be seen as a compromise between smoothing and regression spline and it combines the attractive attributes of regression and smoothing splines. They are basically regression spline in which the penalty is applied directly to the coefficients of the piecewise polynomial. Hence one can retain a large number of knots and constrain their effect using a penalty to avoid over fitting. The number of knots defining the spline function is larger than that justified by the data but smaller than the number of observations. Thus they are referred to as low-rank smoothers and this significantly reduces numerical effort. The level of over fitting is controlled by a roughness penalty over the curve. The most common choice is a penalty based on the integral of a squared derivative of a spline curve. To avoid the drawbacks in regression spline and optimize the fit we can choose a large number of knots e.g. min (n/4, 40) as suggested by  and prevent over fitting by penalizing the coefficients of splines. That is, one finds
subject to for non negative constant a. Where Y is the response variable, β and d are the fixed and random effects vectors, X and S are the design matrices associated with the fixed and random effects vectors. Using a Lagrange multiplier, this minimization can be written as
With , D is a block diag and
The resulting estimate is given by
The smoothness of this estimate varies continuously as a function of a global smoothing parameter ω. The larger the value of ω the more the fit shrinks towards polynomial fit while small values of ω result in an over fitted estimate. Penalized spline can be seen as a generalization of the spline smoothing with more flexible choice of bases, penalties and knots. One chooses the spline basis based on sufficiently large number of knot and penalizes unnecessary structure. This spline possesses a number of good properties: It shows no boundary effect as many kernels smoother do. i.e. the spreading of a fitted curve as density outside of the domain of the data generally accompanied by bending towards zero, it is a straight forward extension of (generalized) linear regression models, conserve moments (means, variances) of the data i.e. Given a linear p spline with degree q + 1 and a penalty of order q + 1 or higher
For all values of the smoothing parameter ω where the fitted values are. This property is very useful in density smoothing where mean and variance of the estimated density are the same as mean and the variance of the data for any amount of smoothing. It also has polynomial curve fit as its limits. That is, for a penalty of order q and large values of the smoothing parameter ω, the fitted function will approach a polynomial of degree , if the degree of the p-spline is equal or higher than q. Also the computations, including those of cross validation are relatively cheap and can easily be incorporated into standard software .
2.2. Mixed Models
Mixed model are regression model with both the fixed effects and random effects. They correspond to a hierarchy of levels with the repeated, correlated measurement occurring among all the lower level units for each particular upper level. The standard linear mixed model has the form
where Y is a vector of observed responses, β is an unknown vector of fixed effects, d is an unknown vector of random effects or subject specific, with mean zero and variance W, X and S are design matrices associated with a vector of fixed effects β and a vector of random effects d respectively and ε is a vector of residual error term with zero mean and covariance matrix P. The dimensions of the design matrices X and S must conform to the lengths of the observation vector Y and the number of fixed and random effects respectively. It is generally assumed that the elements of d are uncorrelated with the elements of ε in which case the covariance matrix of the random effects and residual error term is a block diagonal
The matrices S and W will themselves be block diagonal if the data arise from a hierarchical structure, where a fixed number of random effects common to observations within a single higher-level unit are assumed to vary across the units for a given level of the hierarchy. Typically the vectors of residual errors are taken to independent and identically distributed and thus where is the residual variance. The covariance matrix W of the random effects vector d is often assumed to have a structure that depends on a series of unknown variance component parameters that need to be estimated in addition to the residual variance and the vector of fixed effects β.
The universal estimators of the fixed and random effects are the best linear unbiased estimators (BLUE) of β and the best linear unbiased predictors (BLUP) of d. This can be recovered as the solution to the mixed model equation,
A mixed model is of the form,
Assuming that d and ε are multivariate normal;
and taking . Then Which result into a pdf
The likelihood function becomes,
The log likelihood becomes,
where . Assuming that the parameters defining the covariance matrices W and P are known, the MLE of β is
which although not obvious algebraically must also satisfy the mixed model equation given earlier. Since one of the ways in which these equations can be derived is directly from multivariate normality assumption. Typically W and P will not be known and can be estimated by substituting the expression for back into and maximizing the result over the parameters defining W and P. Once estimates for W and P have been determined, we can return to the mixed model equations and determined the BLUP of random effects vector d as the vector that minimizes the expected mean squared error of prediction.
The BLUP of d can be expressed as the posterior expectation of the random effects given the data which can be solved explicitly under the normality assumption to yield,
2.3. Hierarchical Penalized Mixed Model
where is modelled as truncated polynomial spline
where are the knots covering the range of x’s and
The knots are placed over the range of x’s and the dimension of r is chosen generously. In penalized spline the approach is to put a penalty on the coefficient of . The standard approach is to minimize sum of squares and the quadratic penalty , where ω is the penalty parameter and D is the penalty square matrix. In truncated polynomial D is an identity matrix and the penalty is . In B spline basis the penalty is constructed using the difference between neighboring spline coefficients . An important feature of penalized spline is its links to linear mixed model. Due to this link we assume
where ρ is a vector of spline coefficients, and is a generalized inverse of D.
In this approach a single parameter is used to shrink all the coefficients of spline and this can be a limitation especially if the underlying function is locally varying, i.e. it fails to completely capture the features of functions that exhibit strong heterogeneity. One way to avoid this is to allow the coefficients to have prior variances and assume that the shrinkage variance process is a smooth function modeled as a log-penalized spline
where is a second layer of knots covering the range of . l is practically less than r. The hierarchical penalized smoothing model is completed by the shrinkage assumption and is constant.
Thus our hierarchical smoothing model can be written as
3. Results and Conclusion
Penalized splines are very common in parametric regression but they have one major drawback in that they are not spatially adaptive. This is due to the use of a global smoothing parameter across the whole heterogeneous function. In this research we aimed at coming up with a spatially adaptive penalized spline by introducing hierarchical splines. This was achieved by modeling the global smoothing parameter ω that is normally used in classical smoothing as another spline.