Bayesian Regularized Quantile Regression Analysis Based on Asymmetric Laplace Distribution

Show more

1. Introduction

Since the pioneering work by Koenker and Bassett in 1978, quantile regression (QR) has been deeply studied and widely applied to descript the elaborate relationship between the dependent variable and predictors [1]. Compared with the traditional mean regression, quantile regression has more robustness to data with outliers. In 1999, Koenker and Machado connected the asymmetric Laplace distribution (ALD) to QR model and defined a goodness-of-fit criterion for quantile regression, which is the natural analog of R2 statistic of least squares regression [2]. In 2001, Yu and Moyeed first proposed a Bayesian quantile regression model that the error follows an asymmetric Laplace ( AL) distribution, and proved the maximization of likelihood-based inference used independently distributed asymmetric Laplace densities was equivalent to the minimization of the loss function [3]. In 2010, Hewson and Yu suggested quantile regression model for binary data within the Bayesian framework [4]. In 2011, Reich et al. introduced Bayesian spatial quantile regression model [5]. In 2013, Sriram et al. showed that the misspecified likelihood in the ALD approach still leads to consistent results [6]. In 2009, Kozumi and Kobayashi built a more efficient Gibbs sampler for fitted the quantile regression model based on a location-scale mixture of the asymmetric Laplace distribution to draw samples from the posterior distribution [7]. In 2012, Khare and Hobert proved that this new sampling algorithm converges at a geometric rate [8]. In 2015, Sriram proposed a correction to the MCMC iterations to construct asymptotically valid intervals [9].

In 2004, Koenker added the Lasso regularization method to the mixed-effect quantile regression model for the first time, and the Lasso penalty made the random effect shrink to zero [10]. In 2007, Wang et al. considered the least absolute deviance (LAD) estimate with adaptive Lasso penalty (LAD-lasso) and proved its oracle property [11]. In 2008, Li and Zhu considered quantile regression with the Lasso penalty and developed its piecewise linear solution path [12]. In 2009, Wu and Liu studied the quantile regression with the SCAD method and the adaptive Lasso method [13]. In 2008, Park and Casella studied the Lasso penalty from the Bayesian angle, and proposed that the hierarchical model can be effectively solved by the Gibbs sampler, thereby introducing the regularization method [14]. In 2010, Li et al. studied the regularization method in quantile regression from the perspective of Bayesian and proposed to set the prior distribution of parameters to Laplace prior, and use Gibbs sampler to sampling Bayesian Lasso quantile regression [15]. In 2012, Alhamzawi et al. proposed Bayesian adaptive Lasso quantile regression, by setting different penalty parameters for different variables, and setting the penalty parameter to inverse gamma distribution, and the inverse gamma priori Parameters are treated as unknowns and estimated along with other parameters [16]. In 2018, Adlouni et al. showed that a regularized quantile regression model with B-Splines based on five penalties (Lasso, Ridge, SCAD0, SCAD1 and SCAD2) in Bayesian framework [17].

Based on the existing literature, the Bayesian quantile regression is realized by expressing the asymmetric Laplace distribution as scale mixtures of the standard normal distribution and the standard exponential distribution, and the Gibbs sampler is used to simulate the distributed parameters. The regularized quantile regression under the Bayesian framework is compared with the non-Bayesian regularized quantile regression method. Finally, the prostate cancer data sets are used to illustrate the advantages and disadvantages of these two approaches.

2. Methods

2.1. Quantile Regression

Given data $\left\{\left({x}_{i}\text{\hspace{0.05em}},{y}_{i}\right),i=1,\cdots ,n\right\}$, with covariate vector ${{x}^{\prime}}_{i}=\left({x}_{i1},{x}_{i2},\cdots ,{x}_{ik}\right)$ and ${y}^{\prime}=\left({y}_{1},\cdots ,{y}_{n}\right)$ is the response variable. The $\theta \text{th}$ quantile regression model for the response ${y}_{i}$ given ${x}_{i}$ takes the form of

${Q}_{{y}_{i}}\left(\theta |{x}_{i}\right)={{x}^{\prime}}_{i}\beta \left(\theta \right),$ (1)

where ${Q}_{{y}_{i}}\left(\theta |{x}_{i}\right)={\mathrm{F}}_{{y}_{i}}^{-1}\left(\theta |{x}_{i}\right)$ is the inverse cumulative distribution function and $\beta \left(\theta \right)$ is the unknown coefficients vector that is dependent on the quantile $\theta $, $\theta \in \left(0,1\right)$.

The regression parameter $\beta $ can be estimated by minimizating the following objective function

$\underset{\beta}{\mathrm{min}}{\displaystyle \underset{i=1}{\overset{n}{\sum}}{\rho}_{\theta}}\left({y}_{i}-{{x}^{\prime}}_{i}\beta \right),$ (2)

where ${\rho}_{\theta}\left(u\right)=u\left(\theta -\text{I}\left(u<0\right)\right)$ is the loss function and $I(\cdot )$ denotes the indicator function.

In 2001, Yu and Moyeed [3] argued that the minimization problem in equation (2) is equivalent to maximizing the likelihood function of ${y}_{i}$ by assuming ${y}_{i}$ ’s are random variables from a skewed Laplace distribution with $\mu ={{x}^{\prime}}_{i}\beta $ and $\sigma =1$. The density function of a skewed Laplace distribution is given by

$f\left(y|\mu ,\sigma ,\theta \right)=\frac{\theta \left(1-\theta \right)}{\sigma}\mathrm{exp}\left\{-\frac{{\rho}_{\theta}\left(y-\mu \right)}{\sigma}\right\},$ (3)

where, $\sigma $ is the scale parameter, $\mu $ is the location parameter and $\theta $ is the asymmetrc parameter.

Then the likelihood function of the sample $y={\left({y}_{1},{y}_{2},\cdots ,{y}_{n}\right)}^{\prime}$ can be expressed as

$\mathrm{L}\left(y|\mu ,\sigma ,\theta \right)=\frac{{\theta}^{n}{\left(1-\theta \right)}^{n}}{{\sigma}^{n}}\mathrm{exp}\left\{-\frac{1}{\sigma}{\displaystyle \underset{i=1}{\overset{n}{\sum}}{\rho}_{\theta}\left({y}_{i}-\mu \right)}\right\}.$ (4)

Tsionas [18], Kozumi and Kobayashi [7] have demonstrated that ALD can be viewed as a mixture of standard an exponential distribution, $\mathrm{exp}\left(1\right)$ and a standard normal distribution $\mathrm{N}\left(0,1\right)$. Assume that z and $\upsilon $ represent a standard exponential distribution and a standard normal distribution, respectively.

For $\omega =\frac{1-2\theta}{\theta \left(1-\theta \right)}$, $f=\sqrt{\frac{2}{\theta \left(1-\theta \right)}}$, if $\mu \sim ALD\left(0,\sigma ,\theta \right)$, random variable $\mu =\omega z+\varphi {\sigma}^{-\frac{1}{2}}\sqrt{z}\upsilon $, Therefore, it can be known that the independent variable ${y}_{i}$ of the quantile regression is equivalent to

${y}_{i}={{x}^{\prime}}_{i}\beta +\omega {z}_{i}+\varphi {\sigma}^{-\frac{1}{2}}\sqrt{{z}_{i}}{\upsilon}_{i}.$ (5)

2.2. Bayesian Quantile Regression with Lasso and Adaptive Lasso Penalty

The Bayesian quantile regression parameter estimation model with Lasso penalty (Li and Zhu) [12] is

$\underset{\beta}{\mathrm{min}}{\displaystyle \underset{i}{\overset{n}{\sum}}{\rho}_{\theta}}\left({y}_{i}-{{x}^{\prime}}_{i}\beta \right)+\lambda {\displaystyle \underset{j=1}{\overset{k}{\sum}}\left|{\beta}_{j}\right|}\text{\hspace{0.05em}}\text{\hspace{0.05em}},$ (6)

Li et al. [15] set the prior distribution of the parameter ${\beta}_{j}$ to a Laplace prior $p\left({\beta}_{j}|\sigma ,\lambda \right)={\left(\sigma \lambda /2\right)}^{k}\mathrm{exp}\left\{-\sigma \lambda {\displaystyle \underset{j=1}{\overset{k}{\sum}}\left|{\beta}_{j}\right|}\right\}$. It is also assumed that the error term ${\mu}_{i}$ obeys the asymmetric Laplace distribution (ALD).

Bayesian quantile regression with adaptive Lasso penalty (BQR-AL) is based on different penalty parameters are applied to different regression coefficients. Therefore, the parameter estimation model of BQR-AL is

$\underset{\beta}{\mathrm{min}}{\displaystyle \underset{i=1}{\overset{n}{\sum}}{\rho}_{\theta}}\left({y}_{i}-{{x}^{\prime}}_{i}\beta \right)+{\displaystyle \underset{j=1}{\overset{k}{\sum}}\left|{\lambda}_{j}{\beta}_{j}\right|}\text{\hspace{0.05em}}\text{\hspace{0.05em}},$ (7)

Alhamzawi and Yu [16] proposed different penalization parameters for different regression coefficients, and the prior of the penalty parameters is set to the inverse gamma distribution, and treat the hyperparameters of the inverse gamma prior as unknowns. Thus, the Laplace prior on ${\beta}_{j}$ by

$p\left({\beta}_{j}|\sigma ,{\lambda}_{j}\right)=\frac{{\sigma}^{\frac{1}{2}}}{2{\lambda}_{j}}\mathrm{exp}\left\{-\frac{{\sigma}^{\frac{1}{2}}\left|{\beta}_{j}\right|}{{\lambda}_{j}}\right\},$ (8)

Andrews and Mallows [19] mentioned that

$\frac{\xi}{2}\mathrm{exp}\left\{-\xi \left|t\right|\right\}={\displaystyle {\int}_{0}^{\infty}\frac{1}{\sqrt{2\pi s}}}\mathrm{exp}\left\{-\frac{{t}^{2}}{2s}\right\}\frac{{\xi}^{2}}{2}\mathrm{exp}\left\{-\frac{{\xi}^{2}}{2}s\right\}\text{d}s,\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\xi >0.$ (9)

Let $\eta =\frac{{\sigma}^{\frac{1}{2}}}{{\lambda}_{j}}$, (8) can be written as

$p\left({\beta}_{j}|\sigma ,{\lambda}_{j}\right)=\frac{\eta}{2}\mathrm{exp}\left\{-\eta \left|{\beta}_{j}\right|\right\},$ (10)

also equivalent to

$p\left({\beta}_{j}|\sigma ,{\lambda}_{j}\right)={\displaystyle {\int}_{0}^{\infty}\frac{1}{\sqrt{2\pi {s}_{j}}}}\mathrm{exp}\left\{-{\beta}_{j}^{2}/2{s}_{j}\right\}\frac{{\eta}^{2}}{2}\mathrm{exp}\left\{-{\eta}^{2}{s}_{j}/2\right\}\text{d}{s}_{j},$ (11)

so there are

$p\left({\beta}_{j}|\sigma ,{\lambda}_{j}^{2}\right)={\displaystyle {\int}_{0}^{\infty}\frac{1}{\sqrt{2\pi {s}_{j}}}}\mathrm{exp}\left\{-{\beta}_{j}^{2}/2{s}_{j}\right\}\frac{\sigma}{2{\lambda}_{j}^{2}}\mathrm{exp}\left\{-\sigma {s}_{j}/2{\lambda}_{j}^{2}\right\}\text{d}{s}_{j}.$ (12)

The prior distribution of ${\lambda}_{j}^{2}$ is set to the inverse gamma prior, so the distribution density function of ${\lambda}_{j}^{2}$ is

$p\left({\lambda}_{j}^{2}|\delta ,\gamma \right)=\frac{{\gamma}^{\delta}}{\Gamma \left(\delta \right)}{\left({\lambda}_{j}^{2}\right)}^{-1-\delta}\mathrm{exp}\left\{-\frac{\gamma}{{\lambda}_{j}^{2}}\right\},$ (13)

where $\delta >0$ and $\gamma >0$ are two hyperparameters. Yi and Xu [20] pointed out that the values of these two hyperparameters determine the degree of compression of the variables in (13), because small $\gamma $ large $\delta $ will lead to greater compression, so in order to avoid the impact of special values on the estimation of regression coefficients, we consider $\delta $ and $\gamma $ as unknown parameters.

In summary, the Bayesian quantile regression hierarchical model with adaptive Lasso penalty is

$\begin{array}{l}{y}_{i}={{x}^{\prime}}_{i}\beta +\omega {z}_{i}+\varphi {\sigma}^{-\frac{1}{2}}\sqrt{{z}_{i}}{\upsilon}_{i},\\ p\left({\beta}_{0}\right)\propto 1,\\ p\left({\upsilon}_{i}\right)=\frac{1}{\sqrt{2\pi}}\mathrm{exp}\left\{-\frac{{\upsilon}_{i}^{2}}{2}\right\},\\ p\left({z}_{i}|\sigma \right)=\sigma \mathrm{exp}\left\{-\sigma {z}_{i}\right\},\end{array}$

$\begin{array}{l}p\left({\beta}_{j},{s}_{j}|\sigma ,{\lambda}_{j}^{2}\right)=\frac{1}{\sqrt{2\pi {s}_{j}}}\mathrm{exp}\left\{-{\beta}_{j}^{2}/2{s}_{j}\right\}\frac{\sigma}{2{\lambda}_{j}^{2}}\mathrm{exp}\left\{-\sigma {s}_{j}/2{\lambda}_{j}^{2}\right\},\\ p\left({\lambda}_{j}^{2}|\delta ,\gamma \right)=\frac{{\tau}^{\delta}}{\Gamma \left(\delta \right)}{\left({\lambda}_{j}^{2}\right)}^{-1-\delta}\mathrm{exp}\left\{-\frac{\gamma}{{\lambda}_{j}^{2}}\right\},\\ p\left(\sigma \right)={\sigma}^{a-1}\mathrm{exp}\left\{-b\sigma \right\},\\ p\left(\gamma ,\delta \right)={\gamma}^{-1}.\end{array}$ (14)

2.3. Gibbs Sampling

From the hierarchical model, the joint posterior density function of each parameter is

$\begin{array}{l}p\left(\beta ,z,s,\sigma ,{\lambda}_{1},\cdots ,{\lambda}_{k}|y,{\rm X}\right)\\ \propto p\left(y|\beta ,z,\sigma ,{\rm X}\right){\displaystyle \underset{i=1}{\overset{n}{\prod}}p\left({z}_{i}|\sigma \right)}{\displaystyle \underset{j=1}{\overset{k}{\prod}}p\left({\beta}_{j},{s}_{j}|\sigma ,{\lambda}_{j}^{2}\right)p}\left({\lambda}_{j}^{2}|\gamma ,\delta \right)p\left(\sigma \right)p\left(\gamma ,\delta \right)\\ \propto {\displaystyle \underset{i=1}{\overset{n}{\prod}}\frac{\sigma}{\sqrt{{\sigma}^{-1}{\varphi}^{2}{z}_{i}}}}\mathrm{exp}\left\{-\frac{\sigma {\left({y}_{i}-{{x}^{\prime}}_{i}\beta -\omega {z}_{i}\right)}^{2}}{2{\varphi}^{2}{z}_{i}}-\sigma {z}_{i}\right\}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\times {\displaystyle \underset{j=1}{\overset{k}{\prod}}\frac{1}{\sqrt{2\pi {s}_{j}}}\mathrm{exp}\left\{-{\beta}_{j}^{2}/2{s}_{j}\right\}\frac{\sigma}{2{\lambda}_{j}^{2}}}\mathrm{exp}\left\{-\sigma {s}_{j}/2{\lambda}_{j}^{2}\right\}\\ \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\times \frac{{\gamma}^{\delta}}{\Gamma \left(\delta \right)}{\left({\lambda}_{j}^{2}\right)}^{-1-\delta}\mathrm{exp}\left\{-\frac{\gamma}{{\lambda}_{j}^{2}}\right\}{\gamma}^{-1}{\sigma}^{a-1}\mathrm{exp}\left\{-b\sigma \right\},\end{array}$

where

$y=\left({y}_{1},{y}_{\text{2}},\cdots ,{y}_{n}\right),\text{\hspace{0.17em}}{\rm X}=\left({x}_{1},{x}_{2},\cdots ,{x}_{n}\right),\text{\hspace{0.17em}}z=\left({z}_{1},{z}_{2},\cdots ,{z}_{n}\right),\text{\hspace{0.17em}}s=\left({s}_{1},{s}_{2},\cdots ,{s}_{k}\right).$

The full condition posterior distribution of each parameter is

$\begin{array}{l}{\beta}_{0}|\cdot \sim N\left({\stackrel{\xaf}{\beta}}_{0},{s}_{{\beta}_{0}}^{2}\right),\\ {\beta}_{j}|\cdot \sim N\left({\stackrel{\xaf}{\beta}}_{j},{s}_{{\beta}_{j}}^{2}\right),\\ {z}_{i}|\cdot \sim \mathrm{GIG}{\left(\frac{1}{2},\alpha ,\mathcal{l}\right)}^{2},\\ {s}_{j}|\cdot \sim \mathrm{GIG}\left(\frac{1}{2},\frac{\sigma}{2{\lambda}_{j}^{2}},{\beta}_{j}^{2}\right),\end{array}$

$\begin{array}{l}\sigma |\cdot \sim \mathrm{G}{\left({a}_{1},{a}_{2}\right)}^{3},\\ {\lambda}_{j}^{2}|\cdot \sim \mathrm{GIG}\left(1+\delta ,\frac{\sigma {s}_{j}}{2}+\gamma \right),\\ \gamma |\cdot \sim \mathrm{G}\left(k\delta -1,{\displaystyle \underset{j=1}{\overset{k}{\sum}}{\lambda}_{j}^{-2}}\right).\end{array}$ (15)

Here

$\begin{array}{l}\alpha =\sigma {\omega}^{2}{\varphi}^{-2}+2\sigma ,\\ \mathcal{l}=\sigma {\varphi}^{-2}{\left({y}_{i}-{{x}^{\prime}}_{i}\beta \right)}^{2},\\ {a}_{1}=\frac{3}{2}n+k+a,\\ {a}_{2}={\displaystyle \underset{i=1}{\overset{n}{\sum}}\left[\frac{{\left({y}_{i}-{{x}^{\prime}}_{i}\beta -\omega {z}_{i}\right)}^{2}}{{\varphi}^{2}{z}_{i}}+{z}_{i}\right]}+{\displaystyle \underset{j=1}{\overset{k}{\sum}}\frac{{s}_{j}}{2{\lambda}_{j}^{2}}}+b.\end{array}$ (16)

Since the full condition posterior distribution $p\left(\delta |\cdot \right)\propto {\left(\Gamma \left(\delta \right)\right)}^{-k}{\gamma}^{k\delta}{\displaystyle \underset{j=1}{\overset{k}{\prod}}{\lambda}_{j}^{-2\delta}}$

of $\delta $ does not have a closed form, it is a logarithmic convex function. Gilks [21] proposed using the adaptive rejection sampling algorithm to sample this distribution.

3. Simulation Studies

Based on the MCMC algorithm of Gibbs sampling, Bayesian estimation is carried out on the model. The simulation studies used to compare the regularized quantile regression under the Bayesian and the non-Bayesian framework. These methods include Bayesian quantile regression with adaptive Lasso penalty (BQR-AL), Bayesian quantile regression with Lasso penalty (BQR-L), quantile regression with Lasso penalty (QR-L), quantile regression with SCAD penalty (QR-SCAD) and quantile regression (QR).

3.1. Independent and Identically Distributed Random Errors

Here, we follow the same simulation strategy introduced by Li, Xi and Lin [15] in the simulation studies 1, 2 and 3 with different parameter values for the error distributions.

We consider a linear model

${y}_{i}={{x}^{\prime}}_{i}\beta +{\epsilon}_{i},\text{}\text{}\text{\hspace{0.05em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}i=1,\cdots ,n$

where ${\epsilon}_{i}$ ’s have the $\theta \text{th}$ quantile equal to zero.

For i.i.d. random errors, this paper will consider the following four forms of simulation

Simulation 1: $\beta =\left(3,1.5,0,0,2,0,0,0\right)$,

Simulation 2: $\beta =\left(0.85,0.85,0.85,0.85,0.85,0.85,0.85,0.85\right)$,

Simulation 3: $\beta =\left(5,0,0,0,0,0,0,0\right)$,

Simulation 4: $\beta =\left(\underset{10}{\underset{\ufe38}{3,\cdots ,3,}}\underset{10}{\underset{\ufe38}{0,\cdots ,0}},\underset{10}{\underset{\ufe38}{3,\cdots ,3}}\right)$.

In the first three simulation studies, the rows of $x$ is generated in a multivariate normal distribution $\mathrm{N}\left(0,\Sigma \right)$ with ${\left(\Sigma \right)}_{ij}={0.5}^{\left|i-j\right|}$. In Simulation 4, we first generate ${{\rm Z}}_{1}$ and ${{\rm Z}}_{2}$ from $\mathrm{N}\left(0,1\right)$, then let ${x}_{j}={{\rm Z}}_{1}+{\nu}_{j}$, $j=1,\cdots ,10$, ${x}_{j}\sim N\left(0,1\right)$, $j=11,\cdots ,20$, ${x}_{j}={{\rm Z}}_{2}+{\nu}_{j}$, where $j=21,\cdots ,30$, where ${\nu}_{j}\sim N\left(0,0.01\right)$, $j=1,\cdots ,10,21,\cdots ,30$.

In each simulation, we consider the error distributions in our simulation follows

1) A normal distribution $N\left(\mu ,1\right)$ with the $\theta \text{th}$ quantile equal to zero.

2) A Laplace distribution $Laplace\left(\mu ,1\right)$ with the $\theta \text{th}$ quantile equal to zero.

3) A t distribution with three degrees of freedom, ${t}_{\left(3\right)}$.

4) A ${\chi}^{2}$ distribution with three degrees of freedom, ${\chi}_{\left(3\right)}^{2}$.

5) A asymmetric Laplace distribution $ALD\left(\mu ,0.5,1\right)$ with the $\theta \text{th}$ quantile equal to zero.

The number of observations in one simulated sample is n = 200. The simulation is repeated 50 times for each error distribution. The evaluation index is the median mean absolute deviation (MMAD), i.e.

$\text{MMAD}=\mathrm{median}\left(1/200{\displaystyle \underset{i=1}{\overset{200}{\sum}}\left|{{x}^{\prime}}_{i}\stackrel{^}{\beta}-{{x}^{\prime}}_{i}{\beta}^{true}\right|}\right)$

The quantile regression model for the quantile $\theta =\left(0.3,0.5,0.7\right)$ is estimated separately. The simulation results are shown in Figures 1-4.

Figure 1 shows that under the condition of simulation 1, that is, sparse coefficients, the MMD values of BQR-AL and BQR-L are lower than those of the frequency school method, in which the MMAD value satisfying the error term compliance distribution $ALD\left(\mu ,0.5,1\right)$ is smaller.

Figure 2 corresponds to the dense coefficient in simulation 2, and it can be seen that the MMAD values of BQR-AL and BQR-L are very small and close to each other.

Figure 3 corresponds to the case where the coefficient of the simulation 3 is sparse, and the same conclusion as the simulation 1 can still be obtained. And the effect of Bayesian regularized quantile regression is more obvious.

For simulation 4, since the number of variables is larger than the sample size, the design matrix is a singular matrix, so the QR and QR-SCAD methods cannot be run in this simulation, and the other methods can still operate normally. This also proves the advantages of the regularization method. The results are shown in Figure 4.

Figure 1. The panels represent the MMADs in simulation 1.

Figure 2. The panels represent the MMADs in simulation 2.

Figure 3. The panels represent the MMADs in simulation 3.

Figure 4. The panels represent the MMADs in simulation 4.

Figures 1-4 show that, in terms of the MMAD, BQR-AL and BQR-L method performs better than the other regularized quantile regression method. The results of the MMAD for simulation 1 - 4 are reported in Figures 1-4. From these simulation, we can learn the following results:

1) In the above simulation, the MMADs of BQR-AL and BQR-L tend to give lower MMAD compared with the other regularized quantile regression under non-Bayesian for all distributions under considerations. It is shown that the stability and repeatability of the Bayesian regularized quantile regression are better.

2) In the case of sparse and very sparse regression coefficient, the MMAD value of BQR-AL is the smallest. In the case of dense regression coefficient, the MMAD value of BQR-L is smaller. Moreover, the estimation effect of the two methods is similar.

3) The BQR-AL and BQR-L methods can achieve good results under all error term distributions. It is shown that the regularized Bayesian quantile regression method is robust to the assumption of the error term, and the two methods are satisfactory even if the error term deviates from ALD.

4) No matter what the distribution of the original data is, when the error distribution is ALD, the regularized quantile regression method under Bayesian framework has high accuracy, especially the BQR-AL method, and the estimated value of its parameters is the closest to the real value.

In addition to observing the MMADs of each method, this paper can also observe the estimation of its parameters. Due to the limited space, in this paper, the parameter estimation results of error obeying ALD distribution in simulation 1 are simulated:

It can be seen from the parameter estimates in Table 1, the QR class method generally gives less biased parameter estimates, but this does not guarantee a good quantile prediction, as implied by the MMADs in Figure 4.

3.2. Non-i.i.d. Random Errors

Consider the following model when the error term is subject to a non-i.i.d.

$y=2+{x}_{1}+{x}_{2}+{x}_{3}+\left(1+{x}_{3}\right)\epsilon ,$

where ${x}_{1}\sim \mathrm{N}\left(0,1\right)$, ${x}_{3}\sim U\left(0,1\right)$, ${x}_{2}\sim {x}_{1}+{x}_{3}+z$, where $z\sim N\left(0,1\right)$ and $\epsilon \sim N\left(0,1\right)$. The remaining five noise variables ${x}_{4},{x}_{5},{x}_{6},{x}_{7},{x}_{8}$ are generated from the independent standard normal distribution. The results are shown in Table 2 and are based on 50 repetitions, each with sample size n = 200.

It can be known from Table 2 that the BQR-L method have smaller MMAD values, indicating that the regularized quantile regression method under the Bayesian framework is also superior to the regularized quantile regression method under the non-Bayesian framework when the error obeys the non-i.i.d..

3.3. Prostate Cancer Data Set

This section mainly analyzes prostate cancer data in the “bayesQR” package [22]. The dataset was first proposed by Stamey et al. [23], including a medical

Table 1. Parameter estimates for simulation 1 in the case of i.i.d.

Table 2. MMADs of each method in the case of non-i.i.d.

record of 97 male patients undergoing radical prostatectomy, containing the level of prostate antigen y (lpsa) and eight influencing factors. These influencing factors were: log cancer volume (lcavol), log prostate weight (lweight), age, log of the amount of benign prostatic hyperplasia (lbph), seminal vesicle invasion (svi), log of capsular penetration (lcp), Gleason score (gleason) and percentage of Gleason score 4 or 5 (pgg 45). As with the numerical simulation of the second part, still consider $\theta =\left(0.3,0.5,0.7\right)$ here.

In Table 3, we will compare three methods: QR, BQR-L, and BQR-AL. The QR method will use the “quantreg” package in R and the default rank method to get the confidence interval. Here, the 95% interval is considered and parameter estimation is performed. For the Bayesian method, the MCMC algorithm is used to perform 5000 simulations on the posterior distribution by default, and the

first 1000 samples are discarded. The results are shown in Table 3.

It can be seen from Table 3 that the parameter estimates of the BQR-L and BQR-AL methods are very close to the classical quantile regression and the credible intervals of the BQR-L and BQR-AL methods are narrower than the QR, in which the BQR-AL interval to be more accurate than BQR-L, from this point of view, the Bayesian quantile regression method with penalty is estimated to be more accurate than the non-Bayesian regularized quantile regression method.

Of course, we can also intuitively understand by drawing images. In this section, for a more intuitive observation, the estimated values of the various methods for $\theta =0.7$ are plotted, and similar results are obtained for other quantiles. In order to be intuitive, the estimated values of each method will be translated, and the image is drawn as shown in Figure 5.

Table 3. Parameter estimation and 95% interval of prostate cancer data.

Figure 5. Regression estimates under five different methods and 95% credible intervals for BQR-AL and BQR-L.

It can be clearly seen from Figure 5 that the quantile regression with Bayesian can provide a credible interval, while the non-Bayesian quantile regression is not always available and the credible interval of BQR-AL is narrower than BQR-L. The Bayesian quantile regression with penalty is estimated to be more accurate than the non-Bayesian regularized quantile regression. The effect of BQR-AL is better.

4. Conclusion

Bayesian quantile regression with adaptive Lasso penalty is an extension and improvement of the Lasso method. Adaptive Lasso penalty is based on different penalty parameters are applied to different regression coefficients. This method can effectively eliminate the influence of noise variables and obtain more accurate parameter estimation. Through the Gibbs sampling algorithm, this paper systematically compares the regularized quantile regression under the non-Bayesian and Bayesian framework, and finds that when the error term obeys the independent identically distributed or heteroscedasticity distribution, both BQR-AL and BQR-L have higher accuracy and are superior to non-Bayesian methods. When the error obeys ALD, the BQR-AL method has the highest accuracy for the MMAD under the same quantile, and its parameter estimate is also the closest to the true value in general. In the real data set, we can also find the same conclusion. Therefore, we can say that the Bayesian penalty regression method can get a good effect under the condition that the coefficient is sparse or dense, and it can be described in full aspect at different quantile points, and it will occupy a very important position in the future high-dimensional data analysis.

Funding

This work was supported by the National Natural Science Foundation of China [grant numbers 61763008, 71762008]; Guangxi Science and Technology Plan Project [grant numbers 2018GXNSFAA294131, 2018GXNSFAA050005, 2016GXNSFAA380194].

References

[1] Koenker, R. and Bassett, G. (1978) Regression Quantiles. Econometrica, 46, 33-50.

https://doi.org/10.2307/1913643

[2] Koenker, R. and Machado, J.A.F. (1999) Goodness of Fit and Related Inference Processes for Quantile Regression. Journal of the American Statistical Association, 94, 1296-1310.

https://doi.org/10.1080/01621459.1999.10473882

[3] Yu, K. and Moyeed, R.A. (2001) Bayesian Quantile Regression. Statistics & Probability Letters, 54, 437-447.

https://doi.org/10.1016/S0167-7152(01)00124-9

[4] Hewson, P. and Yu, K. (2010) Quantile Regression for Binary Performance Indicators. Applied Stochastic Models in Business & Industry, 24, 401-418.

https://doi.org/10.1002/asmb.732

[5] Reich, B.J., Fuentes, M. and Dunson, D.B. (2011) Bayesian Spatial Quantile Regression. Publications of the American Statistical Association, 106, 6-20.

https://doi.org/10.1198/jasa.2010.ap09237

[6] Sriram, K., Ramamoorthi, R.V. and Ghosh, P. (2013) Posterior Consistency of Bayesian Quantile Regression Based on the Misspecified Asymmetric Laplace Density. Bayesian Analysis, 8, 479-504.

https://doi.org/10.1214/13-BA817

[7] Kozumi, H. and Kobayashi, G. (2009) Gibbs Sampling Methods for Bayesian Quantile Regression. Journal of Statistical Computation and Simulation, 81, 1565-1578.

https://doi.org/10.1080/00949655.2010.496117

[8] Khare, K. and Hobert, J.P. (2012) Geometric Ergodicity of the Gibbs Sampler for Bayesian Quantile Regression. Journal of Multivariate Analysis, 112, 108-116.

https://doi.org/10.1016/j.jmva.2012.05.004

[9] Sriram, K. (2015) A Sandwich Likelihood Correction for Bayesian Quantile Regression Based on the Misspecified Asymmetric Laplace Density. Statistics & Probability Letters, 107, 18-26.

https://doi.org/10.1016/j.spl.2015.07.035

[10] Koenker, R. (2004) Quantile Regression for Longitudinal Data. Journal of Multivariate Analysis, 91, 74-89.

https://doi.org/10.1016/j.jmva.2004.05.006

[11] Wang, H., Li, G. and Jiang, G. (2007) Robust Regression Shrinkage and Consistent Variable Selection through the LAD-Lasso. Journal of Business & Economic Statistics, 25, 347-355.

https://doi.org/10.1198/073500106000000251

[12] Li, Y. and Zhu, J. (2008) L1-Norm Quantile Regression. Journal of Computational and Graphical Statistics, 17, 163-185.

https://doi.org/10.1198/106186008X289155

[13] Wu, Y. and Liu, Y. (2009) Variable Selection in Quantile Regression. Statistica Sinica, 19, 801-817.

[14] Park, T. and Casella, G. (2008) The Bayesian Lasso. Journal of the American Statistical Association, 103, 681-686.

https://doi.org/10.1198/016214508000000337

[15] Li, Q., Xi, R. and Lin, N. (2010) Bayesian Regularized Quantile Regression. Bayesian Analysis, 5, 533-556.

https://doi.org/10.1214/10-BA521

[16] Alhamzawi, R., Yu, K. and Benoit, D.F. (2012) Bayesian Adaptive Lasso Quantile Regression. Statistical Modelling, 12, 279-297.

https://doi.org/10.1177/1471082X1101200304

[17] Adlouni, S.E., Salaou, G. and Sthilaire, A. (2018) Regularized Bayesian Quantile Regression. Communication in Statistics-Simulation and Computation, 47, 277-293.

https://doi.org/10.1080/03610918.2017.1280830

[18] Tsionas, E. (2003) Bayesian Quantile Inference. Journal of Statistical Computation and Simulation, 73, 659-674.

https://doi.org/10.1080/0094965031000064463

[19] Andrews, D.F. and Mallows, C.L. (1974) Scale Mixtures of Normal Distributions. Journal of the Royal Statistical Society, 36, 99-102.

https://doi.org/10.1111/j.2517-6161.1974.tb00989.x

[20] Yi, N. and Xu, S. (2008) Bayesian LASSO for Quantitative Trait Loci Mapping. Genetics, 179, 1045-1055.

https://doi.org/10.1534/genetics.107.085589

[21] Gilks, W.R. and Wild, P. (1992) Adaptive Rejection Sampling for Gibbs Sampling. Journal of the Royal Statistical Society, 41, 337-348.

https://doi.org/10.2307/2347565

[22] Benoit, D.F., Al-Hamzawi, R., Yu, K.M. and Dirk, V.P. (2017) BayesQR: Bayesian Quantile Regression. R Package Version 2.3.

https://cran.r-project.org/package=bayesQR

https://doi.org/10.18637/jss.v076.i07

[23] Stamey, T.A. (1989) Prostate Specific Antigen in the Diagnosis and Treatment of Adenocarcinoma of the Prostate. II. Radical Prostatectomy Treated Patients. The Journal of Urology, 141, 1076-1083.

https://doi.org/10.1016/S0022-5347(17)41175-X