Orthogonal Series Estimation of Nonparametric Regression Measurement Error Models with Validation Data

Show more

1. Introduction

Let Y be a scalar response variable and X be an explanatory variable in regression. We consider the nonparametric regression model

$Y=g\left(X\right)+\epsilon $ (1)

where $g(\cdot )$ is an unknown nonparametric regression function, $\epsilon $ is a noise variable, and given X the errors $\epsilon =Y-g\left(X\right)$ are assumed to be independent and identically distributed. We consider the model (1) with explanatory variable X measured with error and Y measured exactly. That is, instead of the true X, the surrogate variable W is observed. Throughout we assume

$E\left[\epsilon \mathrm{|}W\right]=0\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{with}\text{\hspace{0.17em}}\text{probability}\text{\hspace{0.17em}}\text{1}$ (2)

which is always satisfied if, for example, W is a function of X and some independent noise (see e.g. [1] ).

Nonparametric regression model (1) in presence of errors in covariables has attracted considerable attention in the literature, and is by now well understood. See Carroll et al. [2] for an excellent source of references for various approaches. However, all these works mostly focus on specifying error model structure between the true variables X and the surrogate variables W (e.g. the classical error structure and the Berkson error structure). In practice, the relationship between the surrogate variables and the true variables can be rather complicated compared to the classical or Berkson error structural equations usually assumed. This situation presents serious difficulties in making valid statistical inferences. Common solution is to use the help of validation data to infer the missing information about relationship between W and X.

We consider settings where some validation data are available for relating X and W. To be specific, we assume that independent validation data $\left({W}_{j}\mathrm{,}{X}_{j}\right)$ , $N+1\le j\le N+n$ are available in addition to the independent primary data ${\left\{\left({Y}_{i},{W}_{i}\right)\right\}}_{i=1}^{N}$ . Recently, several approaches to statistical inference based on surrogate data and a validation sample are available (see, for example, [1] , [3] - [12] and among others). But these approaches do not applicable for handling nonparametric regression measurement error model with the availability of a validation data set. Actually, the models considered by the above referenced authors are some parametric or semiparametric models, and the model (1) is a nonparametric one. With the help of validation data, [13] , [14] and [15] developed estimation methods for the nonparametric regression model (1) with measurement error. However, [13] assumes that the response Y but not the covariable X is measured with error; The method proposed by [14] cannot be extended to the subject assume explanatory variable X is a vector; The approach proposed by [15] is too complicated to calculate.

In this paper, without specifying any structural equations, an orthogonal series method is proposed to estimate g with the help of validation data. As explained in Section 2, we estimate g by solving the following Fredholm equation of the first kind,

$Tg=m$ (3)

Here, we propose orthogonal series estimator of T using the validation data. Using a similar approach, we estimate m based on primary data set. Then an estimator of g is obtained by Tikhonov regularization method.

This paper is arranged as follows. In Section 2, we define an orthogonal series estimation method. In Section 3, we state the convergence rates of the proposed estimator. Simulation results are reported in Section 4 and a brief discussion is given in Section 5. Proofs of the theorems are presented in Appendix.

2. Model and Series Estimation

2.1. Model

Recall model (1) and the assumptions below it. Assume that in addition to the primary data set consisting of N independent and identically distributed obser- vations ${\left\{\left({Y}_{i},{W}_{i}\right)\right\}}_{i=1}^{N}$ from model (1), validation consisting of n independent and identically distributed observations ${\left\{\left({X}_{j},{W}_{j}\right)\right\}}_{j=N+1}^{N+n}$ are available. Furthermore, we suppose that X and W are both real-valued random variables. The extension to random vectors complicates the notation but does not affect the main ideas and results. Without loss of generality, let the supports of X and W both be contained in $\left[\mathrm{0,1}\right]$ (otherwise, one can carry out monotone transformations of X and W).

Let ${f}_{XW}$ and ${f}_{W}$ denote respectively the joint density of $\left(X\mathrm{,}W\right)$ and marginal density of W. Then, according to (2), we have

$E\left(Y|W=w\right)=E\left[g\left(X\right)|W=w\right]={\displaystyle \int}g\left(x\right)\frac{{f}_{XW}\left(x,w\right)}{{f}_{W}\left(w\right)}\text{d}x$ (4)

Let $m\left(w\right)=E\left(Y|W=w\right){f}_{W}\left(w\right)$ and

${L}_{2}\left(\left[0,1\right]\right)=\left\{\phi :\left[0,1\right]\to \mathcal{R},\text{\hspace{0.17em}}\text{s}\text{.t}\text{.}\text{\hspace{0.17em}}\Vert \phi \Vert ={\left({\displaystyle \int}{\left|\phi \left(x\right)\right|}^{2}\text{d}x\right)}^{1/2}<\infty \right\}$

Define the operator $T\mathrm{:}{L}_{2}\left(\left[\mathrm{0,1}\right]\right)\to {L}_{2}\left(\left[\mathrm{0,1}\right]\right)$ as

$\left(T\phi \right)\left(w\right)={\displaystyle \int}\phi \left(x\right){f}_{XW}\left(x\mathrm{,}w\right)\text{d}x$

So that Equation (4) is equivalent to the operator equation

$m\left(w\right)=\left(Tg\right)\left(w\right)$ (5)

According to Equation (5), the function g is the solution of a Fredholm integral equation of the first kind, and this inverse problem is known to be ill-posed and needs a regularization method. A variety of regulation schemes are available in the literature (see e.g. [16] ) but we focus in this paper on the Tikhonov regularized solution:

${g}^{\alpha}=\mathrm{arg}\underset{g}{\mathrm{min}}\left[{\Vert Tg-m\Vert}^{2}+\alpha {\Vert g\Vert}^{2}\right]$ (6)

where the penalization term $\alpha >0$ is the regularization parameter.

We define the adjoint operator ${T}^{\ast}$ of $T$

$\left({T}^{\ast}\psi \right)\left(x\right)={\displaystyle \int}\psi \left(w\right){f}_{XW}\left(x\mathrm{,}w\right)\text{d}w$

where $\psi \left(w\right)\in {L}_{2}\left(\left[\mathrm{0,1}\right]\right)$ . Then the regularized solution (6) is equivalently:

${g}^{\alpha}={\left(\alpha I+{T}^{\ast}T\right)}^{-1}{T}^{\ast}m$ (7)

2.2. Orthogonal Series Estimation

In order to estimate the solution (7), we need to estimate $T$ , ${T}^{\ast}$ and $m$ . In this paper, we consider the orthogonal series method. Under some regularity conditions in Section 4.1, the density function ${f}_{XW}\left(x\mathrm{,}w\right)$ and $m\left(w\right)$ may be approximated with any wished accuracy by a truncated orthogonal series,

${f}_{XW}^{K}\left(x,w\right)={\displaystyle \underset{k=1}{\overset{K}{\sum}}}{\displaystyle \underset{l=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{d}_{kl}{\varphi}_{k}\left(x\right){\varphi}_{l}\left(w\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{m}^{K}\left(w\right)={\displaystyle \underset{k=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{m}_{k}{\varphi}_{k}(\; w\; )$

where

${d}_{kl}={\displaystyle \int}{\displaystyle \int}{f}_{XW}\left(x,w\right){\varphi}_{k}\left(x\right){\varphi}_{l}\left(w\right)\text{d}x\text{d}w\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{m}_{k}={\displaystyle \int}m\left(w\right){\varphi}_{k}\left(w\right)\text{d}w$

Here, $\left\{{\varphi}_{k}\right\}$ is an orthonormal basis of ${L}_{2}\left(\left[\mathrm{0,1}\right]\right)$ which may be trigonome- tric, polynomial, spline, wavelet, and so on. A discussion of different bases and their properties can be found in the literature (see e.g. [17] ). Only to be specific, here and in what follows we are considering the normalized Legendre polynomials on $\left[\mathrm{0,1}\right]$ , which can be obtained through the Rodrigues’ formula

${\varphi}_{k}\left(x\right)=\frac{1}{k\mathrm{!}\sqrt{2k+1}}\frac{{\text{d}}^{k}}{\text{d}{x}^{k}}\left[{\left({x}^{2}-x\right)}^{k}\right]$ (8)

The integer K is a truncation point which is the main smoothing parameter in the approximating series, and ${d}_{kl}$ and ${m}_{k}$ represent the generalized Fourier coefficients of ${f}_{XW}$ and m, respectively.

Note that ${d}_{kl}=E\left[{\varphi}_{k}\left(X\right){\varphi}_{l}\left(W\right)\right]$ and ${m}_{k}=E\left[Y{\varphi}_{k}\left(W\right)\right]$ . Intuitively, we can obtain the estimators of ${d}_{kl}$ , ${f}_{XW}\left(x\mathrm{,}w\right)$ , ${m}_{k}$ and $m\left(w\right)$ by

${\stackrel{^}{d}}_{kl}=\frac{1}{n}{\displaystyle \underset{j=N+1}{\overset{N+n}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\varphi}_{k}\left({X}_{j}\right){\varphi}_{l}\left({W}_{j}\right),\mathrm{}{\stackrel{^}{f}}_{XW}\left(x,w\right)={\displaystyle \underset{k=1}{\overset{K}{\sum}}}{\displaystyle \underset{l=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\stackrel{^}{d}}_{kl}{\varphi}_{k}\left(x\right){\varphi}_{l}(\; w\; )$

${\stackrel{^}{m}}_{k}=\frac{1}{N}{\displaystyle \underset{i=1}{\overset{N}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{Y}_{i}{\varphi}_{k}\left({W}_{i}\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{and}\text{\hspace{0.05em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\stackrel{^}{m}\left(w\right)={\displaystyle \underset{k=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\stackrel{^}{m}}_{k}{\varphi}_{k}(\; w\; )$

respectively. The operators $T$ and ${T}^{\ast}$ can then be consistently estimated by

$\left(\stackrel{^}{T}\phi \right)\left(w\right)={\displaystyle \int}\phi \left(x\right){\stackrel{^}{f}}_{XW}\left(x,w\right)\text{d}x\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\left({\stackrel{^}{T}}^{\ast}\psi \right)\left(x\right)={\displaystyle \int}\psi \left(w\right){\stackrel{^}{f}}_{XW}\left(x,w\right)\text{d}w$

Conclude that, the estimator of $g\left(x\right)$ is obtained by

${\stackrel{^}{g}}^{\alpha}={\left(\alpha I+{\stackrel{^}{T}}^{\ast}\stackrel{^}{T}\right)}^{-1}{\stackrel{^}{T}}^{\ast}\stackrel{^}{m}$ (9)

3. Theoretical Properties

The main objective of this section is to derive the statistical properties of the estimator proposed in Section 2.2. For this purpose, we assume:

Assumption 1. 1) The support of $\left(X\mathrm{,}W\right)$ is contained in ${\left[\mathrm{0,1}\right]}^{2}$ ; 2) The joint density of $\left(X\mathrm{,}W\right)$ is square integrable w.r.t the Lesbegue measure on ${\left[\mathrm{0,1}\right]}^{2}$ .

This is sufficient condition for T to be a Hilbert-Schmidt operator and therefore to be compact (see [18] ). As a result of compactness, there exists a singular values decomposition. Let ${\lambda}_{k}\mathrm{,}k\ge 0$ be the sequence of the nonzero singular values of T and the two orthonormal sequences ${\phi}_{k}\mathrm{,}k\ge 0$ , and ${\psi}_{l}\mathrm{,}l\ge 0$ such that (see [16] ):

$T{\phi}_{k}={\lambda}_{k}{\psi}_{k},\text{\hspace{0.17em}}{T}^{*}{\psi}_{k}={\lambda}_{k}{\phi}_{k};\mathrm{}{T}^{*}T{\phi}_{k}={\lambda}_{k}^{2}{\phi}_{k},\mathrm{}T{T}^{*}{\psi}_{k}={\lambda}_{k}^{2}{\psi}_{k},\mathrm{}\text{\hspace{0.05em}}\text{for}\text{\hspace{0.05em}}k\ge 0$

We define ${\Phi}_{\beta}$ as a b-regularity space for $\beta >0$ :

${\Phi}_{\beta}=\left\{\phi \in {L}_{2}\left(\left[0,1\right]\right)\text{\hspace{0.05em}}\mathrm{}\text{such}\text{\hspace{0.17em}}\text{that}\mathrm{}{\displaystyle \underset{k\ge 0}{\sum}}\frac{\langle \phi ,{\phi}_{k}\rangle}{{\lambda}_{k}^{2\beta}}<+\infty \right\}$

Here and blow, we denote by $\langle \cdot \mathrm{,}\cdot \rangle $ the scalar product in ${L}^{2}\left(\left[\mathrm{0,1}\right]\right)$ .

Assumption 2. We have $g\in {\Phi}_{\beta}$ for some $\beta >0$ .

We then obtain the following result (see [18] ).

Proposition 3.1. Suppose Assumptions 1 and 2 hold, then we have ${\Vert g-{g}^{\alpha}\Vert}^{2}=O\left({\alpha}^{\beta \wedge 2}\right)$ , where $\beta \wedge 2=\mathrm{min}\left\{\beta ,2\right\}$ .

In order to obtain the rate of convergence for ${\Vert {\stackrel{^}{g}}^{\alpha}-g\Vert}^{2}$ , we impose the following additional conditions:

Assumption 3. 1) The joint density ${f}_{XW}$ is r-times continuously differen- tiable on ${\left[\mathrm{0,1}\right]}^{2}$ ; 2) The function $m(\cdot )$ is s-times continuously differentiable on $\left[\mathrm{0,1}\right]$ .

Assumption 4. The function $E\left({Y}^{2}|W=w\right)$ is bounded uniformly on $\left[\mathrm{0,1}\right]$ .

Assumption 5. 1) $\mathrm{lim}n/N=\mu \in \left[0,\infty \right)$ ; 2) $\alpha \to 0$ , $K={K}_{\left(N,n\right)}\to \infty $ , $K/N\to 0$ , ${K}^{2}/n\to 0$ as $n\to \infty $ , $N\to \infty $ .

Theorem 3.1. Suppose Assumptions 1 - 5 hold. Let $\gamma =\mathrm{min}\left\{r,s\right\}$ , then we have

${\Vert {\stackrel{^}{g}}^{\alpha}-g\Vert}^{2}={O}_{P}\left[\frac{1}{\alpha}\times \left(\frac{K}{N}+\frac{1}{{K}^{2\gamma}}+\frac{{K}^{2}}{n}\right)+{\alpha}^{\beta \wedge 2}\right]$ (10)

In (10), the term ${K}^{-2\gamma}$ arises from the bias of $\stackrel{^}{g}$ caused by truncating the series approximation of ${f}_{XW}$ and $m$ . The truncation bias decreases as $\gamma $ increases. The terms ${N}^{-1}K$ and ${n}^{-1}{K}^{2}$ are respectively induced by random surrogate sampling errors and random validation sampling errors in the estimates of the generalized Fourier coefficients ${m}_{k}$ and ${d}_{kl}$ . By Theorem 3.1, it is easy to obtain the following corollary.

Corollary 3.1. Suppose the assumptions of Theorem 3.1 are satisfied. Let $K=O\left({n}^{1/\left(2\gamma +2\right)}\right)$ and $\alpha =O\left({n}^{-\gamma /\left[\left(\gamma +1\right)\left(\beta \wedge 2+1\right)\right]}\right)$ , then we have

${\Vert {\stackrel{^}{g}}^{\alpha}-g\Vert}^{2}={O}_{P}\left({n}^{-\kappa \frac{\beta \wedge 2}{\beta \wedge 2+1}}\right)$

where $\kappa =\gamma /\left(\gamma +1\right)$ .

The proofs of all the results are reported in the Appendix.

4. Simulation Studies

In this section, we conducted simulation studies of the finite-sample perfor- mance of the proposed estimators. First, for comparison, we consider the standard Nadaraya-Watson estimator base on the primary dataset ${\left\{\left({Y}_{i},{W}_{i}\right)\right\}}_{i=1}^{N}$ (denoted as ${\stackrel{^}{g}}_{N}$ ). It should be pointed out that ${\stackrel{^}{g}}_{N}$ can serve as a gold standard in the simulation study, even though it is practically unachievable due to measurement errors. Second, The performance of estimator ${g}^{est}$ is assessed by using the square root of average square errors (RASE)

$\text{\hspace{0.05em}}\text{RASE}\text{\hspace{0.05em}}={\left\{\frac{1}{M}{\displaystyle \underset{s=1}{\overset{M}{\sum}}}{\left[{g}^{est}\left({u}_{s}\right)-g\left({u}_{s}\right)\right]}^{2}\right\}}^{1/2}$

where ${u}_{s},s=1,\cdots ,M$ , are grid points at which ${g}^{est}\left({u}_{s}\right)$ is evaluated.

We considered model (1) with the regression function being

1) $g\left(x\right)={\varphi}_{\mathrm{0,1.5}}\left(4x\right)+{\varphi}_{\mathrm{1,2}}\left(4x\right)+{\varphi}_{\mathrm{2,5}}\left(4x\right)\mathrm{,}\epsilon ~N\left(\mathrm{0,0.2}\right)$ ,

2) $g\left(x\right)=5sin\left(2x\right)exp\left(-16{x}^{2}/50\right)\mathrm{,}\epsilon ~N\left(\mathrm{0,0.2}\right)$ ,

where ${\varphi}_{\mu \mathrm{,}\sigma}$ is the density of an $\text{Normal}\left(\mu \mathrm{,}{\sigma}^{2}\right)$ variable. To perform this simulation, we generate W from ${f}_{W}$ and $\delta $ from ${f}_{\delta}$ , and put $X=W+\delta $ . The densities ${f}_{W}$ and ${f}_{\delta}$ , chosen in the beta family, are

${f}_{W}\left(w\right)=\frac{\left(1-{w}^{2}/4\right)}{2B\left(1/2\mathrm{,2}\right)}I\left(w\in \left[-\mathrm{2,2}\right]\right)$

${f}_{\delta}\left(u\right)=\frac{{\left(1-{u}^{2}\right)}^{{\rho}_{\delta}}}{B\left(1/2,{\rho}_{\delta}+1\right)}I\left(u\in \left[-1,1\right]\right)$

where we chose ${\rho}_{\delta}=1$ , ${\rho}_{\delta}=3$ and ${\rho}_{\delta}=5$ (in fact, the greater the value of ${\rho}_{\delta}$ , the smaller the variance of $\delta $ ). Simulations were run with different validation and primary data sizes $\left(n\mathrm{,}N\right)$ ranging from $\left(\mathrm{20,60}\right)$ to $\left(\mathrm{50,250}\right)$ according to the ratio $\kappa =N/n=3$ and $\kappa =N/n=5$ , respectively. For each case, 500 simulated data sets were generated for each sample size of $\left(n\mathrm{,}N\right)$ .

To implement our method (9), the regularization parameter $\alpha $ and truncating parameter K should be chosen. Here, we estimate $\alpha $ and K by minimizing the following two-dimensional cross-validation score selection criterion

$\text{CV}\left(\alpha \mathrm{,}K\right)={\displaystyle \underset{i=1}{\overset{N}{\sum}}}{\left\{{Y}_{i}-{\displaystyle \underset{k=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{\stackrel{^}{g}}_{k}^{\left(-i\right)}{\varphi}_{k}\left({W}_{i}\right)\right\}}^{2}$

where ${\stackrel{^}{g}}_{k}^{\left(-i\right)}$ are the solutions based on (9), after deleting the ith primary observation $\left({Y}_{i}\mathrm{,}{W}_{i}\right)$ . In addition, for the naive estimator ${\stackrel{^}{g}}_{N}$ , we used the standard normal kernel, and the bandwidth was selected by leave-one-out CV approach. In all graphs, to illustrate the performance of an estimator, we show the estimated curves corresponding to the first (Q1), second (Q2) and third (Q3) quartiles of the ordered RASEs. The target curve is always represented by a solid curve.

Figure 1 shows the regression function curve, the quartile curves of 500 estimates ${\stackrel{^}{g}}^{\alpha}\left(x\right)$ under different values of ${\rho}_{\delta}$ for sample size $\left(N\mathrm{,}n\right)=\left(\mathrm{90,30}\right)$ , in the example (a). From this figure, we clearly see that the proposed estimator ${\stackrel{^}{g}}^{\alpha}\left(x\right)$ appeared to perform very well in this study. Taking the measurement error levels into account, as the variances of $\delta $ decrease, ${\stackrel{^}{g}}^{\alpha}\left(x\right)$ tends to have smaller bias at the peaks of the regression curve.

Figure 2 illustrates the way in which the estimator improves as sample size increases. We compare the results obtained when estimating curve (b) under different settings of sample size $\left(N\mathrm{,}n\right)$ for ${\rho}_{\delta}=3$ . We see clearly that, as the sample size increases, the quality of the estimators improves significantly.

Table 1 compares, for various sample sizes, the results obtained for estimating curves (a) and (b) when
${\rho}_{\delta}=1$ ,
${\rho}_{\delta}=3$ and
${\rho}_{\delta}=5$ . The estimated RASEs which were evaluated at 2^{7} grid points of x are presented. Our results show that

Figure 1. Estimation of regression function (a) for samples of size $\left(N,n\right)=\left(90,30\right)$ , when ${\rho}_{\delta}=1$ (left panel), ${\rho}_{\delta}=3$ (middle panel) and ${\rho}_{\delta}=5$ (right panel). The solid curve is the target curve.

Table 1. The RASE comparison for the estimators ${\stackrel{^}{g}}^{\alpha}\left(x\right)$ and ${\stackrel{^}{g}}_{N}\left(x\right)$ . Let $\kappa =N/n$ .

Figure 2. Estimation of regression function (b) for ${\rho}_{\delta}=3$ , when $\left(N\mathrm{,}n\right)=\left(\mathrm{60,20}\right)$ (left panel), $\left(N\mathrm{,}n\right)=\left(\mathrm{90,30}\right)$ (middle panel) and $\left(N\mathrm{,}n\right)=\left(\mathrm{150,50}\right)$ (right panel). The solid curve is the target curve.

the estimator ${\stackrel{^}{g}}^{\alpha}$ outperforms ${\stackrel{^}{g}}_{N}$ . Also, the performance of ${\stackrel{^}{g}}^{\alpha}$ improves (i.e., the corresponding RASEs decrease) considerably as the sample sizes increases. For any nonparametric method in measurement error regression problem, the quality of the estimator also depends on the discrepancy of the observed sample. That is, the performance of the estimator depends on the variances of measurement error. Here, we compare the results for different values of ${\rho}_{\delta}$ . As expected, Table 1 shows that the effect of the variances on the estimator performance is obvious.

5. Discussion

In this paper, we have proposed a new method for estimating non-parametric regression models when the explanatory variable is measured with error under the assumption that a proper validation data set is available. The validation data set allows us to estimate joint density ${f}_{XW}$ of the true variable and the surrogate variable via an orthogonal series method. In practice, our proposed method can be extended to multidimensional cases in which X may be a p-variate explanatory variable. When the dimension of X and hence of W is large, the curse of dimensionality may occur because of the multivariate density estimation of ${f}_{XW}$ . In this case, exponential series estimator proposed by [19] ensures the positiveness of the estimated density. After obtaining the exponential series estimator of ${f}_{XW}$ , we can obtain results similar to those in the previous sections. Asymptotic theory in this setting still needs to be pursued in the further research.

Acknowledgements

This work was supported by GJJ160927 and Natural Science Foundation of Jiangxi Province of China under grant number 20142BAB211018.

Appendix

Proofs of Theorem 3.1 and Corollary 3.1:

We first present some Lemmas that are need to prove the main theorem.

Lemma 7.1. Suppose Assumptions 1 and 3(1) hold. Then:

1) ${\Vert \stackrel{^}{T}-T\Vert}_{HS}^{2}={O}_{P}\left({K}^{-2r}+{n}^{-1}{K}^{2}\right)$ ;

2) ${\Vert {\stackrel{^}{T}}^{\mathrm{*}}-{T}^{\mathrm{*}}\Vert}_{HS}^{2}={O}_{P}\left({K}^{-2r}+{n}^{-1}{K}^{2}\right)$ .

where ${\Vert \text{\hspace{0.05em}}\cdot \text{\hspace{0.05em}}\Vert}_{HS}$ denotes the Hilbert-Schmidt norm, i.e.:

${\Vert \stackrel{^}{T}-T\Vert}_{HS}^{2}={\displaystyle \int}{\displaystyle \int}{\left[{\stackrel{^}{f}}_{XW}\left(x\mathrm{,}w\right)-{f}_{XW}\left(x\mathrm{,}w\right)\right]}^{2}\text{d}x\text{d}w$

Proof of Lemma 7.1. According to Lemma A1 of Wu [19] , we have

${\Vert {f}_{XW}-{f}_{XW}^{K}\Vert}^{2}=O\left({K}^{-2r}\right)$

Note that the Legendre polynomials ${\varphi}_{k}$ in (8) are orthonormal and complete on ${L}_{2}\left(\left[\mathrm{0,1}\right]\right)$ . Then

${\Vert {\stackrel{^}{f}}_{XW}-{f}_{XW}^{K}\Vert}^{2}={\displaystyle \underset{k=1}{\overset{K}{\sum}}}{\displaystyle \underset{l=1}{\overset{K}{\sum}}}{\left({\stackrel{^}{d}}_{kl}-{d}_{kl}\right)}^{2}$

By $E{\stackrel{^}{d}}_{kl}={d}_{kl}$ , we have

$\begin{array}{c}E\left\{\left|{\displaystyle \underset{k=1}{\overset{K}{\sum}}}{\displaystyle \underset{l=1}{\overset{K}{\sum}}}{\left({\stackrel{^}{d}}_{kl}-{d}_{kl}\right)}^{2}\right|\right\}={\displaystyle \underset{k=1}{\overset{K}{\sum}}}{\displaystyle \underset{l=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}Var\left({\stackrel{^}{d}}_{kl}\right)\\ \le \frac{1}{n}{\displaystyle \underset{k=1}{\overset{K}{\sum}}}{\displaystyle \underset{l=1}{\overset{K}{\sum}}}\text{\hspace{0.05em}}E{\left[{\varphi}_{k}\left(X\right){\varphi}_{l}\left(W\right)\right]}^{2}\\ =O\left[\frac{1}{n}{\displaystyle \underset{k=1}{\overset{K}{\sum}}}{\displaystyle \underset{l=1}{\overset{K}{\sum}}}{\Vert {\varphi}_{k}\left(x\right){\varphi}_{l}\left(w\right)\Vert}^{2}\right]\\ =O\left({K}^{2}/n\right)\end{array}$

where we have used the fact that ${f}_{XW}$ is uniformly bounded on ${\left[\mathrm{0,1}\right]}^{2}$ .

By Chebyshev’s inequality, then we have ${\Vert {\stackrel{^}{f}}_{XW}-{f}_{XW}^{K}\Vert}^{2}={O}_{P}\left({K}^{2}/n\right)$ . Then the desired result follows immediately.

Lemma 7.2. Suppose Assumptions 1, 3 and 4 hold. Let $\gamma \mathrm{=}min\mathrm{\{}r\mathrm{,}s\mathrm{\}}$ , then

${\Vert \stackrel{^}{m}-\stackrel{^}{T}g\Vert}^{2}={O}_{P}\left({N}^{-1}K+{K}^{-2\gamma}+{n}^{-1}{K}^{2}\right)$

Proof of Lemma 7.2. Note that $Tg=m$ . By the triangle inequality and Jensen inequality, we have

${\Vert \stackrel{^}{m}-\stackrel{^}{T}g\Vert}^{2}\le 2\left[{\Vert \stackrel{^}{m}-m\Vert}^{2}+{\Vert \left(T-\stackrel{^}{T}\right)g\Vert}^{2}\right]$

If $g\in {L}_{2}\left(\left[\mathrm{0,1}\right]\right)$ , Lemma 7.1 gives ${\Vert \left(T-\stackrel{^}{T}\right)g\Vert}^{2}={O}_{P}\left({K}^{-2r}+{n}^{-1}{K}^{2}\right)$ . According to the proof of Lemma 7.1, under Assumptions 3(2) and 4, we can show that ${\Vert \stackrel{^}{m}-m\Vert}^{2}={O}_{P}\left({K}^{-2s}+{N}^{-1}K\right)$ . Then we obtain the result in Lemma 7.2.

Proof of Theorem 3.1. Define ${\stackrel{^}{A}}_{\alpha}={\left(\alpha I+{\stackrel{^}{T}}^{\mathrm{*}}\stackrel{^}{T}\right)}^{-1}$ and ${A}_{\alpha}={\left(\alpha I+{T}^{\mathrm{*}}T\right)}^{-1}$ . Notice that ${g}^{\alpha}={A}_{\alpha}{T}^{*}Tg$ , then we have

${\stackrel{^}{g}}^{\alpha}-g={\stackrel{^}{A}}_{\alpha}\left({\stackrel{^}{T}}^{\mathrm{*}}\stackrel{^}{m}-{\stackrel{^}{T}}^{\mathrm{*}}\stackrel{^}{T}g\right)+\left({\stackrel{^}{A}}_{\alpha}{\stackrel{^}{T}}^{\mathrm{*}}\stackrel{^}{T}g-{A}_{\alpha}{T}^{\mathrm{*}}Tg\right)+\left({g}^{\alpha}-g\right)$

The second right-hand side term can itself be decomposed into two components:

${\stackrel{^}{A}}_{\alpha}{\stackrel{^}{T}}^{\mathrm{*}}\stackrel{^}{T}g-{A}_{\alpha}{T}^{\mathrm{*}}Tg={\stackrel{^}{A}}_{\alpha}\left({\stackrel{^}{T}}^{\mathrm{*}}\stackrel{^}{T}-{T}^{\mathrm{*}}T\right)g+\left({\stackrel{^}{A}}_{\alpha}-{A}_{\alpha}\right){T}^{\mathrm{*}}Tg$

Actually, since in this case ${\stackrel{^}{A}}_{\alpha}={\left(\alpha I+{\stackrel{^}{T}}^{\mathrm{*}}\stackrel{^}{T}\right)}^{-1}$ and ${A}_{\alpha}={\left(\alpha I+{T}^{\mathrm{*}}T\right)}^{-1}$ , the identity ${B}^{-1}-{C}^{-1}={B}^{-1}\left(C-B\right){C}^{-1}$ gives:

${\stackrel{^}{A}}_{\alpha}-{A}_{\alpha}=-{\stackrel{^}{A}}_{\alpha}\left({\stackrel{^}{T}}^{*}\stackrel{^}{T}-{T}^{*}T\right){A}_{\alpha}$

Thus,

${\stackrel{^}{A}}_{\alpha}{\stackrel{^}{T}}^{*}\stackrel{^}{T}g-{A}_{\alpha}{T}^{*}Tg={\stackrel{^}{A}}_{\alpha}\left({\stackrel{^}{T}}^{*}\stackrel{^}{T}-{T}^{*}T\right)\left(g-{g}^{\alpha}\right)$

From the properties of norm, we have

${\Vert {\stackrel{^}{g}}^{\alpha}-g\Vert}^{2}\le 3\left[{\Vert {\stackrel{^}{A}}_{\alpha}{\stackrel{^}{T}}^{\mathrm{*}}\left(\stackrel{^}{m}-\stackrel{^}{T}g\right)\Vert}^{2}+{\Vert {\stackrel{^}{A}}_{\alpha}\left({\stackrel{^}{T}}^{\mathrm{*}}\stackrel{^}{T}-{T}^{\mathrm{*}}T\right)\left(g-{g}^{\alpha}\right)\Vert}^{2}+{\Vert {g}^{\alpha}-g\Vert}^{2}\right]$

Let us consider the first term, we have

${\Vert {\stackrel{^}{A}}_{\alpha}{\stackrel{^}{T}}^{\mathrm{*}}\left(\stackrel{^}{m}-\stackrel{^}{T}g\right)\Vert}^{2}\le {\Vert {\stackrel{^}{A}}_{\alpha}{\stackrel{^}{T}}^{\mathrm{*}}\Vert}^{2}{\Vert \stackrel{^}{m}-\stackrel{^}{T}g\Vert}^{2}$

The first norm ${\Vert {\stackrel{^}{A}}_{\alpha}{\stackrel{^}{T}}^{\mathrm{*}}\Vert}^{2}={\Vert {\left(\alpha I+{\stackrel{^}{T}}^{\mathrm{*}}\stackrel{^}{T}\right)}^{-1}{\stackrel{^}{T}}^{\mathrm{*}}\Vert}^{2}$ is equal to the larger eigen value of the operator. These eigen values converges to ${\lambda}_{k}/\left(\alpha +{\lambda}_{k}^{2}\right)$ and are then smaller than $1/\alpha $ . It follows from Lemma 7.2 that

${\Vert {\stackrel{^}{A}}_{\alpha}{\stackrel{^}{T}}^{\mathrm{*}}\left(\stackrel{^}{m}-\stackrel{^}{T}g\right)\Vert}^{2}={O}_{P}\left[{\alpha}^{-1}\left({N}^{-1}K+{K}^{-2\gamma}+{n}^{-1}{K}^{2}\right)\right]$ (11)

Next, we consider the term ${\Vert {\stackrel{^}{A}}_{\alpha}\left({\stackrel{^}{T}}^{\mathrm{*}}\stackrel{^}{T}-{T}^{\mathrm{*}}T\right)\left(g-{g}^{\alpha}\right)\Vert}^{2}$ . Note that

${\stackrel{^}{T}}^{*}\stackrel{^}{T}-{T}^{*}T={\stackrel{^}{T}}^{*}\left(\stackrel{^}{T}-T\right)+\left({\stackrel{^}{T}}^{*}-{T}^{*}\right)T$

Then

$\begin{array}{l}{\Vert {\stackrel{^}{A}}_{\alpha}\left({\stackrel{^}{T}}^{\mathrm{*}}\stackrel{^}{T}-{T}^{\mathrm{*}}T\right)\left(g-{g}^{\alpha}\right)\Vert}^{2}\\ \le 2\left[{\Vert {\stackrel{^}{A}}_{\alpha}{\stackrel{^}{T}}^{\mathrm{*}}\Vert}^{2}{\Vert \stackrel{^}{T}-T\Vert}^{2}{\Vert g-{g}^{\alpha}\Vert}^{2}+{\Vert {\stackrel{^}{A}}_{\alpha}\Vert}^{2}{\Vert {\stackrel{^}{T}}^{\mathrm{*}}-{T}^{\mathrm{*}}\Vert}^{2}{\Vert T\left(g-{g}^{\alpha}\right)\Vert}^{2}\right]\end{array}$

We have ${\Vert {\stackrel{^}{A}}_{\alpha}{\stackrel{^}{T}}^{*}\Vert}^{2}={O}_{P}\left(1/\alpha \right)$ , and ${\Vert {\stackrel{^}{A}}_{\alpha}\Vert}^{2}={O}_{P}\left(1/{\alpha}^{2}\right)$ (see [20] ). According to Lemma 7.1, we have ${\Vert \stackrel{^}{T}-T\Vert}^{2}$ or ${\Vert {\stackrel{^}{T}}^{\mathrm{*}}-{T}^{\mathrm{*}}\Vert}^{2}$ are ${O}_{P}\left({K}^{-2r}+{n}^{-1}{K}^{2}\right)$ .

By Proposition 3.1:

${\Vert {g}^{\alpha}-g\Vert}^{2}=O\left({\alpha}^{\beta \wedge 2}\right)$ (12)

The term $T\left(g-{g}^{\alpha}\right)$ identical to $\alpha {A}_{\alpha}{T}^{\mathrm{*}}g$ , is the regularity bias of ${T}^{\mathrm{*}}g$ equal to $O\left({\alpha}^{\left(\beta +1\right)\wedge 2}\right)$ .

Therefore, we have

${\Vert {\stackrel{^}{A}}_{\alpha}\left({\stackrel{^}{T}}^{\mathrm{*}}\stackrel{^}{T}-{T}^{\mathrm{*}}T\right)\left(g-{g}^{\alpha}\right)\Vert}^{2}={O}_{P}\left[{\alpha}^{\left(\beta -1\right)\wedge 0}\left({K}^{-2r}+{n}^{-1}{K}^{2}\right)\right]$ (13)

Combining (11), (12) and (13) gives the desired result of Theorem 3.1.

Proof of Corollary 3.1. By Theorem 3.1, the proof of Corollary 3.1 is straightforward and is omitted.

References

[1] Stute, W., Xue, L. and Zhu, L. (2007) Empirical Likelihood Inference in Nonlinear Errors-in-Covariables Models with Validation Data. Journal of the American Statistical Association, 102, 332-346.

https://doi.org/10.1198/016214506000000816

[2] Carroll, R.J., Ruppert, D., Stefanski, L.A. and Crainiceanu, C.M. (2006) Measurement Error in Nonlinear Models. 2nd Edition, Chapman and Hall CRC Press, Boca Raton.

https://doi.org/10.1201/9781420010138

[3] Carroll, R.J. and Stefanski, L.A. (1990) Approximate Quasi-Likelihood Estimation in Models with Surrogate Predictors. Journal of the American Statistical Association, 85, 652-663.

https://doi.org/10.1080/01621459.1990.10474925

[4] Carroll, R.J. and Wand, M.P. (1991) Semiparametric Estimation in Logistic Measurement Error Models. Journal of the Royal Statistical Society B, 53, 573-585.

[5] Sepanski, J.H. and Lee, L.F. (1995) Semiparametric Estimation of Nonlinear Errors-in-Variables Models with Validation Study. Journal of Nonparametric Statistics, 4, 365-394.

https://doi.org/10.1080/10485259508832627

[6] Cook, J.R. and Stefanski, L.A. (1994) Simulation-Extrapolation Estimation in Parametric Measurement Error Models. Journal of the American Statistical Association, 89, 1314-1328.

https://doi.org/10.1080/01621459.1994.10476871

[7] Carroll, R.J., Gail, M.H. and Lubin, J.H. (1993) Case-Control Studied with Errors in Covariables. Journal of the American Statistical Association, 88, 185-199.

[8] Stefanski, L.A. and Buzas, J.S. (1995) Instrumental Variable Estimation in Binary Regression Measurement Error Models. Journal of the American Statistical Association, 90, 541-550. https://doi.org/10.1080/01621459.1995.10476546

[9] Lv, Y.-Z., Zhang, R.-Q. and Huang, Z.-S. (2013) Estimation of Semi-Varying Coefficient Model with Surrogate Data and Validation Sampling. Acta Mathematicae Applicatae Sinica English, 29, 645-660. https://doi.org/10.1007/s10255-013-0241-3

[10] Xiao, Y. and Tian, Z. (2014) Dimension Reduction Estimation in Nonlinear Semiparametric Error-in-Response Models with Validation Data. Mathematica Applicata, 27, 730-737.

[11] Yu, S.H. and Wang, D.H. (2014) Empirical Likelihood for First-Order Autoregressive Error-in-Variable of Models with Validation Data. Communications in Statistics Theory Methods, 43, 1800-1823. https://doi.org/10.1080/03610926.2012.679763

[12] Xu, W. and Zhu, L. (2015) Nonparametric Check for Partial Linear Errors-in-Covariables Models with Validation Data. Annals of the Institute of Statistical Mathematics, 67, 793-815.

https://doi.org/10.1007/s10463-014-0476-7

[13] Wang, Q. (2006) Nonparametric Regression Function Estimation with Surrogate Data and Validation Sampling. Journal of Multivariate Analysis, 97, 1142-1161.

https://doi.org/10.1016/j.jmva.2005.05.008

[14] Du, L., Zou, C. and Wang, Z. (2015) Nonparametric Regression Function Estimation for Error-in-Variable Models with Validation Data. Open Journal of Statistics, 5, 808-819.

https://doi.org/10.4236/ojs.2015.57080

[15] Yin, Z. and Liu, F. (2011) Orthogonal Series Estimation of Nonparametric Regression Measurement Error Models with Validation Data. Statistica Sinica, 21, 1093-1113.

https://doi.org/10.5705/ss.2009.047

[16] Darolles, S., Florens, J.P., Renault, E. and Kress, R. (1999) Linear Integral Equations. Springer, New York.

[17] Efromovich, S. (1999) Nonparametric Curve Estimation: Methods, Theory and Applications. Springer, New York.

[18] Carrasco, M., Florens, J.P. and Renault, E. (2007) Linear Inverse Problems in Structural Econometrics: Estimation Based on Spectral Decomposition and Regularization. Elsevier, North Holland, 5633-5751.

[19] Wu, X. (2010) Exponential Series Estimator of Multivariate Densities. Journal of Econometrics, 156, 354-366. https://doi.org/10.1016/j.jeconom.2009.11.005

[20] Groetsch, C. (1984) The Theory of Tikhonov Regularization for Fredholm Equations of the First Kind. Pitman, London.