, s = 1 , , M , are grid points at which g e s t ( u s ) is evaluated.

We considered model (1) with the regression function being

g ( x , z ) = 1 2 π exp ( 0.5 x 2 0.5 z 2 ) ,

and ε being distributed as N ( 0,0.1 ) . The covariate is generated according to ( X , Z ) T ~ N ( 0, Σ ) with v a r ( X ) = v a r ( Z ) = 1 and the correlation coefficient between X and Z being 0.6, and W = X + η v , v ~ N ( 0,1 ) . Results for η = 0.2 , η = 0.4 and η = 0.6 are reported. Simulations were run with different validation and primary data sizes ( n , N ) ranging from ( 10,30 ) to ( 50,250 ) according to the ratio ρ = N / n = 3 and ρ = N / n = 5 , respectively. We generate 500 datasets for each sample size of ( n , N ) .

To calculate g ˜ α ( x , z ) , we used the normalized Legendre polynomials as basis and the standard normal kernel (denote K 0 ( ) ). For g ˜ N ( x , z ) , we used an product kernel K ( x 1 , x 2 ) = K 0 ( x 1 ) K 0 ( x 2 ) , and the bandwidth was selected by generalized cross-validation approach (GCV). For our estimator g ˜ α ( x , z ) , we used the cross-validation approach to choosing the four parameters h N , h n , K and α . For this purpose, h N , h n and ( K , α ) are selected separately as follows.

Define

f ^ Z ( z ; h n ) = 1 n h n j = N + 1 N + n K h n ( z Z j ) .

and

f ˜ Z ( z ; h N ) = 1 N h N i = 1 N K h N ( z Z i ) ,

Here, we adopt the cross-validation (CV) approach to estimate h n by

h ^ n = a r g m i n h n 1 n j = N + 1 N + n { Z j f ^ Z ( j ) ( Z j ; h n ) } 2 ,

where the subscript j denotes the estimator being constructed without using the jth observation. Similarly, we get h ^ N . After obtaining h ^ N and h ^ n , we then select ( K , α ) by

( α ^ , K ^ ) = a r g m i n ( α , K ) 1 N i = 1 N { Y i k = 1 K g ˜ z k ( i ) ϕ k ( W i ) } 2 ,

where the subscript i denotes the estimator being constructed without using the ith observation ( Y i , W i , Z i ) .

We compute the RASE at 200 × 200 grid points of ( x , z ) . Table 1 presents

Table 1. The RASE ( × 10 1 ) comparison for the estimators g ˜ α ( x , z ) and g ˜ N ( x , z ) .

the RASE for estimating curves g ( x , z ) when η = 0.2 , η = 0.4 and η = 0.6 for various sample sizes. It is obvious that our proposed estimator g ˜ α has much smaller RASE than g ˜ N . As is expected, our proposed estimating method produces more accurate estimators than the Nadaraya-Watson estimators. Moreover, there is a drastic improvement in accuracy by using our estimator over the Nadaraya-Watson estimator; this improvement increases with ρ .

Acknowledgments

This work was supported by GJJ160927 and Natural Science Foundation of Jiangxi Province of China under grant number 20142BAB211018.

Appendix

Proof of Theorem 3.1:

Lemma 6.1. Suppose Assumptions (A1), (A2) (i) and (A4) hold. For each z [ 0,1 ] , we have

f ^ X W Z f X W Z 2 = O P { K 2 [ ( n h n ) 1 + h n 2 r ] + K 2 r } .

Proof of Lemma 6.1.. For each z [ 0,1 ] , by Assumptions (A2) (i) and (A4), we have

E ( d ^ z k l ) = 1 h n ϕ k ( x ) ϕ l ( w ) K h n ( z u ) f X W Z ( x , w , u ) d x d w d u = ϕ k ( x ) ϕ l ( w ) K ( u ) f X W Z ( x , w , z + u h n ) d x d w d u = ϕ k ( x ) ϕ l ( w ) f X W Z ( x , w , z ) d x d w + { h n r u r K ( u ) d u r f X W Z ( x , w , z ) z r ϕ k ( x ) ϕ l ( w ) d x d w } ( 1 + o ( 1 ) ) = d z k l + h n r d z k l ( r ) u r K ( u ) d u ( 1 + o ( 1 ) ) ,

where

d z k l ( t ) = r f X W Z ( x , w , z ) z r ϕ k ( x ) ϕ l ( w ) d x d w .

Note that ϕ k are orthonormal and complete basis functions on L 2 ( [ 0,1 ] ) . Under Assumptions (A2) (i), for each z [ 0,1 ] , we have r f X W Z ( x , w , z ) / z r L 2 ( [ 0,1 ] 2 ) . Then, using Cauchy-Schwarz inequality, d z k l ( r ) is bounded in absolute value for each z [ 0,1 ] . Hence, we obtain that

E ( d ^ z k l ) = d z k l + O ( h n r ) .

Moreover, for each z [ 0,1 ] , we have

V a r ( d ^ z k l ) 1 n h n 2 E { [ ϕ k ( X ) ϕ l ( W ) ] 2 K h n 2 ( z Z ) } = 1 n h n 2 [ ϕ k ( x ) ϕ l ( w ) ] 2 K h n 2 ( z u ) f X W Z ( x , w , u ) d x d w d u = K ( u ) 2 n h n [ ϕ k ( x ) ϕ l ( w ) ] 2 f X W Z ( x , w , z ) d x d w ( 1 + o ( 1 ) ) = O [ 1 / ( n h n ) ] ,

where we have used the fact that f X W Z is uniformly bounded on [ 0,1 ] 2 .

We conclude that

d ^ z k l = d z k l + O ( h n r ) + O P ( 1 / n h n ) . (10)

By the triangle inequality and Jensen inequality, we have

f ^ X W Z f X W Z 2 2 [ f ^ X W Z f X W Z K 2 + f X W Z K f X W Z 2 ] .

Under Assumption (A2) (i), we can show that f X W Z K f X W Z 2 = O ( K 2 r ) (see Lemma A1 of [20] ).

By construction of the estimator, we have

f ^ X W Z f X W Z K 2 = k = 1 K l = 1 K [ d ^ z k l d z k l ] 2 = O P { K 2 [ ( n h n ) 1 + h n 2 r ] } ,

where the last equality is due to (10). The desired result follows immediately.,

Proof of Theorem 3.1. Define A ^ α z = ( α I + T ^ z * T ^ z ) 1 and A α z = ( α I + T z * T z ) 1 . Notice that g z α = A α z T z * T z g z where g z = g ( , z ) . Then we have

g ˜ α g z 2 4 [ A ^ α z T ^ z * 2 m ^ z T ^ z g z 2 + A ^ α z T ^ z * 2 T ^ z T z 2 g z g z α 2 + A ^ α z 2 T ^ z * T z * 2 T z ( g z g z α ) 2 + g z α g z 2 ] .

It follows from Lemma 6.1 that T ^ z T z 2 or T ^ z * T z * 2 are O P { K 2 [ ( n h n ) 1 + h n 2 r ] + K 2 r } . Under Assumptions (A1), we have g z α g z 2 = O ( α β 2 ) and T z ( g z g z α ) 2 = O ( α ( β + 1 ) 2 ) . Moreover, A ^ α z T ^ z * 2 = O P ( 1 / α ) and A ^ α z 2 = O P ( 1 / α 2 ) . The main task remained is to establish the order of the term m ^ z T ^ z g z 2 . By the triangle inequality and Jensen inequality, we have

m ^ z T ^ z g z 2 2 [ m ^ z m z 2 + ( T ^ z T z ) g z 2 ] .

Similar to the proof of Lemma 6.1, under Assumptions (A2)(ii), (A3) and (A4), it is easy to show that

m ^ z m z 2 = O P { K 2 s + K [ ( N h N ) 1 + h N 2 γ ] } .

Then, according to the Lemma 6.1, we have

m ^ z T ^ z g z 2 = O P { K 2 [ h n 2 r + ( n h n ) 1 ] + K 2 γ + K [ ( N h N ) 1 + h N 2 γ ] } .

Let h N = O ( N 1 2 γ + 1 ) , h n = O ( n 1 2 r + 1 ) , K = O ( n r ( 2 r + 1 ) ( γ + 1 ) ) and α = O ( N τ β 2 + 1 ) , if r s or s < r 2 s ( s + 1 ) , combining all these results, we complete the proof.,

Cite this paper
Liu, F. and Yin, Z. (2017) Estimation of Nonparametric Regression Models with Measurement Error Using Validation Data. Applied Mathematics, 8, 1454-1463. doi: 10.4236/am.2017.810106.
References

[1]   Stute, W., Xue, L. and Zhu, L. (2007) Empirical Likelihood Inference in Nonlinear Errors-in-Covariables Models with Validation Data. Journal of the American Statistical Association, 102, 332-346. https://doi.org/10.1198/016214506000000816

[2]   Carroll, R.J. and Stefanski, L.A. (1990) Approximate Quasi-Likelihood Estimation in Models with Surrogate Predictors. Journal of the American Statistical Association, 85, 652-663.
https://doi.org/10.1080/01621459.1990.10474925

[3]   Carroll, R.J. and Wand, M.P. (1991) Semiparametric Estimation in Logistic Measurement Error Models. Journal of the Royal Statistical Society: Series B, 53, 573-585.

[4]   Carroll, R.J., Gail, M.H. and Lubin, J.H. (1993) Case-Control Studied with Errors in Covariables. Journal of the American Statistical Association, 88, 185-199.

[5]   Cook, J.R. and Stefanski, L.A. (1994) Simulation-Extrapolation Estimation in Parametric Measurement Error Models. Journal of the American Statistical Association, 89, 1314-1328.
https://doi.org/10.1080/01621459.1994.10476871

[6]   Sepanski, J. and Lee, L.F. (1995) Estimation of Linear and Nonlinear Errors-in-Variables Models Using Validation Data. Journal of the American Statistical Association, 90, 130-140.
https://doi.org/10.1080/01621459.1995.10476495

[7]   Stefanski, L.A. and Buzas, J.S. (1995) Instrumental Variable Estimation in Binary Regression Measurement Error Models. Journal of the American Statistical Association, 90, 541-550.
https://doi.org/10.1080/01621459.1995.10476546

[8]   Wang, Q. and Rao, J.N.K. (2002) Empirical Likelihood-Based Inference in Linear Errors-in-Covariables Models with Validation Data. Biometrika, 89, 345-358.
https://doi.org/10.1093/biomet/89.2.345

[9]   Wang, Q. and Yu, K. (2007) Likelihood-Based Kernel Estimation in Semiparametric Errors-In-Covariables Models with Validation Data. Journal of Multivariate Analysis, 98, 455-480.

[10]   Lü, Y.-Z., Zhang, R.-Q. and Huang, Z.-S. (2013) Estimation of Semi-Varying Coefficient Model with Surrogate Data and Validation Sampling. Acta Mathematicae Applicatae Sinica English, 29, 645-660. https://doi.org/10.1007/s10255-013-0241-3

[11]   Xiao, Y. and Tian, Z. (2014) Dimension Reduction Estimation in Nonlinear Semiparametric Error-in-Response Models with Validation Data. Mathematica Applicata, 27, 730-737.

[12]   Xu, W. and Zhu, L. (2015) Nonparametric Check for Partial Linear Errors-in-Cova- riables Models with Validation Data. Annals of the Institute of Statistical Mathematics, 67, 793-815.
https://doi.org/10.1007/s10463-014-0476-7

[13]   Zhang, Y. (2015) Estimation of Partially Linear Regression for Errors-in-Variables Models with Validation Data. Springer International Publishing, 322, 733-742. https://doi.org/10.1007/978-3-319-08991-1_76

[14]   Wang, Q. (2006) Nonparametric Regression Function Estimation with Surrogate Data and Validation Sampling. Journal of Multivariate Analysis, 97, 1142-1161.
https://doi.org/10.1016/j.jmva.2005.05.008

[15]   Du, L., Zou, C. and Wang, Z. (2011) Nonparametric Regression Function Estimation for Error-in-Variable Models with Validation Data. Statistica Sinica, 21, 1093-1113.
https://doi.org/10.5705/ss.2009.047

[16]   Kress, R. (1999) Linear Integral Equations. Springer, New York. https://doi.org/10.1007/978-1-4612-0559-3

[17]   Devroye, L. and Gyorfi, L. (1985) Nonparametric Density Estimation: The L1 View. John Wiley Sons, New York.

[18]   Efromovich, S. (1999) Nonparametric Curve Estimation: Methods, Theory and Applications. Springer, New York.

[19]   Carrasco, M., Florens, J.P. and Renault, E. (2007) Linear Inverse Problems in Structural Econometrics: Estimation Based on Spectral Decomposition and Regularization. Elsevier, North Holland, 5633-5751.

[20]   Wu, X. (2010) Exponential Series Estimator of Multivariate Densities. Journal of Econometrics, 156, 354-366. https://doi.org/10.1016/j.jeconom.2009.11.005

 
 
Top