Received 21 December 2015; accepted 20 February 2016; published 23 February 2016
In nonparametric regression, it is often of interest to estimate some functionals of a regression function, such as its derivatives. For example, in the study of growth curves, the first (speed) and second (spurt) derivatives of the height as a function of age are important parameters for study (Muller  ). Other needs for derivative estimation often arise in nonparametric regressions themselves. For example, in constructing interval estimates for a re- gression function and kernel bandwidth selection (Ruppert and Wand  ), estimators of higher order derivatives are employed in estimating the leading bias terms. Suppose n independent variables with
where and are independent random variables, assumed to have normal distribution with mean zero and variance for simplicity. The s have a density g which may be known or unknown, but assumed to be compactly supported on the interval, as well as f. We aim to estimate, that is, the dth derivative of f, for any integer d.
Considerable research has been devoted to the subject of estimation, mainly the kernel methods, see, e.g.,  -  , the smoothing splines, and local polynomial methods, see, e.g.,  - . One may also be interested in more traditional approaches to nonparametric regression, mainly fixed-bandwidth kernel methods, orthogonal series methods and linear spline smoothers. These methods are not adaptive. The estimators based on these methods may achieve substantially slower rate of convergence if the smoothness of the underlying regression functions is misspecified. The recent development of wavelet bases based on multiresolution analyses suggests new techniques for nonparametric function estimation.Wavelet analysis plays important roles in both pure and applied mathematics such as signal processing, image compressing, and numerical solutions. The application of wavelet theory to the field of statistical function estimation is pioneered by Donoho and Johnstone. In a series of important papers (see, e.g.,  -  ), Donoho and Johnstone and coauthors present a coherent set of procedures that are spatially adaptive and near optimal over a range of function spaces of inhomogeneous smoothness. They enjoy excellent mean squared error properties when are used to estimate functions that are only piecewise smooth and have near optimal convergence rates over large function classes.
Recently a quite different algorithm is developed by Kerkyacharian and Picard  . The procedure stays very close to the equispaced Donoho and Johnstone’s Visushrink procedure, and thus is very simple in its form and in its implementation. Simply, the projection is done on an unusual non-orthonormal basis, called warped wavelet basis. Assuming that g is known but with no boundedness assumptions on it, two new estimators have been introduced based on a warped wavelet basis. The features of this basis consist of a standard wavelet basis and of the definition of G related to the model. Of course, the properties of this basis truely depend on the warping factor G. Such a technique has been already used with success in the framework of nonparametric regression with random design by Kerkyacharian and Picard  . Recent works on warped wavelet basis in nonparametric statistics can be found in  - . To the best of our knowledge, only Cai  and Petsa and Sapatinas  have proposed wavelet estimators for, but defined with a deterministic equidistant design; that is,
The consideration of a random design with warped wavelet complicates significantly the problem and no wavelet estimators for derivative of regression function exist in this case. This motivates us to study the case under different dependence structures: the strong mixing case and the r-mixing case. Asymptotic mean inte- grated squared error properties for derivatives of regression function has been explored. In each case, we prove that warped wavelet estimator attains a fast rate of convergence. Another important advantage of the warped basis estimators is that they are near optimal in the minimax sense over a large class of function spaces for a wide variety of design densities, not necessarily bounded above and below as generally required by other wavelet estimators. Basically, the condition on the design refers to the Muckenhoupt weights theory introduced in Muckenhoupt  .
The rest of the paper is organized as follows. Section 2 describes the warped wavelet basis and nonquispaced procedure. Optimality of the estimators will be presented in Section 3, while Section 4 contains proofs of the main results.
We aim to estimate derivative of regression function when via n random variables (or vectors) from a strictly stationary stochastic process defined on a probability space.
Condition 1. We define the m-th strong mixing coefficient of by
We define as the s-algebra generated by the random variables (or vectors) and as the s-algebra generated by the random variables (or vectors) We say that is strong mixing if and only if. Furthermore, there exict two constants such that, for any integer,
Applications on strong mixing can be found in  -  . Among various mixing conditions used in the literature, a-mixing has many practical applications. Many stochastic processes and time series are known to be a-mixing. Under certain weak assumptions autoregressive and more generally bilinear time series models are strongly mixing with exponential mixing coefficients. The a-mixing dependence is reasonably weak; it is satis- fied by a wide variety of models including Markov chains, GARCH-type models and discretely observed dis- cussions.
Condition 2. Let be a strictly stationary random sequence. For any, we define the m-the maximal correlation coefficient of by r-mixing:
3. Warped Basis and Estimation Framework
Let N be a positive integer. We consider an orthonormal wavelet basis generated by dilations and translations of a father Daubechies-type wavelet and a mother Daubechies type wavelet of the family db2N (see  ). Further details on wavelet theory can be found in Daubechies  and Meyer  . In particular, mention that 𝜙 and 𝝍 have compact supports. For any, we set and for, we define and as father and mother wavelet:
With appropriated treatments at the boundaries, there exists an integer such that, for any integer,
forms an orthonormal basis of. For any integer and, we have the following wave- let expansion:
where and. Furthermore we consider the following wavelet sequential definition of
the Besov balls. We say, with, and if there exists a constant, such that
with the usual modifications if or. Note that, for particular choices of and con- tains the classical Holder and Sobolev balls. See, e.g., Meyer  and Hardle et al.  . Now we consider the wavelet basis with and and have d derivatives, then the generalized expansion of deri- vative of f is
where the coefficients are
We define the linear wavelet estimator by
where is an integer a posteriori. For more on estimating of derivatives of density function see  and  . Kerkyacharian and Picard  propose a construction where the unknown function is expanded on a warped basis instead of a regular wavelet basis. Proceeding in such a way, the estimates of the coefficients become more natural. Let us briefly describe the construction of this procedure. Suppose
is a known function, continuous and strictly monotone from to 0,1], then
It is clear that the above estimator is unbiased and we perform the following warped estimator:
In the case where g is unknown, we replace G wherever it appears in the construction by the empirical distribution of the Xi’s:
Let us define the new empirical wavelet coefficients:
Consequently we have the estimator:
This approach was initially introduced by Rao  for the estimation of the derivatives of a density. Note that, for m = 0 the standard case, this estimator has been considered and studied in Kerkyachariyan and Picard  .
4. Optimality Results
The main results of the paper are upper bounds for the mean inegrated square error of the wavelet estimator, which is defined as usual by
Moreover, C denotes any constant that does not depend on l, k and n.
Proposition 4.1. Suppose that are independent. For any integer and is unbiased estimator of and there exists a constant such that
Proof of Proposition 4.1. We have
So is unbiased estimator of. Therefore
For upper bound of, we have
Using the same technic as  and change of variables, we obtain
Considering almost the same integral as in, and the fact, we have
It follows from (4.2), (4.3) and (4.1), that
Proposition 4.2. Suppose that the assumptions of Condition 1 hold. Let be (3.4). Then there exists C > 0 such that
Proof of Proposition 4.2. Observe that
It follows from the fact that,
Using Proposition 6.1 in  , and the fact that, we have
Applying the Davydov inequality for strongly mixing processes (see  ), for any, we have
Now we have and by (4.3),
Hence by applying (4.7) and (4.8), we get
It follows from (4.5), (4.6) and (4.9) that
Now (4.10) with proposition 4.1 completes the proof.
Proposition 4.3. Suppose that the assumptions of Condition 2 hold. Let be (3.4). Then there exists a constant such that
Proof of Proposition 4.3. Having the same technique as in Proposition 4.2, we have
Applying the covariance inequality for r-mixing processes (see Doukahn  ), i.e.,
We obtain from (4.2),
So Proposition 4.3 is complete from (4.12) and (4.13).
Now based on the above Proposition, we have the following main result:
Theorem 4.1. Suppose that the assumptions of Section 2 hold. Let be (3.4). Suppose that and Then there exists a constant such that
where The rate of convergence corresponds to the one obtained in the derivatives density estimation framework. See, for example, Rao  and Chaubey et al.   .
Proof of Theorem 4.1. Since we set an arthonirmal set in L[0,1], we have
As we define, there exists a constant, such that
First consider the i.i.d case. Using (4.2) and (4.3) and the fact that, one can easily have
Second, suppose the assumptions of Section 2 hold. Using Proposition 4.2 with, we have
Remark 4.1. Theorem 4.1 shows that, under mild assumptions on the dependence of observations, attains a rate of convergence close to the one for the i.i.d. case i.e.,.
In this paper, we proposed a wavelet-based estimator for derivatives of regression function in the random design. The proposed estimator was formulated according to the warped basis which was simple and easy for applications. The results successfully revealed that without imposing too restrictive assumptions on the model, the wavelet-based estimator attained a sharp rate of convergence under strong mixing and ρ-mixing structures.
The author would like to express her gratitude to the referee and chief editor for their valuable suggestions which have improved the earlier version of the paper.