Received 20 June 2016; accepted 22 August 2016; published 25 August 2016
Consider the nonlinear regression model:
where is a scalar response variate, is a vector of covariate, is a vector of unknown regression parameter, is a known function, and it is nonlinear with respect to, is a random statistical error with. In general, d is different from p. The model has been studied by many authors, such as Jennrich  , Wu  , Crainceanu and Ruppert  and so on.
Missing data is frequently encountered in statistical studies, and ignoring it could lead to biased estimation and misleading conclusions. Inverse probability weighting (Horvitz and Thompson  ) and imputation are two main methods for dealing with missing data. Since Scharfstein et al.  noted that the augmented inverse probability weighted (AIPW) estimator in Robins et al.  was double-robust, authors have proposed many estimators with the double-robust property, see Tan  , Kang and Schafer  , Cao et al.  . The estimator is doubly robust in the sense that consistent estimation can be obtained if either the outcome regression model or the propensity score model is correctly specified. The AIPW estimators have been advocated for routine use (Bang and Robins  ). For model (1), in the absence of missing data, the weighted least squares estimator of
can be obtained by minimizing the objective function. In the presence of missing
data, the above-mentioned method can not be used directly, so we make use of AIPW method to consider the model (1).
Throughout this paper, we assume that X’s are observed completely, Y is missing at random (Rubin  ). Thus, the data actually observed are independent and identically distributed, where indicates that is observed and indicates that is missing. The missing at random (MAR) assumption implies that and Y are conditionally independent given X, that is, . This probability is called the propensity score (Rosenbaum and Rubin  ).
If, model (1) is just the classical linear model. The linear models with missing data have been studied in existing papers, such as Wang and Rao (   ), Xue  , Qin and Lei  and so on. The inverse probability weighted imputation methods of Xue  and other papers are based on the nonparametric estimators of the propensity score model. However, it is difficult to obtain the nonparametric estimators because of the “curse of dimensionality”, and as mentioned in the Kang and Schafer  , the AIPW estimators can be severely biased when both models are misspecified. In addition, there is little work done for model (1) with missing responses.
In this paper, we construct estimators for and of model (1), based on the covariate balancing propensity score (CBPS) method proposed by Imai and Ratkovic  . As mentioned in Imai and Ratkovic  , the weights based on CBPS are robust in the sense that they improve covariate balance even when propensity score model is misspecified. Our estimator has the following merits: 1) it avoids the “curse of dimensionality”; 2) it avoids selecting the optimal bandwidth; 3) it improves performance of the AIPW estimators in terms of bias, standard deviation (SD) and mean-squared error (MSE), even when both outcome regression model and propensity score model are misspecified.
The rest of this paper is organized as follows. In Section 2, based on the CBPS and the AIPW methods, the estimators for the regression parameter and the population mean are proposed, and the asymptotic properties of the estimators are investigated. In Section 3, simulation studies are carried out to assess the performance of the proposed method. In Section 4, concluding remarks are made. In Appendix, the proofs of the main results are given.
2. Construction of Estimators
The most popular choice of is a logistic regression function (Qin and Zhang  ). We make the same choice and posit a logistic regression model for
where is d-dimensional unknown column vector parameter.
2.1. CBPS-Based Estimator for the Propensity Score
Based on, people can obtain the estimator by maximizing the log-likelihood function:
Assuming that is twice continuously differentiable with respect to, so maximizing the (3) implies the first-order condition
where. However, the main drawback of this standard method is that the propensity score model may be misspecified, yielding biased estimators for the interesting parameters, such as and. To overcome the drawback, we borrow the following ideas of Imai and Ratkovic  . Similar to arguments present by Imai and Ratkovic  , we operationalize the covariate balancing property by using inverse propensity score weighting
Equation (5) ensures that the first moment of each covariate is banlanced and the weights based on CBPS are robust even when propensity score model is misspecified. The key idea behind the CBPS is that propensity score model determines the missing mechanism and covariate balancing weights, see Imai and Ratkovic  . The sample analogue of the covariate balancing moment condition given in Equation (5) is
According to Imai and Ratkovic  , the CBPS is said to be just identified when the number of moment conditions equals that of parameters. If we use the covariate balancing conditions given in Equation (6) alone, the CBPS is just-identified. If we combine Equation (6) with the score condition given in Equation (4), then the CBPS is overidentified because the number of moment conditions exceeds that of parameters.
Combining Equation (6) with the score condition given in Equation (4), we obtain the following equation:
Let be the solution to the Equation (7). For the overidentified CBPS, the GMM (Hansen  ) estimator can be obtained by minimizing the following equation with respect to for some positive-semidefinite symmetric weight matrix W:
It is easy to show that, under some regularity conditions, is a consistent estimator of, the true value of. For the just-identified CBPS, we borrow the ideas of Imai and Ratkovic  and still minimize Equation (8) without the score condition to find.
Theorem 1. Suppose that be a set of independent and identically distributed random vectors, under the Assumptions (A1)-(A3) in the Appendix, then and, where minimizes Equation (8).
2.2. Estimator for the Regression Parameter
To make use of AIPW method, we borrow the idea of Seber and Wild  and define the least squares estimator of based on complete-case data by solving the following estimating equation:
where. There is no closed form of, but it can be obtained by the following iterative equation:
where and are eval-
uated at. If, where c is a prespecified tolerance and denotes the norm, then
we stop the above iterative algorithm and obtain the least squares estimator of, denoted by.
Although the implementation of the complete case method is simple, it may result in misleading conclusion by simply excluding the missing data. In this section, we introduce an AIPW method based on CBPS to deal with the problems of complete case method.
Denote, From Equation (1), we have
under the MAR condition. Hence
where’s satisfy. Formula (11) is a full data model without missing data. So similar to Equation (10), we can obtain an estimator of by iterative equation
and is obtained by CBPS method.
The following Theorem 2 gives the asymptotic normality of.
Theorem 2. Suppose that Assumptions (A1)-(A4) in the Appendix hold. Then we have
where, and with.
To apply Theorem 2 to construct the confidence region of, we use to consistently estimate B. where and are defined by
Therefore, we have
We can construct the confidence interval of using (12) and (13).
2.3. Estimator for the Response Mean
It is of interest to estimate the mean of Y, say, when there are missing data in the responses. We here make use of the method of Xue  to construct the estimators of. Let
Under the MAR condition, we have if is the true parameter. Then the proposed estimator is
In the following theorem, we state the asymptotic properties of.
Theorem 3. Under the assumptions (A1)-(A4) in the Appendix, we have
Borrowing the method of Xue  , we can obtain the following consistent estimator of V:
By Theorem 3, the normal approximation based confidence interval of with confidence level is
3. Simulation Examples
We conducted simulation studies to examine the performance of the proposed estimation methods. The simulated data are generated from the model with. The components of are generated from the uniform distribution respectively and is generated from the standard normal distribution, is generated from Bernoulli with true propensity score model.
When both models are misspecified or either of them is misspecified, we adopt the same way as Kang and Schafer  to examine whether our method can improve the empirical performance of doubly robust estimators
or not. Similar to Kang and Schafer  , only the
are observed. If Y is expressed as or propensity score model is expressed as, the model is misspecified. As in the original study, we conduct simulations for population mean under four scenarios:
1) both outcome and propensity score models are correctly specified;
2) only the propensity score model is correct;
3) only the outcome model is correct;
4) both outcome and propensity score models are correctly misspecified.
Due to the regression parameter is in the outcome regression model, we only conduct simulations for under (1) and (3) scenarios. For each scenario, we conduct 1000 simulations and calculate the bias, standard deviation (SD) and mean-squared error (MSE) for and. The results of our simulations are presented in Tables 1-3. For a given scenario, we examine the performance of estimators on the basis of four different propensity score methods:
Table 1. Relative performance of the estimators for regression parameter based on different propensity score estimation methods when both models are correct.
Table 2. Relative performance of the estimators for regression parameter based on different propensity score estimation methods when only outcome model is correct.
Table 3. Relative performance of the doubly robust estimators based on different propensity score estimation methods for mean under the four different scenarios.
Remark: 1) Both models are correct; 2) Only propensity score model is correct; 3) Only outcome model is correct; 4) Both models are incorrect.
a) usual GLM method;
b) the just-identified CBPS estimation with the covariate balancing moment conditions and without the score condition (CBPS1);
c) the overidentified CBPS estimation with both the covariate balancing and score conditions (CBPS2);
d) The true propensity score model which we do not need to estimate (TRUE).
From Table 1 and Table 2, we can see that SD and MSE of our estimators for decrease as n increases. Whether the propensity score model is specified correctly or not, the proposed estimators based on CBPS have smaller SD and MSE than the usual GLM estimators mostly. The CBPS with or without the score condition can substantially improve the performance of usual estimator. Compared with estimators based on true propensity score model, our proposed estimators perform as well as them in the terms of SD and MSE. Table 3 shows that, under the four scenarios, the SD and MSE of our proposed estimators remain lower than the usual GLM estimators. Similar to Imai and Ratkovic  , the final scenario illustrates the most important point made by Kang and Schafer  that doubly robust estimator can deteriorate when both the outcome and the propensity models are misspecified. Under this scenario, the doubly robust estimators based on usual GLM have a significant amount of bias and variance. However, the CBPS can improve the performance of doubly robust estimators. In a word, we obtain the same conclusion as Imai and Ratkovic  that the CBPS can yield robust estimators of population mean, even when both the outcome and propensity score models are misspecified.
4. Concluding Remarks
We have proposed an improved estimation method for the parameters of interest in the nonlinear regression model with missing responses. The estimators based on CBPS and AIPW method have the following merits: 1) They avoid the “curse of dimensionality” and avoid selecting the optimal bandwidth; 2) When either the outcome regression model or the propensity score model is correctly specified, the proposed estimators perform as well as estimators based on true propensity model in the terms of SD and MSE; 3) When both outcome regression and propensity score models are misspecified, as mentioned in Section 1, the usual AIPW estimator can be severely biased, but our method improves the performance of them and obtains an improved estimator for population mean. The simulation shows that the proposed method is feasible. Furthermore, with appropriately modification, the proposed method can be extended to other models with missing responses. The exhaustive procedure will be presented in our future work.
We thank the Editor and the referee for their helpful comments that largely improve the presentation of the paper.
Appendix: Proofs of the Main Results
Throughout, let be the true value of, and be the Euclidean norm for a matrix
. Firstly we make the following assumptions.
(A1) For all X’s, is a known, differentiable function from to (0,1) for all a’s in a neighborhood of.
(A2) and exist and the matrix is of full rank.
(A3) 1) W is positive semi-definite and only if. 2), which is compact.
3) and, where.
To complete the proofs of Theorems 1-3, the following lemma is needed. If there is a function such that 1) is uniquely minimized at; 2) is compact; 3) is continuous; 4) con- verges uniformly in probability to, then, where minimizes subject to.
Lemma 1. is the fundamental consistency result for extremum estimators. Its proof can be found in Newey and McFadden  , and we omit it here.
Proof of Theorem 1. Similar to Theorem 2.6 in Newey and McFadden  , the proof of is proceed by verifying the conditions of Lemma 1. Under assumption (A2), (A3) and Lemma 2.3 in Newey and McFadden (1994), we know that conditions 1) and 2) hold in Lemma 1. Let
, Under assumption (A1) and by Lemma 2.4 in Newey and
McFadden (1994), we have and is continuous. Thus, condition 3 in
Lemma 1 holds by continuous. By is compact, is bounded on, and by the Cauchy-Schwartz inequalities, , and condition 4) in Lemma 1 holds. According to Theorem 3.2 in Newey and McFadden  , we can obtain that.
Proof of Theorem 2. Denote, where. By the definition of,
To prove Theorem 2, we will verify the asymptotically normality of. By direct calculation,
Under MAR assumption, we have. This combines with Theorem 1
From the Theorem 5 in Wu (1981), we know that. This together with (16) and (18) proves that
By Theorem 1,
According to the assumptions given in model (1), we have
Then, it follows from the central limit theorem that
Therefore, by using (19) and Slutsky theorem, the proof of Theorem 2 is completed.
Proof of Theorem 3. By direct calculation, we have we have
By the central theorem, we have. To prove the asymptotically normality of estimator, we need to prove that. For,
Similar to arguments of Qin and Lei  , we have.
For, Under MAR assumption, we have. Therefore,
. This combines with Theorem 1 and yields
. Then the Theorem 3 is proved.
 Jennrich, R.I. (1969) Asymptotic Properties of Nonlinear Least Squares Estimators. The Annals of Mathematical Statistics, 40, 633-643.
 Wu, C.F. (1981) Asymptotic Theory of Nonlinear Least Squares Estimation. Ann. Statist. The Annals of Statistics, 9, 501-513.
 Crainiceanu, C.M. and Ruppert, D. (2004) Likelihood Ratio Tests for Goodness-of-Fit of a Nonlinear Regression Model. Journal of Multivariate Analysis, 100, 35-52.
 Horvitz, D.G. and Thompson, D.J. (1952) A Generalization of Sampling without Replacement from a Finite Universe. Journal of the American Statistical Association, 47, 663-685.
 Scharfstein, D.O., Rotnitzky, A. and Robins, J.M. (1999) Adjusting for Nonignorable Drop-Out Using Semiparametric Nonresponse Models. Journal of the American Statistical Association, 94, 1096-1120.
 Robins, J.M. and Rotnitzky, A. (1994) Estimation of Regression Coefficients When Some Regressors Are Not Always Observed. Journal of the American Statistical Association, 89, 846-866.
 Tan, Z. (2006) A Distributional Approach for Causal Inference Using Propensity Scores. Journal of the American Statistical Association, 101, 1619-1637.
 Kang, J.D.Y. and Schafer, J.L. (2007) Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data. Statistical Science, 22, 523-539.
 Cao, W., Tsiatis, A. and Davidian, M. (2009) Improving Efficiency and Robustness of the Doubly Robust Estimator for a Population Mean with Incomplete Data. Biometrika, 96, 723-734.
 Bang, H. and Robins, J.M. (2005) Doubly Robust Estimation in Missing Data and Causal Inference Models. Biometrics, 61, 692-972.
 Rubin, D.B. (1976) Inference and Missing Data. Biometrika, 63, 581-592.
 Rosenbaum, P.R. and Rubin, D.B. (1983) The Central Role of the Propensity Score in Observational Studies for Causal Effects. Biometrika, 70, 41-55.
 Wang, Q.H. and Rao, J.N.K. (2001) Empirical Likelihood for Linear Regression Models under Imputation for Missing Responses. The Canadian Journal of Statistics, 29, 597-608.
 Wang, Q.H. and Rao, J.N.K. (2002) Empirical Likelihood-Based Inference in Linear Models with Missing Data. The Scandinavian Journal of Statistics, 29, 563-576.
 Xue, L.G. (2009) Empirical Likelihood for Linear Models with Missing Responses. Journal of Multivariate Analysis, 100, 1353-1366.
 Qin, Y. and Lei, Q. (2010) On Empirical Likelihood for Linear Models with Missing Responses. Journal of Statistical Planning and Inference, 140, 3399-3408.
 Imai, K. and Ratkovic, M. (2014) Covariate Balancing Propensity Score. Journal of the Royal Statistical Society, Series B, 76, 243-263.
 Imai, K. and Ratkovic, M. (2015) Robust Estimation of Inverse Probability Weights for Marginal Structural Models. Journal of the American Statistical Association, 110, 1013-1022.
 Qin, J. and Zhang, B. (2007) Empirical-Likelihood-Based Inference in Missing Response Problems and Its Application in Observational Studies. Journal of the Royal Statistical Society, Series B, 69, 101-122.
 Hansen, L.P. (1982) Large Sample Properties of Generalized Method of Moments Estimators. Econometrica, 50, 1029-1054.