o> , , x i p 2 ) .

To explain the difference in size of vector z i in the two cases of IM test and IMDIAG test, let us consider a simple example. Suppose we have a symmetric matrix with elements x i x i T and 3 × 3 dimension as:

[ x 11 x 12 x 13 x 21 x 22 x 23 x 31 x 32 x 33 ] ,

where, x r s = x r i x s i . Then in the case of the IM test, the dimension of vector z i T is 1 × 6 and elements are:

z i T = [ x 11 , x 12 , x 13 , x 22 , x 23 , x 33 ] ,

whereas in the case of IMDIAG test, z i is the 1 × 3 dimensional vector:

z i T = [ x 11 , x 22 , x 33 ] .

4. Simulation Study

Our work, focus on behaviour of goodness of fit tests under alternative hypotheses in case of missing covariate model and in case of the wrong model, because these cases we could not reproduce Kuss’s work in. We will focus on four goodness-of-fit tests ( C ^ g , R S S , I M , I M D I A G ) . Therefore, we examine in more depth the behaviour of the tests and determine more information about asymptotic MLE distribution in case of the wrong model

π i = expit ( 0.405 x i 2 ) ,

or in the case of the missing covariate,

π i = expit ( 0.405 x i + 0.223 u i ) ,

where X , U ~ U ( 6,6 ) , X and U independent.

Simulation study designed as Kuss’s work:

・ The sample sizes are n = 100 and n = 500.

・ Applied only on extreme sparseness when m i = 1 .

・ number of simulation is 1000.

・ distribution of the predictor variables X, U is U ( 6,6 ) , X and U independent chosen to confirm with Kuss’s work.

・ Use four of goodness-of-fit tests from the simulation study under three different alternative hypotheses:

(a) True covariate.

(b) Missing covariate.

(c) Wrong functional form of the covariate.

・ Fitted model in all cases is a standard logistic model with an intercept and one covariate.

・ All the tests on the null hypothesis under α = 0.05 .

Results and Discussion of Tests under Correct Model

In Table 2, reported some results, the mean, variance and the empirical power of four goodness-of-fit tests from simulation study under correct model, namely

π i = expit ( 0.693 x i ) .

Statistics used in the simulation as goodness-of fit tests are: Hosmer- Lemeshow ( C ^ g ) , Information matrix ( I M ) , Information matrix Diagonal ( I M D I A G ) and residual sum of squares (RSS). The asymptotic distribution of statistics is χ d f 2 distribution, where the mean and variance equal df and 2df respectively. In case of ( C ^ g ) statistic we chosen the number of group is g = 10 so, degree of freedom is d f = g 2 . The results shown in Table 1, the mean and variance of all statistics appeared close to df and 2df. Moreover, the simulation study appeared reasonable results when fit the model with sample size n = 500. However, there is slightly large variance of ( C ^ g ) in case of sample size n = 100. Overall, the empirical power and type I error looks good.

In the second case, the results reported the mean, variance and the power to detect a mis-specified model for same goodness-of-fit tests under missing covariate model, when the model is:

logit ( π i ) = expit ( 0.405 x i + 0.223 u i ) ,

and fit standard logistic regression model with x i .

Table 2, showed results from simulation study under alternative hypotheses missing covariate model. The mean and variance of all statistics close to df and 2df, but we have slightly smaller variance in case of C ^ g . However, we have low power when used IM statistics in case of sample size n = 500, IMDIAG statistic and RSS in case of sample size n = 100 and C ^ g statistic in both cases of sample size.

The final case we will show the results of power to detect a mis-specified model for four goodness-of-fit tests under the wrong functional form of the covariate model

logit ( π i ) = expit ( 0.405 x i 2 )

and fit the model as previous cases.

Table 1. Results of N = 1000 simulation with sample size n = 100 and n = 500 under correct model.

Table 2. Results of N = 1000 simulation with sample size n = 100 and n = 500 under missing covariate model.

In Table 3, reported results for goodness-of-fit tests from simulation study under wrong model. The mean and variance of all statistics appeared very larger in two cases of sample size comparing with degree of freedom of statistics. How- ever, high power in all goodness-of-fit tests in both sample size were found, that is meaning this tests have rejected all the null hypothesis. On the other hand, Kuss’s results appeared low power in case of sample size n = 100 compared with our results.

In Figure 1, we plot π vs x and we show the true model (continues line). If we fit π = expit ( α + β x ) , these putative approximation are shown for β < 0 , β > 0 and β = 0 (dot and dash, dash and dot) line respectively.

Table 3. Results of N = 1000 simulation with sample size n = 100 and n = 500 under wrong model.

Figure 1. Plots of the different logistic model π i given X ~ U ( 6,6 ) .

5. Conclusion and Further Work

The work considered in this paper was centered on the asymptotic distribution of goodness-of-fit tests in logistic regression model. We also consider the comparison between some global goodness-of-fit tests, which compared with Kuss’s results. Application of simulation apply in two types of goodness-of-fit tests, those based a test which groups the observation and those which do not group observation. Our results of study confirm the work of Kuss’s regarding

the power of goodness-of-fit tests, which related the Rss , Hosmer-Lemeshow, IM and IMDIAG tests under correct and missing model. However, our results about the asymptomatic distribution of goodness-of-fit tests show, various combinations of behavior on the mean and variance of statistics, which, the asymptotic distribution of statistics is Chi-square χ d f 2 . The results under correct model show reasonable power for all methods, slightly larger variance found in case of Hosmer-Lemeshow test, and smaller variance under missing covariate model. As we know the goodness-of-fit statistics are distributed asymptotically as central χ 2 distribution under H0 when the model is correctly specified, and is non-central χ 2 under H1 when the model mis-specificed. However, under wrong model the results show strange behavior, which all the means and variances are not satisfy the assumption on asymptotic distribution χ d f 2 with men df and variance 2df, also, it is appeared with high power. The problem means that in some circumstances properties of the distribution of the statistics of tests (e.g mean and variance) are far away from the properties of χ 2 distribution. In fact, the interesting point here, some of goodness-of-fit tests seem affected by assumption on covariance matrix. So, many issues about the mean and variance of the asymptotic distribution of goodness-of-fit statistic should also be examined.

Cite this paper
Badi, N. (2017) Asymptomatic Distribution of Goodness-of-Fit Tests in Logistic Regression Model. Open Journal of Statistics, 7, 434-445. doi: 10.4236/ojs.2017.73031.
[1]   Nelder, J.A. and Wedderburn, R.W.M. (1972) Generalized Linear Models. Journal of the Royal Statistical Society, Series A, 135, 370-384.

[2]   Dobson, A. (1990) An Introduction to Generalized Linear Models. Chapman and Hall, London.

[3]   Kleinbaum, D.G. (1994) Logistic Regression A Self-Learning Text. Springer-Verlag, New York.

[4]   Hosmer, D.W. and Lemeshow, S. (2000) Applied Logistic Regression. Wily, Chichester.

[5]   Hosmer, D., Lemeshow, S. and Sturdivant, R.X. (2013) Applied Logistic Regression. 3rd Edition, Wily, Chichester.

[6]   Hilbe, J.M. (2009) Logistic Regression Model. Chapman and Hall, New York.

[7]   Dobson, A.J. and Barnett, A.G. (2008) An Introduction to Generalized Linear Models. 3rd Edition, Chapman and Hall, New York.

[8]   Kuss, O. (2002) Global Goodness-of-Fit Tests in Logistic Regression with Sparse Data. Statistics in Medicine, 21, 3789-3801.

[9]   Hosmer, D.W., Hosmer, T. and Lemeshow, S. (1980) A Goodness-of-Fit Tests for the Multiple Logistic Regression Model. Communications in Statistics, 10, 1043-1069.

[10]   Lemeshow, S. and Hosmer, D.W. (1982).A Review of Goodness of Fit Statistics for Use in the Development of Logistic Regression Models. American Journal of Epidemiology, 115, 92-106.

[11]   Hosmer, D.W., Hosmer, T., Le Cessie, S. and Lemeshow, S. (1997) A Comparison of Goodness-of-Fit Tests for the Logistic Regression Model. Statistics in Medicine, 16, 965-980.

[12]   Brown, C.C. (1982) On A Goodness of Fit Test for the Logistic Model Based on Score Statistics. Communications in Statistics Theory and Methods, 10, 1097-1105.

[13]   McCullagh, P. and Nelder, J.A. (1989) Linear Models. 2nd Edition, Chapman and Hall, London.

[14]   Copas, J.B. (1989) Testing for Neglected Heterogeneity. Econometrica, 52, 865-872.

[15]   Cox, D.R. and Snell, E.J. (1989) Analysis of Binary Data. 2nd Edition, Chapman and Hall/CRC, London.

[16]   Nagelkerke, N.D. (1991) A Note on a General Definition of the Coefficient of Determination. Biometrika, 3, 691-692.

[17]   White, H. (1982) Maximum Likelihood Estimation of Misspecified Models. Econometrica, 50, 1-25.

[18]   Lancaster, T. (1984) Covariance Matrix of the Information Matrix Test. Econometrica, 4, 1051-1053.

[19]   Newey, W.K. (1984) Maximum Likelihood Specification Testing and Conditional Moment Test. Econometrica, 53, 1047-1070.

[20]   Davidson, R. and Mackinnon, J.G. (1984) Convenient Specification Tests for Logit and Probit Models. Journal of Econometrics, 25, 241-262.

[21]   Orme, C. (1988) The Calculation of the Information Matrix Test for Binary Data Models. EconPapers, 56, 370-376.

[22]   Chesher, A. (1984) Unweighted Sum of Squares Test for Proportions. Econometrica, 38, 71-80.