Back
 OALibJ  Vol.8 No.2 , February 2021
The Behaviour of the Dispersion Matrix of the Information Matrix Test under the Wrong Logistic Regression Model
Abstract: The Information Matrix Tests (IMT) considers as one of the important global goodness of fit test. The IMT provides a unified framework for specification goodness of fit tests for a wide variety of distribution, multivariate or univariate, discrete or continuous. Many researchers discussed the IMT in cases of the outcome covariate is a continuous variable which reported it has reasonable behaviour. This article considers using IMT as a goodness of fit test for the logistic regression mode, to investigate the behaviour of this statistic under the wrong model. Moreover, we are interested to examine the behaviour of the dispersion matrix under wrong logistic model and compute alternative formula of variance, empirical variance of IMT and examine it by simulation.

Estimation of Parameters

1. Introduction

The IMT is a test for general misspecification, produced by [1] who pointed out that the properties of the Maximum likelihood estimator and the information matrix can be exploited to yield a family of useful tests for model mis-specification. The idea of the IMT is to compare two different estimators of the information matrix to assess model fit. The IMT is based on the information matrix equality that obtains when the model specification is correct. This equality implies the asymptotic equivalence of the Hessian and the score forms of Fisher’s information matrix [2]. As [1], points out, the IMT is designed to detect the failure of this equality and the failure implies the model misspecification. [3] discussed the information matrix test and showed that it is useful with binary data models. Many researchers, [4] [5] and [6] pointed out the behaviour of the asymptotic distribution of IMT statistic and dispersion matrix. The idea of the

information matrix test is to compare E ( 2 l θ θ T ) and E ( l θ l θ T ) , as these

differ when the model is mis-specified but not when the model is correct. [7], pointed out, can be estimated the covariance matrix of IMT, dependent upon the IMT of [1], which can be estimated without the computation of analytic third derivatives of the density function. [4], discussed that, the IMT is sensitive to non-normality. Moreover, he proposed a simple computation procedure which employs the Outer Product of the Gradient (OPG) covariance matrix estimator of IMT statistic. However, [5] argue that, such a procedure maybe give unreliable inferences, related to the stochastic nature of the covariance matrix estimator which uses high sample moments to estimate high population moments. [6] purposed a simple calculation procedure for the test statistic, for general binary data models, which employs the ML covariance matrix estimator instead the OPG estimator. Moreover, [8], computed and examined IMT and found it had good power for logistic model.

Basic Idea of the IMT

Let us consider the density function f ( x i , θ ) for individual observation and the data are independent, identically distribution so we have

f ( x | θ ) d x = 1

and we consider l ( θ ) = log f ( x , θ ) to be the logarithm of a density function of x dependent upon p parameters θ , so the log-likelihood function in this case is

l n ( θ ) = i = 1 n log f ( x i , θ )

Now, as we defined the idea of the IMT to compare two different matrix of expected the first and second partial derivatives of the l n ( θ ) , we have

l θ = f ( x | θ ) θ d x = log f ( x | θ ) θ f ( x | θ ) d x = E ( log ( f ( x | θ ) ) θ ) = 0 (1)

So, according to the ML method, we have

E ( l θ ) = 0.

Differentiating (1) again we get

0 = 2 log f ( x | θ ) θ θ T f ( x | θ ) d x + log f ( x | θ ) θ log f ( x | θ ) θ T f ( x | θ ) d x (2)

So

E ( 2 l θ θ T ) + E ( l θ l θ T ) = 0. (3)

When the model is mis-specified, the above quantity will be not necessarily equal zero.

Asymptotic Distribution of θ ^

The asymptotic distribution of estimated parameters and the behaviour of the MLE under the wrong model discussed by [9] and more investigated considered by [10]. [11], pointed out the estimation the parameters of a given regression model. In the limit for each value of the parameter vector θ ,

n 1 l n ( θ ) g ( Y ) log f ( Y | θ ) d Y = E ( log f ( Y | θ ) )

where g ( Y ) denoted to the true model and f ( Y | θ ) is the fitted model. Also, consider the Kullback-Leibler divergence (KL) from the true to the approximating model conditional on X, under the wrong model. In this case θ ^ θ * , where θ * is the least false value (LF). Note that the least false value θ * minimizes the KL divergence, because the derivative of the KL is

E ( log f ( Y , θ ) θ ) = g ( Y ) log f ( Y , θ ) θ d Y = 0.

Also, if we need define

J = E ( 2 l θ θ T )

and

K = var ( log f ( Y , θ ) θ ) = E ( l θ l θ T )

these matrixes are identical when g ( Y ) = log f ( Y , θ ) θ for all Y. As explained in [11], the distribution of the θ ^ , in this case from the central limit theorem there is convergence in distribution

n U ¯ n U ~ N p ( 0, K )

where, U ¯ = n 1 i = 1 n u ( Y i , θ * ) , which is leads to

n ( θ ^ θ * ) J 1 U ~ N p ( 0, J 1 K J 1 ) .

So, we can say, the asymptotic MLE distribution under the null hypotheses H0, in this case

n θ ^ ~ N ( θ 0 , J 1 )

where, θ 0 is the true value. And the asymptotic distribution of θ ^ under alternative hypotheses H1 is

n θ ^ ~ N ( θ * , J 1 K J 1 )

So, that is meaning ( J = K ) if and only if when fitted the correct model (i.e. under H0).

2. The IMT under Missing Covariates for Logistic Regression Model

In this part, we apply the procedure of the IMT statistic under missing covariates for a logistic regression model. If X i is a p-dimensional vector of covariates draw from normal distribution and Y i is binary with

P ( Y i = 1 | X i ) = expit ( α + β T X i ) . (4)

In the following we treat the simple case where the fitted model is

P ( Y i = 1 | X i ) = expit ( α + β 1 X 1 i ) (5)

for a scalar X 1 and that the true model has

P ( Y i = 1 | X i ) = expit ( α + β 1 X 1 i + β 2 X 2 i ) , (6)

where X 2 is also a scalar. We have the log-likelihood function contribution for the ith element ( Y i , X i ) is

l ( Y i , X i ) = Y i ( α + β T X i ) log ( 1 + exp ( α + β T X i ) ) (7)

and so,

l i α = Y i π i ; l i β 1 = ( Y i π i ) X 1 i

and note that we only consider fitting the model with X 1 , even if the true model also includes X 2 (i.e. β 2 0 ). From this we get:

2 l i θ θ T = [ π i ( 1 π i ) π i ( 1 π i ) X i π i ( 1 π i ) X i π i ( 1 π i ) X i 2 ]

Also,

l i θ l i θ T = [ ( Y i π i ) 2 ( Y i π i ) 2 X i ( Y i π i ) 2 X i ( Y i π i ) 2 X i 2 ]

using,

( Y i π i ) 2 π i ( 1 π i ) = ( Y i π i ) ( 1 2 π i ) ,

as Y i 2 is Y i , and so we get that

d g ( y i , θ ) = ( Y i π i ) ( 1 2 π i ) [ 1 X i X i 2 ] . (8)

3. An Alternative Formulae of Variance

In this part we are interested to find a formulae of the variance of d statistic, even when the model is mis-specified. To perform the IMT we need to find the mean and variance of

T = 1 n i = 1 n d g i

Under H0 E ( d g i ) = 0 , and so the IMT could be written as

T T var ( T ) 1 T

which will have a χ 2 -distribution on rank ( var ( T ) ) d.f. as T is asymptotically Normal. However, the test statistic has to be evaluated at the MLE θ ^ and this introduces a complication. The MLE θ ^ is the solution to

S = 1 n l = 1 n i = 1 n l i = 1 n i = 1 n ( y i π i ) [ 1 X i ] = 0.

The expression for T is

T = 1 n i = 1 n ( y i π i ) ( 1 2 π i ) [ 1 x i x i 2 ]

and this is clearly going to be highly correlation with S. Therefore, the appropriate variance for the IMT is var ( T | S = 0 ) . As T and S are sums of independent elements, the Central limit Theorem implies that ( T , S ) T is asymptotically Normal and so we can use

var ( T | S = 0 ) = var ( T ) cov ( T , S ) var ( S ) 1 cov ( T , S ) T . (9)

To work out var ( T | S = 0 ) , so, in this case we can write

var ( T ) = var ( [ d g 1 + d g 2 + + d g n ] / n ) = var ( d g 1 ) ,

and similarly

var ( S ) = var ( l 1 ) , cov ( T , S ) = cov ( d g 1 , l 1 ) .

3.1. The Variance of IMT under Missing Covariates for Logistic Regression Model

We now need to find expressions for var ( d g 1 ) , var ( l 1 ) and cov ( d g 1 , l 1 )

We already have that

d g = ( y i π i ) ( 1 2 π i ) [ 1 x i x i 2 ]

and

l i = ( y i π i ) [ 1 x i ]

so, the variance is

var ( d g ) = E ( d g d g T ) E ( d g ) E ( d g T ) (10)

and we have

d g d g T = ( y π ) 2 ( 1 2 π ) 2 [ 1 x i x i 2 x i x i 2 x i 3 x i 2 x i 3 x i 4 ] (11)

taking expectation E Y | X we obtain

E ( d g 1 ) = E X [ ( π t π ) ( 1 2 π ) [ 1 x i x i 2 ] ] (12)

and,

E ( d g 1 d g 1 T ) = E X [ ( π t ( 1 2 π ) + π 2 ) ( 1 2 π ) 2 [ 1 X X 2 X X 2 X 3 X 2 X 3 X 4 ] ] . (13)

Now we need to compute cov ( d g , l ) . In fact E ( l ) = 0 , not only if the model is correct but also when evaluated at the least false value θ * (under wrong model), so in this case

cov ( d g 1 , l 1 ) = E ( d g l ) T .

and we have

d g 1 l 1 T = ( y π ) ( 1 2 π ) [ 1 x i x i 2 ] ( y π ) [ 1 x i ] = ( y π ) 2 ( 1 2 π ) [ 1 x i x i x i 2 x i 2 x i 3 ]

then,

E ( d g 1 l 1 T ) = E X [ ( π t ( 1 2 π ) + π 2 ) ( 1 2 π ) [ 1 X X X 2 X 2 X 3 ] ] . (14)

Now we will work out var ( l ) , as before, since E ( l ) = 0 , so

var ( l 1 ) = E ( l l T ) = E X E Y | X [ ( Y π ) 2 ( Y π ) 2 X ( Y π ) 2 X ( Y π ) 2 X 2 ]

and note that

E Y | X ( Y π ) 2 = E Y | X ( Y ( 1 2 π ) + π 2 ) = π t ( 1 2 π ) + π 2 ,

where, π t is E ( Y ) under the true model. So,

E ( l l T ) = E X [ π t ( 1 2 π ) + π 2 ( π t ( 1 2 π ) + π 2 ) X ( π t ( 1 2 π ) + π 2 ) X ( π t ( 1 2 π ) + π 2 ) X 2 ] . (15)

Hence, the required variance (9)

E ( d g d g T ) E ( d g ) E ( d g T ) E ( d g l T ) E ( l l T ) 1 E ( ( l ) d g T ) (16)

and we have expressions for each component from (12), (13), (14) and (15) We need to evaluate these components by simulation.

3.2. The Dispersion Matrix under Wrong Model

In fact, may be some elements of the covariance matrix of the IMT are linear combinations of others leading to singularity of the estimated covariance matrix, this point discussed by [1] and [12]. We are interested to compute the var ( T | S = 0 ) , even when the wrong model has been fitted. We will compute each of the components of this variance separately. We see from Section 3.1 that we need to evaluate, e.g.

E ( d ) = E X ( ( π t π ) ( 1 2 π ) [ 1 X X 2 ] )

and also,

E ( d d T ) = E X ( [ π t ( 1 2 π ) + π 2 ] ( 1 2 π ) 2 [ 1 X X 2 X X 2 X 3 X 2 X 3 X 4 ] ) .

This cannot be done analytically so we simulate 5000 values of X and replace the E ( d ) by the mean of these 5000 values. In evaluating π t we use the values of the parameters α t , β 1 t and β 2 t . What do we use for π ? We need to evaluate π ( α , β 1 ) at the least false values α * and β 1 * for α and β 1 . So, e.g, the first element of E ( d ) is found by simulation from

E X [ ( expit ( α t + β t 1 X 1 + β t 2 X 2 ) expit ( α * + β 1 * X 1 ) ) ( 1 2 expit ( α * + β 1 * X 1 ) ) ]

where,

α * = α t + β t 2 ( μ 2 ρ μ 1 ) 1 + k 2 β t 2 2 σ 2 ( 1 ρ 2 ) , (17)

β 1 * = β t 1 + ρ β t 2 1 + k 2 β t 2 2 σ 2 ( 1 ρ 2 ) (18)

and X draw from bivariate normal distribution with μ = ( μ 1 , μ 2 ) , and σ 1 2 = σ 2 2 . The formulae of the least false values α * and β * has been discussed and calculated by [10].

4. Empirical Variance of IMT

The expression in (16) is the variance V of d at θ ^ but we need an estimate, V ^ . If we have a sample { ( y i , x i 1 ) | i = 1, , n } how can we estimate V consistently? One candidate would be to compute

d i = ( y i π ^ i ) ( 1 2 π ^ i ) [ 1 x i x i 2 ] , i = 1 , , n

and

l i = ( y i π ^ i ) [ 1 x i ] , i = 1 , , n

where, π ^ i is the fitted value from the model with just x 1 . Now compute

W ^ n = 1 n i = 1 n d i d i T ( 1 n i = 1 n d i ) ( 1 n i = 1 n d i T )

and

B ^ n = 1 n i = 1 n ( y i π ^ ) 2 [ 1 x i x i x i 2 ] ,

C ^ n = 1 n i = 1 n ( y i π ^ ) 2 ( 1 2 π ^ i ) [ 1 x i x i x i 2 x i 2 x i 3 ]

Then use

V ^ = W ^ n C ^ n B ^ n 1 C ^ n T (19)

as an estimate of V, we will assess this by simulation.

5. Simulation Study

This simulation examines the correctness of the form of the dispersion matrix V in (16) and (19). To achieve the aim of this simulation, we will consider a logistic regression model which has two covariates draw from bivariate normal distribution with mean zero and covariance matrix Σ as:

π t = expit ( α t + β t 1 x 1 + β t 2 x 2 )

and the fitted model is

π = expit ( α + β 1 x 1 )

・ Apply in two cases of logistic model,

・ The fitted is the true logistic model (i.e. β t 2 = 0 )

・ The fitted model is mis-specified (i.e. β t 2 0 ).

・ Use variance ( σ 1 2 = σ 2 2 = 2 ) and correlation ρ = 0.1 .

・ We choose some different components of parameters α t , β t 1 and β t 2 to calculate π t .

・ We compute the least false values α * and β 1 * by formulae to calculate π .

・ We compute the true variance by simulating d i and take the variance to be var ( n d ¯ ) = V t r .

・ We compute the theoretical variance var ( d ) = V T at the least false value and calculate E ( d 1 ) and E ( d 1 d 1 T ) as described in section 3.2.

・ Finally, for each simulation we compute the empirical variance V E and take the mean over the simulations.

・ We make comparison between the diagonal elements of dispersion matrix V E , V T vs. V t r respectively.

・ Apply on different sample size n = 500 , 1000 and N = 5000 number of simulations.

6. Results and Discussion

The results were reported in tables, which show the diagonal elements of the variance matrix: V E denotes the empirical variance, V T denotes the theoretical variance and V t r denotes the true variance. The true parameters appear as α t , β t 1 , and β t 2 ; R n E and R n T denote to the rank of the covariance matrix

empirical and theoretical respectively. The Ratio R E and R T are V E V t r , V T V t r respectively. S . D ( π t ) denotes the standard deviation over a sample

where π t is the true model. In our simulation we consider two covariates, so in this case the dispersion matrix of d is a 3 × 3 dimensional matrix.

Firstly, we consider the results under true logistic model, Table 1, shows the results of simulation, which appeared the diagonal elements of matrix V, the empirical version and theoretical form comparing with true variance, which use ρ = 0.1 in case of σ 1 2 = σ 2 2 = 2 by sample size n = 500 . Table 2, reported the results by sample size n = 1000 , with equal variance σ 1 2 = σ 2 2 = 2 . We can see clearly, that all diagonal elements appeared small in value in two different cases of sample size. The first element was much closer to zero than of the rest. In almost cases the results appeared reasonable ratio which is meaning the theoretical variance and empirical variance are close to the true value. There are some slightly strange ratio almost in case of sample size n = 500 , the reason may be affected by small value of standard deviation of π t S . D ( π t ) , otherwise the ratio is close to one. In case of sample size n = 1000 , the behaviour of results shows almost the same pattern, with the ratio close to one and that is meaning the formulae of the variance works well. In a few cases with small values of S . D ( π t ) which affected on the ratio where the first two elements were more sensitive. Overall, we have reasonable results to say that, the alternative formulae of variance works well and the two first elements still more sensitive which appeared tend to zero.

Secondly, we consider the results when the missing covariate logistic model has been fitted. That is meaning when the variance of IMT computed under H1 and uses the least false values. Table 3, shows the results of sample size n = 500 . Table 4, shows the results of sample size 1000. In general, the behaviour of ratio

Table 1. Simulation results of the variance ( V t r ), ( V E ) and ( V T ) in case of fitted true model, with sample size n = 500 and σ 1 2 = σ 2 2 = 2 .

Table 2. Simulation results of the variance ( V t r ), ( V E ) and ( V T ) in case of fitted true model, with sample size n = 1000 and σ 1 2 = σ 2 2 = 2 .

Table 3. Simulation results of the variance ( V t r ), ( V E ) and ( V T ) in case of fitted missing covariates model, with sample size n = 500 and σ 1 2 = σ 2 2 = 2 .

Table 4. Simulation results of the variance ( V t r ), ( V E ) and ( V T ) in case of fitted missing covariates model, with sample size n = 1000 and σ 1 2 = σ 2 2 = 2 .

appeared the same behaviour which found in case of β 2 t = 0 , the two cases of different sample size appeared reasonable ratio which is close to one. A few cases shows low ratio, the reason is as discussed before concerning to the small value of S . D ( π t ) .

7. Conclusion

This paper carried out to investigate the behaviour of IMT and compute the covariance matrix under the wrong logistic regression model. As result, we can see that the alternative formula of the variance appeared reasonable results under the true and missing covariate model. As we computed the final form of the variance of IMT, we can see clearly it is dependent on E ( d ) . As we know, we made some notes on the first two elements of E ( d ) , which may be quite close to zero under true model and use the least false value, the E ( π t π ) = E ( ( π t π ) X ) = 0 related to the log likelihood functions. So, these elements leading to singularity of the estimated covariance matrix, and have effect on the behaviour of the dispersion matrix of the IMT.

Acknowledgements

I am very grateful to Professor J. N. S. Matthews, School of Mathematics and Statistics, Newcastle University for academic supporting and Dr. Hamza M. A. Boauod, (FCOPHTH) (SA), Consultant Ophthalmologist, Eye Department, Klerksdorp Hospital. South Africa for his financial support. Also thank the referees, associate editor and joint editor for their helpful comments and additional references.

Cite this paper: Badi, N.H.S. (2021) The Behaviour of the Dispersion Matrix of the Information Matrix Test under the Wrong Logistic Regression Model. Open Access Library Journal, 8, 1-12. doi: 10.4236/oalib.1107183.
References

[1]   White, H. (1982) Maximum Likelihood Estimation of Misspecified Models. Econometrica, 50, 1-25. https://doi.org/10.2307/1912526

[2]   Hausman, J. A. (1978) Spesification Tests in Econometrics. Econometrica, 46, 1251- 1271.
https://doi.org/10.2307/1913827

[3]   Chesher, A. (1984) Testing for Neglected Heterogeneity. Econometrica, 52, 865-872.
https://doi.org/10.2307/1911188

[4]   Newey, W.K. (1984) Maximum Likelihood Specification Testing and Conditional Moment Tests. Econometrica, 53, 1047-1070. https://doi.org/10.2307/1911011

[5]   Davidson, R. and Mackinnon, J.G. (1984) Convenient Specification Tests for Logit and Probit Models. Journal of Econometrics, 25, 241-262. https://doi.org/10.1016/0304-4076(84)90001-0

[6]   Orme, C. (1988) The Calculation of the Information Matrix Test for Binary Data Models. The Manchester School, 56, 370-376. https://doi.org/10.1111/j.1467-9957.1988.tb01339.x

[7]   Lancaster, T. (1984) Covariance Matrix of the Information Matrix Test. Econometrica, 52, 1051-1053. https://doi.org/10.2307/1911198

[8]   Kuss, O. (2002) Global Goodness-of-Fit Tests in Logistic Regression with Sparse Data. Statistics in Medicine, 21, 3789-3801. https://doi.org/10.1002/sim.1421

[9]   Matthews, J.N.S. and Badi, N.H. (2015) Inconsistent Treatment Estimates from Mis-Specified Logistic Regression Analyses of Randomized Trials. Statistics in Medicine, 34, 2681-2694.
https://doi.org/10.1002/sim.6508

[10]   Badi, N.H.S. (2017) Properties of the Maximum Likelihood Estimates and Bias Reduction for Logistic Regression Model. Open Access Library Journal, 4, e3625.

[11]   Claeskens, G. and Hjort, N.L. (2008) Model Selection and Model Averaging. Cambridge University Press, Cambridge.

[12]   Lin, D.Y. and Wel, L.J. (1991) Goodness-of-Fit Tests for the General Cox Regression Model. Statistica Sinica, 1, 1-17.

 
 
Top