The logistic regression methods are often used to interpret the statistical analysis of dichotomous outcome variables. It is commonly applied procedure for describing the relationship between a binary outcome variable. The general method of estimating the logistic regression parameter is maximum likelihood (ML). In a very general sense the ML method yields values for the unknown parameters that maximize the probability of the observed set of data. The commonly problem with using ML method is convergence problem, which occurs when the maximum likelihood estimates (MLE) do not exist. The subject of the assessment behaviour of MLE for logistic regression model is important, as the logistic model is widely used in medical statistics. Much work discusses on logistic regression model address converges problem like  or the bias reduction like   . Many assumptions and more details considered about the distribution of the coefficients estimated by MLE approach and bias reduction technique, and also for more application and effects of the sample size, see   . However, the behavior and properties of bias correction methods are less investigated. A recent paper takes the bias correction technique proposed by  to achieve the MLE existing. In the present paper, it centers to evaluate the behavior and properties of the bias reduction method by simulated data with different sample sizes and parameters. The next section, explains the shape and fits the logistic regression model. Section 3 discusses clearly the ML convergence problem. Application on modified score function in logistic regression model will discuss in Section 4 and it illustrates special case of modified function to give two equations that are used to estimate the parameters. Section 5 investi- gates the asymptotic properties for logistic regression model with making compression between estimated parameters with ML method and reduction technique by simulated data. The discussion, conclusion and some general remarks about the results are in Section 6.
2. The Logistic Regression Model
The goal of a logistic regression analysis is to find the best fitting model to describe the relationship between an outcome and covariates where the outcome is dichotomous.  considered the logistic regression model is a member of the class of the generalized linear models. For more details of logistic model see    also    .
Suppose now where is a response variable. Suppose that and are related to a collection of covariates according to the equation
We consider the special case so where is the probability of success for each . We also define so that
Here is called the logit link function and is the linear predictor.
There are some other link functions which can also be used, instead of the logit link function such as the probit link function
and the complementary log-log link function
Fitting The Model
The logistic model when with can be fitted using the method of maximum likelihood to estimate the parameters. The first step is to construct the likelihood function which is a function of the unknown parameters. we choose those values of the parameters that maximize this function. The probability function of the model is
where the likelihood function is
Since the observations are independent, the likelihood function is as follows:
The maximum likelihood estimate of is the value which maximizes the likelihood function. In general the log likelihood function is easier to work with mathematically and is:
2.1. Special Case of the Logistic Model with Two Covariates
In this case the logistic regression model with two covariates, thus, , with one the general mean. So, we have and , such that
where is now a scalar covariate and
Therefore we can write the log-likelihood function as:
To estimate the values of and we differentiate in terms of and respectively as:
Now we set and and so the maximum likelihood estimates
of and are the solution of the following equations
and will be denoted as and . We know that for the logistic regression the last two equations are non linear in and , and we need to use a numerical method for their solution, such as Newton-Raphson method.
2.2. The Asymptotic Distribution of the (MLE)
The estimated parameters , have an asymptotic distribution which is given by where is Fisher’s information matrix defined as
where the matrix is evaluated at the MLE. For the logistic regression the estimated Fisher Information matrix can be writen as
where and . The variance of is approximated
defined by .
3. Maximum Likelihood Convergence Problems
A problem occurs in estimating logistic regression models when the maximum likelihood estimates do not exist and one or more components of are infinite. The one case of the occurrence of this problem is when all of the observations have the same response. For example, suppose that and that all of the response variables equal zero i.e., . In this case the log-likelihood function is
Now differentiating in terms of and respectively and setting equal to zero gives
The first equation has no solution because it is the sum of positive quantities and so cannot be equal to zero and satisfy the equation. To make this equation equal to zero we need to make larger and negative i.e. tend to . However, if precisely one of the response variable equal 1, the result maximum likelihood equation become
where we have assumed the numbers such that . Here the maximum likelihood estimates is exist and the convergence of the MLE is achieved. Because the two previous equations are sum of positive quantities equal positive values. So as in first equation, if parameter is large and positive, then the sum is larger than one as well as if it is large and negative, then the sum is smaller than one and will not satisfy the equation, then we can find finite estimate of parameters which satisfy the equation.
4. Modified Score Function
Firth  proposed a method to reduce bias in MLE. The maximum likelihood convergence problem does not exist with the modified score function. The idea that extend and focus on two standard approaches have been extensively studied in the literature. The computationally-intensive jackknife method proposed by   . The second approach simply substitutes for the unknown in
. The point that discussed in case of small size sets of data, it is not
uncommon that is infinite in some samples of logistic regression models   . We know that the maximum likelihood approach is dependent on the derivative the log-likelihood function as a solution to the score equation
 proposed that instead, we solve , where the appropriate modification to is:
and the expected value of proposed by  , is given by:
The variance of is approximated defined by .
4.1. Modified Function with Logistic Regression Model
In this part we will apply the modified score function to simple logistic regression model. We know that the bias vector given in the form
which proposed by  . Here has ith element
and is the ith diagonal element of the hat matrix
where and is the design matrix. Then, the modified score function is written as
In this case, the modified score function gives two equations
These are used to estimate the parameters.
4.2. Special Case of Modified Function
For more evaluation, we will discuss the behaviour of the adjusted score function when all the observation have the same response i.e. . As a special case, suppose we have one explanatory variable taking values 0 or 1. Before we calculate the adjusted score function, first calculate the form of which we obtain from . Here, is the diagonal element of the matrix and is
where , , and , where and are the number of observations of x equal to 0 and 1 respectively. Hence
Therefore, when we set the adjusted score function with
Before calculate and we can consider the following way to calculate and . Let and . Then, , and , so, we can write as
Therefore, and can be written as
Then, we obtain
As a result of this example with when , we can say that, the estimate of parameters are finite. The modified function works well and the problem of convergence does not exist.
5. Simulation Study
The follow discussion are the simulation plan and the designs used in generating the data to identify the effect of sample size and proportion of events (the percentage of or ) on estimation of parameters. We will examine the precision of the estimation by calculating the variance of parameters obtained by simulation for the two approaches, MLE and Firth, and compare those with evaluated at the known values of . The simulation study is designed as follows:
1) Thre sample sizes have been used , and .
2) For each sample size we choose as a draw from . The x variables are fixed at these values throughout the simulation.
3) We choose and to give three cases. Choose and adjust so that over the covariates is approximately (a) 0.5, (b) 0.1, (c) 0.05.
4) For each sample size and set of parameter values we perform 100,000 simulation.
5) Two approaches are used to estimate the parameters, MLE and the bias- reduced estimator Firth.
5.1. Results and Discussion of Sample Size n = 500
The simulation reported the accuracy of the estimation of using the information matrix. We calculate and for the simulated values of , and also by evaluating at the known values of . The results in the Table 1, which shows the three cases of the proportion of , achieved the convergence of likelihood maximization alogrithm.
As can be seen in Table 1, Sim and Sim are the variance of the parameters estimated by MLE and Firth method respectively. Ratio L and Ratio F denote the ratio of the variance estimated by MLE and Firth’s method, respectively. The results showed that, both the variance of the parameters calculated from the simulation and the variance calculated by evaluating the information matrix at the known values of are almost the same. We note that the ratio in the first case when is 0.5 appeared nearly close to one but in the second case and the last case the ratio appeared slightly larger than in the first case.
The variance of parameters calculated by Firth’s method were smaller than when calculated by MLE and the ratio in general was close to 1. Moreover, the bias ( ) was smaller.
5.2. Results and Discussion of Sample Size n = 120
In this part using the same way used in the previous case when . The
Table 1. Results of 100,000 simulations with sample size n = 500 and (0.5, 0.1, 0.05) propotion of y = 1.
results of simulation are shown in Table 2. Maximum likelihood convergence problems occurred (when ). Note that, there are many situations in which the likelihood function has no maximum, in which case we say that the maximum likelihood estimate does not exist. Consider the simulation which generating the data set 100,000 times, in some cases the coefficients reach to infinite in the final iterations and so, we have not results of the estimation and , that result in at which point the algorithm has not converged. In our simulation we consider the cases that not achieved the converges algorithm.
Here for only 99,806 (99%) of the data sets was it possible to obtain finite estimates of and converged. Moreover, the variance of the parameters and is large. This is because even though convergence is achieved when , There are some very large negative values of . In the other two cases of we achieved ML convergence in every simulation. We note that the ratio is nearly one but is a bit high when compared with case of . Firth’s approach showed reasonable results, all cases achieved the maximum likelihood convergence. Moreover, the ratio was better than MLE approach as well as the bias .
5.3. Results and Discussion of Sample Size n = 40
We used the same analysis as in the previous cases with . As can be seen in Table 3, the results showed that, MLE approach had convergence problems,
Table 2. Results of 100,000 simulations with sample size n = 120 and (0.5, 0.1, 0.05) propotion of y = 1.
Table 3. Results of 100,000 simulations with sample size n = 40 and (0.5, 0.1, 0.05) propotion of y = 1.
98,273 (98%) and 85,967 (86%) of data sets achieved ML convergence when was (0.1, 0.05), respectively. Convergence was only achieved in every simulation in the case of , where the ratio was nearly close to one, but is a bit high from previous cases. Moreover, we found the same problem as discussed in the case of , in that the variance of the parameters and is large. However, when we use Firth’s approach, all data sets achieved M.L convergence. Moreover, the ratio was better than M.L.E approach as well as the bias being smaller.
Attention has been directed in this work to determine the behaviour of the asymptotic estimation of parameters by two methods―MLE and bias reduction technique compared with the result of the information matrix. In fact in regular convergence problem the modified score function appeared appropriate behaviour, which denoted that the bias form may be removed from the MLE by reduction bias term. The asymptotic variance of the MLE may be appeared as strange behaviour, and the results shown variance of the parameters were large in some cases, even though convergence is achieved. It is denoted that there are some very large negative values of , as shown in results section. We can report that the small sample size and the value of have an effect on behaviour estimation of parameters when using MLE. Clearly, we found conver- gence problem for some combinations of sample size and . The approach of Firth appeared a moderate results that the data sets in all cases of sample size and achieved ML convergence. Overall, we can consider the bias reduction technique is worked well and has a moderate behaviour almost with all cases which have been investigated. Moreover, the convergence problem is not only effective on behaviour of the MLE, and although the convergence is achieved, the variance of the parameters estimates appeared large value.
 McCullagh, P. (1986) The Conditional Distribution of Goodness-of-Fit Statistics for Discrete Data. Journal of the American Statistical Association, 81, 104-107.
 Clogg, C.C., Rubin, D.B., Schenker, N., Schultz, B. and Weidman, L. (1991) Multiple Imputation of Industry and Occupation Codes in Census Public-Use Samples Using Bayesian Logistic Regression. Journal of the American Statistical Association, 86, 68-78.