The Poisson distribution is commonly used to model count data. However, a restriction of this distribution is that the response variable must have a mean equal to the variance. This restriction does not often hold true for many biological and epidemiological data. In many applications the variance can be much larger than the mean, a phenomenon known as “over dispersion”. This over dispersion may occur due to population heterogeneity, or presence of outliers in the data . An analysis of data with overly dispersed counts can lead to the underestimation of parameter standard error, if overdispersion is ignored. A review of the issue of overdispersion in both binary and count data was reviewed by Hinde and Demetrio , and in a more recent review by Hayat and Higgins . Diagnosing and accounting for overdispersion is not a simple issue and should be appropriately dealt with to avoid bias in interpreting the results.
The Negative-Binomial (NB) distribution has been used as an alternative to the Poisson distribution for modeling data that exhibit overdispersion. The NB has two parameters and a variance that is a quadratic function of the mean. NB model has been the model of choice for the analysis of overly dispersed count data. The NB regression was reviewed by Hinde and Demetrio . Joe and Zhu  drew a comparison between the NB and a mixture-based generalization of the Poisson distribution.
In this paper we discuss several inferential statistical issues related to a modified form of the Generalized Poisson Distribution (GPD). The GPD distribution was introduced to the statistical literature by Consul and Jain  and a detailed account of its properties was given by Consul . The distribution has two parameters, and a variance that is cubic function of the mean. The distribution has been used to analyze data in the fields of genetics  as a queuing model    and genomics . The modified form of the GPD, which we shall call “Modified Poisson Distribution” (MGPD) was first discussed in . The modification was a double parametric transformation on the original parameters of the GPD. The main purpose of the transformation was to achieve parameters orthogonality  and make the MGPD a member of the class of “Generalized Linear Models” . Recently Shoukri and Al-Eid investigated several inference procedures in the two samples situation .
This paper has three-fold objectives. In Section 2, we present the model. We then, assume that we have k independent samples and we demonstrate how to construct statistical testing procedures on the dispersion parameters. Specifically, we first validate the hypothesis of homogeneity of dispersion parameters, thereafter we test the significance of the common dispersion parameter. In Section 3 we test the hypothesis of equality of k-means in the presence of overdispersion. When covariates are measured, testing the equality of group means is therefore equivalent to the Analysis of covariance (ANCOVA) in the presence of overdispersion. In Section 4 we use the COVID-19 mortality data to draw a comparison between the MGPD, and the Generalized Additive Models (GAM). We demonstrate the differences between the two analytic strategies and highlight the superiority of the MGPD in the analysis of count data exhibiting overdispersion in Section 5. General discussion is presented in Section 6.
2. The Model and Its Parameters Estimation
2.1. Modified Generalized Poisson Distribution
The GPD was introduced by Consul and Jain 
The GPD whose probability function is given in (2.1) reduces to the well-known Poisson distribution when . Therefore the parameter with the above restriction on its range, is considered the dispersion parameter. Shoukri and Mian  employed the parametric transformations:
Using the transformations in (2.2) we therefore have:
For fixed , the function in (2.3) is the natural parameter transformation which renders the GPD a member of the linear family of exponential class (see;  ), with a general structure:
We call the transformed GPD, the “Modified Generalized Poisson Distribution” or MGPD.
Shoukri and Mian  showed that a recurrence relation among the non-central moments is such that:
From (2.5) we can show that:
That is the variance is a cubic function of the population mean. We shall deal with the situation when .
2.2. Point Estimators
Our approach for parametric estimation in this section will be for a single random sample. If is a random sample from the GPD (2.3), Consul and Shoukri  showed that the unique maximum likelihood estimates of the parameters exist if and only if the sample variance is larger than the sample mean. Here we shall use the sample moments to obtain estimators for the model parameters .
Equating the first two sample moments to their corresponding population moments
and solving for the parameters we get:
The variance of the moment estimators of the mean and the dispersion parameter are respectively given by Shoukri and Al-Eid  as:
2.2.1. Homogeneity of Dispersion Parameters
Suppose that we have independent random samples from (2.3), which we denote with observations from the population .
Wedenote the variance of the estimator of given in Equation (2.8) by and, let , is given in (2.8b).
Cochran  developed a general statistic Q which may be used to test the homogeneity of several population parameters. The Q statistics has asymptotically, chi-square distribution with k − 1 degrees of freedoms. It is defined as:
The hypothesis of homogeneity of dispersion parameters is rejected whenever the statistic exceeds , the upper 5% quantile of a chi-square random variable with k − 1 degrees of freedom.
2.2.2. Testing the Significance of the Common Dispersion Parameter:
Here we develop a test statistic on the null hypothesis of absence of overdispersion. For the case when ’s are unknown, a uniformly most powerful test for (Poisson) versus (GPD) cannot be obtained, however the locally powerful Neyman’s test can be constructed . The log-likelihood function is given by
The locally asymptotically most powerful test is to reject for large values of . From (2.11):
Therefore, the locally asymptotically most powerful test is to reject for large values of , where
The statistic (2.13) is obtained from (2.12) by replacing each with root consistent estimator, . The simplest is the maximum likelihood estimator . Moran  pointed out that the test statistic T is asymptotically normal. It can be easily shown that:
Under , (2.14) and (2.15) reduce respectively to and
The hypothesis is rejected whenever:
exceeds , the upper 5% quantile of a chi-square random variable with one degree of freedom.
3. Testing Equality of Means
Based on the one-way layout data considered in the previous section, we would like to test the null hypothesis against : at least two of the are different, for all . The log likelihood under the hypothesis : is given by (2.11), and will be denoted by , the log likelihood under will be denoted by and is obtained by replacing in (2.11). Under , the maximum likelihood estimator of is
And the maximum likelihood estimator , of is the non-negative root of
Under the maximum likelihood estimator of the common mean is , where, .
The maximum likelihood estimator of and under is the positive root of
Detailed discussion on the necessary and sufficient conditions that (3.1) and (3.2) to have a unique root is given in Consul and Shoukri .
Denoting the maximized log likelihood under by , and that under by , the likelihood ratio test, which has an asymptotic distribution of chi-squared with degree of freedom is:
As an alternative to the likelihood ratio test (3.3), we present the Neyman’s statistic which has local optimal properties. Suppose that can be written as with . Then testing the null hypothesis is equivalent to testing , where and are nuisance parameters. We reparametrize (11.2), and denote the resulting function by .
Let be any root-n consistent estimator of under the null hypothesis. Moran  showed that the test is based on , where and are the partial regression coefficients of on and respectively. Define the following matrices:
Here, we replace by its estimator in F, P, Q, and R, the test statistic is given by
The asymptotic distribution of the test statistic given in (3.5) will be that of a chi-square with k − 1 degrees of freedom.
Now, there are two possible root-n consistent estimators of , under :
The first is the maximum likelihood estimator , which on substitution we get , and hence . Accordingly, (3.5) reduces to
The hypothesis of equality of population means is thus rejected whenever exceeds , the upper 5% quantile of a chi-square random variable with k − 1 degrees of freedom. For more details we refer the reader to .
4. ANCOVA: The Generalized Poisson Regression
It is well-known that ANOVA and regression are related techniques that are concerned with testing the differences in group means after adjusting for the confounding effects of potential risk factors and covariates. Since the MGPD is a member of the linear exponential family (for fixed ) Shoukri and Mian  expressed the expectation of as:
In Equation (4.1), is a set of measured covariates, and a subset of theses covariates defines a set of indicators (dummy) variables to identify categorical effects. The transformation is a monotone, differentiable function named “the link function”. To estimate , and we construct the log-link so that:
The logarithm of the likelihood function will be proportional to
The first and second partial derivatives are given by:
Taking the expected value of the negative of the second partial derivatives we get the Fishers’ information matrix I, whose elements are:
From Consul and Shoukri  we get:
The asymptotic distributions of the regression estimators can be established using the results in .
Our approach to the data analysis when the main interest is comparing group means in the presence of potential risk factors and confounders is summarized in three steps. In the first step we use the MLE to estimate the regression parameters using Equation (4.2), without including the groups as independent variable. In the second step, we extract the residuals (E) of the generalized Poisson regression model, defines as:
In the final step we test the normality and variance homogeneity of E. Thereafter, we use nonparametric ANOVA with the residuals being the dependent variable, and the groups being the independent variables to complete the ANCOVA testing.
5. Data Analyses
Al-Gahtani et al.  analyzed COVID-19 case fatality data collected retrospectively from the start of the of the epidemic to December 2-2020, the day the Pfizer vaccine was approved by the American Center for Disease Control (CDC). The data were collected from 120 countries grouped into 15 regions  as shown in Table 1. We will reanalyze the data such that:
The response variable is the aggregate number of COVID-19 deaths which we denote by “y”. We shall use different set of covariates, and these are:
· Region: The factor variable which is the main effect.
· The other covariates are:
1) X1 =log (percentage of obese personsin a country reported in 2018)  ;
2) X2 = log (population density) ;
3) X3 = log (number of people with colorectal cancer in a country reported in 2017) ;
4) X4 = log (Chronic Kidney Disease—case fatality in a country as reported in 2017) .
In Figure 1 we show the histogram of (y), the aggregate of COVID-19 deaths during the study period. The distribution is positively skewed with variance much larger than the mean.
Direct calculations from the summary statistics given in Table 2 give:
, and the corresponding p-value = 0.999. Therefore, the hypothesis of homogeneity of dispersion parameters is supported by the data. Moreover, is quite large and the corresponding p-value = 0.00001.
Table 1. Adaptedtable: Countries and the corresponding Regional classification as given in https://doi.org/10.1016/s0140-6736(20)30045-3 . In the first column we have the countries, in the second column we have Region or group name followed by the number of countries within the group. In the last column we have Region code.
CSSA = Central Sub-Saharan-Africa; MENA = Middle East and North Africa; HIAP = High Income Asian Pacific; WSSA = Western Sub-Saharan-Africa; SSSA = Southern Sub-Saharan Africa.
Figure 1. Histogram COVID-19 deaths (y). It is skewed to the right showing clear over dispersion in the data.
Table 2. Summary measures of COVID-19 deaths: group sizes (n), means (m), standard deviation (s) and the estimates of the dispersion parameter (eps) per group.
Therefore, the hypotheses that the common dispersion is not significantly different from zero is not supported by the data. The C2-statistic is quite large as well, and the corresponding p-value is near zero, therefore the hypothesis of equality of mean counts in all regions (aggregate COVID-19 deaths) is also not supported by the data.
We shall write a function using the R-program for the estimation of the regression parameters. The iteration process requires staring points. We obtain the staring points by first fitting the classical Poisson regression, which is done using the following code:
out1 = GLM (y~x1 + x2 + x3 + x4, data = data2, family = Poisson).
Having obtained the parameter estimates from the Poisson regression, we use them to start the iteration process and obtain final estimates as shown in the Appendix.
The MGPD regression results are summarized in Table 3.
The correlation between the observed and predicted COVID-19 death counts is (0.758).
Figure 2 gives the Q-Q plot of the quantiles of model residuals exhibiting close agreement among with the quantiles of the standard normal distribution.
To complete the ANCOVA testing we use theKruskal-Wallis test whereby residuals of the MGPD regression model are used as dependent variables and the “Regions”, or groups as independent variables. The results are summarized as follows:
Kruskal-Wallis chi-squared = 14.936, p-value = 0.1344.
Therefore, after adjusting for the covariates, there is not sufficient evidence to
Table 3. Maximum likelihood estimation of the MGPD regression using R.
Figure 2. Plot of the quantiles of the residuals of the MGPD regression model against the quantiles of the standard normal distribution.
reject the hypothesis of equality of mean counts in COVID-19 deaths among the “Regions”.
6. Nonparametric Regression Modeling: Generalized Additive Models
The Generalized Additive Models (GAM) are recent developments that are becoming popular as modeling techniques. It is nonparametric in nature and, even though less powerful, it is quite robust against departure from the assumptions required by classical GLM regression models. The GAM allow us to include non-linear smoothers into the modeling strategy. In mathematical terms GAM solve the following equation:
The are smooth functions to be estimated. Equation (6.1) seems complex, but it is very simple to understand. The first thing to notice is that with GAM we are not necessarily estimating the response directly, i.e. we are not modelling y. In fact, as with GLM we have the possibility to use link functions to model non-normal response variables (and thus perform Poisson or logistic regression) . Therefore, the term is simply the transformation of y needed to “linearize” the model. When we are dealing with a normally distributed response this term is simply replaced by y. The second part of the equation, where we have two terms: the parametric and the non-parametric part. In GAM we can include all the parametric terms we can include in Linear Model or GLM, for example linear or polynomial terms. The second part is the non-parametric smoother that will be automatically fitted, and it is the key point of GAMs. A complete and lucid account of the GAM theory can be found in   .
We fitted the GAM to the data using the R-package “GAM”, and the next two lines are the needed code:
Call: gam(formula = y ~ code + x1 + x2 + x3 + x4, data = data2).
The following results are obtained from the GAM fitting to the data:
1) Null Deviance: 135512150629 on 112 degrees of freedom;
2) Residual Deviance: 74451674616 on 96 degrees of freedom.
From which: The correlation between observed counts and predicted counts is: (1-74451674616/135512150629)1/2 = 0.671.
The GAM results are shown in Table 4, and the Q-Q plot of the model residuals is given in Figure 3, showing that the model residuals are not as close to normality as the residuals of the MGPD regression model.
In this paper we demonstrated the use of the MGPD as a model for the ANCOVA. We used a two-steps approach. In the first step we used the regression models to
Table 4. The results of fitting the GAM: ANOVA for Parametric Effects.
***significant at level of significance less than 0.00001.
Figure 3. Plot of the quantiles of the residuals of the GAM nonparametric regression model against the quantiles of the standard normal distribution.
assess the influence of possible confounders and covariates on the outcome of interest. Thereafter we extracted the model residuals and used these residuals as a dependent variable of a nonparametric ANOVA, with the groups being the independent predictors. We note that while there was a significant difference among the group means in the univariate analysis, such difference was not significant in the second step of the ANCOVA. We note that the MGPD regression model showed high correlation (0.758) between the observed counts and the model based predicted counts, indicative of a good fit by the model to the given data. On using the Q-Q plot, model residuals are shown to have close agreement to the empirical quantiles of the standard normal distribution. This shows that the model is quite reliable as a predictive tool, and that the distribution of the estimated regression parameters is that of a multivariate normal.
For the sake of comparison, we fitted the data using the GAM, a nonparametric regression approach. This approach deals with the covariates as factors. The GAM model showed that after adjusting for the covariates within the same model, there are no significant differences among regions. These findings are in agreement with those based on the MGPD regression. The GAM did not produce estimate for the dispersion parameter 𝟄. The measure of goodness of fit of the GAM was (0.671), which is much lower than that of the MGPD. The MGPD model has several advantages when compared to the GAM. First, The GAM cannot be used as a predictive tool, while the MGPD model can be used to predict the mean of the response variable. Second, the residuals of the MGPD regression model have a distribution that is almost normal. This emphasizes the reliability of the likelihood based statistical estimation of the model parameters. Finally, our two-steps approach to data fitting makes helps avoiding both overfitting and possible multicollinearity.
R-Code fitting the Generalized Poisson using the method of maximum likelihood.
###Notations: bi are the regression parameters, mu is the mean function, “n” is the number of observations “ll” denotes the ###log-likelihood function, eta is the linear predictor ###
#CALCULATING THE STANDARD ERRORS OF MLE
#### FINAL ESTIMATES -0.30007804 0.65884589 0.48926214 0.42536091 -0.38469265
### SHAPITO WILK TEST OF NORMALITY####
####ANOVA ON THE RESIDUALS WITH REGION BEING THE INDEPENDENT VARIABLEUSING KRUSKAL_WALLIS####
boxplot(data2_error~data2$Region,xlab=“Region”,ylab=“GPD Residuals”,main=“CODID-19 Deaths”)
###END OF CODE####.
 Joe, H. and Zhu, R. (2005) Generalized Poisson Distribution: The Property of Mixture of Poisson and Comparison with the Negative Binomial Distribution. Biometrical Journal, 47, 219-229.
 Consul, P.C. and Shoukri, M.M. (1988) Some Chance Mechanisms Related to a Generalized Poisson Probability Model. American Journal of Mathematical and Management Sciences, 8, 181-202.
 Shoukri, M.M. and Mian, I.U.H. (1991) Some Aspects of Statistical Inference on the Lagrange (Generalized) Poisson Distribution. Communication in Statistics: Computations and Simulations, 20, 1115-1137.
 Cox, D.R. and Reid, N. (1987) Parameter Orthogonality and Approximate Conditional Inference. Journal of the Royal Statistical Society. Series B (Methodological), 49, 1-39.
 Shoukri, M.M. and Al-Eid, M. (2020) Inference Procedures on the Ratio of Modified Generalized Poisson Distribution Means: Applications to RNA_SEQ Data. International Journal of Statistics in Medical Research, 9, 41-49.
 Consul, P.C. and Shoukri, M.M. (1984) Maximum Likelihood Estimation of the Generalized Poisson Distribution. Communications in Statistics, Theory and Methods, 13, 1533-1547.
 Al-Gahtani, S., Shoukri, M. and Al-Eid, M. (2021) Predictors of the Aggregate of COVID-19 Cases and Its Case-Fatality: A Global Investigation Involving 120 Countries. Open Journal of Statistics, 11, 259-277.
 Cockwell, P. and Fisher, L.-A. (2017) Global, Regional, and National Burden of Chronic Kidney Disease, 1990-2017: A Systematic Analysis for the Global Burden of Disease Study. The Lancet, 395, 709-733.
 Rottoli, M., Bernante, P., Garelli, S. and Gianella, M. (2020) How Important Is Obesity as a Risk Factor for Respiratory Failure, Intensive Care Admission and Death in Hospitalized COVID-19 Patients? Results from a Single Italian Center. European Journal of Endocrinology, 183, 389-397.
 Rashed, E.A., Kodera, S., Gomez-Tames, J. and Hirata, A. (2020) Correlation between COVID-19 Morbidity and Mortality Rates in Japan and Local Population Density, Temperature, and Absolute Humidity. International Journal of Environmental Research and Public Health, 17, 5447.
 GBD Colorectal Cancer Collaborators (2019) The Global, Regional, and National Burden of Colorectal Cancer and Its Attributable Risk Factors in 195 Countries and Territories, 1990-2017: A Systematic Review for the Global Burden of Disease Study 2017. The Lancet Gastroenterology and Hepatology, 4, 913-933.