Coronavirus disease 2019 (COVID-19) is now a considered a pandemic by the World Health Organization. The main objective of this study is to report on the association between regional case fatality of COVID-19, kidney diseases mortality, diabetes, population density and the Gross Domestic Product (GPD). This cross-sectional historical data was constructed by combining data from several sources  - .
The above sited data sources were accessed on December 2-2020. We included in the study the cumulative number of COVID-19 cases and the associated death counts by country as of December 2-2020. We excluded countries that had cumulative count less than 10,000 cases. The data base has 120 countries, and we divided them into regions according to the classification given in data source number 2, resulting in 15 regions. This classification is shown in Table 1. This table has in the first column the names of the countries, the second column is the name of the region they belong to, within brackets, the number of countries in that region, and the third column is a code given to each region.
Basically, we have two outcome variables of interest: 1) The aggregate number of cases per country over the period ending December 2-2020 (AC). 2) The COVID-19 Case Fatality (CF). This is calculated as:
AC = Count of COVID-19 cases analyzed at the regional level.
CF = Count of deaths attributed to COVID-19/(Count of COVID-19 cases) × 100,000.
This paper has three objectives. Firstly, we quantify the degree between regions variation in both AC and CF. The second objective is to identify the important factors associated with AC and CF. We use machine learning algorithm to build a regression models with the entire data set divided into learning set and validation test to quantify the predictive accuracy of the constructed models.
2. Selected Risk Factors
2.1. Chronic Kidney Diseases
Chronic Kidney Disease (CKD) is an important contributor to morbidity and mortality from noncommunicable diseases, and this disease should be actively addressed to meet the UN’s Sustainable Development Goal target to reduce premature mortality from non-communicable diseases by a third by 2030 . We extracted the CKD prevalence and the associated mortality from the countries listed in table. The rationale was that there are many published articles
Table 1. Countries and the corresponding Regional classification as given in https://doi.org/10.1016/s0140-6736(20)30045-3.
CSSA = Central Sub-Saharan-Africa, MENA = Middle East and North Africa, HIAP = High Income Asian Pacific, WSSA = Western Sub-Saharan-Africa, SSSA = Southern Sub-Saharan Africa.
highlighting the importance of CKD as a possible risk factor for COVID-19 mortality.
A recent meta-analysis  outlined several reasons urging investigators to emphasize the importance of CKD during the COVID-19 infection. It was also noted in  that CKD has not attracted enough awareness due to its inconspicuous course, especially in the early stage. Both diabetes and hypertension are the leading causes of CKD in all developed countries and many developing countries, and the long-term or advanced CKD usually increases the risk of cardiovascular diseases. To be noted, these conditions accompanying CKD are all risk factors that exacerbate the COVID-19 patients. The present study highlights the importance of CKD as a risk factor for COVID-19 mortality. Some reports either did not include information on CKD or failed to state the definition of CKD used in the study. By contrast, the study by Williamson et al.  includes data for three subgroups with CKD. These data also demonstrate that patients with severe forms of CKD have a very high risk of COVID-19 mortality, which is even higher than that of other known high-risk groups, including patients with hypertension, obesity, chronic heart disease or lung disease   . The CKD data indicate that these patients deserve special attention with regard to COVID-19.
2.2. Population Density
Two studies   utilized data from Japan suggested that the population density, which is somewhat indicative of social distancing, was a significant factor associated with COVID-19 infection. The effect of population density on the morbidity rate was also discussed in a case study of Iran . These studies suggested that several cofactors introduce uncertainty. When discussing the effects of policies, a multi-city analysis representing different countries may be imperative. In multi-country analyses, the number of conducted tests may add uncertainty because this number depends on medical and economic resources for each country.
A recent population-based study from Italy  documented that the presence of comorbidities, including diabetes, were associated with a more severe course of COVID-19 and a higher fatality rate. Other studies from the most affected countries, including China, United States and Italy, seem to indicate that prevalence of diabetes among patients affected by COVID-19 is not higher than that observed in the general population, thus suggesting that diabetes is not a risk factor for SARS-CoV-2 infection. However, a large body of evidence demonstrate that diabetes is a risk factor for disease progression towards critical illness, development of acute respiratory distress syndrome, need for mechanical ventilation or admission to intensive care unit, and ultimately death. The mechanisms underlying the relationship between COVID-19 and diabetes remain to be elucidated. In particular, it is still unresolved whether is diabetes per se, especially if poorly controlled, or rather the various comorbidities/complications associated with it that predispose patients with COVID-19 to a worse prognosis. In fact, conditions that cluster with diabetes in the context of the metabolic syndrome, such as obesity and hypertension, or complicated chronic hyperglycemia, such as cardiovascular disease and chronic kidney disease, have also been associated with poor prognosis in these individuals and the available studies have not consistently shown that diabetes predict disease severity independently of them .
The estimated global prevalence was 9.3% in 2019 with an upward trend  . In the USA alone, more than 34 million adults had known or undiagnosed diabetes in 2018 . In 2017, diabetes was listed as the underlying or contributing cause of death on 270,702 death certificates, which corresponds to a crude rate of 83.1 per 100,000 persons . Infectious diseases are more frequent and can be associated with worse outcomes in patients with diabetes . Therefore, it is not surprising that diabetes has been considered as a possible risk factor or a predictor for worse outcomes in patients with coronavirus disease 2019 (COVID-19)   . COVID-19 rapidly reached the level of a pandemic and has caused more than 850,000 deaths worldwide within a few months despite unprecedented mitigation measures . The strength of the association between diabetes and COVID-19 has been investigated in observational cohorts around the world. We aimed to systematically review and conduct a meta-analysis of the available observational studies reporting the effect of diabetes on mortality among hospitalized patients with COVID-19.
We obtained the data of global prevalence of diabetes by country from . The rationale behind including diabetes in the data base of or study is to explore the strength of the country level diabetes prevalence with the CF outcome variable.
2.4. Gross Domestic Product (GDP) and the Percentage of Spending on Health Care (PHSC)
A recent article  analyzed data from 88 countries included as a covariate the percentage of government spending on healthcare. This factor is an indicative of the level of preparedness of the healthcare system to face urgent crisis such as the one caused by the COVID-19 pandemic. The authors showed that there is a significant relationship between an increase in health care spending and reduction in the COVID-19 case fatality. The finding that countries with strong healthcare capacity had fewer deaths per confirmed case was unsurprising. In this sense, the  study confirms previous research on the association of overall mortality (all causes) and healthcare funding .
3. Data Analysis
We divided the data analysis into two sections. In the first section univariate and descriptive statistics are produced with graphics. We also explore the extent of regional variability for the parameters of interest. This is measured by comparing the between regional variation to the within regional variations. A widely used statistic in this regard is the “Intraclass Correlation Coefficient” (ICCC). In the second stage we first identify the risk factors associated with the outcome of interest. Factors that are significantly correlated with the outcome of interest are used in a multivariate analysis model. The model that we used identifies regions as random effects. Using predictive analytic machine learning regression approach to evaluate the joint effect of the selected covariates on the outcome of interest is done by splitting the data into training data (70% of the entire data) and validation or test set (the remaining 30% of the data).
3.1. Descriptive Measures (Table 2)
The lowest average population density is in region 4, or, HIAP (47267.5 ± 51256.458).
The highest average population density is in region 7, or, West Europe (2628158.14 ± 6383767.55).
The histogram of population density is shown in Figure 1.
In Table 3 we present the summary statistics for the aggregate number of COVID-19 cases.
Figure 1. Histogram of log-population density.
Table 2. Summary measures of population density.
Table 3. Summary measures of the aggregate number of COVID-19 cases by region.
From Figure 2 we can see that the distribution of the number of cases is highly skewed to the right and the data in Table 1 shows that the variance is much larger than the mean, a phenomenon known as “over dispersion”.
In Table 4 we present the mean, standard deviations of CF as defined above. The largest mean CF is in Andean Latin America (5626.36 ± 1685), while the smallest mean number of CF is in Central Asia (1199.95 ± 402.64).
The large variations in CF can better be depicted from the boxplot of the data as shown by the boxplot as given in Figure 3.
It appears from the above boxplot (Figure 3) that there is considerable variation in the distribution of CF. One way to stabilize this variation is to employ a specific transformation. We selected the logarithmic transformation, so that the dependent variable of interest is Y = Log (CF). The histogram of this new outcome variable Y is given in Figure 4. The Q-Q plot of Y is given in Figure 5 and it shows the closeness of the distribution of Y to that of the normal distribution.
The primary predictor of interest is CKD case fatality (Table 5).
As can be seen from Figure 6, there is great amount of variation among the 15 regions with respect to CKD case fatality. The next covariate we examine is the percent of GDP spending on healthcare. The summary measures are shown in Table 6, with the histogram given in Figure 7.
The above histogram (Figure 7) is quite symmetric apart from an extreme outlier. We shall not employ any transformation on this variable.
In Table 7 we present the summary measures of the diabetes prevalence for each of the 15 regions in the data.
Figure 2. Histogram of the number of COVID-19 cases.
Figure 3. Boxplot of COVID-19 case fatality by region.
Figure 4. Histogram of Y = Log COVID-19 case fatality.
Figure 5. Q-Q plot of the logarithm of COVID-19 case fatality.
Figure 6. Boxplot for the CKD case fatality.
Figure 7. Histogram of the distribution of percentage of GDP on healthcare.
Table 4. COVID_CASE_FATALITY = (COVID death count/COVID Cases) × 100,000.
Table 5. CKD_CASE_FATALITY (2017).
Table 6. Percent of GDP spending on healthcare (2018).
Table 7. Diabetes prevalence worldwide by region (2019).
The lowest diabetes prevalence is in WSSA (2.95), while the highest is in MENA region (11.965).
3.2. Detection of Regional Clustering
In this section we shall quantify the degree of clustering for any continuous variable.
We assume that we have k regions and that the individual units within each region are the countries as indicated in Table 1. To articulate this concept, we first assume that the quantity of interest measured in the jth country within the ith region, yij is modelled as
= random regional effect and represents the collective effects of all regional level unmeasured covariates.
= the country within region deviation from the overall mean of the ith region
is the grand mean of all measurements in the population. It is assumed that the region effects are normally and identically distributed with mean 0 and variance , the errors are normally and identically distributed with mean 0 and variance , and the and are independent. For this model the ICCC, which may be interpreted as the correlation ρ between any two countries belonging to the same region, is defined as:
It is seen by definition that the ICCC is defined as non-negative in this model, a plausible assumption for the application of interest here. We also note that the variance components , and can be estimated from the one-way ANOVA mean squares (see; Shoukri, page 47)  given in expectation by
The ANOVA estimator of ρ is then given by
where MSB and MSW are, obtained from the usual ANOVA table, with corresponding sums of squares
, and .
Using the delta method, and to the first order of approximation, the variance of  is given by:
In our data, k = 15, N = 120, and n0 = 7.47.
We now start screening for the potential risk factors using bivariate correlation analysis. Table 8 provides the Pearson’s correlations and the associated p-values. Significant factors (p-value < 0.05) are candidate for entry into the multivariate model. Table 9 accounts for the regional clustering effect. The clustering parameter (Intra Class Correlation Coefficient) affects the standard errors by a quantity known as the “Design Effect” or DEFF. The effect of the clustering is that the standard errors must be multiplied by the corresponding DEFF when we desire to construct confidence limits on the mean of the variable of interest.
It should be noted that the general definition of the “Design Effect” or DEFF is , that is when reporting the standard errors of the means in Tables 3-6 the standard error reported in these tables must be multiplied by the corresponding DEFF. We note that the between regions variations is quite significant for all the parameters in Table 8, except for the COVID-19 cases where the ICCC is quite low for that parameter. This means that there is a large amount of variability between countries within regions with respect to the COVID-19 cases.
Table 8. Bivariate correlations between variables in the data set.
Table 9. Regional clustering of variables of interest. DEFF = (1 + 6.47 * ICCC)1/2.
*No regional clustering.
In the next section we develop multiple regression models relating the target outcome variables; COVID-19 cases, and COVID-19 case fatality using the significant predictors.
4. Multivariate Regression Analyses
4.1. Negative Binomial Regression Model (NBRM)
From Table 9, the outcome variable, the number of COVID-19 cases does not exhibit regional clustering. That is one can safely ignore the region as a predictor. However, this variable (which is an integer variable) exhibits large amount of over dispersion, (the variance is much larger than the mean). The commonly used regression models of counts in the presence of over dispersion are constructed using the Negative Binomial Regression Model (NBRM) . For this model we selected number of COVID-19 case as the dependent variable, and the three risk factors that are, in the univariate screening step for potential predictors, significantly correlated with it as shown in Table 8. It is interesting to see that in the NBRM multiple regression model the three covariates are jointly significantly associated the number of COVID-19 cases. The results are shown in Table 10. For this type of models, the “Scaled Deviance” is taken as a measure of goodness of fit of the NBRM. An optimal model should have a scaled deviance very close to unity. The scaled deviance for our model is 1.417. In our opinion this value is not substantially higher than unity, and the results based on the NBRM are quite useful. As the brilliant statistician George Box once said: All models are wrong, some of them are useful.
4.2. Linear Mixed Effects Modeling of COVID-19 Case Fatality
Before we proceed with the data analysis of COVID-19 Case fatality we should emphasize the hierarchical structure of the data. Basically, the data have two levels; the higher level consists of regions and the lower level consists of countries nested within regions. The results of the fitted model are summarized in Table 11 and Table 12. In Table 11 we have the results of the F-statistic to test the significance of all effects. In Table 12, we have the estimated regression coefficients.
Table 10. Results of the negative binomial regression of the aggregate number of COVID-19 cases.
Dependent variable: COVID cases. Model: (Intercept), CKD_COUNT_2017, Percent-expenditure on healthcare.
Table 11. Dependent variable: LOG_COVID_FATALITY. The type III sums of squares.
Table 12. Dependent variable: LOG_COVID_FATALITY.
All risk factors we included in the model are deemed significant, except diabetes which was not significant neither in the univariate screening stage nor in the multiple regression model. In Note that in Table 12, the region WSSA was selected as the reference category when the computer program created dummy variables for the 15 regions.
Now the model-based prediction for the random effects model the correlation between observed and predicted outcome is R = 0.765, which is quite high. Moreover, we found no association between the model-based predictions and the residuals indicating that no model assumptions have been violated.
The data that we analyzed are two-levels or hierarchical. We used R software  and the commercial software SPSS version 25  for the data analyses. The univariate analyses showed that there is strong regional clustering of CKD, population density and percentage of GDP spending on health care. But the variability between regions in the COVID-19 cases was significantly lower than the variability with regions.
The analyses in this paper targeted two important outcome variables. The first is the aggregate number of COVID-19 cases. The NBRM found that CKD count, population density, and the percentage of GDP spent on healthcare were the most significant risk factors associated with this outcome. The second outcome variable is the COVID-19 case-fatality. The predictive analytic procedure showed that regional effect, log-population density, per-capita GDP, and CKD case fatality are the most important predictors of this outcome. The analysis showed that neither diabetes, nor the percentage of GDP spending on healthcare is significant predictors of COVID-19 case fatality.
Since the analyses in this paper are ecological, the conclusions should not be applied at the within region country level. By contrast, there may be several regional area-level environmental factors that we were not able to explore that may explain the correlation we see between regional level prevalence of health-related risk factors and case fatality of COVID-19.
Since these data did not include within country individual subjects’ information on the health status of the case patients, conclusions cannot be drawn about within countries individual risk factors. However, this analysis suggests that there are important regional-level variations in COVID-19 infections that are correlated with variations in other chronic conditions, suggesting that the factors that influence health disparities may also be operating on the distribution of COVID-19.
Furthermore, there presently are limitations with overall infection count data. The analyses were conducted with data only in the first wave of the COVID-19 pandemic in the world. This fact may explain the associations observed with both CKD and population density, rather than indicators of who is being infected, and these data continue to reflect those countries with the most severe outcomes after infection. Finally, the implementation of restrictions on travel and in-person activities may also impact overall rates of COVID-19 during this first wave. It should also be noted that several new strains of the virus have emerged near the end of 2020. Therefore, new and more comprehensive data on the new variant of the COVID virus through the winter of 2021 may clarify the association with other comorbid conditions.
In summary, these analyses found that countries within the fifteen regions estimated CKD case fatality and population density were significant ecologic predictors of case fatality of COVID-19.
 GBD Chronic Kidney Disease Collaboration (2020) Global, Regional, and National Burden of Chronic Kidney Disease, 1990-2017: A Systematic Analysis for the Global Burden of Disease Study 2017. The Lancet, 395, 709-733.
 Zhou, Y.Z., Ren, Q.D., Chen, G., Jin, Q., Cui, Q.X., Luo, H.T., Zheng, K., Qin, Y. and Li, X.M. (2020) Chronic Kidney Diseases and Acute Kidney Injury in Patients with COVID-19: Evidence from a Meta-Analysis. Frontiers in Medicine, 7, Article ID: 588301.
 Guan, W.J., et al. (2020) Comorbidity and Its Impact on 1590 Patients with COVID-19 in China: A Nationwide Analysis. European Respiratory Journal, 55, Article ID: 2000547.
 Li, X., et al. (2020) Impact of Cardiovascular Disease and Cardiac Injury on In-Hospital Mortality in Patients with COVID-19: A Systematic Review and Meta-Analysis. Heart, 106, 1142-1147.
 Diao, Y.L., Kodera, S., Anzai, D., Gomez-Tames, J. and Rashed, E.A. (2021) Influence of Population Density, Temperature, and Absolute Humidity on Spread and Decay Durations of COVID-19: A Comparative Study of Scenarios in China, England, Germany, and Japan. One Health, 12, Article ID: 100203.
 Rashed, E.A., Kodera, S., Gomez-Tames, J. and Hirata, A. (2020) Correlation between COVID-19 Morbidity and Mortality Rates in Japan and Local Population Density, Temperature, and Absolute Humidity. International Journal of Environmental Research and Public Health, 17, 5447.
 Ahmadi, M., Sharifi, A., Dorosti, S., Jafarzadeh Ghoushchi, S. and Ghanbari, N. (2020) Investigation of Effective Climatology Parameters on COVID-19 Outbreak in Iran. Science of the Total Environment, 729, Article ID: 138705.
 Pugliese, G., Vitale, M., Resi, V. and Orsi, E. (2020) Is Diabetes Mellitus a Risk Factor for Corona Virus Disease 19 (COVID-19)? Acta Diabetologica, 57, 1275-1285.
 Palaiodimos, L., Chamorro-Pareja, N., Karamanis, D., Li, W.J., Zavras, P.D., Chang, K.M., Mathias, P. and Kokkinidis, D.G. (2020) Diabetes Is Associated with Increased Risk for In-Hospital Mortality in Patients with COVID-19: A Systematic Review and Meta-Analysis Comprising 18,506 Patients. Hormones.
 Zhou, B., Lu, Y., Hajifathalian, K., Bentham, J., Di Cesare, M., Danaei, G., Bixby, H., Cowan, M.J., Ali, M.K., Taddei, C. and Lo, W.C. (2016) Worldwide Trends in Diabetes since 1980: A Pooled Analysis of 751 Population-Based Studies with Million Participants. The Lancet, 387, 1513-1530.
 Saeedi, P., Petersohn, I., Salpea, P., Malanda, B., Karuranga, S., Unwin, N., Colagiuri, S., Guariguata, L., Motala, A.A., Ogurtsova, K. and Shaw, J.E. (2019) Global and Regional Diabetes Prevalence Estimates for 2019 and Projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas. Diabetes Research and Clinical Practice, 157, Article ID: 107843.
 Casqueiro, J., Casqueiro, J. and Alves, C. (2012) Infections in Patients with Diabetes Mellitus: A Review of Pathogenesis. Indian Journal of Endocrinology and Metabolism, 16, S27.
 Hussain, A., Bhowmik, B. and do Vale Moreira, N.C. (2020) COVID-19 and Diabetes: Knowledge in Progress. Diabetes Research and Clinical Practice, 162, Article ID: 108142.
 Maddaloni, E. and Buzzetti, R. (2020) COVID-19 and Diabetes Mellitus: Unveiling the Interaction of Two Pandemics. Diabetes/Metabolism Research and Reviews, 36, e33213321.
 Angelidi, A.M., Belanger, M.J. and Mantzoros, C.S. (2020) COVID-19 and Diabetes Mellitus: what We Know, How Our Patients Should Be Treated Now, and What Should Happen Next. Metabolism, 107, Article ID: 154245.
 Khan, J.R., Awan, N., Islam, M.M. and Muurlink, O. (2020) Healthcare Capacity, Health Expenditure, and Civil Society as Predictors of COVID-19 Case Fatalities: A Global Analysis. Frontier in Public Health, 8, 347.
 Korda, R.J. and Butler, J.R.G. (2006) Effect of Healthcare on Mortality: Trends in Avoidable Mortality in Australia and Comparisons with Western Europe. Public Health, 120, 95-105.