The ability to gauge hospital performance using patient outcome data depends upon many factors. In principle, the outcome needs reflect features that are directly affected by the quality of hospital care, to name but a few; mortality, readmission rates patients and employee satisfaction. Beyond this, however, there are a number of important data and statistical considerations:
1) Data must be available and used to adjust for differences in patient health at admission across different hospitals (case-mix differences). These adjustments are required to ensure that variations in reported performance apply to hospitals’ contributions to their patients’ outcomes rather than to the intrinsic difficulty of the patients they treat. Needless to say that, performance of the adjustments depends on the type and quality of available data.
2) In distinct contrast to the previous point, reported performance should not adjust away differences related to the quality of the hospital. For example, if “presence of a special dialysis care unit” is systematically associated with better survival following organ failure, a hospital’s reported performance should capture the benefit provided by that unit and as a consequence such hospital-level characteristic should not influence the risk adjustment.
3) The reported performance measure should be little affected by the variability associated with rates based on the small numbers of cases.
In this report we address technical statistical issues associated with the KFSHRC hospital mortality data. The salient point is that there is no consensus to guide our choice of an appropriate statistical model. However, we shall use the most scientific statistical models to analyze our data. To enhance the traditional modeling techniques we include use of more flexible models incorporating Diagnosis Related Groups (DRG) adjustment  ; stressing the use of statistical distributions that do not belong to the well-known Gaussian family used in the hierarchical, random effects models; evaluation of the effectiveness of current outlier detection methods; and consideration of producing an ensemble of hospital-specific Standardized Mortality Ratios (HSMR) that accurately estimates the true, underlying distribution of ratios  . Discussion with clinicians and other quality experts concluded that risk adjustments should not reflect hospital characteristics, but their use in reducing confounding of the case-mix/ risk relation. Statistical models are available for each of these operations. The ability to develop and implement such models is now available since the adoption of the SAS software and the acquisition of its important components.
In Section 2 we define what is meant by DGR, and in Section 3 we describe the data that were made available to us, with mortality as the primary outcome at the King Faisal Specialist Hospital (KFSHRC). In Section 4 we compare the models, and in Section 5 we discuss the quantitative merits of these models, followed by recommendations.
2. The Importance of Incorporating DRG’s within the Proposed Models
The Diagnostic Related Groups (DRGs) were first developed at Yale University in 1975. The main objective was to group patients with similar treatments and conditions for comparative studies. DRGs were designed to be homogeneous units of hospital activity to which binding prices could be attached. A central theme in the advocacy of DRGs was that this reimbursement system would, by constraining the hospitals, oblige their administrators to alter the behavior of the physicians and surgeons comprising their medical staffs. Hospitals were forced to leave the nearly risk-free world of cost reimbursement, and face the uncertain financial consequences associated with the provision of health care. DRGs were designed to provide practice pattern information that administrators could use to influence individual physician behavior.
DRGs were designed to be homogeneous units of hospital activity to which binding prices could be attached. A central theme in the advocacy of DRGs was that this reimbursement system would, by constraining the hospitals, oblige their administrators to alter the behavior of the physicians and surgeons comprising their medical staffs. Hospitals were forced to leave the nearly risk-free world of cost reimbursement, and face the uncertain financial consequences associated with the provision of health care. DRGs were designed to provide practice pattern information that administrators could use to influence individual physician behavior.
In 2007, author Rick Mayes  described DRGs as:
...the single most influential postwar innovation in medical financing: Medicare’s prospective payment system (PPS). Inexorably rising medical inflation and deep economic deterioration forced policymakers in the late 1970s to pursue radical reform of Medicare to keep the program from insolvency.
In the USA the most significant change in health policy since Medicare and Medicaid’s passage in 1965 went virtually unnoticed by the general public  . Nevertheless, the change was nothing short of revolutionary. For the first time, the federal government gained the upper hand in its financial relationship with the hospital industry. Medicare’s new prospective payment system with DRGs triggered a shift in the balance of political and economic power between the providers of medical care (hospitals and physicians) and those who paid for it―power that providers had successfully accumulated for more than half a century. From statistical view point DRG’s are considered artificial clusters of subjects.
Krumholz et al.  discussed several factors that should be considered when assessing hospital quality. These relate to differences in the chronic and clinical acuity of patients at hospital presentation, the numbers of patients treated at a hospital, the frequency of the outcome studied, the extent to which the outcome reflects a hospital quality signal, and the form of the performance metric used to assess hospital quality. However, issues related to DRG have not been considered as factors of importance. Since the outcome of interest is hospital mortality, any attempt to derive risk adjusted mortality that does not take into account the relative importance of DRG will produce biased estimates  . The performance measure is reported as:
Observed # deaths/Expected (model based) # of deaths (1)
The denominator of Equation (1) results from applying a model that adjusts/standardizes for an ensemble of patient-level, pre-admission risk factors, rather than only demographic factors such as age and gender as is typical in epidemiological applications. The statistical issues arising in the estimation of the standardized death rate and the SMR are identical because the latter is simply the hospital-specific value divided by the expected number of deaths computed from the postulated risk model.
3.1. Study Design
Hospital discharge status, available from the hospital medical records from 2014 through 2016 were extracted. For each subject, the age at admission, length of stay and DRG membership were included in this cross sectional retrospective design. The study was reviewed and approved by the Institutional Review Board at the King Faisal Specialist Hospital and Research Center (KFSHRC).
3.2. Study Variables
3.2.1. Dependent Variable
Discharge status is the dependent variable (dead/alive). Because of the Bernoulli distribution of the outcome, the log-odds of death were calculated in the analytical cohort.
3.2.2. Independent Variables
Regression models included parameters that defined age at admission, length of stay, gender, and DRG. Because DRG is a categorical variable with excessive number of levels our modeling strategy used DRG as a clustering variable, and as a random effect variable. The fundamental aim was to adjust the standard errors of the estimated model parameters for the possible within DRG correlation. Another reason is, to preserve the stochastic process and hierarchical structure of the data, and develop an effective risk adjustment. Because patient-specific outcomes are binary (death indicator), a Bernoulli model operating at the patient level is appropriate. Risk adjustment and stabilization should adopt this model and thus logistic regression is a suitable approach for including the effects of patient-level characteristics. With flexible modeling of covariate influences, the model would produce a valid risk adjustment and there is no reason to replace the logistic by another function.
The evaluation process must be based on an effective risk adjustment. Though one might wish to have additional information of patient attributes and clinical severity, even with currently available data we should evaluate whether a more flexible risk adjustment model will improve performance. Patient characteristics (clinical and demographic) are of the three types, measured and accounted for, measurable but not accounted for, and characteristics that are difficult or impossible to measure. Prudence dictates that risk adjustments should include pre-admission medical conditions, but whether or not to include demographic attributes is a policy decision.
3.3. Statistical Analysis
Univariate and descriptive statistics were used to profile the study covariates, including the frequency distribution of the top twelve DRG’s, as shown in Table 1. Because of the binary nature of the outcome of interest (patient’s status when discharged), we fitted logistic regression models to estimate change in level (intercept) and trend (slope) on log-odds of age at admission and length of stay. Each model was adjusted to account for the clustering effect of DRG.
Three statistical estimation procedures in SAS (GLM, GEE, GLIMMIX) were used to account for the correlation between responses with a DRG and heterogeneity across individuals in the study. The intra-class correlation was calculated using the one-way ANOVA using the GLM procedure in SAS. Data management and analyses were accomplished via PC-SAS (v9.4)  , with an a-priori Type I error rate set at 0.01.
Table 1. The ICD9 diagnoses for the top most frequent DRG’s in the hospital data base.
The analyses produced point estimates and 95% confidence intervals of the odds ratios whether the two covariates were entered the models as continuous or as categorical variables.
Assuming that the number of DRG’s in the data base is k, and the size of the ith DRG is . The estimated intra-cluster correlation obtained from the one-way ANOVA using the GLM procedure in SAS (Shoukri  ) is given in Equation (2):
where MSBD and MSWD are respectively the between DRG mean squares and the within DRG mean squares. Moreover:
Summary statistics for age at admission and length of stay are presented in Table 2. Note that the standard deviation formula uses the (number of observations minus one) to produce an unbiased estimator for the corresponding population parameter (Shoukri  ).
The main purpose of using the GLM, which requires independent responses, is to produce a point estimator of the within cluster correlation (Shoukri,  ).
From the GLM procedure we have, , and . From Equation (1) the intra-cluster correlation coefficient .
The Effect of Dichotomization
Measurements of continuous variables are made in all branches of epidemiological studies aiding in the diagnosis and treatment of patients. In clinical practice it is helpful to label individuals as having r not having an attribute, such as
Table 2. Summary statistics for age at admission (AAA) and length of stay (LOS) presented by discharge status (Alive, Dead). (a) Status = Alive; (b) Status = Dead.
being “old” or “young” or having “long stay” depending on the number of days.
Dichotomization of continuous variables is also common in clinical research, but the statistical analysis has some serious drawbacks as there will be reduction in the precision of the estimated effect sizes. Though grouping may help data presentation, notably in tables, categorization is unnecessary for. Here we consider the impact of converting continuous data to two groups (dichotomizing), as this is the most common approach in clinical research.
Within each model we estimated for each effect the log-odds as an effect size using continuous and categorized covariates. We found that the GLIMMIX has superior advantage over the logistic regression and the GEE models. We calculated the optimal split for the AAA and LOS using the “Receiver Operating Characteristic curve” or ROC curve. Figure 1 and Table 3 shows the optimal cut off for LOS is 160 days, and the corresponding area under curve 73%. This means that the risk of death is significantly higher among patients who are hospitalized over 160 days relative to those who stay less than 160 days, (corrected P-value = 0.0001). Additionally, in Figure 2, and Table 4 we show that the optimal split for AAA is 53 years. The areas under the ROC curve corresponding to the dichotomized covariate AAA is 65%.
Figure 1. The ROC curve for the LOS.
Table 3. Area under the curve for the LOS optimal cut-off point.
Corrected P-value = 0.0001; The optimal cut-off point is LOS ≥ 160 days.
Figure 2. The ROC curve for AAA.
Table 4. Area under the curve, test result variable(s) for AAA.
Corrected p-value = 0.0001; The optimal cut-off pint is age ≥ 53 is associated with death.
Dichotomizing leads to several problems. Firstly, much information is lost, and this can been seen from the increase in the estimated standard errors of the odds ratios. Moreover, the odds ratios point estimates are inflated as well and therefore are potentially biased. For example, in Table 5 the estimated odds ratio of the dichotomized LOS is 14.53 while its value is 1.024 when measured on the continuous scale under the same model. The remark holds true if the fitted model is the GEE as shown in Table 6. The estimates are somewhat stable under the GLIMMIX and the results are shown in Table 7. One may conclude that dichotomization may increase the risk of a positive result being a false-positive.
Alternative to dichotomization we categorized age in a meaningful way such that:
Group 1: Age is less than 14 years
Group 2: Age between 15 and 30
Group 3: Age between 31 and 59
Group 4: Age above 60.
When we plotted the mortality rate, with 95% confidence limits, against the 4 age categories, as shown in Figure 3, there was an increasing trend in mortality
Figure 3. Death rate versus the 4 age categories.
Table 5. Estimating the odds ratios using the logistic regression models.
Table 6. Estimating the odds ratios using the GEE models.
Table 7. Estimating the odds ratios using the GLIMMIX models.
as age groups moved up. The one-degree of freedom Cochran-Armitage test for trend was quite significant with p-value < 0.001.
, Scaled deviance = 0.61, and, −2 Res log pseudo-likelihood = 477030.
The GLIMMIX estimation when age is categorized into 4 groups is given in Table 8. The odds ratio estimate of LOS, which is highly correlated with age has improved and in fact is almost similar to the estimated odds ratio when the age was taken as a continuous covariate. There is almost no change in the between DRG variance component estimate, , confirming the hypothesis that the measured covariates are not correlated with the random component in the model. The scaled deviance is less than one, indicating that the model has captured the effect of the measured and the unmeasured covariates. The value of −2 Res log pseudo-likelihood, (which is equivalently defined as the AIC),indicates that model goodness of fit is also acceptable.
Under the GLIMMIX model, whose results are summarized in Table 8, theCIHI  index of hospital performance when mortality is the outcome of interest is:
giving CIHI = NUM/DEN less than unity. This indicates that the hospital risk adjusted mortality meets the CIHI criteria for quality.
Although the fitted models produced odds ratio estimates whose changes and trend were in the same direction and of the same significance, the magnitude of point estimates and length of confidence intervals varied. Clearly the logistic regression produced a smaller length of confidence intervals. This should be expected, since this model ignores the nature of the correlation structure among responses within each DRG, and hence the standard errors are under-estimated. The GEE, introduced by Liang and Zeger  is suitable for the analysis of clustered data. The GEE estimation produced similar magnitude of point estimates but relatively less precise confidence intervals. The GEE is supposed to produce consistent estimates even if the within DRG correlation parameter is misspecified. In our case we assigned an exchangeable correlation, as a working correlation parameter to represent the average within DRG heterogeneity. Finally the GLIMMIX produced entirely different set of estimates, and an estimate of an
Table 8. GLIMMIX: Age is categorized into 4 groups with group 4 being the reference.
additional parameter representing the variance component or quantification of the between DRG variation.
When applied to dichotomous response variable, the GEE results are to be interpreted at the population average (PA) level and does not account for heterogeneity across the DGRs. The GEE indicates that the risk of death depends on AAA and LOS that are measured at the individual level and not on any random effect across the clusters of DRGs. This model uses a working correlation as an instrument to account for the within DRGs variations.
However the, the fundamental feature of the GLIMMIX is the assumption of heterogeneity across DRGs in our study population. The GLIMMIX interprets the estimated model parameter (the odds ratio) as conditioned on DRG specific intercepts. These intercepts reflect a natural heterogeneity due to unmeasured covariates. For example, presence or absence of co-morbid conditions, or re-admission (yes/no) may produce different trajectory for the risk of death. Therefore, there is as expected a sizable difference in the magnitude of uncertainties between the GEE and the GLIMMIX.
In conclusion, health services and quality of care research might use any of the above models depending on the scientific question posed. The GEE, will produce estimates that will be of most interest to quality health services and policy makers who evaluate hospital performance on an average level. On the other hand, the GLIMMIX will have RDRG level interpretation.