n Table 4. All the effects of the three treatment factors and their interactions were highly significant (α < 0.0001). This suggests that, by a large m (m = 99) and sufficient number of replicates (30), the random errors were under a good control. More importantly, this shows that γ values can easily and greatly be affected by the three treatment factors (Table 4).
Judging by the magnitude of the F values, Anal_V was by far the most important factor affecting γ. It had a large F value of 23,158. The F value of the Anal_V × Imp_V interaction ranked the second, and that of Imp_V, the third. The primary purpose of this study was to investigate the γ-δ relationship. One may expect δ to have a major effect on γ. However the F value of δ was only one tenth that of Anal_V and one third that of Imp_V. Nevertheless an F value of 2223.9 for δ was still gigantic, indicating that δ can greatly affect γ (Table 4).
Anal_V’s dominating effect on γ revealed by ANOVA can also be visualized by comparing the actual γ value changes attributed to δ, Anal_V and Imp_V. One way to do it is to compare the variation range of the γ means caused by different factors. Table 5 lists the maximum γ mean, the minimum γ mean and the range expressed as the percentage over the minimum for the three factors. The γ means at different δ varied a maximum of 105%, which is quite big. But this variation due to δ could easily be overshadowed by the effect of Anal_V, which
Table 3. Mean γ values for different Anal_V and Imp_V combinations at δ = 10% and m = 99.
Table 4. Results of ANOVA, showing the effects of δ, Anal_V, Imp_V, and their interactions on γ.
Table 5. Ranges of γ means for δ, Anal_V and Imp_V.
caused as much as 13439% difference in γ means! The minimum γ mean and the maximum γ mean of δ were within the same order of magnitude, i.e. 0.001, whereas the minimum γ mean and the maximum γ mean of Anal_V were across three orders of magnitude, i.e. 0.00001, 0.0001 and 0.001 (Table 5). The effect of Imp_V as reflected by the variation in γ means was 135%, which was slightly greater than that of δ but much smaller than that of Anal_V (Table 5). Owing to the large impacts of other factors on γ, it is impossible to predict γ using δ.
3.3. Results of Regression Analysis
The linear model γ = a + bδ was applied to data of all Anal_V × Imp_V combinations. The values of the regression coefficient b and the results of the t test are presented in Table 6. The values of b were very small due to small γ values. It’s t values, however, are very big: generally in two digits and highly significant at P < 0.0001 level. The largest t values was a remarkable number of 61.64, which occurred with the REGION for SIZE100. The large, positive t values indicate that γ would linearly increase with δ. There are exceptions, however. The t value for DERIVED with SIZE100 combination is negative (−4.22) and very significant (P < 0.0001), indicating a strong tendency for γ to decrease with the increase in δ in this particular case. The b value and the t value for
Table 6. The regression coefficient (b) and the corresponding t value for the linear model γ = a + bδ for different Anal_V × Imp_V combinations.
DERIVED with SIZE20 were negative too (t = −1.81, p = 0.07) (Table 6).
3.4. Effects of Anal_V on γ-δ Relationship
Figure 1 graphically shows how γ would change as δ increased from 4% to 29%. The four Anal_V treatments were presented in the same graph to make it easier for readers to visualize the main effect of Anal_V on γ-δ the relationship. CONTROL and REGION had much smaller γ values than PRIMEMP and DERIVED. They were graphed at a different scale in graphs a2 and b2 so that their γ-δ curves can be better visualized. The change patterns of γ with increased δ were drastically different among the four analytic treatments, indicating that how the data were analysed had major effect on the γ-δ relationship (Figure 1(a1) and Figure 1(b1)). For CONTROL and REGION, γ increased linearly with the increase of the δ (Figure 1(a2) and Figure 1(b2)). For PRIMEMP and DERIVED, however, the γ-δ curve might increase, zigzag, remain flat, or decrease in a highly significant (α < 0.01) manner with the increase of δ (Figure 1(a1) and Figure 1(b1)). The effects of PRIMEMP and DE- RIVED on the γ-δ relationship were affected by Imp_V. For example, as δ increased from 4% to 29%, the line of DERIVED zigzagged in SIZE100 but never went downward in SIZE5 (Figure 1(a1) and Figure 1(b1)).
γ values were also calculated for different values of an Anal_V treatment, for instance, Northeast vs. West in REGION. Would different levels or categories of the same Anal_V treatment have different γ values and γ-δ relationships? Data of the first four categories in the value list of PRIMEMP (see Table 2) are presented for SIZE100 and SIZE5 in Figure 2. The four PRIMEMP categories had different γ values and γ-δ relationships, as visualized by the line graphs of Figure 2. When δ increased, γ usually increased but could decrease or remain unchanged (Figure 2). For the same PRIMEMP category, the line heights (indicating γ values) and shapes (indicating the γ-δ relationship) were similar between SIZE100 (Figure 2(a)) and SIZE5 (Figure 2(b)).
3.5. Effects of Imp_V on γ-δ Relationship
Figure 3 shows the effect of Imp_V on the γ-δ relationship. The difference among SIZE5, SIZE20 and SIZE100 had different patterns with different Anal_V treatments. When Anal_V = CONTROL, the γ-δ lines were similar among SIZE5, SIZE20 and SIZE100 (Figure 3(a)). When Anal_V = REGION, the γ-δ lines were almost identical between SIZE5 and SIZE20, but γ values were lower and the slope of the γ-δ line was smaller for SIZE100 (Figure 3(b)). When Anal_V = PRIMEMP, the general trends of the γ-δ lines were similar among SIZE5, SIZE20 and SIZE100 (Figure 3(c)). Large differences in the γ-δ relationship existed among SIZE5, SIZE20 and SIZE100 when Anal_V = DERIVED (Figure 3(d)). In summary, Imp_V might or might not have significant impact on the γ-δ relationship depending on how the data would be analysed.
4. Additional Discussions
4.1. Why Were γ Values So Small?
From the Equations (2) to (7), we see that when, γ becomes B/T, where T = B + U, as indicated by
Figure 1. Effects of Anal_V on the γ-δ relationship. Since CONTROL and REGION had much smaller γ, they are graphed at different scale for better visualization in a2 and b2.
Figure 2. Effects of different categories of PRIMEMP on the γ-δ relationship. Data of the first four categories of PRIMEMP (Table 2) are presented.
Equation (8). The m value chosen for the MI trials in this study, m = 99, is very large comparing to what Rubin recommended  . m = 99 could be regarded as an approximation of. PWS12 had a sample size of n = 2567. The “quantity of interest”, i.e. Q for Equation (6), is the sample mean. From Equations (5) and (6), we can see that B is the variance of the sample means, and U is the mean of the sample variances. Based on the classic
Figure 3. Effects of Imp_V on the γ-δ relationship.
statistics, the sample variance s2 and the variance of the sample means has the following relationship  :
If all 2567 values were randomly drawn during the MI process, then can be approximated by B, and s2 can be approximated by U, and we will have the following relationships:
Since U ≤ T, we will have γ ≤ 1/2567 = 0.00039. The assumption that all values are regenerated by the imputation process would be possible only if δ = 100%. The largest δ in this study was 29%, meaning that 71% of the values would be identical among the 99 samples generated by MI. B and γ would be smaller than if the all the values were generated by MI. This may help explain why the γ values were so small.
4.2. The Implications of Small γ
Rubin’s conclusion of m ≤ 5 as being sufficient assumes γ ≤ 0.5. For a long time, researchers who adopted a small m in their MI programs may have had a concern that their γ might be greater than 0.5 so that Rubin’s requirement for small m as being sufficient was violated. The result of this study suggest that γ may never be greater than 0.5. Now that we no longer need to worry about γ being too large, should we use just a few imputations in our MI without any other concerns? The answer would be a “Yes” only if we accept that Rubin’s γ-based RE can indeed be legitimately used for determining the sufficient m for MI under any circumstances.
Should the γ’s being too small be a concern? The γ for PWS12 as determined in this study was <0.01. With such small RE, Rubin’s γ-based RE will assume a value of 1 even at m = 1, meaning that a single imputation is sufficient. With such a small γ, we may conclude that MI is meaningless. Or, alternatively, we may conclude that the γ-based RE is inappropriate for determining the sufficient m for MI. Nowadays MI is a popular approach in dealing with missing data  . To deny MI entirely just because of the smallness of γ does not seem to be reasonable, for there may be other ways to define RE of MI and there may be other ways to determine the sufficient m for MI  .
4.3. Are γ and δ Comparable?
Rubin made the following statement in his 1987 book  : “The quantity of γ0 is equal to the expected fraction of observations missing in the simple case of scalar Yi with no covariates, and commonly is less than the fraction of observations missing when there are covariates that predict Yi,” where γ0 is the same as the γ in Equation (1), Y is the imputed variable and Yi is the value of Y for the ith unit. This statement about the γ-δ relationship might lead people to have a feeling that γ and δ are linked, comparable, and similar in magnitude. In this study, we see that γ was one, two, or sometimes three orders of magnitude smaller than δ. This enormous difference between γ and δ cannot be possibly explained by the existence or absence of covariates as suggested by Rubin  .
As indicated by Equation (8), γ is essentially a ratio of variances, whereas δ is simply a ratio of sampling unit counts. After the survey is completed, δ is a fixed value, whereas γ is still undecided prior to the MI and even after the MI is completed. As a ratio of variances, γ may be regarded as the second order of statistics which involves a squaring of the first order of statistics such as the sample mean, the standard deviation, and the sample size. On the other hand, as a ratio of sampling unit counts, δ belong to the first order of statistics. The unit of γ would be at the same order as δ if we take the square root of γ. For CONTROL, when δ = 4%, 10%, 20%, and 29%, the corresponding γ means were 0.000013, 0.000035, 0.000056, and 0.000067, respectively, and the corresponding values were 0.003673, 0.005944, 0.007493, and 0.008181, respectively. Therefore, even if we take the square root of γ, the parameter could still be very different from δ. The results of this study suggest that γ and δ have totally different meaning and units and are not comparable.
As we see from Equations (2) to (6), δ is not a factor in defining γ. Data from this study suggest that even though δ has significant effect on γ, it is far from being a dominant factor. It may almost be impossible to mathematically relate δ to γ. The conditions for γ = δ has not been mathematically visualized in the published literature as well as in this study. Given the fact that the effect of δ on γ can be easily overshadowed by other factors, it is impossible to predict γ using δ even if a linear relationship may exist between γ and δ under certain circumstances.
5. Conclusive Summary
Using the real survey data from PWS12, MI was performed at m = 99 and δ = 4%, 10%, 20%, and 29% for Imp_V = SIZE5, SIZE20, and SIZE100. The γ valued were determined for Anal_V = CONTROL, REGION, PRIMEMP and DERIVED. The following conclusions may be drawn from the results and discussions of this study:
1. γ and δ have different meaning and units and are not comparable. γ is essentially a ratio of variances, whereas δ is a ratio of the counts of sampling units. δ is a fixed value once the survey is complete, whereas γ is not known prior to and after the MI.
2. Anal_V had the dominating effect on γ as compared to other two factors tested, i.e. Imp_V and δ. The variation of γ due to Anal_V was 100 times greater than that due to δ, judging by the range of γ means. The effect of Imp_V was much smaller than that of Anal_V but greater than that of δ.
3. The linear increase of γ with increased δ was observed. The decrease of γ with increased δ was also observed. Even though the linear regression coefficients were highly significant (P < 0.0001) in the majority of cases, it may not be possible to predict γ using δ because the effect of δ can easily be overshadowed by the effect of Anal_V and Imp_V.
4. Rubin stated that γ would be equal to the expected δ in the simple case of no covariates, and commonly less than δ when there are covariates  . The magnitude of γ in this study varied from 0.01 to 0.000001 while δ varied from 4% to 29%. The enormous difference between γ and δ cannot possibly be explained by the presence or absence of covariates. The supposition that γ = E[δ] does not seem to be untenable.
5. The magnitude of γ obtained from this study was very small (<0.01). On the one hand, one may not need to worry about γ being too large (>0.5) when applying Rubin small-m recommendations. On the other hand, the smallness of γ may challenge the rationality to use the γ-based RE for determining the sufficient m. For γ < 0.01, a single imputation would be sufficient and MI would become meaningless if we use the γ-based RE to determine the sufficient m.
The authors sincerely thank Dr. Alan H. Dorfman, Office of Research and Methodology (ORM), NCHS, CDC, USA, for his valuable suggestions on the research and critical text editing of the paper.
*The views of this paper do not necessarily reflect the views of the National Center for Health Statistics (NCHS) or the Centers for Disease Control and Prevention (CDC) of the United States government.
 Schafer, J.L. (1997) Analysis of Incomplete Multivariate Data. Chapman and Hall/CRC, Washington DC, 89-145.
 Pan, Q., Wei, R., Shimizu, I. and Jamoom, E. (2014) Determining Sufficient Number of Imputations Using Variance of Imputation Variances: Data from 2012 NAMCS Physician Workflow Mail Survey. Applied Mathematics, 5, 3421-3430.
 Pan, Q., Wei, R., Shimizu, I. and Jamoom, E. (2014) Variances of Imputation Variances as Determiner of Sufficient Number of Imputations Using Data from 2012 NAMCS Physician Workflow Mail Survey. 2014 JSM Proceedings, Statistical Computing Section, American Statistical Association. Alexandria, 3276-3283.
 Graham, J.W., Olchowski, A.E. and Gilreath, T.D. (2007) How Many Imputations Are Really Needed? Some Practical Clarifications of Multiple Imputation Theory. Prevention Science, 8, 206-213.
 Allison, P. (2012) Why You Probably Need More Imputations Than You Think.
 Schafer, J.L. and Graham, J.W. (2002) Missing Data: Our View of the State of the Art. Psychological Methods, 7, 147-177. http://dx.doi.org/10.1037/1082-989X.7.2.147
 Hershberger, S.L. and Fisher, D.G. (2003) A Note on Determining the Number of Imputations for Missing Data. Structural Equation Modeling. Structural Equation Modeling, 10, 648-650.
 Bodner, T.E. (2008) What Improves with Increased Missing Data Imputations? Structural Equation Modeling: A Multidisciplinary Journal, 15, 651-675.
 Asendorpf, J.B., et al. (2014) Reducing Bias Due to Systematic Attrition in Longitudinal Studies: The Benefits of Multiple Imputation. International Journal of Behavioral Development, 38, 453-460.
 Bartlett, J.W., et al. (2015) Multiple Imputation of Covariates by Fully Conditional Specification: Accommodating the Substantive Model. Statistical Methods in Medical Research, 24, 462-487.
 Basagana, X., et al. (2013) A Framework for Multiple Imputation in Cluster Analysis. American Journal of Epidemiology, 177, 718-725.
 Biering, K., et al. (2015) Using Multiple Imputation to Deal with Missing Data and Attrition in Longitudinal Studies with Repeated Measures of Patient-Reported Outcomes. Clinical Epidemiology, 7, 91-106.
 Pan, Q. and Wei, R. (2015) Relationship between Missing Information and Missing Data in 2012 NAMCS Physician Workflow Mail Survey. 2015 JSM Proceedings, Statistical Computing Section, American Statistical Association, Alexandria, 2630-2637.
 Jamoom, E., Beatty, P., Bercovitz, A., et al. (2012) Physician Adoption of Electronic Health Record Systems: United States, 2011. NCHS Data Brief, No 98, National Center for Health Statistics, Hyattsville.
 Andridge, R.R. and Little, R.J.A. (2010) A Review of Hot Deck Imputation for Survey Non-Response. International Statistical Review, 78, 40-64.