Effect Modeling of Count Data Using Logistic Regression with Qualitative Predictors

Author(s)
Haeil Ahn

Affiliation(s)

Department of Industrial Engineering, Seokyeong University, Seoul, Republic of Korea.

Department of Industrial Engineering, Seokyeong University, Seoul, Republic of Korea.

Abstract

We modeled binary count data with categorical predictors, using logistic regression to develop a statistical method. We found that ANOVA-type analyses often performed unsatisfactorily, even when using different transformations. The logistic transformation of fraction data could be an alternative, but it is not desirable in the statistical sense. We concluded that such methods are not appropriate, especially in cases where the fractions were close to 0 or 1. The major purpose of this paper is to demonstrate that logistic regression with an ANOVA-model like parameterization aids our understanding and provides a somewhat different, but sound, statistical background. We examined a simple real world example to show that we can efficiently test the significance of regression parameters, look for interactions, estimate related confidence intervals, and calculate the difference between the mean values of the referent and experimental subgroups. This paper demonstrates that precise confidence interval estimates can be obtained using the proposed ANOVA-model like approach. The method discussed here can be extended to any type of experimental fraction data analysis, particularly for experimental design.

We modeled binary count data with categorical predictors, using logistic regression to develop a statistical method. We found that ANOVA-type analyses often performed unsatisfactorily, even when using different transformations. The logistic transformation of fraction data could be an alternative, but it is not desirable in the statistical sense. We concluded that such methods are not appropriate, especially in cases where the fractions were close to 0 or 1. The major purpose of this paper is to demonstrate that logistic regression with an ANOVA-model like parameterization aids our understanding and provides a somewhat different, but sound, statistical background. We examined a simple real world example to show that we can efficiently test the significance of regression parameters, look for interactions, estimate related confidence intervals, and calculate the difference between the mean values of the referent and experimental subgroups. This paper demonstrates that precise confidence interval estimates can be obtained using the proposed ANOVA-model like approach. The method discussed here can be extended to any type of experimental fraction data analysis, particularly for experimental design.

Cite this paper

Ahn, H. (2014) Effect Modeling of Count Data Using Logistic Regression with Qualitative Predictors.*Engineering*, **6**, 758-772. doi: 10.4236/eng.2014.612074.

Ahn, H. (2014) Effect Modeling of Count Data Using Logistic Regression with Qualitative Predictors.

References

[1] Rao, M.M. (1960) Some Asymptotic Results on Transformations in the Analysis of Variance. ARL Technical Note, Aerospace Research Laboratory, Wright-Patterson Air Force Base, Dayton, 60-126.

[2] Wiener, B.J., Brown, D.R. and Michels, K.M. (1971) Statistical Principles in Experimental Design. McGraw Hill, New York.

[3] Toutenburg, H. and Shalabh (2009) Statistical Analysis of Designed Experiments. 3rd Edition, Springer Texts in Statistics.

[4] Cochran, W.G. (1940) The Analysis of Variances When Experimental Errors Follow the Poisson or Binomial Laws. The Annals of Mathematical Statistics, 11, 335-347.

http://dx.doi.org/10.1214/aoms/1177731871

[5] Ross, P.J. (1989) Taguchi Techniques for Quality Engineering. McGraw Hill, Singapore.

[6] Jaeger, T.F. (2008) Categorical Data Analysis: Away from ANOVAs (Transformation or Not) and towards Logit Mixed Models. Journal of Memory and Language, 59, 434-446.

http://dx.doi.org/10.1016/j.jml.2007.11.007

[7] Dyke, G.V. and Patterson, H.D. (1952) Analysis of Factorial Arrangements When the Data Are Proportions. Biometrics, 8, 1-12.

http://dx.doi.org/10.2307/3001521

[8] Montgomery, D.C., Peck, E.A., and Vining, G.G. (2006) Introduction to Linear Regression Analysis. 4th Edition, John Wiley & Sons, Inc., Hoboken.

[9] Kleinbaum, D.G. and Klein, M. (2010) Logistic Regression: A Self Learning Text. 3rd Edition, Springer, New York.

http://dx.doi.org/10.1007/978-1-4419-1742-3

[10] Agresti, A. (2013) Categorical Data Analysis. 3rd Edition, John Wiley & Sons Inc., Hoboken.

[11] Dobson, A.J. and Barnett, A.G. (2008) An Introduction to Generalized Linear Models. 3rd Edition, CRC Press, Chapman & Hall, Boca Raton.

[12] Sloan, D. and Morgan, S.P. (1996) An Introduction to Categorical Data Analysis. Annual Review of Sociology, 22, 351-375.

http://dx.doi.org/10.1146/annurev.soc.22.1.351

[13] Strokes, M.E., Davis, C.S. and Koch, G.G. (2000) Categorical Data Analysis Using the SAS System. 2nd Edition, SAS Institute Inc., Cary, NC.

[14] Allison, P.D. (1999) Logistic Regression Using the SAS System—Theory and App. SAS Institute Inc., Cary, NC.

[15] Minitab (2011) Minitab Manual. Minitab Inc.

http://www.minitab.com/en-us/

[16] Hsieh, F.Y., Bloch, D.L. and Larsen, M.D. (1998) A Simple Method of Sample Size Calculation for Linear and Logistic Regression. Statistics in Medicine, 17, 1623-1634.

http://dx.doi.org/10.1002/(SICI)1097-0258(19980730)17:14<1623::AID-SIM871>3.0.CO;2-S