Health  Vol.6 No.21 , December 2014
Simulation Program to Determine Sample Size and Power for a Multiple Logistic Regression Model with Unspecified Covariate Distributions
Abstract: Binary logistic regression models are commonly used to assess the association between outcomes and covariates. Many covariates are inherently continuous, and have a variety of distributions, including those that are heavily skewed to the left or right. Existing theoretical formulas, criteria, and simulation programs cannot accurately estimate the sample size and power of non-standard distributions. Therefore, we have developed a simulation program that uses Monte Carlo methods to estimate the exact power of a binary logistic regression model. This power calculation can be used for distributions of any shape and covariates of any type (continuous, ordinal, and nominal), and can account for nonlinear relationships between covariates and outcomes. For illustrative purposes, this simulation program is applied to real data obtained from a study on the influence of smoking on 90-day outcomes after acute atherothrombotic stroke. Our program is applicable to all effect sizes and makes it possible to apply various statistical methods, logistic regression and related simulations such as Bayesian inference with some modifications.
Cite this paper: Kumagai, N. , Akazawa, K. , Kataoka, H. , Hatakeyama, Y. and Okuhara, Y. (2014) Simulation Program to Determine Sample Size and Power for a Multiple Logistic Regression Model with Unspecified Covariate Distributions. Health, 6, 2973-2998. doi: 10.4236/health.2014.621336.

[1]   Ottenbacher, K.J., Ottenbacher, H.R., Tooth, L. and Ostir, G.V. (2004) A Review of Two Journals Found That Articles Using Multivariable Logistic Regression Frequently Did Not Report Commonly Recommended Assumptions. Journal of Clinical Epidemiology, 57, 1147-1152.

[2]   Brenner, H. and Blettner, M. (1997) Controlling for Continuous Confounders in Epidemiologic Research. Epidemiology, 8, 429-434.

[3]   Andrici, J., Cox, M.R. and Eslick, G.D. (2013) Cigarette Smoking and the Risk of Barrett’s Esophagus. A Systematic Review and Meta-Analysis. Journal of Gastroenterology and Hepatology, 28, 1258-1273.

[4]   Bergtold, J., Yeager, E. and Featherstone, A. (2011) Sample Size and Robustness of Inferences from Logistic Regression in the Presence of Nonlinearity and Multicollinearity. The Agricultural & Applied Economics Association’s 2011 AAEA & NAREA Joint Annual Meeting, Pittsburgh, Pennsylvania, 24-26 July 2011.

[5]   Demidenko, E. (2007) Sample Size Determination for Logistic Regression Revisited. Statistics in Medicine, 26, 3385-3397.

[6]   Whittemore, A.S. (1981) Sample Size for Logistic Regression with Small Response Probability. Journal of the American Statistical Association, 76, 27-32.

[7]   Hsieh, F.Y., Bloch, D.A. and Larsen, M.D. (1998) A Simple Method of Sample Size Calculation for Linear and Logistic Regression. Statistics in Medicine, 17, 1623-1634.<1623::AID-SIM871>3.0.CO;2-S

[8]   Peduzzi, P., Concato, J., Kemper, E., Holford, T.R. and Feinstein, A.R. (1996) A Simulation Study of the Number of Events per Variable in Logistic Regression Analysis. Journal of Clinical Epidemiology, 49, 1373-1379.

[9]   Vittinghoff, E. and McCulloch, C.E. (2007) Relaxing the Rule of Ten Events per Variable in Logistic and Cox Regression. American Journal of Epidemiology, 165, 710-718.

[10]   SAS/STAT(R) 9.2 User’s Guide, Second Edition.

[11]   Hosmer, D.W. and Lemeshow, S. (2000) Applied Logistic Regression. 2nd Edition, John Wiley & Sons, New York.

[12]   Messerli, F.H. and Panjrath, G.S. (2009) The J-Curve between Blood Pressure and Coronary Artery Disease or Essential Hypertension: Exactly How Essential? Journal of the American College of Cardiology, 54, 1827-1834.

[13]   Kumagai, N., Okuhara, Y., Iiyama, T., Fujimoto, Y., Takekawa, H., Origasa, H., Kawanishi, Y. and Yamaguchi, T. (2013) Effects of Smoking on Outcomes after Acute Atherothrombotic Stroke in Japanese Men. Journal of the Neurological Sciences, 335, 164-1168.

[14]   Hsieh, F.Y. (1989) Sample Size Tables for Logistic Regression. Statistics in Medicine, 8, 795-802.

[15]   Fan, X., Felsovalyi, A., Sivo, S.A. and Keenan, S.C. (2003) SAS® for Monte Carlo Studies: A Guide for Quantitative Researchers. SAS Institute, Cary.

[16]   Webb, M.C., Wilson, J.R. and Chong, J. (2004) An Analysis of Quasi-Complete Binary Data with Logistic Models: Applications to Alcohol Abuse Data. Journal of Data Science, 2, 273-285.

[17]   Arnold, B.F., Hogan, D.R., Colford Jr., J.M. and Hubbard, A.E. (2011) Simulation Methods to Estimate Design Power: An Overview for Applied Research. BMC Medical Research Methodology, 11, 94.

[18]   Royston, P. and Sauerbrei, W. (2005) Building Multivariable Regression Models with Continuous Covariates in Clinical Epidemiology—With an Emphasis on Fractional Polynomials. Methods of Information in Medicine, 44, 561-571.

[19]   Grund, B. and Sabin, C. (2010) Analysis of Biomarker Data: Logs, Odds Ratios, and Receiver Operating Characteristic Curves. Current Opinion in HIV & AIDS, 5, 473-479.

[20]   Li, A. (2013) Handbook of SAS® DATA Step Programming. Chapman and Hall & CRC, London.

[21]   Burlew, M.M. (2007) SAS Macro Programming Made Easy. SAS Institute, Cary.