OJS  Vol.4 No.9 , October 2014
Confirmatory Methods, or Huge Samples, Are Required to Obtain Power for the Evaluation of Theories
Abstract: Experimental studies are usually designed with specific expectations about the results in mind. However, most researchers apply some form of omnibus test to test for any differences, with follow up tests like pairwise comparisons or simple effects analyses for further investigation of the effects. The power to find full support for the theory with such an exploratory approach which is usually based on multiple testing is, however, rather disappointing. With the simulations in this paper we showed that many of the common choices in hypothesis testing led to a severely underpowered form of theory evaluation. Furthermore, some less commonly used approaches were presented and a comparison of results in terms of power to find support for the theory was made. We concluded that confirmatory methods are required in the context of theory evaluation and that the scientific literature would benefit from a clearer distinction between confirmatory and exploratory findings. Also, we emphasis the importance of reporting all tests, significant or not, including the appropriate sample statistics like means and standard deviations. Another recommendation is related to the fact that researchers, when they discuss the conclusions of their own study, seem to underestimate the role of sampling variability. The execution of more replication studies in combination with proper reporting of all results provides insight in between study variability and the amount of chance findings.
Cite this paper: Klugkist, I. , Post, L. , Haarhuis, F. and Wesel, F. (2014) Confirmatory Methods, or Huge Samples, Are Required to Obtain Power for the Evaluation of Theories. Open Journal of Statistics, 4, 710-725. doi: 10.4236/ojs.2014.49066.

[1]   Van den Hout, M.A., Rijkeboer, M.M., Engelhard, I.M., Klugkist, I., Hornsveld, H., Toffolo, M. and Cath, D.C. (2012) Tones Inferior to Eye Movements in the EMDR Treatment of PTSD. Behaviour Research and Therapy, 50, 275-279.

[2]   Spencer, S.J., Steele, C.M. and Quinn, D.M. (1999) Stereotype Threat and Women’s Math Performance. Journal of Experimental Social Psychology, 35, 4-28.

[3]   Hoijtink, H. (2012) Informative Hypotheses: Theory and Practice for Behavioral and Social Scientists. Chapman and Hall/CRC, London.

[4]   Maxwell, S.E. (2004) The Persistence of Underpowered Studies in Psychological Research: Causes, Consequences, and Remedies. Psychological Methods, 9, 147-163.

[5]   Williams, L.J. and Abdi, H. (2010) Fisher’s Least Significant Difference (LSD) Test. In: Salkind, N., Ed., Encyclopedia of Research Design, Sage, Thousand Oaks.

[6]   Rosenthal, R., Rosnow, R.L. and Rubin, D.B. (2000) Contrasts and Effect Sizes in Behavorial Research. A Correlation Approach. Cambridge University Press, Cambridge.

[7]   Klugkist, I., Laudy, O. and Hoijtink, H. (2005) Inequality Constrained Analysis of Variance: A Bayesian Approach. Psychological Methods, 10, 477-493.

[8]   Béland, S., Klugkist, I., RaÎche, G. and Magis, D. (2012) A Short Introduction into Bayesian Evaluation of Informative Hypotheses as an Alternative to Exploratory Comparisons of Multiple Group Means. Tutorials in Quantitative Methods for Psychology, 8, 122-126.

[9]   Klugkist, I., Van Wesel, F. and Bullens, J. (2011) Do We Know What We Test and Do We Test What We Want to Know? International Journal of Behavioral Development, 35, 550-560.

[10]   Van de Schoot, R., Mulder, J., Hoijtink, H., van Aken, M.A.G., Dubas, J.S., de Castro, B.O., Meeus, W. and Romeijn, J.-W. (2011) An Introduction to Bayesian Model Selection for Evaluating Informative Hypotheses. European Journal of Developmental Psychology, 8, 713-729.

[11]   Faul, F., Erdfelder, E., Lang, A.G. and Buchner, A. (2007) G*Power 3: A Flexible Statistical Power Analysis Program for the Social, Behavioral, and Biomedical Sciences. Behavior Research Methods, 39, 175-191.

[12]   Kass, R.E. and Raftery, A. (1995) Bayes Factors. Journal of the American Statistical Association, 90, 773-795.

[13]   Nickerson, R.S. (2000) Null Hypothesis Significance Testing: A Review of an Old and Continuing Controversy. Psychological Methods, 5, 241-301.

[14]   Cohen, J. (1994) The Earth Is Round (p < .05). American Psychologist, 49, 997-1003.

[15]   Royall, R.M. (1997) Statistical Evidence. A Likelihood Paradigm. Chapman & Hall, New York.

[16]   Cumming, G. (2008) Replication and p Intervals: p Values Predict the Future Only Vaguely, but Confidence Intervals Do Much Better. Perspectives on Psychological Science, 3, 286-300.

[17]   Mulder, J., Hoijtink, H. and Klugkist, I. (2010) Equality and Inequality Constrained Multivariate Linear Models: Objective Model Selection Using Constrained Posterior Priors. Journal of Statistical Planning and Inference, 140, 887-906.

[18]   Van Wesel, F., Hoijtink, H. and Klugkist, I. (2010) Choosing Priors for Inequality Constrained Normal Linear Models: Methods Based on Training Samples. Scandinavian Journal of Statistics, 38, 666-690.

[19]   Kuiper, R.M. and Hoijtink, H. (2010) Comparisons of Means Using Exploratory and Confirmatory Approaches. Psychological Methods, 15, 69-86.

[20]   Van de Schoot, R., Hoijtink, H., Mulder, J., Van Aken, M.A.G., Orobio de Castro, B., Meeus, W. and Romeijn, J.W. (2011) Evaluating Expectations about Negative Emotional States of Aggressive Boys Using Bayesian Model Selection. Developmental Psychology, 47, 203-212.