Back
 JAMP  Vol.7 No.7 , July 2019
Why Quantitative Variables Should Not Be Recoded as Categorical
Abstract: The transformation of quantitative variables into categories is a common practice in both experimental and observational studies. The typical procedure is to create groups by splitting the original variable distribution at some cut point on the scale of measurement (e.g. mean, median, mode). Allegedly, dichotomization improves causal inference by simplifying statistical analyses. In this article, we address some of the adverse consequences of recoding quantitative variables into categories. In particular, we provide evidence that categorization usually leads to inefficient and biased estimates. We believe that considerable progress in our understanding of data analysis can occur if scholars follow the recommendations presented in this article. The recodification of quantitative variables as categorical is a poor methodological strategy, and scientists must stay away from it.
Cite this paper: Fernandes, A. , Malaquias, C. , Figueiredo, D. , da Rocha, E. and Lins, R. (2019) Why Quantitative Variables Should Not Be Recoded as Categorical. Journal of Applied Mathematics and Physics, 7, 1519-1530. doi: 10.4236/jamp.2019.77103.
References

[1]   Cohen, J. (1983) The Cost of Dichotomization. Applied Psychological Measurement, 7, 249-253.
https://doi.org/10.1177/014662168300700301

[2]   Altman, D. (1991) Categorising Continuous Variables. British Journal of Cancer, 64, 975.
https://doi.org/10.1136/bmj.332.7549.1080

[3]   Walraven, C. and Van and Hart, G. 2008) Leave Me Alone—Why Continuous Variables Should Be Analyzed as Such. Neuroepidemiology, 30, 138-139.
https://doi.org/10.1159/000126908

[4]   Dawson, N.V. and Weiss, R. (2012) Dichotomizing Continuous Variables in Statistical Analysis. Medical Decision Making, 32, 225-226.
https://doi.org/10.1177/0272989X12437605

[5]   Maxwell, S.E. and Delaney, H.D. (1993) Bivariate Median Splits and Spurius Statistical Significance. Psychological Bulletin, 113, 181-190.
https://doi.org/10.1037//0033-2909.113.1.181

[6]   Kuss, O. (2013) The Danger of Dichotomizing Continuous Variables: A Visualization. Teaching Statistics, 35, 78-79.
https://doi.org/10.1111/test.12006

[7]   Paranhos, R., Figueiredo Filho, D.B., da Rocha, E.C. and do Carmo, E.F. (2013) A importancia da replicabilidade na ciência política: O caso do SIGOBR. Revista Política Hoje, 22, 213-229.

[8]   Janz, N. (2016) Bringing the Gold Standard into the Classroom: Replication in University Teaching. International Studies Perspectives, 17, 392-407.

[9]   Figueiredo, D., Lins, R., Domingos, A., Janz, N. and Silva, L. (2019) Seven Reasons Why: A User’s Guide to Reproducibility and Transparency. Brazilian Political Science Review, 13.

[10]   Lewis, J.A. (2004) In Defence of the Dichotomy. Pharmaceutical Statistics, 3, 77-79.
https://doi.org/10.1002/pst.107

[11]   Fedorov, V., Mannino, F. and Zhang, R. (2009) Consequences of Dichotomization. Pharmaceutical Statistics, 8, 50-61.
https://doi.org/10.1002/pst.331

[12]   Royston, P., Altman, D.G. and Sauerbrei, W. (2006) Dichotomizing Continuous Predictors in Multiple Regression: A Bad Idea. Statistics in Medicine, 25, 127-141.
https://doi.org/10.1002/sim.2331

[13]   Delaney, H., Maxwell, S.E. and Delaney, H.D. (1993) Bivariate Median Splits and Spurious Statistical Significance. Psychological Bulletin, 113, 181-190.
https://doi.org/10.1037//0033-2909.113.1.181

[14]   Taylor, J.M.G. and Yu, M. (2002) Bias and Efficiency Loss Due to Categorizing an Explanatory Variable. Journal of Multivariate Analysis, 83, 248-263.
https://doi.org/10.1006/jmva.2001.2045

[15]   Peters, C.C. and Van Voorthis, W.R. (1940) Statistical Procedures and Their Mathematical Bases. McGraw-Hill, New York.

[16]   Humphreys, L.G. and Fleishman, A. (1974) Pseudo-Orthogonal and Other Analysis of Variance Designs Involving Individual-Differences Variables. Journal of Educational Psychology, 66, 464-472.
https://doi.org/10.1037/h0036539

[17]   Cohen, J. and Cohen, P. (1983) Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Earlbaum, Hillsdale.

[18]   Zhao, L.P. and Kolonel, L.N. (1992) Efficiency Loss from Categorizing Quantitative Exposures into Qualitative Exposures in Case-Control Studies. American Journal of Epidemiology, 136, 464-474.
https://doi.org/10.1093/oxfordjournals.aje.a116520

[19]   Chen, H., Cohen, P. and Chen, S. (2007) Biased Odds Ratios from Dichotomization of Age. Statistics in Medicine, 26, 3487-3497.
https://doi.org/10.1002/sim.2737

[20]   Naggara, O., et al. (2011) Analysis by Categorizing or Dichotomizing Continuous Variables Is Inadvisable: An Example from the Natural History of Unruptured Aneurysms. American Journal of Neuroradiology, 32, 437-440.
https://doi.org/10.3174/ajnr.A2425

[21]   Maccallum, R.C., et al. (2002) On the Practice of Dichotomization of Quantitative Variables. Psychological Methods, 7, 19-40.
https://doi.org/10.1037//1082-989X.7.1.19

[22]   Nunnally, J.C., Bernstein, I.H. and Berge, J.M.T. (1994) Psychometric Theory. Vol. 226, McGraw-Hill, New York.

[23]   Krueger, J. and Lewis-Beck, M. (2008) Is OLS Dead? The Political Methodologist, 15, 2-4.

[24]   Friedrich, R.J. (1982) In Defense of Multiplicative Terms in Multiple Regression Equations. American Journal of Political Science, 26, 797-833.
https://doi.org/10.2307/2110973

[25]   Farrington, D.P. and Loeber, R. (2000) Some Benefits of Dichotomization in Psychiatric and Criminological Research. Criminal Behaviour and Mental Health, 10, 100-122.
https://doi.org/10.1002/cbm.349

[26]   Schmidt, F. (2010) Detecting and Correcting the Lies That Data Tell. Perspectives on Psychological Science, 5, 233-242.
https://doi.org/10.1177/1745691610369339

[27]   Ragland, D.R. (1992)) Dichotomizing Continuous Outcome Variables: Dependence of the Magnitude of Association and Statistical Power on the Cutpoint. Epidemiology, 3, 434-440.
https://doi.org/10.1097/00001648-199209000-00009

[28]   Vargha, A., Rudas, T., Delaney, H.D. and Maxwell, S.E. (1996)) Dichotomization, Partial Correlation, and Conditional Independence. Journal of Educational and Behavioral Statistics, 21, 264-282.
https://doi.org/10.3102/10769986021003264

[29]   Rousson, V. (2014) Measuring an Effect Size from Dichotomized Data: Contrasted Results Whether Using a Correlation or an Odds Ratio. Journal of Educational and Behavioral Statistics, 39, 144-163.
https://doi.org/10.3102/1076998614524597

[30]   Irwin, J.R. and McClelland, G.H. (2003) Negative Consequences of Dichotomizing Continuous Predictor Variables. Journal of Marketing Research, 40, 366-371.
https://doi.org/10.1509/jmkr.40.3.366.19237

[31]   Farewell, V.T., Tom, B.D.M. and Royston, P. (2004) The Impact of Dichotomization on the Efficiency of Testing for an Interaction Effect in Exponential Family Models. Journal of the American Statistical Association, 99, 822-831.
https://doi.org/10.1198/016214504000001169

[32]   Altman, D.G. and Royston, P. (2006) The Cost of Dichotomising Continuous Variables. BMJ, 332, 1080.
https://doi.org/10.1136/bmj.332.7549.1080

[33]   Seaman, J.E. and Allen, I.E. (2014) Don’t Be Discrete. Quality Progress, 47, 41.

[34]   Nelson, S.P., Ramakrishnan, V., Nietert, P.J., Kamen, D.L., Ramos, P.S. and Wolf, B.J. (2017) An Evaluation of Common Methods for Dichotomization of Continuous Variables to Discriminate Disease Status. Communications in Statistics—Theory and Methods, 46, 10823-10834.
https://doi.org/10.1080/03610926.2016.1248783

 
 
Top