Linear Maximum Likelihood Regression Analysis for Untransformed Log-Normally Distributed Data

ABSTRACT

Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed data estimates the relative effect, whereas it is often the absolute effect of a predictor that is of interest. We propose a maximum likelihood (ML)-based approach to estimate a linear regression model on log-normal, heteroscedastic data. The new method was evaluated with a large simulation study. Log-normal observations were generated according to the simulation models and parameters were estimated using the new ML method, ordinary least-squares regression (LS) and weighed least-squares regression (WLS). All three methods produced unbiased estimates of parameters and expected response, and ML and WLS yielded smaller standard errors than LS. The approximate normality of the Wald statistic, used for tests of the ML estimates, in most situations produced correct type I error risk. Only ML and WLS produced correct confidence intervals for the estimated expected value. ML had the highest power for tests regarding β_{1}.

Medical research data are often skewed and heteroscedastic. It has therefore become practice to log-transform data in regression analysis, in order to stabilize the variance. Regression analysis on log-transformed data estimates the relative effect, whereas it is often the absolute effect of a predictor that is of interest. We propose a maximum likelihood (ML)-based approach to estimate a linear regression model on log-normal, heteroscedastic data. The new method was evaluated with a large simulation study. Log-normal observations were generated according to the simulation models and parameters were estimated using the new ML method, ordinary least-squares regression (LS) and weighed least-squares regression (WLS). All three methods produced unbiased estimates of parameters and expected response, and ML and WLS yielded smaller standard errors than LS. The approximate normality of the Wald statistic, used for tests of the ML estimates, in most situations produced correct type I error risk. Only ML and WLS produced correct confidence intervals for the estimated expected value. ML had the highest power for tests regarding β

Cite this paper

S. Gustavsson, S. Johannesson, G. Sallsten and E. Andersson, "Linear Maximum Likelihood Regression Analysis for Untransformed Log-Normally Distributed Data,"*Open Journal of Statistics*, Vol. 2 No. 4, 2012, pp. 389-400. doi: 10.4236/ojs.2012.24047.

S. Gustavsson, S. Johannesson, G. Sallsten and E. Andersson, "Linear Maximum Likelihood Regression Analysis for Untransformed Log-Normally Distributed Data,"

References

[1] P. O. Osvoll and T. Woldb?k, “Distribution and Skewness of Occupational Exposure Sets of Measurements in the Norwegian Industry,” Annals of Occupational Hygiene, Vol. 43, No. 6, 1999, pp. 421-428.

[2] K. M. McGreevy, S. R. Lipsitz, J. A. Linder, E. Rimm and D. G. Hoel, “Using Median Regression to Obtain Adjusted Estimates of Central Tendency for Skewed Laboratory and Epidemiologic Data,” Clinical Chemistry, Vol. 55, No. 1, 2009, pp. 165-169. doi:10.1373/clinchem.2008.106260

[3] R. Branham Jr, “Alternatives to Least Squares,” The Astronomical Journal, Vol. 87, 1982, pp. 928-937. doi:10.1086/113176

[4] A. C. Olin, B. Bake and K. Toren, “Fraction of Exhaled Nitric Oxide at 50 mL/s—Reference Values for Adult Lifelong Never-Smokers,” Chest, Vol. 131, No. 6, 2007, pp. 1852-1856. doi:10.1378/chest.06-2928

[5] J. O. Ahn and J. H. Ku, “Relationship between Serum Prostate-Specific Antigen Levels and Body Mass Index in Healthy Younger Men,” Urology, Vol. 68, No. 3, 2006, pp. 570-574. doi:10.1016/j.urology.2006.03.021

[6] L. Preller, H. Kromhout, D. Heederik and M. J. Tielen, “Modeling Long-Term Average Exposure in Occupational Exposure-Response Analysis,” Scandinavian Journal of Work, Environment & Health, Vol. 21, No. 6, 1995, p. 8. doi:10.5271/sjweh.67

[7] M. Watt, D. Godden, J. Cherrie and A. Seaton, “Individual Exposure to Particulate Air Pollution and Its Relevance to Thresholds For Health Effects: A Study of Traffic Wardens,” Occupational and Environmental Medicine, Vol. 52, No. 12, 1995, pp. 790-792. doi:10.1136/oem.52.12.790

[8] B. Dickey, W. Fisher, C. Siegel, F. Altaffer and H. Azeni, “The Cost and Outcomes of Community-Based Care for the Seriously Mentally Ill,” Health Services Research, Vol. 32, No. 5, 1997, p. 599.

[9] R. Kilian, H. Matschinger, W. Loffler, C. Roick and M. C. Angermeyer, “A Comparison of Methods to Handle Skew Distributed Cost Variables in the Analysis of the Resource Consumption in Schizophrenia Treatment,” Journal of Mental Health Policy and Economics, Vol. 5, No. 1, 2002, pp. 21-32.

[10] J. P. T. Higgins, I. R. White and J. Anzures-Cabrera, “Meta-Analysis of Skewed Data: Combining Results Reported on Log-Transformed Or Raw Scales,” Statistics in Medicine, Vol. 27, No. 29, 2008, pp. 6072-6092. doi:10.1002/sim.3427

[11] X.-H. Zhou, S. Gao and S. L. Hui, “Methods for Comparing the Means of Two Independent Log-Normal Samples,” Biometrics, Vol. 53, No. 3, 1997, pp. 1129-1135. doi:10.2307/2533570

[12] T. H. Wonnacott and R. J. Wonnacott, “Regression: A Second Course in Statistics,” Wiley, New York, 1981.

[13] Y. Yurgens, “Quantifying Environmental Impact by LogNormal Regression Modelling of Accumulated Exposure,” Licentiate of Engineering, Chalmers University of technology and G?teborg University, Gothenburg, 2004.

[14] S. Johannesson, P. Gustafson, P. Molnar, L. Barregard and G. Sallsten, “Exposure to Fine Particles (PM2.5 and PM1) and Black Smoke in the General Population: Personal, Indoor, and Outdoor Levels,” Journal of Exposure Science and Environmental Epidemiology, Vol. 17, No. 7, 2007, pp. 613-624. doi:10.1038/sj.jes.7500562

[15] G. J. Dollard, C. J. Dore and M. E. Jenkin, “Ambient Concentrations of 1,3-Butadiene in the UK,” ChemicoBiological Interactions, Vol. 135-136, 2001, pp. 177-206. doi:10.1016/S0009-2797(01)00190-9

[16] Y. M. Kim, S. Harrad and R. M. Harrison, “Concentrations and Sources of VOCs in Urban Domestic and Public Microenvironments,” Environmental Science & Technology, Vol. 35, No. 6, 2001, pp. 997-1004. doi:10.1021/es000192y

[17] P. Gustafson, L. Barregard, B. Strandberg and G. Sallsten, “The Impact of Domestic Wood Burning on Personal, Indoor and Outdoor Levels of 1,3-Butadiene, Benzene, Formaldehyde and Acetaldehyde,” Journal of Environmental Monitoring, Vol. 9, No. 1, 2007, pp. 23-32. doi:10.1039/b614142k

[18] U. Bergendorf, K. Friman and H. Tinnerberg, “CancerFramkallande ?mnen i T?tortsluft-Personlig Exponering och Bakgrundsm?tningar i Malm? 2008,” Report to the Swedish Environmental Protection Agency Department of Occupational and Environmental Medicine Malm?, 2010.

[19] W. H. Greene and C. Zhang, “Econometric Analysis,” Vol. 5, Prentice Hall, Upper Saddle River, 2003.

[20] J. Hass, M. D. Weir and G. B. Thomas, “University Calculus,” Pearson Addison-Wesley, 2008.

[21] Y. Pawitan, “In All Likelihood: Statistical Modelling and Inference Using Likelihood,” Oxford University Press, Oxford, 2001.

[22] A. A. Szpiro, C. J. Paciorek and L. Sheppard, “Does More Accurate Exposure Prediction Necessarily Improve Health Effect Estimates? [Miscellaneous Article],” Epidemiology, Vol. 22, No. 5, 2011, pp. 680-685. doi:10.1097/EDE.0b013e3182254cc6

[23] Y. Guo and R. J. Little, “Regression Analysis with Covariates That Have Heteroscedastic Measurement Error,” Statistics in Medicine, Vol. 30, No. 18, 2011, pp. 2278- 2294. doi:10.1002/sim.4261

[24] H. G. M. I. Kim, D. Richardson, D. Loomis, M. Van Tongeren and I. Burstyn, “Bias in the Estimation of Exposure Effects with Individualor Group-Based Exposure Assessment,” Journal of Exposure Science and Environmental Epidemiology, Vol. 21, No. 2, 2011, pp. 212-221. doi:10.1038/jes.2009.74

[25] S. Rappaport and L. Kupper, “Quantitative Exposure Assessment,” Stephen Rappaport, 2008.

[26] E. Tielemans, L. L. Kupper, H. Kromhout, D. Heederik and R. Houba, “Individual-Based and Group-Based Occupational Exposure Assessment: Some Equations to Evaluate Different Strategies,” The Annals of Occupational Hygiene, Vol. 42, No. 2, 1998, pp. 115-119.

[27] K. Steenland, J. A. Deddens and S. Zhao, “Biases in Estimating the Effect of Cumulative Exposure in Log-Linear Models When Estimated Exposure Levels Are Assigned,” Scandinavian Journal of Work Environment & Health, Vol. 26, No. 1, 2000, pp. 37-43. doi:10.5271/sjweh.508

[28] J. Wu, A. C. M. Wong and G. Jiang, “Likelihood-Based Confidence Intervals for a Log-Normal Mean,” Statistics in Medicine, Vol. 22, No. 11, 2003, pp. 1849-1860. doi:10.1002/sim.1381

[29] J. Wu, A. C. M. Wong and W. Wei, “Interval Estimation of the Mean Response in a Log-Regression Model,” Statistics in Medicine, Vol. 25, No. 12, 2006, pp. 2125-2135. doi:10.1002/sim.2329

[1] P. O. Osvoll and T. Woldb?k, “Distribution and Skewness of Occupational Exposure Sets of Measurements in the Norwegian Industry,” Annals of Occupational Hygiene, Vol. 43, No. 6, 1999, pp. 421-428.

[2] K. M. McGreevy, S. R. Lipsitz, J. A. Linder, E. Rimm and D. G. Hoel, “Using Median Regression to Obtain Adjusted Estimates of Central Tendency for Skewed Laboratory and Epidemiologic Data,” Clinical Chemistry, Vol. 55, No. 1, 2009, pp. 165-169. doi:10.1373/clinchem.2008.106260

[3] R. Branham Jr, “Alternatives to Least Squares,” The Astronomical Journal, Vol. 87, 1982, pp. 928-937. doi:10.1086/113176

[4] A. C. Olin, B. Bake and K. Toren, “Fraction of Exhaled Nitric Oxide at 50 mL/s—Reference Values for Adult Lifelong Never-Smokers,” Chest, Vol. 131, No. 6, 2007, pp. 1852-1856. doi:10.1378/chest.06-2928

[5] J. O. Ahn and J. H. Ku, “Relationship between Serum Prostate-Specific Antigen Levels and Body Mass Index in Healthy Younger Men,” Urology, Vol. 68, No. 3, 2006, pp. 570-574. doi:10.1016/j.urology.2006.03.021

[6] L. Preller, H. Kromhout, D. Heederik and M. J. Tielen, “Modeling Long-Term Average Exposure in Occupational Exposure-Response Analysis,” Scandinavian Journal of Work, Environment & Health, Vol. 21, No. 6, 1995, p. 8. doi:10.5271/sjweh.67

[7] M. Watt, D. Godden, J. Cherrie and A. Seaton, “Individual Exposure to Particulate Air Pollution and Its Relevance to Thresholds For Health Effects: A Study of Traffic Wardens,” Occupational and Environmental Medicine, Vol. 52, No. 12, 1995, pp. 790-792. doi:10.1136/oem.52.12.790

[8] B. Dickey, W. Fisher, C. Siegel, F. Altaffer and H. Azeni, “The Cost and Outcomes of Community-Based Care for the Seriously Mentally Ill,” Health Services Research, Vol. 32, No. 5, 1997, p. 599.

[9] R. Kilian, H. Matschinger, W. Loffler, C. Roick and M. C. Angermeyer, “A Comparison of Methods to Handle Skew Distributed Cost Variables in the Analysis of the Resource Consumption in Schizophrenia Treatment,” Journal of Mental Health Policy and Economics, Vol. 5, No. 1, 2002, pp. 21-32.

[10] J. P. T. Higgins, I. R. White and J. Anzures-Cabrera, “Meta-Analysis of Skewed Data: Combining Results Reported on Log-Transformed Or Raw Scales,” Statistics in Medicine, Vol. 27, No. 29, 2008, pp. 6072-6092. doi:10.1002/sim.3427

[11] X.-H. Zhou, S. Gao and S. L. Hui, “Methods for Comparing the Means of Two Independent Log-Normal Samples,” Biometrics, Vol. 53, No. 3, 1997, pp. 1129-1135. doi:10.2307/2533570

[12] T. H. Wonnacott and R. J. Wonnacott, “Regression: A Second Course in Statistics,” Wiley, New York, 1981.

[13] Y. Yurgens, “Quantifying Environmental Impact by LogNormal Regression Modelling of Accumulated Exposure,” Licentiate of Engineering, Chalmers University of technology and G?teborg University, Gothenburg, 2004.

[14] S. Johannesson, P. Gustafson, P. Molnar, L. Barregard and G. Sallsten, “Exposure to Fine Particles (PM2.5 and PM1) and Black Smoke in the General Population: Personal, Indoor, and Outdoor Levels,” Journal of Exposure Science and Environmental Epidemiology, Vol. 17, No. 7, 2007, pp. 613-624. doi:10.1038/sj.jes.7500562

[15] G. J. Dollard, C. J. Dore and M. E. Jenkin, “Ambient Concentrations of 1,3-Butadiene in the UK,” ChemicoBiological Interactions, Vol. 135-136, 2001, pp. 177-206. doi:10.1016/S0009-2797(01)00190-9

[16] Y. M. Kim, S. Harrad and R. M. Harrison, “Concentrations and Sources of VOCs in Urban Domestic and Public Microenvironments,” Environmental Science & Technology, Vol. 35, No. 6, 2001, pp. 997-1004. doi:10.1021/es000192y

[17] P. Gustafson, L. Barregard, B. Strandberg and G. Sallsten, “The Impact of Domestic Wood Burning on Personal, Indoor and Outdoor Levels of 1,3-Butadiene, Benzene, Formaldehyde and Acetaldehyde,” Journal of Environmental Monitoring, Vol. 9, No. 1, 2007, pp. 23-32. doi:10.1039/b614142k

[18] U. Bergendorf, K. Friman and H. Tinnerberg, “CancerFramkallande ?mnen i T?tortsluft-Personlig Exponering och Bakgrundsm?tningar i Malm? 2008,” Report to the Swedish Environmental Protection Agency Department of Occupational and Environmental Medicine Malm?, 2010.

[19] W. H. Greene and C. Zhang, “Econometric Analysis,” Vol. 5, Prentice Hall, Upper Saddle River, 2003.

[20] J. Hass, M. D. Weir and G. B. Thomas, “University Calculus,” Pearson Addison-Wesley, 2008.

[21] Y. Pawitan, “In All Likelihood: Statistical Modelling and Inference Using Likelihood,” Oxford University Press, Oxford, 2001.

[22] A. A. Szpiro, C. J. Paciorek and L. Sheppard, “Does More Accurate Exposure Prediction Necessarily Improve Health Effect Estimates? [Miscellaneous Article],” Epidemiology, Vol. 22, No. 5, 2011, pp. 680-685. doi:10.1097/EDE.0b013e3182254cc6

[23] Y. Guo and R. J. Little, “Regression Analysis with Covariates That Have Heteroscedastic Measurement Error,” Statistics in Medicine, Vol. 30, No. 18, 2011, pp. 2278- 2294. doi:10.1002/sim.4261

[24] H. G. M. I. Kim, D. Richardson, D. Loomis, M. Van Tongeren and I. Burstyn, “Bias in the Estimation of Exposure Effects with Individualor Group-Based Exposure Assessment,” Journal of Exposure Science and Environmental Epidemiology, Vol. 21, No. 2, 2011, pp. 212-221. doi:10.1038/jes.2009.74

[25] S. Rappaport and L. Kupper, “Quantitative Exposure Assessment,” Stephen Rappaport, 2008.

[26] E. Tielemans, L. L. Kupper, H. Kromhout, D. Heederik and R. Houba, “Individual-Based and Group-Based Occupational Exposure Assessment: Some Equations to Evaluate Different Strategies,” The Annals of Occupational Hygiene, Vol. 42, No. 2, 1998, pp. 115-119.

[27] K. Steenland, J. A. Deddens and S. Zhao, “Biases in Estimating the Effect of Cumulative Exposure in Log-Linear Models When Estimated Exposure Levels Are Assigned,” Scandinavian Journal of Work Environment & Health, Vol. 26, No. 1, 2000, pp. 37-43. doi:10.5271/sjweh.508

[28] J. Wu, A. C. M. Wong and G. Jiang, “Likelihood-Based Confidence Intervals for a Log-Normal Mean,” Statistics in Medicine, Vol. 22, No. 11, 2003, pp. 1849-1860. doi:10.1002/sim.1381

[29] J. Wu, A. C. M. Wong and W. Wei, “Interval Estimation of the Mean Response in a Log-Regression Model,” Statistics in Medicine, Vol. 25, No. 12, 2006, pp. 2125-2135. doi:10.1002/sim.2329