The Box-Cox transformation model (BC model)  is widely used in empirical studies. For details on this model, see Hossain  and Sakia  . The maximum likelihood estimator (MLE), which maximizes the likelihood function under the normality assumption (BC MLE), can be asymptotically efficient if the “small” condition described by Bickel and Doksum  is satisfied. On the other hand, the model with heteroscedastic disturbances, in which variances are different among groups, is also widely used in the analysis of various datasets such as panel data  . It is sometimes necessary to consider a model combining these two models. Nawata and Kawabuchi  -  analyzed length of stay (LOS) in Japanese hospitals using the BC model. They found that the variances among hospitals were often very different among hospitals even after the transformation. Their studies are these cases.
It is well known that the MLE is usually an asymptotically efficient estimator when the number of parameters is finite. However, this may not be true when the number of parameters goes to infinity. It is often necessary for us to consider cases in which numbers of groups go to infinity. For example, the new medical payment system known as the Diagnostic Procedure Combination/Per Diem Payment System (DPC/PDPS) was introduced in 2003 in Japan, and as of April 2014, 1863 hospitals had either already joined or were preparing to join; this number has been increasing  . The hospitals joining the DPC/PDPS are required to computerize their medical information. This means that it has become possible to analyze a large scale dataset that contains information from many hospitals. In other words, it is necessary for us to consider the asymptotic properties of estimators when the number of groups (hospitals) goes to infinity.
This paper considers the estimation of the Box-Cox transformation model with heteroscedastic disturbances when the number of groups that increases to infinity. In such cases, the conventional maximum likelihood method yields only an estimator whose rate of convergence is slower than ordinal order of even if the “small” condition is satisfied in all groups. Then a new estimation method that can handle these problems is proposed.
2. BC Model with Heteroscedastic Disturbances
Suppose that is the explanatory variable of observation j in group i (for example, LOS of patient j in hospital i in Nawata and Kawabuchi  -  ). I consider the BC model:
with heteroscedastic disturbances and variances given by
1If the “small” condition is not satisfied, we can use the estimator proposed by Nawata  instead of the BC MLE. Even in this case, however, we reach the same conclusion as that presented here; that is, a model that considers heteroscedasticity and a number of parameters that goes to infinity is simply a consistent estimator of order and there exists a consistent estimator of order by a modification of the homoscedastic case.
where is the transformation parameter, and are the vectors of the explanatory variables and coefficients, k is the number of groups, and is the number of people in group i. We assume that the “small” condition described by Bickel and Doksum  (, in practice under the normality assumption, is small enough, for all i and j) is satisfied.1 Under this condition, we can assume that follows the normal distribution with mean 0 and variance. The likelihood function is given by 
where is the density function of the standard normal distribution. Although the likelihood function is a function of we simply write it as (3). Note that Showalter  reports a large bias of the BC MLE when heteroscedasticity is ignored.
3. Estimation of the Model When the Number of Groups Goes to Infinity
Let the numbers of observations be. We assume that converges to a nonsingular matrix in probability, and. Note that is order of m and satisfies and for all i. Since, we get. k is assumed to increase at a slower rate than; that means that. When k is finite, we can estimate by maximizing (3). Let be the true parameter values of and let be the MLE of. We do not assume any specific forms of the variances, and simply assume for all i. However, since is, can only be a consistent estimator of order by any estimation method (Baltagi and Griffin  considered different variance estimators). When we substitute, the conditions that the estimators obtained by maximizing (3) become order; i.e. and, are given by
As before, although the values of derivatives are at, we write them simply as (4). Then (5) becomes
Therefore, if, (5) is satisfied. This means that we can use the standard method of dealing with heteroscedasticity if and only if in the standard model.
However, for the transformation parameter, since
we get  
under the “small” condition, where is the value of when.
and (4) is not satisfied. This means that the MLE becomes a consistent estimator only of order; that is, the rate of convergence is slower than ordinal order when. This means that the estimator of cannot be a consistent estimator of order. Here,
Therefore, cannot be an estimator of order either.
4. A Consistent Estimator of Order
Here, an alternative estimator is proposed by an essential modification of the likelihood function. Suppose that disturbances are homoscedastic and that for all i. Then the likelihood becomes
Instead of maximizing (15), we considered the roots of the equations,
For the standard maximum likelihood method, the variance is estimated by the simple average. However, in this case, the variance is estimated by the weighted average of least squares residuals.
From (21), we get
where. Based on the same argument presented by Nawata   , there exist consistent estimators of and among the roots of (16)-(18). Let be the consistent root and let. The asymptotic distribution of this estimator is given by
where, , and .
is estimated by where. This means that and are consistent estimators of order and are asymptotically more efficient than the BC MLE.
This paper considers the estimation of the BC model with heteroscedastic disturbances; that is, variances are different by groups. The BC MLE is a consistent and asymptotically efficient estimator if the “small” condition described by Bickel and Doksum  is satisfied and the number of parameters is finite. However, its rate of convergence is slower than ordinal order of and the BC MLE cannot be efficient when the heteroscedasticity of disturbances is considered and the number of groups goes to infinity. An alternative consistent estimator based on a modification of the likelihood function is considered. It is a consistent estimator of order. One important result of this study is that the MLE might not be a good estimator and estimation methods should be carefully chosen when the model contains many parameters in the actual empirical studies.
The author would like to thank two anonymous referees for their helpful comments and suggestions.