A Chi-Square Approximation for the F Distribution

Show more

1. Introduction

F distribution is one of the most frequently used distributions in statistics. It arises in many practical situations. For example, the test statistic for testing equality of variances of two independently distributed normal distributions is distributed as an F distribution. Another example is the test statistic for testing equality of means of k independent normal distributions with homogeneous variance is also distributed as an F distribution.

Johnson and Kotz [1] give a comprehensive review on the approximations to the cumulative distribution function (cdf) of the F distribution. Li and Martin [2] propose a shrinking factor approximation method and approximate the cdf of the F distribution by the cdf of the ${\chi}^{2}$ distribution. On the other hand, considering testing equality of variances of two independent normal distributions, Wong [3] derives the modified signed log-likelihood ratio statistic. As a result, a normal approximation for the cdf of the F distribution is obtained. The approximation by Wong [3] has a theoretical order of convergence $O\left({n}^{-3/2}\right)$ .

In this paper, we consider the problem of testing equality of means of k independent normal distributions with homogeneous variance. Rather than the standard one-way ANOVA approach, we derive an adjusted log-likelihood ratio statistic, which is asymptotically distributed as ${\chi}^{2}$ distribution such that the mean of this adjusted log-likelihood ratio statistic is exactly the same as the mean of the ${\chi}^{2}$ distribution. As a result, a very accurate new ${\chi}^{2}$ approximation for the cdf of the F distribution is obtained.

2. Bartlett Corrected Log-Likelihood Ratio Statistic

Let $\left({Y}_{1}\mathrm{,}\cdots \mathrm{,}{Y}_{n}\right)$ be identical independently distributed random variables with joint log-likelihood function $\mathcal{l}\left(\theta \right)$ , where $\theta $ is a p-dimensional vector parameter. A frequently used asymptotic method for testing the hypothesis

${H}_{0}\mathrm{:}\psi \left(\theta \right)={\psi}_{0}\text{\hspace{1em}}\text{vs}\text{\hspace{1em}}{H}_{a}\mathrm{:}\psi \left(\theta \right)\ne {\psi}_{0}\mathrm{,}$ (1)

is based on the asymptotic distribution of the log-likelihood ratio statistic. In particular, the log-likelihood ratio statistic is defined as

$W=2\left\{\mathcal{l}\left(\stackrel{^}{\theta}\right)-\mathcal{l}\left(\stackrel{\u02dc}{\theta}\right)\right\}$

where $\stackrel{^}{\theta}$ is the unconstrained maximum likelihood estimator of $\theta $ , which is obtained by maximizing the log-likelihood function with respect to $\theta $ , and $\stackrel{\u02dc}{\theta}$ is the constrained maximum likelihood estimator of $\theta $ , which is obtained by maximizing the log-likelihood function with respect to $\theta $ subject to the constraint that $\psi \left(\theta \right)={\psi}_{0}$ . Generally, this constrained maximum likelihood estimator of $\theta $ can be obtained by the Lagrange multiplier method. With the regularity conditions stated in Cox and Hinkley [4] , it is well-known that W is asymptotically distributed as ${\chi}_{r}^{2}$ distribution, where r is the degrees of freedom, which is the difference in the number of unconstrained parameters being estimated and the number of constrained parameters being estimated. Hence, the observed level of significance for testing the hypothesis in (1) is $P\left({\chi}_{r}^{2}>w\right)$ , where w is the observed value of the log-likelihood ratio statistic W. Note that Cox and Hinkley [4] show that this method of obtaining the observed level of significance has order of convergence of only $O\left({n}^{-1/2}\right)$ .

There exists many different ways of improving the accuracy of the convergence of the log-likelihood ratio statistic. Barndorff-Nielsen and Cox [5] and Brazzale et al. [6] give detail review of some higher order asymptotic methods and their applications. Recently, Davison et al. [7] derive a directional test for a vector parameter of interest for the linear exponential families. The method is quite complicated, both in terms of theories and computations.

In this paper, we propose a statistic, which is very similar to the Bartlett corrected log-ikelihood ratio statistic. Bartlett [8] [9] show that the expected value of W can be expressed as

$E\left(W\right)=r\left(1+\frac{b}{n}+O\left({n}^{-2}\right)\right),$

where b is known as the Bartlett factor. Since $E\left(W\right)$ does not equal to the mean of the ${\chi}_{r}^{2}$ distribution, Bartlett [8] [9] propose to adjust the log-like- lihood ratio statistic by

${W}^{*}=\frac{W}{1+\frac{b}{n}}$

such that $E\left({W}^{*}\right)=r$ with rate of convergence $O\left({n}^{-2}\right)$ . Lawley [10] shows that in fact all cumulants of ${W}^{\mathrm{*}}$ agree with those of a ${\chi}_{r}^{2}$ distribution to the same order. Lawley’s proof is very complicated. Barndorff-Nielsen and Cox [11] discuss a much simpler derivation based on the saddlepoint approximation. However, the Bartlett factor, b, in general, is very difficult to obtain. This limited the use of the Bartlett corrected log-likelihood ratio statistic in applied statistic.

In this paper, we propose to adjust the log-likelihood ratio statistic W such that the adjusted log-likelihood ratio statistic has exactly the same mean as the ${\chi}_{r}^{2}$ distribution. In other words, let

${W}^{\u2020}=\frac{W}{E\left(W\right)/r}.$ (2)

${W}^{\u2020}$ is asymptotically distributed as ${\chi}_{r}^{2}$ distribution. Thus, the observed level of significance for testing the hypothesis in (1) is $P\left({\chi}_{r}^{2}>{w}^{\u2020}\right)$ , where ${w}^{\u2020}$ is the observed value of ${W}^{\u2020}$ . Note that his adjusted log-likelihood ratio statistic is just a modified version of the Bartlett corrected log-likelihood ratio statistic.

In the next section, the proposed adjusted log-likelihood ratio statistic for testing the equality of means of k homoscedastic normally distributed populations is derived. By comparing to the standard F-test in the one-way ANOVA approach, an approximation of the cdf of the F distribution is obtained.

3. Main Result

Let ${X}_{ij}$ be independent normally distributed random variables with mean ${\mu}_{i}$ and a common variance ${\sigma}^{2}$ , where $i=1,\cdots ,k$ and $j=1,\cdots ,{n}_{i}$ . Our aim is to test

${H}_{0}:{\mu}_{1}=\cdots ={\mu}_{k}=\mu \text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{vs}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{H}_{a}\mathrm{:}\text{the}\text{\hspace{0.17em}}\text{means}\text{\hspace{0.17em}}\text{are}\text{\hspace{0.17em}}\text{not}\text{\hspace{0.17em}}\text{all}\text{\hspace{0.17em}}\text{the}\text{\hspace{0.17em}}\text{same}\mathrm{.}$ (3)

From the one-way ANOVA approach, we have the following sum of squares:

$\begin{array}{l}SST=SSTr+SSE\\ \iff {\displaystyle \underset{i=1}{\overset{k}{\sum}}}{\displaystyle \underset{j=1}{\overset{{n}_{i}}{\sum}}}{\left({X}_{ij}-\stackrel{\xaf}{X}\right)}^{2}={\displaystyle \underset{i=1}{\overset{k}{\sum}}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{n}_{i}{\left({\stackrel{\xaf}{X}}_{i}-\stackrel{\xaf}{X}\right)}^{2}+{\displaystyle \underset{i=1}{\overset{k}{\sum}}}{\displaystyle \underset{j=1}{\overset{{n}_{i}}{\sum}}}{\left({X}_{ij}-{\stackrel{\xaf}{X}}_{i}\right)}^{2},\end{array}$

and the degrees of freedom are

$dfTr=k-1,\text{\hspace{0.17em}}\text{\hspace{0.17em}}dfE={\displaystyle \underset{i=1}{\overset{k}{\sum}}}\text{\hspace{0.05em}}{n}_{i}-k.$

For testing the hypothesis in (3), the F-test is used. Denote the test statistic as

${F}^{*}=\frac{SSTr/dfTr}{SSE/dfE}.$ (4)

It is well-known that ${F}^{\mathrm{*}}$ is distributed as the F distribution with degrees of freedom $\left(dfTr\mathrm{,}dfE\right)$ . Hence, the observed level of significance for testing the hypothesis in (3) is $P\left({F}_{dfTr,dfE}>{f}^{*}\right)$ with ${f}^{\mathrm{*}}$ being the observed value of ${F}^{\mathrm{*}}$ .

From the likelihood analysis point of view, let $\theta ={\left({\mu}_{1},\cdots ,{\mu}_{k},{\sigma}^{2}\right)}^{\prime}$ , and the log-likelihood function can be written as

$\mathcal{l}\left(\theta \right)=\mathcal{l}\left({\mu}_{1},\cdots ,{\mu}_{k},{\sigma}^{2}\right)={\displaystyle \underset{i=1}{\overset{k}{\sum}}}\left[-\frac{{n}_{i}}{2}\mathrm{log}{\sigma}^{2}-\frac{1}{2{\sigma}^{2}}{\displaystyle \underset{j=1}{\overset{{n}_{i}}{\sum}}}{\left({X}_{ij}-{\mu}_{i}\right)}^{2}\right].$

It can be shown that the unconstrained maximum likelihood estimator is $\stackrel{^}{\theta}={\left({\stackrel{^}{\mu}}_{1},\cdots ,{\stackrel{^}{\mu}}_{k},{\stackrel{^}{\sigma}}^{2}\right)}^{\prime}$ , where

${\stackrel{^}{\mu}}_{1}={\stackrel{\xaf}{X}}_{1},\cdots ,{\stackrel{^}{\mu}}_{k}={\stackrel{\xaf}{X}}_{k},{\stackrel{^}{\sigma}}^{2}=\frac{SSE}{{n}_{1}+\cdots +{n}_{k}}.$

Therefore

$\mathcal{l}\left(\stackrel{^}{\theta}\right)=-\frac{{n}_{1}+\cdots +{n}_{k}}{2}\mathrm{log}{\stackrel{^}{\sigma}}^{2}-\frac{{n}_{1}+\cdots +{n}_{k}}{2}.$

When the null hypothesis in (3) is true, the log-likelihood function can be written as

$\mathcal{l}\left(\mu ,\cdots ,\mu ,{\sigma}^{2}\right)={\displaystyle \underset{i=1}{\overset{k}{\sum}}}\left[-\frac{{n}_{i}}{2}\mathrm{log}{\sigma}^{2}-\frac{1}{2{\sigma}^{2}}{\displaystyle \underset{j=1}{\overset{{n}_{i}}{\sum}}}{\left({X}_{ij}-\mu \right)}^{2}\right],$

and the constrained maximum likelihood estimator is $\stackrel{\u02dc}{\theta}={\left(\stackrel{\u02dc}{\mu},\cdots ,\stackrel{\u02dc}{\mu},{\stackrel{\u02dc}{\sigma}}^{2}\right)}^{\prime}$ , where

$\stackrel{\u02dc}{\mu}=\stackrel{\xaf}{X},\text{\hspace{0.17em}}{\stackrel{\u02dc}{\sigma}}^{2}=\frac{SSTr}{{n}_{1}+\cdots +{n}_{k}}.$

Thus, we have

$\mathcal{l}\left(\stackrel{\u02dc}{\theta}\right)=\mathcal{l}\left(\stackrel{\u02dc}{\mu},\cdots ,\stackrel{\u02dc}{\mu},{\stackrel{\u02dc}{\sigma}}^{2}\right)=-\frac{{n}_{1}+\cdots +{n}_{k}}{2}\mathrm{log}{\stackrel{\u02dc}{\sigma}}^{2}-\frac{{n}_{1}+\cdots +{n}_{k}}{2}.$

Therefore, the log-likelihood ratio statistic is

$W=\left({n}_{1}+\cdots +{n}_{k}\right)\mathrm{log}\frac{SSTr}{SSE}=\left(dfTr+dfE+1\right)\mathrm{log}\left(1+\frac{dfTr}{dfE}{F}^{*}\right),$

and W is asymptotically distributed as ${\chi}^{2}$ distribution with $dfTr$ degrees of freedom.

Our proposed method required to obtain $E\left(W\right)$ . Since ${F}^{\mathrm{*}}$ is distributed as F distribution with $\left(dfTr\mathrm{,}dfE\right)$ degrees of freedom,

$E\left(W\right)={\displaystyle {\int}_{0}^{\infty}}\left(dfTr+dfE+1\right)\mathrm{log}\left(1+\frac{dfTr}{dfE}y\right)g\left(y;dfTr,dfE\right)\text{d}y$ (5)

where $g\left(y\mathrm{;}dfTr\mathrm{,}dfE\right)$ is the probability density function of the F distribution with degrees of freedom $\left(dfTr\mathrm{,}dfE\right)$ . Therefore, the observed level of significance for testing the hypothesis in (3) based on the proposed adjusted loglikelihood ratio statistic is

$P\left({\chi}_{dfTr}^{2}>\frac{\left(dfTr+dfE+1\right)\mathrm{log}\left(1+\frac{dfTr}{dfE}{f}^{*}\right)}{E\left(W\right)/dfTr}\right)$

where $E\left(W\right)$ is defined in (5) and ${f}^{\mathrm{*}}$ is the observed value of the test statistic given in (4).

By re-indexing the above approximation, let X be distributed as the ${F}_{u\mathrm{,}v}$ distribution, where $\left(u\mathrm{,}v\right)$ are the corresponding degrees of freedom. Then the cdf of X is $P\left({F}_{u\mathrm{,}v}\le x\right)$ for $x>0$ . Hence, the log-likelihood ratio statistic is

$W=\left(u+v+1\right)\mathrm{log}\left(1+\frac{u}{v}X\right).$

Since W is asymptotically distributed as ${\chi}_{u}^{2}$ distribution, we have

$P\left({F}_{u,v}\le x\right)\approx P\left({\chi}_{u}^{2}\le \left(u+v+1\right)\mathrm{log}\left(1+\frac{u}{v}x\right)\right).$

However, this approximation has order of convergence $O\left({n}^{-1/2}\right)$ only.

The proposed approach gives

${W}^{\u2020}=\frac{W}{E\left(W\right)/u}=\frac{W}{b\left(u,v\right)}$

where

$\begin{array}{c}b\left(u,v\right)=\frac{E\left[\left(u+v+1\right)\mathrm{log}\left(1+\frac{u}{v}X\right)\right]}{u}\\ =\frac{{\displaystyle {\int}_{0}^{\infty}}\left(u+v+1\right)\mathrm{log}\left(1+\frac{u}{v}x\right)g\left(x;u,v\right)\text{d}x}{u}.\end{array}$

As a result,

$P\left({F}_{u,v}\le x\right)\approx P\left({\chi}_{u}^{2}\le \frac{\left(u+v+1\right)\mathrm{log}\left(1+\frac{u}{v}x\right)}{b\left(u,v\right)}\right).$

Note that $b\left(u,v\right)$ does not have a closed form solution but it can be obtained numerically by software like R, Maple and Matlab. Table 1 records some values of $b\left(u,v\right)$ for $u\le v$ . Moreover,

$\underset{v\to \infty}{\mathrm{lim}}b\left(u,v\right)=1\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\underset{u\to \infty}{\mathrm{lim}}b\left(u,v\right)=\infty .$

Hence, the proposed approximation will be problematic when u is large. Never- theless, the ${F}_{u\mathrm{,}v}$ distribution has the inverse property:

$P\left({F}_{u\mathrm{,}v}\le x\right)=1-P\left({F}_{v\mathrm{,}u}\le 1/x\right)$

Table 1. b(u,v).

that can be applied to obliviate this problem. Thus, the proposed approximation is:

$P\left({F}_{u,v}\le x\right)=\{\begin{array}{l}P\left({\chi}_{u}^{2}\le \frac{\left(u+v+1\right)\mathrm{log}\left(1+\frac{u}{v}x\right)}{b\left(u,v\right)}\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}u\le v\\ 1-P\left({\chi}_{v}^{2}\le \frac{\left(v+u+1\right)\mathrm{log}\left(1+\frac{v}{u}\frac{1}{x}\right)}{b\left(u,v\right)}\right)\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{if}\text{\hspace{0.17em}}u>v\end{array}$ (6)

4. Numerical comparisons

Wong [3] gives a simple and accurate normal approximation to the cdf of the ${F}_{u\mathrm{,}v}$ distribution, which has order of convergence $O\left({n}^{-3/2}\right)$ . It takes the form

$P\left({F}_{u,v}\le x\right)=\Phi \left(r-\frac{1}{r}\mathrm{log}\frac{r}{q}\right)$

where $\Phi (\cdot )$ is the cdf of the standard normal distribution,

$r=\mathrm{sgn}\left(x-1\right){\left\{\left(u+v\right)\mathrm{log}\frac{ux+v}{u+v}-u\mathrm{log}x\right\}}^{1/2}$

$q=\frac{x-1}{ux+v}{\left\{\frac{uv\left(u+v\right)}{2}\right\}}^{1/2}$

It is of interest to compare the proposed method, to the approximation by Wong [3] .

Figures 1(a)-8(a) are the plots of the cumulative distribution functions for

the ${F}_{u\mathrm{,}v}$ distribution for various u and v obtained by the exact method, the

approximation by Wong [3] , and the proposed method. The difference between the two approximated cumulative distribution functions and the exact cumu- lative distribution function are barely noticeable. To explore the accuracy of the two approximations, we examine the relative error, which is defined as

(a) (b)

Figure 1. (a) cdf with (u,v) = (1,1); (b) Relative error.

(a) (b)

Figure 2. (a) cdf with (u,v) = (1,2); (b) Relative error.

(a) (b)

Figure 3. (a) cdf with (u,v) = (1,10); (b) Relative error.

(a) (b)

Figure 4. (a) cdf with (u,v) = (2,1); (b) Relative error.

(a) (b)

Figure 5. (a) cdf with (u,v) = (2,2); (b) Relative error.

(a) (b)

Figure 6. (a) cdf with (u,v) = (2,10); (b) Relative error.

(a) (b)

Figure 7. (a) cdf with (u,v) = (10,2); (b) Relative error.

(a) (b)

Figure 8. (a) cdf with (u,v) = (15,2); (b) Relative error.

$\text{relative}\text{\hspace{0.17em}}\text{error}=\frac{\text{approximation}-\text{exact}}{\text{exact}}.$

Figures 1(b)-8(b) are the plots of the corresponding relative errors. It is clear that the proposed method generally outperformed the approximation by Wong [3] in all cases.

5. Conclusion

In this paper, a simple chi-square approximation to the cumulative distribution function of the F-distribution is obtained via an adjusted log-likelihood ratio statistic. Simulation studies illustrated that the new approximation outperformed the higher-order asymptotic method discussed in Wong (2008), regardless of how show the degrees of freedom are.

References

[1] Johnson, N. and Kotz, S. (1994) Continuous Univariate Distributions. Volume 2, John Wiley & Sons, New York.

[2] Li, B. and Martin, E.B. (2002) An Approximation to the F Distribution Using the Chi-Square Distribution. Computational Statistics and Data Analysis, 40, 21-26.

https://doi.org/10.1016/S0167-9473(01)00097-4

[3] Wong, A. (2008) Approximating the F Distribution via a General Version of the Modified Signed Log-Likelihood Ratio Statistic. Computational Statistics and Data Analysis, 52, 3902-3912.

https://doi.org/10.1016/j.csda.2008.01.007

[4] Cox, D.R. and Hinkley, D.V. (1997) Theoretical Statistics. Cambridge University Press, Cambridge.

[5] Barndorff-Nielsen, O.E. and Cox, D.R. (1994) Inference and Asymptotics. Chapman and Hall, New York.

https://doi.org/10.1007/978-1-4899-3210-5

[6] Brazzale, A.R., Davison, A.C. and Reid, N. (2007) Applied Asymptotics. Cambridge University Press, Cambridge.

https://doi.org/10.1017/CBO9780511611131

[7] Davison, A.C., Fraser, D.A.S., Reid, N. and Sartori, N. (2014) Accurate Directional Inference for Vector Parameters in Linear Exponential Families. Journal of the American Statistical Association, 109, 302-314.

https://doi.org/10.1080/01621459.2013.839451

[8] Bartlett, M.S. (1937) Properties of Sufficiency and Statistical Test. Proceedings of the Royal Society A, 160, 268-282.

https://doi.org/10.1098/rspa.1937.0109

[9] Bartlett, M.S. (1953) Approximate Confidence Interval. Biometrika, 40, 12-19.

https://doi.org/10.1093/biomet/40.1-2.12

[10] Lawley, D.N. (1956) A General Method for Approximating to the Distribution of the Likelihood Ratio Criteria. Biometrika, 43, 295-303.

https://doi.org/10.1093/biomet/43.3-4.295

[11] Barndorff-Nielsen, O.E. and Cox, D.R. (1979) Edgeworth and Saddlepoint Approximation with Statistical Applications (with Discussion). Journal of the Royal Statistical Society, Series B, 41, 279-312.