Pseudodistance Methods Using Simultaneously Sample Observations and Nearest Neighbour Distance Observations for Continuous Multivariate Models

Show more

1. Introduction

For statistical inferences methods for continuous multivariate models, we often assume to have a random sample of size n of multivariate observations ${x}_{1},\cdots ,{x}_{n}$ which are independent and identically distributed as the d-dimensional vector of random variable $x$ with a d-dimensional density function $g\left(x\right)$.

For the parametric set-up $g\left(x\right)\in \left\{{f}_{\theta}\right\},\theta ={\left({\theta}_{1},\cdots ,{\theta}_{m}\right)}^{\prime}$ and let the vector ${\theta}_{0}$ denote the true vector of parameters, we would like to have statistical methods for estimating the vector ${\theta}_{0}$ if the parametric model $\left\{{f}_{\theta}\right\}$ can be assumed and inference methods to validate the assumption of the model $\left\{{f}_{\theta}\right\}$ by means of various goodness-of-fit statistics. This leads to a composite null hypothesis and ideally we would like to use goodness-of-fit test statistics which follow a unique asymptotic distribution $\theta \in \Omega $, $\Omega $ assumed to be compact.

The multidimensional model testing often poses difficulties as often goodness-of-fit test statistics used either have very complicated distributions such as the case of statistics which make use of multivariate empirical characteristic functions, see Csörgö [1] or for the classical chi-square tests where the asymptotic distributions for simple and composite hypotheses are simple but observations must be grouped into cells and there is some arbitrariness on choosing cells, see Moore [2] , Klugman et al. [3] (pages 208-209) on extending chi-square tests for continuous multidimensional models. Goodness of fit test statistics using multivariate sample distribution function often has a very complicated null distribution, see Babu and Rao [4] and extensive simulations are needed to obtain the p values of the tests. For applications in various fields, it appears that there is a need for developing goodness of fit tests statistics which are relatively simple to implement with the property of the tests based on such statistics which are consistent.

Multivariate modelling is used in many fields which include actuarial sciences and finance. For financial applications, Moore [2] used the chi-square tests for testing whether the joint weekly returns of two assets follow a bivariate normal distribution but as mentioned earlier, for chi-square tests we need to partition the sample space into cells and the tests are not consistent despite the asymptotic null distributions of the statistics are simple.

In this paper, we shall introduce a class of pseudodistance ${D}_{h}\left(g,f\right)$ constructed based on a class of convex functions $h\left(x\right)$ which measures the discrepancy between the two density functions g and f, see details in Section 2.3. Goodness-of-fit test statistics for model testing based on ${D}_{h}$ will preserve the property of having a simple asymptotic null distribution comparable to chi-square tests but unlike chi-square tests, the tests based on ${D}_{h}$ are consistent for model testing.

It is also interesting to note that within this class ${D}_{h}$, the statistic based on ${D}_{h}$ with $h\left(x\right)=-\text{log}\left(x\right)$ can also accommodate parameters being estimated using maximum likelihood (ML) method for composite hypothesis. On the estimation side, estimators based on ${D}_{h}$ will have the potential of having good efficiencies and robustness properties. Furthermore, estimation and model testing can be handled in a unified way.

The inference methods proposed extends previous methods for the univariate models to multivariate continuous models. This paper can be considered as a follow up of previous papers by Luong [5] , Luong [6] . The neighbour distance (NND) notion is used in this paper to replace a similar notion of distance which f was used when considering spacings and order statistics, see Ranneby et al. [7] . Order statistics are used for defining spacings for univariate models.

The class of pseudo distances ${D}_{h}\left(g,f\right)$ is constructed using the following class of strictly convex functions $h\left(x\right)$ with its second derivative ${h}^{\u2033}\left(x\right)>0$. For each chosen $h\left(x\right)$ we then have a corresponding pseudodistance ${D}_{h}\left(g,f\right)$ and ${D}_{h}\left(g,f\right)$ is a discrepancy measure between density g and density f.

Explicitly, the function $h\left(x\right)$ takes the form

$h\left(x\right)=-{x}^{\alpha}$, (1)

$\alpha $ is a known constant with $0<\alpha <1$ and in practice we choose $\alpha $ near 0, and $h\left(x\right)$ can also have the form

$h\left(x\right)=-\text{log}\left(x\right)$. (2)

Note that $-\frac{{x}^{\alpha}-1}{\alpha}\to -\text{log}\left(x\right)$ as $\alpha $ decreases to 0 and $h\left(x\right)$ only needs

to be defined up to an additive and a positive multiplicative constant and provided that these constants are known inference procedures based on
${D}_{h}$ with

Furthermore, if $h\left(x\right)=-\text{log}\left(x\right)$ is used to construct the pseudo distance ${D}_{h}$, estimation using this pseudo distance will give the maximum likelihood estimators. This ${D}_{h}$ as a pseudodistance is up to a few terms which does not depend on $\theta $ the Kullback-Leibler (KL) distance used to generate ML estimators. These few terms without involving $\theta $ do not affect estimation but they are very significant for construction of goodness-of-fit test statistics as goodness of fit test statistics constructed using ${D}_{h}$ will have an asymptotic normal distribution for model testing meanwhile goodness-of-fit test statistics using the KL pseudodistance do not have a simple asymptotic distribution especially for the composite null hypothesis case where parameters must be estimated using the ML estimators. We shall give more discussions in Section 2.2.

The paper is organized as follows. In Section 2, we introduce the auxiliary observations obtained from the NND observations. The class of pseudodistances ${D}_{h}$ is also introduced in this section. Asymptotic properties of estimators based on ${D}_{h}$ are considered in Section 3. Estimators obtained using ${D}_{h}$ with $h\left(x\right)=-\text{log}\left(x\right)$ are identical to ML estimators which are fully efficient. If other $h\left(x\right)$ is used for ${D}_{h}$, the corresponding estimators have the potential of good efficiencies and some robustness properties. These properties allow flexibility for balancing efficiency and robustness. In Section 4, goodness-of-fit statistics based on the class ${D}_{h}$ are shown to have an asymptotic normal distribution and in Section 5 an example is provided for illustration of the proposed techniques.

2. Pseudo Distances

2.1. Nearest Neighbour Distance (NND) Observations

For each vector of observation ${x}_{i},i=1,\cdots ,n$ in the random sample, we define ${r}_{i}$ the nearest neighbour distance (NND) to ${x}_{i}$ with

${r}_{i}={\mathrm{min}}_{{x}_{j}\ne {x}_{i}}\Vert {x}_{j}-{x}_{i}\Vert $,

$\Vert \text{\hspace{0.05em}}\text{\hspace{0.05em}}\Vert $ is the commonly used Euclidean distance

and ${r}_{i}$ clearly can be obtained using the sample of d-variate observations.

In the literature, these ${r}_{i}$ ’s have been used to construct goodness of fit statistics, see Bickel and Breiman [8] , Zhou and Jammalamadaka [9] but often these statistics for multivariate models do not have a simple asymptotic distribution which might create difficulties for applications. Now, we can define ${y}_{i}$ as given

by proposition 2 by Ranneby et al. [7] (page 433) with ${y}_{i}=n{c}_{d}{r}_{i}$ or equivalently $\frac{{y}_{i}}{n}={c}_{d}{r}_{i},d\ge 2$, ${c}_{d}=\frac{{\pi}^{\frac{d}{2}}}{\Gamma \left(\frac{d}{2}+1\right)}$, $\pi $ is the usual constant pi used in

formulas to find volume or area and $\Gamma (.)$ is the commonly used gamma function.

Note that we have ${y}_{1},\cdots ,{y}_{n}$ which are n univariate auxiliary observations obtained from NND observations. Therefore, from the original observations of the sample ${x}_{1},\cdots ,{x}_{n}$ and using the n auxiliary observations, we can form the following $d+1$ multivariate observations

$\left({x}_{1},{y}_{1}\right),\cdots ,\left({x}_{n},{y}_{n}\right)$.

These n observations for $n\to \infty $ are asymptotically independent and have a common density function given by the density of $\left(X,Y\right)$ below,

${p}_{0}\left(x,y\right)={g}^{2}\left(x\right){\text{e}}^{-g\left(x\right)}$, (3)

see the end of Section 2 given by Kuljus and Ranneby [10] , (p1094). In fact, the situation is similar to the univariate case where spacings were used, see Luong [5] (pages 619-620).

Now we can consider the random criterion function

${Q}_{n}^{h}\left(\theta \right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}h\left({y}_{i}{f}_{\theta}\left({x}_{i}\right)\right)}$ (4)

for the class of function h defined by expressions (1) and (2), we shall see subsequently that inference methods based on the objective functions (4) are pseudodistance methods based on a class of pseudodistance ${D}_{h}\left(g,f\right)$ where g and f are density functions.

Minimizing ${Q}_{n}^{h}\left(\theta \right)$ with $h\left(x\right)=-\mathrm{log}\left(x\right)$, we obtain the pseudodistance ${D}_{h}$ estimators which are identical to maximum likelihood (ML) estimators, ML estimators can be viewed as pseudodistance estimators based on the Kullback-Leibler (KL) distance but we shall see goodness-of-fit tests statistics are complicated with the use of the KL distance unlike the ones which are based on ${Q}_{n}^{h}\left(\theta \right)$ and consequently based on ${D}_{h}\left(g,{f}_{\theta}\right)$. The KL pseudodistance used to derive ML estimators will be discussed in Section 2.2 and the class of pseudodistances will be introduced in Section 2.2.

Furthermore, if we use $h\left(x\right)=-{x}^{\alpha}$ to construct ${D}_{h}$ then $\alpha $ should be set near 0 but within the range $0<\alpha <1$ for robust estimation without relying on a, explicit multivariate density estimate which is needed for the minimum Hellinger method as proposed by Tamura and Boos [11] . Therefore, it appears that the class of pseudodistance methods being considered are very useful for applications and they are relatively simple to implement so that practitioners might want to use them for applied works.

2.2. Kullback-Leibler (KL) Pseudo-Distance

The negative of the log likelihood function can be expressed as

${Q}_{n}^{ML}\left(\theta \right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}-\mathrm{log}{f}_{\theta}(xi)}$

and ML estimators can be viewed as the values obtained by minimizing the observed version which can also be called sample version of the Kullback-Leibler (KL) pseudo-distance ( ${D}_{KL}$ ), i.e., ${D}_{KL}^{o}\left(g,{f}_{\theta}\right)$ defined as

${D}_{KL}^{o}\left(g,{f}_{\theta}\right)={Q}_{n}^{ML}\left(\theta \right)+\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\text{log}\left(g\left({x}_{i}\right)\right)}\stackrel{p}{\to}{D}_{KL}\left(g,{f}_{\theta}\right)$, (5)

$\stackrel{p}{\to}$ denotes convergence in probability,

and minimizing ${D}_{KL}^{o}\left(g,{f}_{\theta}\right)$ is equivalent to maximize the log of the likelihood function.

The KL pseudo-distance is defined as ${D}_{KL}\left(g,{f}_{\theta}\right)=-{E}_{g}\left(\mathrm{log}\left(\frac{{f}_{\theta}}{g}\right)\right)$, is the KL pseudo-distance.

Howewer, for testing the validity of the model with the null composite hypothesis given by

${H}_{0}:g\left(x\right)\in \left\{{f}_{\theta}\right\},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\theta \in \Omega $

and since $g\left(x\right)$ appears in the LHS of expression (5), it must be estimated and replacing $g\left(x\right)$ say by a multivariate density estimate $\stackrel{^}{g}\left(x\right)$ will make the distribution of the LHS of expression (5) complicated despite that we can replace $\theta $ by ${\stackrel{^}{\theta}}_{ML}$. This might explain the limited use of the KL pseudo-distance for construction of statistics for model testing with the use of ${\stackrel{^}{\theta}}_{ML}$.

2.3. The Class of Pseudo-Distances D_{h}

We shall focus on pseudo-distance methods based on ${D}_{h}\left(g,{f}_{\theta}\right)$ for parametric model with emphasis on continuous multivariate models but some of the previous univariate results which are scattered can also be unified by viewing them as pseudo distance methods.

In general for pseudodistances we require the following property:

${D}_{h}\left(g,f\right)>0$ if $g\ne f$, ${D}_{h}\left(g,f\right)=0$ if and only if $g=f$, (6)

g and f are density functions. The property given by (6) are needed for establishing consistency of estimation and for consitency goodness of fit tests, see Broniatowski et al. [12] for more notions and properties of pseudodistances.

Since ${D}_{h}\left(g,f\right)$ in general is not observable if g is unknown, we shall see at the end of this section that can define an observed version ${D}_{h}^{0}\left(g,f\right)$ with the property ${D}_{h}^{0}\left(g,f\right)\stackrel{p}{\to}{D}_{h}\left(g,f\right)$. ${D}_{h}^{0}\left(g,f\right)$ will satisfy the property given by expression (6) in probability which similar to ${D}_{KL}^{o}\left(g,{f}_{\theta}\right)$ with the property ${D}_{KL}^{o}\left(g,{f}_{\theta}\right)\stackrel{p}{\to}{D}_{KL}\left(g,{f}_{\theta}\right)$ for the KL distance.

Now we work with following pair of observations to develop ${D}_{h}$ methods,

${\left({X}_{1},{Y}_{1}\right)}^{\prime},\cdots ,{\left({X}_{n},{Y}_{n}\right)}^{\prime}$.

For this sample, the observations are asymptotically independent using Propostion 3 by Ranneby et al. [7] (p413) and as $n\to \infty $, the distribution of $\left({X}_{i},{Y}_{i}\right)$ tends to a common distribution, i.e., the common distribution is the distribution of the random vector $\left(X,Y\right)$ with joint density function given by

$p\left(x,y\right)={g}^{2}\left(x\right){\text{e}}^{-yg\left(x\right)}$,

see Kuljus and Ranneby [10] (p1101).

Therefore, the results are very similar to the univariate case with the interpretation $g\left(x\right)$ being a multivariate density here instead of a univariate density, the results given by Luong [5] (page 624) continue to hold and we also have:

1) $Z=Yg\left(X\right)$ follows a standard exponential distribution with density $f\left(z\right)={\text{e}}^{-z},z>0$.

2) Z and X are independent.

If we use Jensen’s inequality it follows that

$E\left(h\left(yf\left(x\right)\right)\right)={E}_{Z}\left(E\left(h\left(z\right)\frac{f\left(x\right)}{g\left(x\right)}\right)|Z\right)>{E}_{Z}\left(h\left(z\right)E\left(\frac{f\left(x\right)}{g\left(x\right)}|Z\right)\right)={E}_{z}(h(z))$

for $f\ne g$, since $E\left(\frac{f\left(x\right)}{g\left(x\right)}|Z\right)=E\left(\frac{f\left(x\right)}{g\left(x\right)}\right)=1$

and $E\left(h\left(yf\left(x\right)\right)\right)={E}_{z}\left(h\left(z\right)\right)$ for $f=g$.

Now, we can define

${D}_{h}\left(g,f\right)=E\left(h\left(yf\left(x\right)\right)\right)-{E}_{z}(h(z))$

and ${E}_{z}\left(h\left(z\right)\right)$ is a known constant which does not depend on the parameters given by the vector $\theta $.

Under the parametric model, if we consider to minimize ${D}_{h}\left(g,{f}_{\theta}\right)$ but g is unknown, it leads to consider the observed objective function ${D}_{h}^{o}\left(g,{f}_{\theta}\right)$ defined below which is based on ${D}_{h}\left(g,{f}_{\theta}\right)$ with

${D}_{h}^{o}\left(g,{f}_{\theta}\right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left(h\left({y}_{i}{f}_{\theta}\left({x}_{i}\right)\right)\right)}-{E}_{z}\left(h\left(z\right)\right)$,

note that we have

${D}_{h}^{o}\left(g,{f}_{\theta}\right)\stackrel{p}{\to}{D}_{h}\left(g,{f}_{\theta}\right)$.

The pseudodistance ${D}_{h}$ estimators given by the vector $\stackrel{^}{\theta}$ based on ${D}_{h}\left(g,{f}_{\theta}\right)$ is obtained by minimizing ${D}_{h}^{o}\left(g,{f}_{\theta}\right)$. Equivalently, it is obtained by minimizing

${Q}_{n}^{h}\left(\theta \right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}h\left({y}_{i}{f}_{\theta}\left({x}_{i}\right)\right)}$

which is expression (4).

3. Asymptotic Properties of D_{h} estimators

3.1. Consistency

It is not difficult to see that the ${D}_{h}$ estimators given by the vector $\stackrel{^}{\theta}$ which minimizes expression (4) is consistent by defining $h\left({z}_{i}\left(\theta ,n\right)\right)=h\left({y}_{i}{f}_{\theta}\left({x}_{i}\right)\right)$ and by using assumptions and results of Section 3.1 as given by Luong [5] (pages 622-624).

The limit laws like uniform weak law of large numbers UWLLN and Central limit Theorem (CLT) are applicable by using the property of $\left\{{z}_{i}\left(\theta ,n\right)\right\}$ being a mixing sequence which is due to $\left({X}_{i},{Y}_{i}\right),i=1,\cdots ,n$ are asymptotically independent with a common distribution as $n\to \infty $. Therefore, $\stackrel{^}{\theta}\stackrel{p}{\to}{\theta}_{0}$.

3.2. Asymptotic Normality

Using CLT and results given by Section 2 in Luong [5] (pages 626-631), we can conclude that

$\sqrt{n}\left(\stackrel{^}{\theta}-{\theta}_{0}\right)\stackrel{L}{\to}N\left({\sigma}_{h}^{2}{\left[I\left({\theta}_{0}\right)\right]}^{-1}\right)$, $\stackrel{L}{\to}$ denotes convergence in law, $I\left({\theta}_{0}\right)=-{E}_{{\theta}_{0}}\left(\frac{{\partial}^{2}\mathrm{ln}f\left(x;{\theta}_{0}\right)}{\partial {\theta}^{\prime}\partial \theta}\right)$ is the commonly used information matrix with ${\sigma}_{h}^{2}=1$, if the function $h\left(x\right)=-\text{log}\left(x\right)$ is used to define ${D}_{h}$ and ${D}_{h}^{0}$ and ${\sigma}_{h}^{2}=\frac{{E}_{Z}\left[{\left({h}^{\prime}\left(Z\right)\right)}^{2}{Z}^{2}\right]}{{\left[{E}_{Z}\left({Z}^{2}{h}^{\u2033}\left(Z\right)\right)\right]}^{2}}$ if the function $h\left(x\right)=-{x}^{\alpha},0<\alpha <1$ is used to define ${D}_{h}$ and ${D}_{h}^{0}$ with the first and second derivatives of h denoted respectively by ${h}^{\prime}$ and ${h}^{\u2033}$. The random variable Z follows a standard exponential distribution as given by expression 25 in Luong [5] (page 631) and from the standard exponential distribution, we also have ${E}_{Z}\left({Z}^{k}\right)=\Gamma \left(1+k\right),k>-1$ and note that ${\sigma}_{h}^{2}\to 1$ as $\alpha \to 0$.

The ${D}_{h}$ estimators using $h\left(x\right)=-{x}^{\alpha},0<\alpha <1$, might have some robustness property using M-estimation theory, see Luong [5] (page 632) and might be preferred over the ML estimators.

From the fact that the proposed ${D}_{h}$ methods are density based but without requiring an explicit density estimate to implement hence they appear to be simpler for practitioners and can be used as alternative to other robust methods such as the Hellinger methods as proposed by Tamura and Boos [11] . Besides, the observed pseudodistances ${D}_{h}^{o}$ based on ${D}_{h}$ can also be used for construction of goodness-of-fit statistics and lead to statistics which are relatively simple to implemement.

4. Goodness-of-Fit Tests Statistics Using ${D}_{h}^{0}$

For model selection and model testing we are primary interested on testing the null composite hypothesis

${H}_{0}:g\left(x\right)\in \left\{{f}_{\theta}\right\},\text{\hspace{0.17em}}\text{\hspace{0.17em}}\theta \in \Omega $.

Howewer it might be easier to follow the procedure to construct test statistics by first consider the test based on ${D}_{h}^{0}$ which is also implicitly based on ${D}_{h}$ for the simple hypothesis first where there is no unkown parameter.

4.1. Simple Null Hypothesis

For simple ${H}_{0}:g=f$,

A natural statistics to use can be based on ${D}_{h}^{0}\left(g,f\right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left(h\left({y}_{i}f\left({x}_{i}\right)\right)-{E}_{z}\left(h\left(z\right)\right)\right)}$ and since $\left\{h\left({y}_{i}f\left({x}_{i}\right)\right),i=1,2,\cdots \right\}$ forms a mixing sequence, CLT can be applied with the distribution of each $h\left({y}_{i}f\left({x}_{i}\right)\right)$ tends to a standard exponential random variable and Slutsky’s Theorem can also be used if needed. Therefore, the following test statistic $\sqrt{n}{D}_{h}^{0}\left(g,f\right)$ can be used and $\sqrt{n}{D}_{h}^{0}\left(g,f\right)\stackrel{L}{\to}N\left(0,{v}_{Z}^{h}\right)$ where ${v}_{Z}^{h}$ is the variance of $h\left(Z\right)$ where Z follows a standard exponential distribution.

For an $\alpha $ level test, we can reject ${H}_{0}$ if

$\frac{\sqrt{n}{D}_{h}^{0}\left(g,f\right)}{\sqrt{{v}_{Z}^{h}}}>{z}_{1-\alpha}$

where ${z}_{1-\alpha}$ is the $\left(1-\alpha \right)$ the percentile of the standard normal distribution.

Equivalently, we can reject ${H}_{0}$ if

$-\frac{\sqrt{n}{D}_{h}^{0}\left(g,f\right)}{\sqrt{{v}_{Z}^{h}}}<{z}_{\alpha}$. (7)

Note that $-{D}_{h}^{0}\left(g,f\right)=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left(-h\left({y}_{i}f\left({x}_{i}\right)\right)-{E}_{z}\left(-h\left(z\right)\right)\right)}$ and ${v}_{Z}^{h}$ is also the variance of $-h\left(Z\right).$

Now if we use ${D}_{h}^{0}$ with $h\left(x\right)=-\text{log}\left(x\right)$, ${E}_{z}\left(-h\left(z\right)\right)={E}_{z}\left(\mathrm{log}\left(Z\right)\right)$ and using the moment generating function of $\mathrm{log}Z$ which is given by ${M}_{\mathrm{log}Z}\left(t\right)={E}_{Z}\left({\text{e}}^{t\mathrm{log}z}\right)=\Gamma \left(1+t\right)$ with $\Gamma (.)$ being the gamma function so that the cumulant generating function $\mathrm{log}{M}_{\mathrm{log}Z}\left(t\right)=\mathrm{log}\Gamma \left(1+t\right)$ and by differentiating it, we can obtain the first two cumulants which are given by ${E}_{z}\left(\mathrm{log}\left(Z\right)\right)=\psi \left(1\right)$, ${v}_{Z}^{h}={\psi}^{\prime}\left(1\right)$, $\psi (.)$ and ${\psi}^{\prime}(.)$ are respectively the digamma function and the trigamma function and they are available in most of the statistical packages.

The test statistic given by expression (7) can be expressed explicilty as

$\frac{\frac{1}{\sqrt{n}}{\displaystyle {\sum}_{i=1}^{n}{y}_{i}}+\frac{1}{\sqrt{n}}{\displaystyle {\sum}_{i=1}^{n}\mathrm{log}f\left({x}_{i}\right)}-\sqrt{n}\psi \left(1\right)}{\sqrt{{\psi}^{\prime}\left(1\right)}}$ (8)

and reject the simple ${H}_{0}$

if $\frac{\frac{1}{\sqrt{n}}{\displaystyle {\sum}_{i=1}^{n}{y}_{i}}+\frac{1}{\sqrt{n}}{\displaystyle {\sum}_{i=1}^{n}\mathrm{log}f\left({x}_{i}\right)}-\sqrt{n}\psi \left(1\right)}{\sqrt{{\psi}^{\prime}\left(1\right)}}<{z}_{\alpha}$ for an $\alpha $ level test, $0<\alpha <1$.

Note that the test is consistent as $\sqrt{n}{D}_{h}^{0}\left(g,f\right)\to \infty $ as $n\to \infty $ if $g\ne f$, so we will reject ${H}_{0}$ with probability 1 should $g\ne f$ but this property is not shared by chi-square tests. Also, there is also the difficulty of arbitrariness of grouping observations into cells for chi-square tests, see Bickel and Breiman [8] for more discussions.

Furthermore, if we use ${D}_{h}^{0}$ with $h\left(x\right)=-{x}^{\alpha},0<\alpha <1$, ${E}_{Z}\left({Z}^{\alpha}\right)=\Gamma \left(1+\alpha \right)$, ${E}_{Z}\left({Z}^{2\alpha}\right)=\Gamma \left(1+2\alpha \right)$ and ${v}_{Z}^{h}=\Gamma \left(1+2\alpha \right)-{\left[\Gamma \left(1+\alpha \right)\right]}^{2}$.

The corresponding test statistic given by expression (7) can be expressed explicitly as

$\frac{\frac{1}{\sqrt{n}}{\displaystyle {\sum}_{i=1}^{n}{\left({y}_{i}f\left({x}_{i}\right)\right)}^{\alpha}}-\sqrt{n}\Gamma \left(1+\alpha \right)}{\sqrt{\Gamma \left(1+2\alpha \right)-{\left[\Gamma \left(1+\alpha \right)\right]}^{2}}}$ (9)

and reject the simple ${H}_{0}$ if

$\frac{\frac{1}{\sqrt{n}}{{\displaystyle \sum}}_{i=1}^{n}{\left({y}_{i}f\left({x}_{i}\right)\right)}^{\alpha}-\sqrt{n}\Gamma \left(1+\alpha \right)}{\sqrt{\Gamma \left(1+2\alpha \right)-{\left[\Gamma \left(1+\alpha \right)\right]}^{2}}}<{z}_{\alpha}.$

4.2. Composite Null Hypothesis

For model testing, we consider the composite ${H}_{0}:g\left(x\right)\in \left\{{f}_{\theta}\right\},\theta \in \Omega $, since ${\theta}_{0}$ is unknown, first we estimate ${\theta}_{0}$ by $\stackrel{^}{\theta}$ which minimizes ${D}_{h}^{0}\left(g,{f}_{\theta}\right)$, then we can form the following statistic, $\sqrt{n}{D}_{h}^{0}\left(g,{f}_{\stackrel{^}{\theta}}\right)$ and we shall show that $\sqrt{n}{D}_{h}^{0}\left(g,{f}_{\stackrel{^}{\theta}}\right)\stackrel{L}{\to}N\left(0,{v}_{Z}^{h}\right)$ which is similar to the statistic for the simple ${H}_{0}$ and unlike other statistics when parameters are estimated lead to complicated null distribution, we shall show that the statistics behave like the one used for simple ${H}_{0}$ in Section 4.1 and the equivalent rejection rules are similar to the ones given by expression (8) and expression (9) depending on the choice of $h\left(x\right)$ used for ${D}_{h}^{0},{D}_{h}$.

In fact, these expressions remain valid for the composite ${H}_{0}$ provided that we replace $f\left({x}_{i}\right)$ by ${f}_{\stackrel{^}{\theta}}\left({x}_{i}\right),i=1,\cdots ,n$ when they appear in these expressions.

As we have seen by using a version of CLT for mixing sequences if needed, $\sqrt{n}{D}_{h}^{0}\left(g,{f}_{{\theta}_{0}}\right)\stackrel{L}{\to}N\left(0,{v}_{Z}^{h}\right)$, now if we can establish

$\sqrt{n}{D}_{h}^{0}\left(g,{f}_{\stackrel{^}{\theta}}\right)=\sqrt{n}{D}_{h}^{0}\left(g,{f}_{{\theta}_{0}}\right)+{o}_{p}\left(1\right)$ (10)

with
${o}_{p}\left(1\right)$ being a term which converges to

$\sqrt{n}{D}_{h}^{0}\left(g,{f}_{\stackrel{^}{\theta}}\right)\stackrel{L}{\to}N\left(0,{v}_{Z}^{h}\right).$

Now, we will proceed to establish the property by expression (10). Using the Mean Value Theorem and the following expansion around $\stackrel{^}{\theta}$, we have

$\sqrt{n}{D}_{h}^{0}\left(g,{f}_{{\theta}_{0}}\right)=\sqrt{n}{D}_{h}^{0}\left(g,{f}_{\stackrel{^}{\theta}}\right)+\frac{1}{2}\sqrt{n}{\left(\stackrel{^}{\theta}-{\theta}_{0}\right)}^{\prime}\frac{{\partial}^{2}{D}_{h}^{0}\left(g,{f}_{\stackrel{\xaf}{\theta}}\right)}{\partial {\theta}^{\prime}\partial \theta}\left(\stackrel{^}{\theta}-{\theta}_{0}\right)$

with $\stackrel{\xaf}{\theta}$ lies on the line segment joining $\stackrel{^}{\theta}$ and ${\theta}_{0}$.

Since $\frac{\partial {D}_{h}^{0}\left(g,{f}_{\stackrel{^}{\theta}}\right)}{\partial \theta}=0$ and using $\frac{{\partial}^{2}{D}_{h}^{0}\left(g,{f}_{\stackrel{\xaf}{\theta}}\right)}{\partial {\theta}^{\prime}\partial \theta}\stackrel{p}{\to}{E}_{Z}\left({Z}^{2}{h}^{\u2033}\left(Z\right)\right)I\left({\theta}_{0}\right)$,

see Luong [5] (page 630). Also, since $I\left({\theta}_{0}\right)$ is positive definite, ${E}_{Z}\left({Z}^{2}{h}^{\u2033}\left(Z\right)\right)>0$ and $\stackrel{^}{\theta}\stackrel{p}{\to}{\theta}_{0}$, we then have the relation as given by expression (10).

Furthermore, if we use $h\left(x\right)=-\text{log}\left(x\right)$ for ${D}_{h}$, $\stackrel{^}{\theta}={\stackrel{^}{\theta}}_{ML}$,

$\sqrt{n}{D}_{h}^{0}\left(g,{f}_{\text{}{\stackrel{^}{\theta}}_{ML}}\right)=\sqrt{n}{D}_{h}^{0}\left(g,{f}_{\stackrel{^}{\theta}}\right)\stackrel{L}{\to}N\left(0,{v}_{Z}^{h}\right)$.

The use of the ML estimators ${\stackrel{^}{\theta}}_{ML}$ for chi-square distance type statistics often create complications when comes to derive the asymptotic distributions of these statistics, see Chernoff and Lehmann [13] (p580), Luong and Thompson [14] (p249-251).

For applications, it has been recognized that the maximum value attained by the log of the likelihood function can provide information on goodness-of-fit for the model being used, the test as given by expression (8) with $f\left({x}_{i}\right)$ replaced by ${f}_{\stackrel{^}{\theta}}\left({x}_{i}\right)$ formalizes the informal procedures on the use of the maximum value of the log likelihood function for assessing goodness-of-fit of the model, see Klugman et al. [15] but note that the condition of no tied observation is needed for the use of test based on the log of likelihood function as given by expression (8) with $f\left({x}_{i}\right)$ replaced by ${f}_{\stackrel{^}{\theta}}\left({x}_{i}\right)$, otherwise there are some values of ${y}_{i}=0$ and the log of these values are undefined meanwhile test based on ${D}_{h}^{o}$ with $h\left(x\right)=-{x}^{\alpha},0<\alpha <1$ is well defined even with the presence of tied obsevations and in general we should fix a value for $\alpha $ near 0 for balancing efficiency and robustness for the estimation procedures.

5. Illustration

For illustration of the proposed methods, we use the multivariate normal model with d dimension; its density function is often parameterized using the mean $\mu $ and the covariance matrix $\Sigma $ and it is given by

$f\left(x;\mu ,\Sigma \right)={\left(2\text{\pi}\right)}^{-\frac{1}{2}d}{\left|\Sigma \right|}^{-\frac{1}{2}}{\text{e}}^{-\frac{1}{2}{\left(x-\mu \right)}^{\prime}{\Sigma}^{-1}\left(x-\mu \right)},\text{\hspace{0.17em}}\text{\hspace{0.17em}}x\in {R}^{d}.$ (11)

$\left|\Sigma \right|$ is the determinant of the matrix $\Sigma $, see Anderson [16] (page 20).

There is redundancy when using elements of the matrix $\Sigma $ as parameters as $\Sigma $ being a covariance matrix; it is symmetric.

We can eliminate the redundancy by defining the vector of parameters as $\theta $ with

$\theta =\left(\begin{array}{c}\mu \\ Vech\Sigma \end{array}\right)$.

The Vech operator when applied to $\Sigma $ extracts the lower triangular elements of $\Sigma $ and stacks them in a vector. Equivalently, we can use the vector of parameters $\theta $ instead of $\mu $ and $\Sigma $ and express the multivariate normal density as $f\left(x;\theta \right)$ to avoid redundancy of the previous parameterization. We assume that we have a random sample of size n which allows us to obtain the auxiliary univariate observations ${y}_{1},\cdots ,{y}_{n}$ from NND observations and there is no tied observation so that

${y}_{i}>0,i=1,\cdots ,n$

For illustration say we use ${D}_{h}^{0}$ with $h\left(x\right)=-\mathrm{log}x$, the vector of estimators in this case coincides with maximum likelihood (ML) estimators, i.e., $\stackrel{^}{\theta}={\stackrel{^}{\theta}}_{ML}$ but for multivariate normal model, it is well known ${\stackrel{^}{\theta}}_{ML}$ can be obtained explicitly, see Anderson [16] (page 112).

Explicitly,

$\stackrel{^}{\theta}={\stackrel{^}{\theta}}_{ML}$, ${\stackrel{^}{\theta}}_{ML}=\left(\begin{array}{c}\stackrel{\xaf}{x}\\ VechS\end{array}\right)$, $\stackrel{\xaf}{x}=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}{x}_{i}}$,

$\stackrel{\xaf}{x}$ is the sample mean and $S$ is the sample covariance matrix which can also be expressed as

$S=\frac{1}{n}{\displaystyle {\sum}_{i=1}^{n}\left({x}_{i}-\stackrel{\xaf}{x}\right){\left({x}_{i}-\stackrel{\xaf}{x}\right)}^{\prime}}$.

For model testing then we can use the test statistic

$\frac{\frac{1}{\sqrt{n}}{\displaystyle {\sum}_{i=1}^{n}{y}_{i}}+\frac{1}{\sqrt{n}}{\displaystyle {\sum}_{i=1}^{n}\mathrm{log}{f}_{{\stackrel{^}{\theta}}_{ML}}\left({x}_{i}\right)}-\sqrt{n}\psi \left(1\right)}{\sqrt{{\psi}^{\prime}(1)}}$

and reject the model if the statistics gives a value smaller than ${z}_{\alpha}$ for an $\alpha $ level test.

As Tamura and Boos [11] have pointed out that, ${\stackrel{^}{\theta}}_{ML}$ might not be robust and hence proposed multivariate Hellinger density estimators but a multivariate density estimate is needed for their procedures. For robust estimation or in case of having tied observation we might want to use ${D}_{h}^{0}$ with $h\left(x\right)=-{x}^{\alpha}$ with $\alpha $ being a positive number but near 0.

In this paper, we focus on presentations of methodologies of ${D}_{h}$, leaving simulation studies for assessing power of the tests, the use of other distributions than the normal distribution for the null distribution of goodness-of-fit tests statistics and assessing efficiency when sample sizes are small or in finite samples for subsequent works. Practitioners might be encouraged to use these ${D}_{h}$ methods.

Acknowledgements

The helpful and constructive comments of a referee which lead to an improvement of the presentation of the paper and support from the editorial staffs of Open Journal of Statistics to process the paper are all gratefully acknowledged.

References

[1] Csorgo, S.L. (1986) Testing for Normality in Arbitrary Dimension. The Annals of Statistics, 14, 708-723.

https://doi.org/10.1214/aos/1176349948

[2] Moore, D.S. and Stubblebine, J.B. (1981) Chi-Square Tests for Multivariate Normality with Applications to Common Stocks Prices. Communications in Statistics, Theory and Methods, 10, 713-738.

https://doi.org/10.1080/03610928108828070

[3] Klugman, S.A., Panjer, H.H. and Willmott, G.E. (2013) Loss Models: Further Topics. Wiley, New York.

https://doi.org/10.1002/9781118787106

[4] Babu, G.J. and Rao, C.R. (2004) Goodness-of-Fit Tests When Parameters Are Estimated. Sankhya, 66, 63-74.

[5] Luong, A. (2018) Unified Asymptotic Results for Maximum Spacing and Generalized Spacing Methods for Continuous Models. Open Journal of Statistics, 8, 614-639.

https://doi.org/10.4236/ojs.2018.83040

[6] Luong, A. (2018) Asymptotic Results for Goodness-of-Fit Tests Using a Class of Generalized Spacing Methods. Open Journal of Statistics, 8, 731-746.

https://doi.org/10.4236/ojs.2018.84048

[7] Ranneby, B., Jammalamadaka, S.R. and Tetterukovskiy, A (2005) The Maximum Spacing Estimation for Multivariate Observations. Journal of Statistical Planning Inference, 129, 427-446.

https://doi.org/10.1016/j.jspi.2004.06.059

[8] Bickel, P.J. and Breiman, L. (1983) Sum of Functions of Nearest Neighbor Distances, Moment Bounds, Limit Theorem and a Goodness of Fit Test. The Annals of Probability, 11, 185-114.

https://doi.org/10.1214/aop/1176993668

[9] Zhou, S. and Jammalamadakala, S.R. (1993) Goodness of Fit in Multi-Dimensions Based on Nearest Neighbour Distances. Nonparametric Statistics, 2, 271-284.

https://doi.org/10.1080/10485259308832558

[10] Kuljus, K. and Ranneby, B. (2015) Generalized Maximum Spacing Estimation for Multivariate Observations. Scandinavian Journal of Statistics, 42, 1092-1108.

https://doi.org/10.1111/sjos.12153

[11] Tamura, R.N. and Boos, D.D. (1986) Minimum Hellinger Distance Estimation for Multivariate Location and Covariance. Journal of the American Statistical Association, 81, 223-229.

https://doi.org/10.1080/01621459.1986.10478264

[12] Broniatowski, M., Toma, A. and Vajda, I. (2012) Decomposable Pseudodistances and Applications in Statistical Estimation. Journal of Statistical Planning and Inference, 142, 2574-2585.

https://doi.org/10.1016/j.jspi.2012.03.019

[13] Chernoff, H. and Lehmann, E.L. (1954) The Use of Maximum Estimates in Chi-Square Tests for Goodness of Fit. Annals of Mathematical Statistics, 25, 579-586.

https://doi.org/10.1214/aoms/1177728726

[14] Luong, A. and Thompson, M.E. (1987) Minimum Distance Methods Based on Quadratic Distances for Transforms. Canadian Journal of Statistics, 15, 239-251.

https://doi.org/10.2307/3314914

[15] Klugman, S.A., Panjer, H.H. and Willmott, G.E. (2012) Loss Models: From Data to Decision. Wiley, New York.

https://doi.org/10.1002/9781118787106

[16] Anderson, T.W. (2003) An Introduction to the Multivariate Statistical Analysis. 3rd Edition, Wiley, New York.