is a known constant with and in practice we choose near 0, and can also have the form
Note that as decreases to 0 and only needs
to be defined up to an additive and a positive multiplicative constant and provided that these constants are known inference procedures based on
Furthermore, if is used to construct the pseudo distance , estimation using this pseudo distance will give the maximum likelihood estimators. This as a pseudodistance is up to a few terms which does not depend on the Kullback-Leibler (KL) distance used to generate ML estimators. These few terms without involving do not affect estimation but they are very significant for construction of goodness-of-fit test statistics as goodness of fit test statistics constructed using will have an asymptotic normal distribution for model testing meanwhile goodness-of-fit test statistics using the KL pseudodistance do not have a simple asymptotic distribution especially for the composite null hypothesis case where parameters must be estimated using the ML estimators. We shall give more discussions in Section 2.2.
The paper is organized as follows. In Section 2, we introduce the auxiliary observations obtained from the NND observations. The class of pseudodistances is also introduced in this section. Asymptotic properties of estimators based on are considered in Section 3. Estimators obtained using with are identical to ML estimators which are fully efficient. If other is used for , the corresponding estimators have the potential of good efficiencies and some robustness properties. These properties allow flexibility for balancing efficiency and robustness. In Section 4, goodness-of-fit statistics based on the class are shown to have an asymptotic normal distribution and in Section 5 an example is provided for illustration of the proposed techniques.
2. Pseudo Distances
2.1. Nearest Neighbour Distance (NND) Observations
For each vector of observation in the random sample, we define the nearest neighbour distance (NND) to with
is the commonly used Euclidean distance
and clearly can be obtained using the sample of d-variate observations.
In the literature, these ’s have been used to construct goodness of fit statistics, see Bickel and Breiman  , Zhou and Jammalamadaka  but often these statistics for multivariate models do not have a simple asymptotic distribution which might create difficulties for applications. Now, we can define as given
by proposition 2 by Ranneby et al.  (page 433) with or equivalently , , is the usual constant pi used in
formulas to find volume or area and is the commonly used gamma function.
Note that we have which are n univariate auxiliary observations obtained from NND observations. Therefore, from the original observations of the sample and using the n auxiliary observations, we can form the following multivariate observations
These n observations for are asymptotically independent and have a common density function given by the density of below,
see the end of Section 2 given by Kuljus and Ranneby  , (p1094). In fact, the situation is similar to the univariate case where spacings were used, see Luong  (pages 619-620).
Now we can consider the random criterion function
for the class of function h defined by expressions (1) and (2), we shall see subsequently that inference methods based on the objective functions (4) are pseudodistance methods based on a class of pseudodistance where g and f are density functions.
Minimizing with , we obtain the pseudodistance estimators which are identical to maximum likelihood (ML) estimators, ML estimators can be viewed as pseudodistance estimators based on the Kullback-Leibler (KL) distance but we shall see goodness-of-fit tests statistics are complicated with the use of the KL distance unlike the ones which are based on and consequently based on . The KL pseudodistance used to derive ML estimators will be discussed in Section 2.2 and the class of pseudodistances will be introduced in Section 2.2.
Furthermore, if we use to construct then should be set near 0 but within the range for robust estimation without relying on a, explicit multivariate density estimate which is needed for the minimum Hellinger method as proposed by Tamura and Boos  . Therefore, it appears that the class of pseudodistance methods being considered are very useful for applications and they are relatively simple to implement so that practitioners might want to use them for applied works.
2.2. Kullback-Leibler (KL) Pseudo-Distance
The negative of the log likelihood function can be expressed as
and ML estimators can be viewed as the values obtained by minimizing the observed version which can also be called sample version of the Kullback-Leibler (KL) pseudo-distance ( ), i.e., defined as
denotes convergence in probability,
and minimizing is equivalent to maximize the log of the likelihood function.
The KL pseudo-distance is defined as , is the KL pseudo-distance.
Howewer, for testing the validity of the model with the null composite hypothesis given by
and since appears in the LHS of expression (5), it must be estimated and replacing say by a multivariate density estimate will make the distribution of the LHS of expression (5) complicated despite that we can replace by . This might explain the limited use of the KL pseudo-distance for construction of statistics for model testing with the use of .
2.3. The Class of Pseudo-Distances Dh
We shall focus on pseudo-distance methods based on for parametric model with emphasis on continuous multivariate models but some of the previous univariate results which are scattered can also be unified by viewing them as pseudo distance methods.
In general for pseudodistances we require the following property:
if , if and only if , (6)
g and f are density functions. The property given by (6) are needed for establishing consistency of estimation and for consitency goodness of fit tests, see Broniatowski et al.  for more notions and properties of pseudodistances.
Since in general is not observable if g is unknown, we shall see at the end of this section that can define an observed version with the property . will satisfy the property given by expression (6) in probability which similar to with the property for the KL distance.
Now we work with following pair of observations to develop methods,
For this sample, the observations are asymptotically independent using Propostion 3 by Ranneby et al.  (p413) and as , the distribution of tends to a common distribution, i.e., the common distribution is the distribution of the random vector with joint density function given by
see Kuljus and Ranneby  (p1101).
Therefore, the results are very similar to the univariate case with the interpretation being a multivariate density here instead of a univariate density, the results given by Luong  (page 624) continue to hold and we also have:
1) follows a standard exponential distribution with density .
2) Z and X are independent.
If we use Jensen’s inequality it follows that
for , since
and for .
Now, we can define
and is a known constant which does not depend on the parameters given by the vector .
Under the parametric model, if we consider to minimize but g is unknown, it leads to consider the observed objective function defined below which is based on with
note that we have
The pseudodistance estimators given by the vector based on is obtained by minimizing . Equivalently, it is obtained by minimizing
which is expression (4).
3. Asymptotic Properties of Dh estimators
It is not difficult to see that the estimators given by the vector which minimizes expression (4) is consistent by defining and by using assumptions and results of Section 3.1 as given by Luong  (pages 622-624).
The limit laws like uniform weak law of large numbers UWLLN and Central limit Theorem (CLT) are applicable by using the property of being a mixing sequence which is due to are asymptotically independent with a common distribution as . Therefore, .
3.2. Asymptotic Normality
Using CLT and results given by Section 2 in Luong  (pages 626-631), we can conclude that
, denotes convergence in law, is the commonly used information matrix with , if the function is used to define and and if the function is used to define and with the first and second derivatives of h denoted respectively by and . The random variable Z follows a standard exponential distribution as given by expression 25 in Luong  (page 631) and from the standard exponential distribution, we also have and note that as .
The estimators using , might have some robustness property using M-estimation theory, see Luong  (page 632) and might be preferred over the ML estimators.
From the fact that the proposed methods are density based but without requiring an explicit density estimate to implement hence they appear to be simpler for practitioners and can be used as alternative to other robust methods such as the Hellinger methods as proposed by Tamura and Boos  . Besides, the observed pseudodistances based on can also be used for construction of goodness-of-fit statistics and lead to statistics which are relatively simple to implemement.
4. Goodness-of-Fit Tests Statistics Using
For model selection and model testing we are primary interested on testing the null composite hypothesis
Howewer it might be easier to follow the procedure to construct test statistics by first consider the test based on which is also implicitly based on for the simple hypothesis first where there is no unkown parameter.
4.1. Simple Null Hypothesis
For simple ,
A natural statistics to use can be based on and since forms a mixing sequence, CLT can be applied with the distribution of each tends to a standard exponential random variable and Slutsky’s Theorem can also be used if needed. Therefore, the following test statistic can be used and where is the variance of where Z follows a standard exponential distribution.
For an level test, we can reject if
where is the the percentile of the standard normal distribution.
Equivalently, we can reject if
Note that and is also the variance of
Now if we use with , and using the moment generating function of which is given by with being the gamma function so that the cumulant generating function and by differentiating it, we can obtain the first two cumulants which are given by , , and are respectively the digamma function and the trigamma function and they are available in most of the statistical packages.
The test statistic given by expression (7) can be expressed explicilty as
and reject the simple
if for an level test, .
Note that the test is consistent as as if , so we will reject with probability 1 should but this property is not shared by chi-square tests. Also, there is also the difficulty of arbitrariness of grouping observations into cells for chi-square tests, see Bickel and Breiman  for more discussions.
Furthermore, if we use with , , and .
The corresponding test statistic given by expression (7) can be expressed explicitly as
and reject the simple if
4.2. Composite Null Hypothesis
For model testing, we consider the composite , since is unknown, first we estimate by which minimizes , then we can form the following statistic, and we shall show that which is similar to the statistic for the simple and unlike other statistics when parameters are estimated lead to complicated null distribution, we shall show that the statistics behave like the one used for simple in Section 4.1 and the equivalent rejection rules are similar to the ones given by expression (8) and expression (9) depending on the choice of used for .
In fact, these expressions remain valid for the composite provided that we replace by when they appear in these expressions.
As we have seen by using a version of CLT for mixing sequences if needed, , now if we can establish
being a term which converges to
Now, we will proceed to establish the property by expression (10). Using the Mean Value Theorem and the following expansion around , we have
with lies on the line segment joining and .
Since and using ,
see Luong  (page 630). Also, since is positive definite, and , we then have the relation as given by expression (10).
Furthermore, if we use for , ,
The use of the ML estimators for chi-square distance type statistics often create complications when comes to derive the asymptotic distributions of these statistics, see Chernoff and Lehmann  (p580), Luong and Thompson  (p249-251).
For applications, it has been recognized that the maximum value attained by the log of the likelihood function can provide information on goodness-of-fit for the model being used, the test as given by expression (8) with replaced by formalizes the informal procedures on the use of the maximum value of the log likelihood function for assessing goodness-of-fit of the model, see Klugman et al.  but note that the condition of no tied observation is needed for the use of test based on the log of likelihood function as given by expression (8) with replaced by , otherwise there are some values of and the log of these values are undefined meanwhile test based on with is well defined even with the presence of tied obsevations and in general we should fix a value for near 0 for balancing efficiency and robustness for the estimation procedures.
For illustration of the proposed methods, we use the multivariate normal model with d dimension; its density function is often parameterized using the mean and the covariance matrix and it is given by
is the determinant of the matrix , see Anderson  (page 20).
There is redundancy when using elements of the matrix as parameters as being a covariance matrix; it is symmetric.
We can eliminate the redundancy by defining the vector of parameters as with
The Vech operator when applied to extracts the lower triangular elements of and stacks them in a vector. Equivalently, we can use the vector of parameters instead of and and express the multivariate normal density as to avoid redundancy of the previous parameterization. We assume that we have a random sample of size n which allows us to obtain the auxiliary univariate observations from NND observations and there is no tied observation so that
For illustration say we use with , the vector of estimators in this case coincides with maximum likelihood (ML) estimators, i.e., but for multivariate normal model, it is well known can be obtained explicitly, see Anderson  (page 112).
, , ,
is the sample mean and is the sample covariance matrix which can also be expressed as
For model testing then we can use the test statistic
and reject the model if the statistics gives a value smaller than for an level test.
As Tamura and Boos  have pointed out that, might not be robust and hence proposed multivariate Hellinger density estimators but a multivariate density estimate is needed for their procedures. For robust estimation or in case of having tied observation we might want to use with with being a positive number but near 0.
In this paper, we focus on presentations of methodologies of , leaving simulation studies for assessing power of the tests, the use of other distributions than the normal distribution for the null distribution of goodness-of-fit tests statistics and assessing efficiency when sample sizes are small or in finite samples for subsequent works. Practitioners might be encouraged to use these methods.
The helpful and constructive comments of a referee which lead to an improvement of the presentation of the paper and support from the editorial staffs of Open Journal of Statistics to process the paper are all gratefully acknowledged.