Received 27 May 2016; accepted 20 August 2016; published 23 August 2016
Time series are used to model various phenomena measured over time. Successive observations are often correlated, since they may depend on some common external factors, but which remain unknown to the analyst. In this case, autoregressive models will be useful to model this dependence.
In some situations, we might be interested in the number of events which occur during a certain period of time. Such observations will necessarily be non-negative and integer-valued. Models which have been used for sequences of dependent discrete random variables include the Poisson autoregressive process of order 1, denoted, introduced by Al-Osh and Alzaid  and the generalized Poisson autoregressive process of order 1, denoted (see Alzaid and Al-Osh  ). The process, a stationary process with Poisson marginal distributions, is a special case of the.
The paper is organized as follows. In Section 2, for completeness, we review some properties of the generalized Poisson autoregressive process of order 1. In Section 3, we derive the expressions for the moments estimators, the quasi-likelihood and the maximum likelihood estimators of the 3 parameters of the. These methods have appeared in the literature (see Al-Nachawati, Alwasel and Alzaid  for the quasilike- lihood and moments method and Brännäs  for likelihood methods). However, asymptotic properties such as efficiencies of these methods are not discussed in those papers. In this paper (Sections 4 and 5), we study properties of these estimators such as bias and asymptotic efficiency. The last section reanalyzes a real-data example which can be modelled with a process, where testing is discussed.
We hope that with this study, practitioners will have more information to select one estimation method versus another one and to perform tests concerning values of the parameters.
2 GPAR(1) Process
To define the process, we need first to review the generalized Poisson and the quasi-binomial distributions.
A random variable X has a generalized Poisson distribution with parameters and, denoted, if its probability mass function (pmf) is defined by
where, and is the greatest positive integer for which when is negative. Note that, for, the random variable X becomes a Poisson () distribution. In this paper, we will restrict ourselves to the case where.
Consul  has shown that the expected value and variance of X are given, when, by
so that, for positive values of, we have overdispersion (i.e.).
The sum of two independent random variables X and Y with and distributions, also has a GP distribution, with parameters. Ambagaspitiya and Balakrishnan  have derived the recurrence formula for the probability function of the compound generalized Poisson distribution, used in risk theory.
A non-negative integer-valued random variable X has a quasi-binomial distribution, denoted, if its pmf is given by
where and is such that. Its mean, equal to, is independent of the parameter.
The following proposition, proved in Alzaid and Al-Osh  , shows the relation between the QB and GP distributions.
Proposition 1: If X and are two independent random variables with and distributions, then follows a distribution.
The process generalizes the process introduced by Al-Osh and Alzaid  . The model, where, has been used to model time series in various fields, for example in insurance for short-term workers' compensation because of work-related injuries (Freeland and McCabe  ) and in medicine for the incidence of infectious diseases (Cardinal, Roy and Lambert  ).
In practice, many integer-valued series will often exhibit overdispersion, (i.e. is greater than). The model would therefore not be appropriate for those time series. In cases where the extra variation can be explained in a deterministic way, adding regressors would be adequate (see Freeland and McCabe  ), but where the extra variation is of a stochastic nature, the model could be used for modelling overdispersed time series.
The model, introduced by Alzaid and Al-Osh  , is defined as
1) is a sequence of iid random variables with a distribution.
2) is a sequence of iid random variables with a distribution.
3) These two sequences are independent of each other.
4) has a distribution independent of and.
Proposition 2: The process has a GP marginal distribution.
Proof: See Alzaid and Al-Osh  . The process is obtained from the and distributions, and not from the and distributions, as stated in Al- Nachawati et al.  .
The autocorrelation function (acf) of the process is equal to
The acf of this process is the same as that of an process except that it is always non-negative, since. The partial autocorrelation function (pacf) of the process is equal to
The sample acf and pacf will be useful to identify the model from an observed time series.
3. Estimation of the Parameters
Estimating the parameters in a process will present some challenges, since the conditional distribution of, given, is the convolution of a and a distri- bution.
In this section, we will review three estimation methods for the parameter vector of the process, the methods of moments, quasi-likelihood and conditional maximum likelihood. These methods have been proposed in the literature, see for example, Al-Nachawati et al.  or Brännäs  . However, less emphasis is placed on their asymptotic properties, such as efficiency. In Section 4, we study the bias of these estimators, and in Section 5 their efficiency.
3.1. Method of Moments or Yule-Walker
The first autocovariance of the process is equal to
By taking the expected value of both sides of the equation given in (1), we find Since
we obtain (3)
We also know that
From the observations, we estimate the means, , the variance Var and the autocovariance by their sample analogs
Solving the system of Equations (2), (3), (4) with, , and replaced by their sample values, we obtain the moments estimators of parameter vector,
We have corrected here misprints in the formulas for the moment estimators of the parameters and given by Al-Nachawati et al.  .
3.2. Quasi-Likelihood Method
This method, proposed initially by Whittle  , replaces the true likelihood by the one which assumes that the observations come from a normal distribution with the same conditional mean and variance. Al-Nachawati et al.  obtained the quasi-likelihood estimators by maximizing
where and are given by
We have used the expression in Shenton  for the formula of the variance of a quasi-binomial distribution, which is a bit different from the one given in Al-Nachawati et al.  . Since the process is restricted to non-negative integers and therefore not symmetrical, one might suspect that the estimators are less efficient than the maximum likelihood estimators, which is indeed the case (see Section 5 for numerical results).
3.3. Conditional Maximum Likelihood Method
To obtain the conditional maximum likelihood estimators (MLE’s), we need the conditional distribution of, which is the convolution of a distribution and a distribution. Given the observations, we have to maximize the function
We will work with the loglikelihood function equal to, which will have to be maximized numerically to obtain the MLE.
Under normal regularity conditions, using likelihood theory (see Gouriéroux and Monfort  or Hamilton  ), the vector has an asymptotic multinormal distribution, i.e.
where denotes convergence in law, 0 is the vector of zeros of dimension 3, and
is Fisher’s expected information matrix, of dimension.
4. Bias of Estimators
With simulations, we will study the bias of the moments estimators and the MLE’s. Setting the values of the 3 parameters to those in Table 1, two series of 50 and 200 observations were generated from model (1) in C++. This experiment was repeated 200 times.
For each series, the moments estimators were calculated, as well as their average, and the bias. The conditional MLE's were calculated using the iterative Downhill Simplex method (see Press, Teukolsky, Vetterling and Flannery  ), which does not require the calculation of the derivatives of the function to be maximized. As initial values, we used the moments estimators. The results of the simulations appear in Figures 1-3.
From Figures 1-3, we see that the bias of the MLE’s is smaller than that of the moments estimators, and that
Table 1. Values of parameters.
Figure 1. Bias of the estimators of p (Moment: ----- MLE: - - -).
Figure 2. Bias of the estimators of l (Moment: ----- MLE: - - -).
Figure 3. Bias of the estimators of q (Moment: ----- MLE: - - -).
it decreases when the size of the series increases. Figure 1 shows that the bias of is much smaller than that of, except when and where they are almost equal to 0. The bias of the two estimators is negative. In Figure 2, we see that the bias of and is close to 0 when; as increases, and are more biased. In all cases, the bias of the estimator of is positive. The bias of the estimator of behaves like that of p (Figure 3); for the two estimation methods, it is similar for or 10.
Since the moments estimators and the conditional MLE’s are almost unbiased for large n, we study their asymptotic efficiency in the next section.
5. Asymptotic Efficiency of Estimators
We will first discuss the techniques by which we can obtain the asymptotic variance-covariance matrix of the estimators under the three estimation methods. To study efficiencies, we calculate, in subsection 5.4, the ratios of the variances of the estimators and the ratio of the determinants of their variance-covariance matrix using observations simulated from a process for various values of the parameters. The results are summarized in Table 2 and Table 3 of this section.
Table 2. Efficiency of moments estimators.
5.1. Method of Moments
By using an asymptotically equivalent factor of instead of in Equation (3), moments estimators are given as solutions of the system of equations
Table 3. Efficiency of quasi-likelihood estimators.
Let us define the functions
and the vector
The expected values, and are asymptotically equal to 0. Using a Taylor series expansion around, the true parameter value, we obtain
where, with denoting convergence in probability.
Since is a solution of, Equation (5) can rewritten as
Using Slutsky’s theorem, we find that
where, with probability 1,
Matrix A evaluated at can be estimated by
If and are unknown, they can be replaced by appropriate estimates. The variance-covariance matrix of Y is equal to
Let us consider the first element of this matrix:
since asymptotically (because as). In practice, we truncate these expressions, since, as. If we limit ourselves to a difference of, the last equality becomes
Using the law of large numbers, we can estimate this last term by
The other elements of the matrix can be estimated in the same way.
5.2. Quasi-Likelihood Method
To determine the quasi-likelihood estimator, we have to maximize
Let us define the quasi-score vector
From Hamilton  , using quasi-likelihood theory, we conclude that
where with probability 1, D and S are limits in probability matrices. They are defined as
evaluated at, the true parameter. We can obtain estimates for and, where matrix is defined as
and is the finite version of S evaluated at; the elements of are evaluated numerically using expression (7). Packages such as MATHEMATICA can handle these derivatives calculations numerically. Consequently, the variance-covariance matrix of can be estimated by.
5.3. Conditional Maximum Likelihood
Using the true loglikelihood function from section 4.3, we define the score vector
From Hamilton  , using likelihood theory, we find that
where matrix S is defined analogously as in the previous section, but with a different loglikelihood function.
5.4. Numerical Comparisons
Table 2 and Table 4 give the estimate of the asymptotic efficiency of the moment and the quasi-likelihood estimators compared to the MLE, calculated from 20,000 observations (10 series of 2000 observations) gene- rated from a process with various parameter values.
Comparing Table 2 and Table 3, the quasi-likelihood estimator for p has a smaller variance than the moments estimator; for, it depends on the values of the parameters. The moments estimator of has a smaller variance than the quasi-likelihood estimator, except when, where is better than.
The estimated determinant of the variance-covariance matrix of using the average of the determinants is always smaller than that of and (last column of Table 2 and Table 3). The MLE is more efficient than the moment or the quasi-likelihood estimator, and the moment estimator more efficient than the quasi-likelihood estimator, in general.
6. Applications: Number of Computer Breakdowns
In this section, we perform some tests on a real time series presented by Al-Nachawati et al.  on the number of weekly computer breakdowns for 128 consecutive weeks. This series is overdispersed, since its mean and variance are equal to 4.016 and 14.504. In Figure 4, the acf function is seen to decrease with the lag, while the pacf is high for lag 1 and low thereafter; a model could therefore be appropriate for this series. We use the model in the analysis.
Since the MLE was shown to be the best asymptotic estimator in the previous section, the parameters were estimated with this method; the estimates appear in Table 4, with the estimated variance-covariance matrix.
With the estimated variance-covariance matrices based on expressions (6), (8) and (9) of Section 5, Wald tests can be performed quite easily depending on which estimator has been chosen.
For example, to test using, the quasilikelihood estimator, the statistic can be based on the
statistic, where is an estimate of the variance of, which can be obtained from
the corresponding diagonal element of. Since is asymptotically, we reject at level if is greater than
To test, the test statistic can be based on
which follows a distribution asymptotically. It is expected that the more efficient the estimator is, the
Table 4. MLE’s of the parameters.
Figure 4. Acf and pacf.
more powerful the test will be.
With the estimated parameters, we can test the model versus the simpler model. Since the conditional MLE equals 0.471, with a variance of 0.0026, performing the test vs gives. This leads us to reject and to conclude that the model is more appropriate: there is overdispersion in the observations.
The authors gratefully acknowledge the financial support of the Natural Sciences and Engineering Research Council of Canada and of the Fonds pour la Contribution à la Recherche du Québec.