The Coordinate-Free Prediction in Finite Populations with Correlated Observations
Abstract: In this paper, we got the best linear unbiased predictor of any linear function of the elements of a finite population under coordinate-free models. The optimal predictor of these quantities was obtained in an earlier work considering models with a known diagonal covariance matrix. We extended this result assuming any known covariance matrix. It is shown that in the particular case of the coordinatized models, this general predictor coincides with the optimal predictor of the total population under a regression super population model with correlated observations.

1. Introduction

A coordinate-free approach in finite populations was introduced by  as an alternative to the Gauss-Markov set up, used with the purpose of predicting li- near functions. The Gauss-Markov approach is characterized by a dependence on a particular basis matrix, but in the coordinate-free language, we need only to describe a parametric subspace of $\text{I}\text{​}{\text{R}}^{N}$ , where $N$ is the size of the finite po- pulation. Coordinate-free models in the linear models context are discussed by  and  .

In a finite population $P=\left\{1,2,\cdots ,N\right\}$ , where $N$ is the known population size, let ${y}_{i},\text{}i=1,2,\cdots ,N$ be the value of a random variable $y$ associated to each population unit. Under the superpopulation approach, we will assume that $Y$ is a random vector such that $Y\in Q$ , where $Q$ is an $N$ -dimensional real vector space with the usual inner product.

The superpopulation model is expressed by

$\begin{array}{l}E\left(Y\right)=\mu \in \Omega \\ \text{Var}\text{ }\left(Y\right)={\sigma }^{2}V,\end{array}$ (1.1)

where $\Omega$ is a $p$ -dimensional subspace of $Q$ , ${\sigma }^{2}$ is a unknown positive pa- rameter and $V$ is a known positive definite matrix.

The considered model is coordinate free, in the sense that no basis is defined for $\Omega$ , the parametric space of $\mu$ .

Our main objective is predicting ${\mathcal{l}}^{\prime }Y$ , a linear combination of the elements of $Y$ . With this purpose, a sample of $n$ observations is drawn of the population and the values of ${y}_{i}$ in $Y$ become known for the sample elements. Let $s$ and $r$ be the sets of sample and non sample elements, respectively, such that $P=s\cup r$ .

We will consider, without loss of generality that $Y$ and $V$ are reordered as

$Y=\left[\begin{array}{c}{Y}_{s}\\ {Y}_{r}\end{array}\right]\text{\hspace{0.17em}}\text{ }\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}\text{ }\text{\hspace{0.17em}}V=\left[\begin{array}{cc}{V}_{s}& {V}_{sr}\\ {V}_{rs}& {V}_{r}\end{array}\right],$

with ${Y}_{s}$ containing the $n$ observed sample elements, ${Y}_{r}$ containing the unobserved elements, ${V}_{s}=\text{Var}\text{ }\left({Y}_{s}\right)$ , ${V}_{r}=\text{Var}\text{ }\left({Y}_{r}\right)$ and ${V}_{sr}=\text{Cov}\text{ }\left({Y}_{s},{Y}_{r}\right)$ are the covariance matrix.

Under a less general model, with $\text{Var}\text{ }\left(Y\right)={\sigma }^{2}D$ , $D$ a known diagonal matrix,  presented the optimal linear predictor of ${\mathcal{l}}^{\prime }Y$ . In the next section, we extended the result, obtaining the best linear unbiased predictor of ${\mathcal{l}}^{\prime }Y$ in the model (1.1) and this was the main contribution of the paper. In Section 3, we show that under the coordinatized model, this predictor coincides with that given by  . Finally, we conclude the paper with some examples in Section 4.

2. Best Linear Unbiased Predictor of Linear Functions

The linear function $\theta ={\mathcal{l}}^{\prime }Y$ to be predicted may be written as

$\theta ={\mathcal{l}}^{\prime }Y={\mathcal{l}}^{\prime }{I}_{s}Y+{\mathcal{l}}^{\prime }\left(I-{I}_{s}\right)Y,$

where ${I}_{s}=\text{diag}\text{ }\left({i}_{1},{i}_{2},\cdots ,{i}_{N}\right)$ is a diagonal matrix with its $k$ -th diagonal element ${i}_{k}$ , where ${i}_{k}=1$ if $k\in s$ and ${i}_{k}=0$ if $k\in r$ , $s=\left\{1,2,\cdots ,n\right\}$ , $r=\left\{n+1,n+2,\cdots ,N\right\}$ .

We note that with this notation, ${\mathcal{l}}^{\prime }{I}_{s}Y$ corresponds to the linear combina- tion of the components of $Y$ in the sample and ${\mathcal{l}}^{\prime }\left(I-{I}_{s}\right)Y$ is the com- bination of the unobserved elements.

Before stating the predicting results, it is necessary to introduce some de- finitions and preliminary results.

Let

${\Omega }_{s}=\left\{{\mu }_{s}|{\mu }_{s}={I}_{s}\mu ,\mu \in \Omega \right\}$

${\Omega }_{r}=\left\{{\mu }_{r}|{\mu }_{r}=\left(I-{I}_{s}\right)\mu ,\mu \in \Omega \right\}\text{ },\text{\hspace{0.17em}}$ and

${Y}_{s}={I}_{s}Y,\text{ }{Y}_{r}=\left(I-{I}_{s}\right)Y,\text{ }N×1\text{ }\text{\hspace{0.17em}}\text{matrices}\text{ }.$

Since after the sample is observed, ${I}_{s}Y$ will be known, we restrict our atten- tion to linear predictors of ${\mathcal{l}}^{\prime }Y$ in the form

$\stackrel{^}{\theta }={\mathcal{l}}^{\prime }{I}_{s}Y+{b}^{\prime }{I}_{s}Y,$

where $b$ is a $N$ -dimensional vector.

Definition. A linear predictor $\stackrel{^}{\theta }$ of $\theta$ is unbiased if and only if

${E}_{\mu }\left(\stackrel{^}{\theta }-\theta \right)=0,$

for every $\mu \in \Omega$ .

The class of all linear unbiased predictors of ${\mathcal{l}}^{\prime }Y$ will be denoted by ${U}_{\mathcal{l}}$ .

Finally, next definition states the concept of optimality of the linear predictor of $\theta$ .

Definition. The linear predictor ${\stackrel{^}{\theta }}_{0}$ is the best linear unbiased predictor of $\theta$ or the optimal linear predictor of $\theta$ if ${\stackrel{^}{\theta }}_{0}\in {U}_{\mathcal{l}}$ and

${E}_{\mu }{\left({\stackrel{^}{\theta }}_{0}-\theta \right)}^{2}\le {E}_{\mu }{\left(\stackrel{^}{\theta }-\theta \right)}^{2},$

for every $\mu \in \Omega$ and every $\stackrel{^}{\theta }\in {U}_{\mathcal{l}}$ .

The value of ${E}_{\mu }{\left({\stackrel{^}{\theta }}_{0}-\theta \right)}^{2}$ corresponds to the mean-squared error of the predictor ${\stackrel{^}{\theta }}_{0}$ .

The optimal linear predictor of $\theta$ under the model

$\begin{array}{l}E\left(Y\right)=\mu \in \Omega \\ \text{Var}\text{ }\left(Y\right)={\sigma }^{2}D,\end{array}$

where $D$ is a known diagonal matrix and ${\sigma }^{2}$ is unknown was obtained by  . It was shown that if $\mathrm{dim}\text{ }\left(\Omega \right)=\mathrm{dim}\text{ }\left({\Omega }_{s}\right)$ , where $\mathrm{dim}\text{ }\left(\Omega \right)$ is the dimension of the linear space $\Omega$ , then the best linear unbiased predictor of $\theta ={\mathcal{l}}^{\prime }Y$ is given by

${\stackrel{^}{\theta }}_{*}={\mathcal{l}}^{\prime }{I}_{s}Y+{\mathcal{l}}^{\prime }{\stackrel{^}{\mu }}_{*}$

where ${\stackrel{^}{\mu }}_{*}=\left[\begin{array}{c}0\\ {\stackrel{^}{\mu }}_{r}\end{array}\right]$ , 0 is a null vector of dimension $n$ , ${\stackrel{^}{\mu }}_{*}$ is such that

${\stackrel{^}{\mu }}_{*}=\left(I-{I}_{s}\right){P}_{\Omega }\left({Y}_{s}+{\stackrel{^}{\mu }}_{*}\right)$ (1.2)

and ${P}_{\Omega }$ is the orthogonal projector onto $\Omega$ .

Returning to the model (1.1), with a non diagonal covariance matrix $V$ , let us consider the decomposition $V=P\cdot {P}^{\prime }$ , with $P$ a lower triangular matrix. As shown by  (Theorem 7.2.1) there is a unique lower triangular matrix $P$ such that $V=P{P}^{\prime }$ . In addition, $P$ is nonsingular. Then, we define the random vector $Z={P}^{-1}Y$ and, as a consequence, by multivariate properties of covariance matrix of random vectors and matrix results,

$\text{Var}\text{ }\left(Z\right)={P}^{-1}V{P}^{-{1}^{\prime }}{\sigma }^{2}={P}^{-1}P{P}^{\prime }{P}^{-{1}^{\prime }}{\sigma }^{2}={P}^{\prime }{{P}^{\prime }}^{-1}{\sigma }^{2}={\sigma }^{2}I.$

Next theorem presents the best linear unbiased predictor of ${\mathcal{l}}^{\prime }Y$ under model (1.1).

Theorem 1. In the model (1.1)

$\begin{array}{l}E\left(Y\right)=\mu \in \Omega \\ \text{Var}\text{ }\left(Y\right)={\sigma }^{2}V.\end{array}$

$V$ a known positive definite matrix, the optimal linear predictor of any linear function of $Y$ , ${h}^{\prime }Y$ , is

${h}^{\prime }{I}_{s}Y+{h}^{\prime }{\stackrel{^}{\mu }}_{Y}$ (2.1)

where ${\stackrel{^}{\mu }}_{Y}=\left[\begin{array}{c}0\\ {\stackrel{^}{Y}}_{r}\end{array}\right]$ , 0 is the null vector of dimension $n$ , ${\stackrel{^}{Y}}_{r}$ is the solution in ${Y}_{r}$

of the system of linear equations

$\left(I-{I}_{s}\right){P}^{-1}Y=\left(I-{I}_{s}\right){P}^{-1}{P}_{\Omega }Y,$

and ${P}_{\Omega }$ is the orthogonal projection matrix onto $\Omega$ .

Proof. Let $Z={P}^{-1}Y=\left[\begin{array}{c}{Z}_{s}\\ {Z}_{r}\end{array}\right]$ with $P$ the lower triangular matrix such that

$V=P{P}^{\prime }$ ,

${\Omega }^{*}=\left\{{\mu }^{*}|{\mu }^{*}={P}^{-1}\mu ,\mu \in \Omega \right\},$

$\Gamma ={h}^{\prime }P{I}_{s}Z+{h}^{\prime }P{\stackrel{^}{\mu }}_{Z},$

where ${\stackrel{^}{\mu }}_{Z}=\left[\begin{array}{c}0\\ {\stackrel{^}{Z}}_{r}\end{array}\right]$ , 0 is the null vector of dimension $n$ and ${\stackrel{^}{Z}}_{r}$ the solution in

${Z}_{r}$ of the system of linear equations

${\stackrel{^}{\mu }}_{Z}=\left(I-{I}_{s}\right){P}_{{\Omega }^{*}}\left({I}_{s}Z+{\stackrel{^}{\mu }}_{Z}\right).$

We note that $\Gamma$ does not depend on unknown quantities because, as it will be shown in the appendix, ${h}^{\prime }P{I}_{s}Z$ and ${h}^{\prime }P{\stackrel{^}{\mu }}_{Z}$ do not depend on unknown quantities.

Since

$E\left(Z\right)={P}^{-1}E\left(Y\right)={P}^{-1}\mu ={\mu }^{*}\in {\Omega }^{*}$

and

$\text{Var}\text{ }\left(Z\right)={\sigma }^{2}I,$

by  results, the optimal linear predictor of ${\mathcal{l}}^{\prime }Z$ is

${\mathcal{l}}^{\prime }{I}_{s}Z+{\mathcal{l}}^{\prime }{\stackrel{^}{\mu }}_{Z}$

with ${\stackrel{^}{\mu }}_{Z}=\left[\begin{array}{c}0\\ {\stackrel{^}{Z}}_{r}\end{array}\right]$ , where 0 is the null vector of dimension $n$ and ${\stackrel{^}{Z}}_{r}$ obtained

by (1.2) is the solution of the system of linear equations

${\stackrel{^}{\mu }}_{Z}=\left(I-{I}_{s}\right){P}_{{\Omega }^{*}}\left({I}_{s}Z+{\stackrel{^}{\mu }}_{Z}\right).$

Taking ${\mathcal{l}}^{\prime }={h}^{\prime }P$ , this predictor reduces to $\Gamma$ and ${\mathcal{l}}^{\prime }Z={h}^{\prime }P{P}^{-1}Y={h}^{\prime }Y$ . So, by (1.2), we have just proved that $\Gamma$ is the optimal linear predictor of ${h}^{\prime }Y$ .

To finish the proof, it is enough to show that $\Gamma ={h}^{\prime }{I}_{s}Y+{h}^{\prime }{\stackrel{^}{\mu }}_{Y}$ . For this purpose we write some of matrices already defined in the partitioned form as

$\begin{array}{l}P=\left[\begin{array}{cc}{P}_{1}& 0\\ {P}_{3}& {P}_{4}\end{array}\right],\text{\hspace{0.17em}}\text{\hspace{0.17em}}{P}^{-1}=\left[\begin{array}{cc}C& 0\\ {B}_{1}& {B}_{2}\end{array}\right],\\ {P}_{{\Omega }^{*}}=\left[\begin{array}{cc}{H}_{1}& {H}_{2}\\ {A}_{1}& {A}_{2}\end{array}\right],\text{\hspace{0.17em}}\text{\hspace{0.17em}}{I}_{s}=\left[\begin{array}{cc}{I}_{n}& 0\\ 0& 0\end{array}\right],\text{\hspace{0.17em}}\text{\hspace{0.17em}}I-{I}_{s}=\left[\begin{array}{cc}0& 0\\ 0& {I}_{N-n}\end{array}\right],\end{array}$

where the submatrix are of dimension $n×n$ , $n×\left(N-n\right)$ , $\left(N-n\right)×n$ and $\left(N-n\right)×\left(N-n\right)$ and 0 denotes the null matrix.

Since $Z=\left[\begin{array}{c}{Z}_{s}\\ {Z}_{r}\end{array}\right]$ ,

${\stackrel{^}{\mu }}_{Z}=\left(I-{I}_{s}\right){P}_{{\Omega }^{*}}\left({I}_{s}Z+{\stackrel{^}{\mu }}_{Z}\right)$

implies that

${\stackrel{^}{Z}}_{r}={A}_{1}{Z}_{s}+{A}_{2}{\stackrel{^}{Z}}_{r}\text{\hspace{0.17em}}\text{ }\text{\hspace{0.17em}}\text{and}\text{\hspace{0.17em}}\text{ }\text{\hspace{0.17em}}{\stackrel{^}{Z}}_{r}={\left(I-{A}_{2}\right)}^{-1}{A}_{1}{Z}_{s}.$

Further, ${P}_{\Omega }=P{P}_{{\Omega }^{*}}{P}^{-1}$  , then ${P}_{{\Omega }^{*}}={P}^{-1}{P}_{\Omega }P$ and after some calculations

we have

$\left(I-{I}_{s}\right){P}^{-1}Y=\left[\begin{array}{c}0\\ {B}_{1}{Y}_{s}+{B}_{2}{Y}_{r}\end{array}\right]$

and

$\begin{array}{l}\left(I-{I}_{s}\right){P}^{-1}{P}_{\Omega }Y=\left(I-{I}_{s}\right){P}^{-1}{P}_{\Omega }P{P}^{-1}Y\\ =\left(I-{I}_{s}\right){P}_{{\Omega }^{*}}{P}^{-1}Y=\left[\begin{array}{c}0\\ \left({A}_{1}C+{A}_{2}{B}_{1}\right){Y}_{s}+{A}_{2}{B}_{2}{Y}_{r}\end{array}\right].\end{array}$

Thus, if ${\stackrel{^}{Y}}_{r}$ is the solution in ${Y}_{r}$ of

$\left(I-{I}_{s}\right){P}^{-1}Y=\left(I-{I}_{s}\right){P}^{-1}{P}_{\Omega }Y,$

it follows that

${\stackrel{^}{Y}}_{r}={\left({B}_{2}-{A}_{2}{B}_{2}\right)}^{-1}\left({A}_{1}C+{A}_{2}{B}_{1}-{B}_{1}\right){Y}_{s}.$

Now, with this notation,

$Z={P}^{-1}Y=\left[\begin{array}{c}C{Y}_{s}\\ {B}_{1}{Y}_{s}+{B}_{2}{Y}_{r}\end{array}\right]$

which implies that

${Z}_{s}=C{Y}_{s}.$

So,

$\begin{array}{c}{B}_{1}{Y}_{s}+{B}_{2}{\stackrel{^}{Y}}_{r}={B}_{1}{Y}_{s}+{B}_{2}{\left({B}_{2}-{A}_{2}{B}_{2}\right)}^{-1}\left({A}_{1}C+{A}_{2}{B}_{1}-{B}_{1}\right){Y}_{s}\\ =\left\{{B}_{1}+{B}_{2}{B}_{2}^{-1}{\left(I-{A}_{2}\right)}^{-1}\left({A}_{1}C+{A}_{2}{B}_{1}-{B}_{1}\right)\right\}{Y}_{s}\\ =\left\{{B}_{1}+{\left(I-{A}_{2}\right)}^{-1}{A}_{1}C+{\left(I-{A}_{2}\right)}^{-1}\left({A}_{2}-I\right){B}_{1}\right\}{Y}_{s}\\ ={\left(I-{A}_{2}\right)}^{-1}{A}_{1}C{Y}_{s}={\stackrel{^}{Z}}_{r}.\end{array}$

Hence,

$\begin{array}{c}\Gamma ={h}^{\prime }P{I}_{s}Z+{h}^{\prime }P{\stackrel{^}{\mu }}_{Z}={h}^{\prime }P\left[\begin{array}{c}{Z}_{s}\\ 0\end{array}\right]+{h}^{\prime }P\left[\begin{array}{c}0\\ {\stackrel{^}{Z}}_{r}\end{array}\right]\\ ={h}^{\prime }\left[\begin{array}{cc}{P}_{1}& 0\\ {P}_{3}& {P}_{4}\end{array}\right]\left[\begin{array}{c}C{Y}_{s}\\ {B}_{1}{Y}_{s}+{B}_{2}{\stackrel{^}{Y}}_{r}\end{array}\right]\\ ={h}^{\prime }\left[\begin{array}{c}{P}_{1}C{Y}_{s}\\ {P}_{3}C{Y}_{s}+{P}_{4}{B}_{1}{Y}_{s}+{P}_{4}{B}_{2}{\stackrel{^}{Y}}_{r}\end{array}\right]\end{array}$

and because

$P\cdot {P}^{-1}=\left[\begin{array}{cc}{P}_{1}C& 0\\ {P}_{3}C+{P}_{4}{B}_{1}& {P}_{4}{B}_{2}\end{array}\right]=\left[\begin{array}{cc}{I}_{n}& 0\\ 0& {I}_{N-n}\end{array}\right],$

then

$\Gamma ={h}^{\prime }\left[\begin{array}{c}{Y}_{s}\\ {\stackrel{^}{Y}}_{r}\end{array}\right]={h}^{\prime }\left[\begin{array}{c}{Y}_{s}\\ 0\end{array}\right]+{h}^{\prime }\left[\begin{array}{c}0\\ {\stackrel{^}{Y}}_{r}\end{array}\right]={h}^{\prime }{I}_{s}Y+{h}^{\prime }{\stackrel{^}{\mu }}_{Y}.$

It is important to observe that ${P}_{\Omega }$ has $\frac{N\left(N+1\right)}{2}$ unknown elements and it

may be difficult to calculate by the above definition. But it can be obtained as ${P}_{\Omega }=A{\left({A}^{\prime }{V}^{-1}A\right)}^{-1}{A}^{\prime }{V}^{-1}$ , when $A$ is a basis matrix for $\Omega$ .

Some applications of the result in Theorem 1 will be presented in the examples.

3. Best Linear Unbiased Predictor in the Coordinatized Model

We now consider a coordinatized version of the model (1.1), given by

$\begin{array}{l}E\left(Y\right)=X\beta ,\text{\hspace{0.17em}}\text{\hspace{0.17em}}\beta \in \text{I}\text{​}{\text{R}}^{p}\\ \text{Var}\text{ }\left(Y\right)={\sigma }^{2}V.\end{array}$ (3.1)

${\sigma }^{2}>0$ , with $V$ a known positive definite matrix and $X$ a basis matrix of $\Omega$ .

Under this formulation, $X$ is a $N×p$ matrix of full rank $p$ and there exists a unique $\beta \in \text{I}\text{​}{\text{R}}^{p}$ such that $\mu =X\beta$ . Regression models are included in the class of models defined in (3.1).

 derived the best linear unbiased predictor of the population total $T=\underset{i=1}{\overset{N}{\sum }}{y}_{i}$ .

This predictor, adapted to the notation introduced here and to predict any linear combination of $Y$ is given by

$\stackrel{^}{T}={h}^{\prime }{Y}_{s}+{h}^{\prime }\left(I-{I}_{s}\right)\left[\begin{array}{c}0\\ {X}_{r}\stackrel{^}{\beta }+{V}_{rs}{V}_{s}^{-1}\left({Y}_{s}-{X}_{s}\stackrel{^}{\beta }\right)\end{array}\right],$ (3.2)

where $\stackrel{^}{\beta }={\left({{X}^{\prime }}_{s}{V}_{s}^{-1}{X}_{s}\right)}^{-1}{{X}^{\prime }}_{s}{V}_{s}^{-1}{Y}_{s}$ and $X=\left[\begin{array}{c}{X}_{s}\\ {X}_{r}\end{array}\right]$ .

Next theorem shows that in the coordinatized model (3.1), the optimal linear predictor obtained in Theorem 1 reduces to the Royall’s predictor defined in (3.2).

Theorem 2. Under model (3.1), the optimal linear predictor ${h}^{\prime }{I}_{s}Y+{h}^{\prime }{\stackrel{^}{\mu }}_{Y}$ given in (2.1) is equal to $\stackrel{^}{T}$ .

Proof. We must show that ${\stackrel{^}{Y}}_{r}$ in (2.1) is equal to ${X}_{r}\stackrel{^}{\beta }+{V}_{rs}{V}_{s}^{-1}\left({Y}_{s}-{X}_{s}\stackrel{^}{\beta }\right)$ .

As proved in Theorem 1

${\stackrel{^}{Y}}_{r}={\left({B}_{2}-{A}_{2}{B}_{2}\right)}^{-1}\left({A}_{1}C+{A}_{2}{B}_{1}-{B}_{1}\right){Y}_{s}$

which is equivalent to

${\stackrel{^}{Y}}_{r}={B}_{2}^{-1}\left[{\left(I-{A}_{2}\right)}^{-1}{A}_{1}C-{B}_{1}\right]{Y}_{s}.$

Applying (A.3), (A.1) and (A.2) of the appendix, it follows that

$\begin{array}{c}{\stackrel{^}{Y}}_{r}={B}_{2}^{-1}\left[{\left(I-{A}_{2}\right)}^{-1}\left({B}_{1}{X}_{s}+{B}_{2}{X}_{r}\right){\left({X}^{\prime }{V}^{-1}X\right)}^{-1}{{X}^{\prime }}_{s}{C}^{\prime }C-{B}_{1}\right]{Y}_{s}\\ ={B}_{2}^{-1}\left[{\left(I-{A}_{2}\right)}^{-1}\left({B}_{1}{X}_{s}+{B}_{2}{X}_{r}\right){\left({X}^{\prime }{V}^{-1}X\right)}^{-1}{{X}^{\prime }}_{s}{V}_{s}^{-1}-{B}_{1}\right]{Y}_{s}\\ ={V}_{rs}{V}_{s}^{-1}{Y}_{s}+{B}_{2}^{-1}{\left(I-{A}_{2}\right)}^{-1}\left({B}_{1}{X}_{s}+{B}_{2}{X}_{r}\right){\left({X}^{\prime }{V}^{-1}X\right)}^{-1}{{X}^{\prime }}_{s}{V}_{s}^{-1}{Y}_{s}.\end{array}$

Now, it is enough showing that

${B}_{2}^{-1}{\left(I-{A}_{2}\right)}^{-1}\left({B}_{1}{X}_{s}+{B}_{2}{X}_{r}\right){\left({X}^{\prime }{V}^{-1}X\right)}^{-1}=\left[{X}_{r}-{V}_{rs}{V}_{s}^{-1}{X}_{s}\right]{\left({{X}^{\prime }}_{s}{V}_{s}^{-1}{X}_{s}\right)}^{-1}.$

By (A.6),

$\begin{array}{l}{B}_{2}^{-1}{\left(I-{A}_{2}\right)}^{-1}\left({B}_{1}{X}_{s}+{B}_{2}{X}_{r}\right){\left({X}^{\prime }{V}^{-1}X\right)}^{-1}\\ ={B}_{2}^{-1}\left[I+\left({B}_{1}{X}_{s}+{B}_{2}{X}_{r}\right){\left({{X}^{\prime }}_{s}{V}_{s}^{-1}{X}_{s}\right)}^{-1}\left({{X}^{\prime }}_{s}{{B}^{\prime }}_{1}+{{X}^{\prime }}_{r}{{B}^{\prime }}_{2}\right)\right]\\ \text{}×\left({B}_{1}{X}_{s}+{B}_{2}{X}_{r}\right){\left({X}^{\prime }{V}^{-1}X\right)}^{-1}\end{array}$

and employing (A.2), last expression reduces to

$\begin{array}{l}\left[{B}_{2}^{-1}-{V}_{rs}{V}_{s}^{-1}{X}_{s}{\left({{X}^{\prime }}_{s}{V}_{s}^{-1}{X}_{s}\right)}^{-1}\left({{X}^{\prime }}_{s}{{B}^{\prime }}_{1}+{{X}^{\prime }}_{r}{{B}^{\prime }}_{2}\right)\\ \text{}+{X}_{r}{\left({{X}^{\prime }}_{s}{V}_{s}^{-1}{X}_{s}\right)}^{-1}\left({{X}^{\prime }}_{s}{{B}^{\prime }}_{1}+{{X}^{\prime }}_{r}{{B}^{\prime }}_{2}\right)\right]\left({B}_{1}{X}_{s}+{B}_{2}{X}_{r}\right){\left({X}^{\prime }{V}^{-1}X\right)}^{-1}\\ =\left\{-{V}_{rs}{V}_{s}^{-1}{X}_{s}+{X}_{r}+\left[{X}_{r}-{V}_{rs}{V}_{s}^{-1}{X}_{s}\right]{\left({{X}^{\prime }}_{s}{V}_{s}^{-1}{X}_{s}\right)}^{-1}\left({{X}^{\prime }}_{s}{{B}^{\prime }}_{1}+{{X}^{\prime }}_{r}{{B}^{\prime }}_{2}\right)\\ \text{}×\left({B}_{1}{X}_{s}+{B}_{2}{X}_{r}\right)\right\}{\left({X}^{\prime }{V}^{-1}X\right)}^{-1}.\end{array}$

Finally, using (A.5), we get

$\begin{array}{l}{B}_{2}^{-1}{\left(I-{A}_{2}\right)}^{-1}\left({B}_{1}{X}_{s}+{B}_{2}{X}_{r}\right){\left({X}^{\prime }{V}^{-1}X\right)}^{-1}\\ =\left({X}_{r}-{V}_{rs}{V}_{s}^{-1}{X}_{s}\right)\left[I+{\left({{X}^{\prime }}_{s}{V}_{s}^{-1}{X}_{s}\right)}^{-1}\left({X}^{\prime }{V}^{-1}X-{{X}^{\prime }}_{s}{V}_{s}^{-1}{X}_{s}\right)\right]{\left({X}^{\prime }{V}^{-1}X\right)}^{-1}\\ =\left({X}_{r}-{V}_{rs}{V}_{s}^{-1}{X}_{s}\right){\left({{X}^{\prime }}_{s}{V}_{s}^{-1}{X}_{s}\right)}^{-1}.\end{array}$

4. Examples

In this section, we present two examples to illustrate the optimal predictors that are obtained in the theorems.

In the first one, we consider a coordinate free model and the predictor is derived applying Theorem 1. Second example shows an application of Theorem 2 in a particular coordinatized model.

Example 1. Our objective is to predict the population total $T=\underset{i=1}{\overset{N}{\sum }}{y}_{i}$ in the

model

$E\left(Y\right)=\mu$ and

$\text{Var}\text{ }\left(Y\right)=\frac{{\sigma }^{2}}{1-{\rho }^{2}}\left[\begin{array}{ccccc}1& \rho & {\rho }^{2}& \cdots & {\rho }^{N-1}\\ & 1& \rho & \cdots & {\rho }^{N-2}\\ & & 1& \cdots & {\rho }^{N-3}\\ & & & & ⋮\\ & & & & 1\end{array}\right],$

with $\rho$ a known parameter and ${\sigma }^{2}>0$ .

Because of the great quantity of calculations, without loss of generality, we restrict the attention to the situation where $N=4$ , $n=3$ , such that

$\begin{array}{l}Y=\left[\begin{array}{c}{y}_{1}\\ {y}_{2}\\ {y}_{3}\\ {y}_{4}\end{array}\right],\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{Y}_{s}=\left[\begin{array}{c}{y}_{1}\\ {y}_{2}\\ {y}_{3}\end{array}\right],\text{\hspace{0.17em}}\text{\hspace{0.17em}}\text{\hspace{0.17em}}{Y}_{r}=\left[{y}_{4}\right]\text{\hspace{0.17em}}\text{and}\\ V=\frac{1}{1-{\rho }^{2}}\left[\begin{array}{cccc}1& \rho & {\rho }^{2}& {\rho }^{3}\\ \rho & 1& \rho & {\rho }^{2}\\ {\rho }^{2}& \rho & 1& \rho \\ {\rho }^{3}& {\rho }^{2}& \rho & 1\end{array}\right],\text{\hspace{0.17em}}\text{\hspace{0.17em}}|\rho |\ne 1,\text{\hspace{0.17em}}\rho \text{ }\text{\hspace{0.17em}}\text{known}\text{ }.\end{array}$

In this case,

${V}^{-1}=\left[\begin{array}{cccc}-1& -\rho & 0& 0\\ -\rho & 1+{\rho }^{2}& -\rho & 0\\ 0& -\rho & 1+{\rho }^{2}& -\rho \\ 0& 0& -\rho & 1\end{array}\right]$ and ${P}^{-1}=\left[\begin{array}{cccc}\sqrt{1-{\rho }^{2}}& 0& 0& 0\\ -\rho & 1& 0& 0\\ 0& -\rho & 1& 0\\ 0& 0& -\rho & 1\end{array}\right].$

Since

$\Omega =\left\{v\in \text{I}\text{​}{\text{R}}^{4}|v={\left(\mu ,\mu ,\mu ,\mu \right)}^{\prime },\text{\hspace{0.17em}}\mu \in \text{IR}\right\},$

a base for $\Omega$ is given by $A={\left[\begin{array}{cccc}1& 1& 1& 1\end{array}\right]}^{\prime }$ .

Then, it is easy to see that

$\begin{array}{c}{P}_{\Omega }=A{\left({A}^{\prime }{V}^{-1}A\right)}^{-1}{A}^{\prime }{V}^{-1}\\ =\frac{1}{4-6\rho +2{\rho }^{2}}\left[\begin{array}{cccc}1-\rho & 1+{\rho }^{2}-2\rho & 1+{\rho }^{2}-2\rho & 1-\rho \\ 1-\rho & 1+{\rho }^{2}-2\rho & 1+{\rho }^{2}-2\rho & 1-\rho \\ 1-\rho & 1+{\rho }^{2}-2\rho & 1+{\rho }^{2}-2\rho & 1-\rho \\ 1-\rho & 1+{\rho }^{2}-2\rho & 1+{\rho }^{2}-2\rho & 1-\rho \end{array}\right].\end{array}$

Also

$\left(I-{I}_{s}\right){P}^{-1}Y=\left[\begin{array}{c}0\\ 0\\ 0\\ -\rho {y}_{3}+{y}_{4}\end{array}\right]$

and

$\left(I-{I}_{s}\right){P}^{-1}{P}_{\Omega }Y=\frac{1}{4-6\rho +2{\rho }^{2}}\left[\begin{array}{c}0\\ 0\\ 0\\ \left(1-2\rho +{\rho }^{2}\right)\left({y}_{1}+{y}_{4}\right)+\left(1-3\rho +3{\rho }^{2}-{\rho }^{3}\right)\left({y}_{2}+{y}_{3}\right)\end{array}\right].$

By Theorem 1, the optimal linear predictor of $T$ is $\stackrel{^}{T}=\underset{i=1}{\overset{3}{\sum }}{y}_{i}+{\stackrel{^}{y}}_{4}$ , where ${\stackrel{^}{y}}_{4}$

is the solution in ${y}_{4}$ of the equation

$\left(I-{I}_{s}\right){P}^{-1}Y=\left(I-{I}_{s}\right){P}^{-1}{P}_{\Omega }Y.$

After calculations, we get

${\stackrel{^}{y}}_{4}=\frac{\left(1-2\rho +{\rho }^{2}\right){y}_{1}+\left(1-3\rho +3{\rho }^{2}-{\rho }^{3}\right){y}_{2}+\left(1+\rho -3{\rho }^{2}+{\rho }^{3}\right){y}_{3}}{3-4\rho +{\rho }^{2}}$

and

$\stackrel{^}{T}={a}_{1}{y}_{1}+{a}_{2}{y}_{2}+{a}_{3}{y}_{3},$

where ${a}_{1}=\frac{4-6\rho +2{\rho }^{2}}{3-4\rho +{\rho }^{2}}$ , ${a}_{2}=\frac{4-7\rho +4{\rho }^{2}-{\rho }^{3}}{3-4\rho +{\rho }^{2}}$ and ${a}_{3}=\frac{4-3\rho -2{\rho }^{2}+{\rho }^{3}}{3-4\rho +{\rho }^{2}}$ .

It is interesting to note that, if $\rho =0$ , such that $V=I$ and ${y}_{i}$ and ${y}_{j}$ are uncorrelated, $i\ne j$ , then $\stackrel{^}{T}=4{\stackrel{¯}{y}}_{s}$ , where ${\stackrel{¯}{y}}_{s}$ is the sample mean. In this case, $\stackrel{^}{T}$ is the expansion predictor which was found by  under the model $E\left(Y\right)=\mu$ and $\text{Var}\text{ }\left(Y\right)={\sigma }^{2}I$ .

Example 2. Let us consider the superpopulation model

${y}_{i}=\beta {x}_{i}+{ϵ}_{i},\text{\hspace{0.17em}}\text{\hspace{0.17em}}i=1,2,\cdots ,N,$

with $E\left({ϵ}_{i}\right)=0$ , $\text{Var}\text{ }\left({ϵ}_{i}\right)=1$ , $\text{Cov}\text{ }\left({ϵ}_{i},{ϵ}_{j}\right)=\rho$ for $i\ne j$ , $i,j=1,2,\cdots ,N$ , and $\rho$ a known parameter, $|\rho |\ne 1$ .

Our objective is to calculate the best linear unbiased predictor of the popula-

tion total $T=\underset{i=1}{\overset{N}{\sum }}{y}_{i}$ .

In this situation, the model is coordinatized, and by Theorem 2, it is enough to obtain the value

${\stackrel{^}{Y}}_{r}={X}_{r}\stackrel{^}{\beta }+{V}_{rs}{V}_{s}^{-1}\left({Y}_{s}-{X}_{s}\stackrel{^}{\beta }\right).$

Let ${V}_{s}$ and ${V}_{rs}$ be written as

$\begin{array}{l}{V}_{s}=\left(1-\rho \right){I}_{n}+\rho {J}_{n}\\ {V}_{rs}=\rho {J}_{N-n,n}\text{\hspace{0.17em}},\end{array}$

where ${J}_{n}$ and ${J}_{N-n,n}$ are respectively the $n×n$ and $\left(N-n\right)×n$ matrix of ones.

Thus, it is easy to see that

$\stackrel{^}{\beta }=\frac{\underset{i=1}{\overset{n}{\sum }}{x}_{i}{y}_{i}-\frac{\rho }{1+\left(n-1\right)\rho }\underset{i=1}{\overset{n}{\sum }}{x}_{i}\underset{i=1}{\overset{n}{\sum }}{y}_{i}}{\underset{i=1}{\overset{n}{\sum }}{x}_{i}^{2}-\frac{\rho }{1+\left(n-1\right)\rho }{\left(\underset{i=1}{\overset{n}{\sum }}{x}_{i}\right)}^{2}}$

and

${\stackrel{^}{Y}}_{r}=\left[\begin{array}{c}{a}_{n+1}\stackrel{^}{\beta }+b\\ {a}_{n+2}\stackrel{^}{\beta }+b\\ ⋮\\ {a}_{N}\stackrel{^}{\beta }+b\end{array}\right],$

where

${a}_{n+j}=\left({x}_{n+j}-\frac{\rho }{1+\left(n-1\right)\rho }\underset{i=1}{\overset{n}{\sum }}{x}_{i}\right)\stackrel{^}{\beta },\text{\hspace{0.17em}}\text{\hspace{0.17em}}j=1,2,\cdots ,N-n,$

and

$b=\frac{\rho \underset{i=1}{\overset{n}{\sum }}{y}_{i}}{1+\left(n-1\right)\rho }.$

Appendix

First, we show that $\Gamma ={h}^{\prime }P{I}_{s}Z+{h}^{\prime }P{\stackrel{^}{\mu }}_{Z}$ defined in the proof of Theorem 1 does not depend on unknown quantities.

Since $P$ is a lower triangular matrix, ${P}^{-1}$ is lower triangular also, then

${P}^{-1}Y=\left[\begin{array}{cccc}{\delta }_{11}& 0& \cdots & 0\\ {\delta }_{21}& {\delta }_{22}& \cdots & 0\\ ⋮& & & \\ {\delta }_{N1}& {\delta }_{N2}& \cdots & {\delta }_{NN}\end{array}\right]Y=\left[\begin{array}{c}{\delta }_{11}{y}_{1}\\ {\delta }_{21}{y}_{1}+{\delta }_{22}{y}_{2}\\ ⋮\\ {\delta }_{n1}{y}_{1}+{\delta }_{n2}{y}_{2}+\cdots +{\delta }_{nn}{y}_{n}\\ {\delta }_{n+11}{y}_{1}+{\delta }_{n+12}{y}_{2}+\cdots +{\delta }_{n+1\text{ }n+1}{y}_{n+1}\\ ⋮\\ {\delta }_{N1}{y}_{1}+{\delta }_{N2}{y}_{2}+\cdots +{\delta }_{NN}{y}_{N}\end{array}\right]$

and

${I}_{s}Z=\left[\begin{array}{cc}{I}_{n}& 0\\ 0& 0\end{array}\right]{P}^{-1}Y=\left[\begin{array}{c}{\delta }_{11}{y}_{1}\\ {y}_{21}{y}_{1}+{\delta }_{22}{y}_{2}\\ ⋮\\ {\delta }_{n1}{y}_{1}+{\delta }_{n2}{y}_{2}+\cdots +{\delta }_{nn}{y}_{n}\\ 0\\ ⋮\\ 0\end{array}\right].$

So, it is shown that ${h}^{\prime }P{I}_{s}Z$ does not depend on unknown quantities. By the proof of Theorem 1, we can see that ${\stackrel{^}{Z}}_{r}={\left(I-{A}_{2}\right)}^{-1}{A}_{1}C{Y}_{s}$ and thus, ${h}^{\prime }P{\stackrel{^}{\mu }}_{Z}$ also does not depend on unknown quantities. Then $\Gamma$ is a predictor of ${h}^{\prime }Y$ .

Now we derive the results (A.1) through (A.6) which are necessary to prove Theorem 2.

Let ${P}^{-1}$ partitioned as in the proof of Theorem 1, ${P}^{-1}=\left[\begin{array}{cc}C& 0\\ {B}_{1}& {B}_{2}\end{array}\right]$ which

implies that

$P=\left[\begin{array}{cc}{C}^{-1}& 0\\ -{B}_{2}^{-1}{B}_{1}{C}^{-1}& {B}_{2}^{-1}\end{array}\right].$

Then using the equality $V=P{P}^{\prime }$ and after some algebraic manipulations, it follows that

${C}^{-1}{C}^{-{1}^{\prime }}={V}_{s},$

and so,

${C}^{\prime }C={V}_{s}^{-1}.$ (A.1)

Furthermore,

$-{C}^{-1}{C}^{-{1}^{\prime }}{{B}^{\prime }}_{1}{B}_{2}^{-1}=-{V}_{s}{{B}^{\prime }}_{1}{B}_{2}^{-{1}^{\prime }}={V}_{sr}$

and hence

$-{B}_{2}^{-1}{B}_{1}={V}_{rs}{V}_{s}^{-1}.$ (A.2)

In the coordinatized model with $\mu =X\beta$ and covariance matrix $V$ , it is well known  , that

${P}_{\Omega }=X{\left({X}^{\prime }{V}^{-1}X\right)}^{-1}{X}^{\prime }{V}^{-1}$

and thus

${P}_{{\Omega }^{*}}={P}^{-1}{P}_{\Omega }P={P}^{-1}X{\left({X}^{\prime }{V}^{-1}X\right)}^{-1}{X}^{\prime }{P}^{-{1}^{\prime }}.$

In the partitioned form, this matrix can be written as

$\begin{array}{c}{P}_{{\Omega }^{*}}=\left[\begin{array}{cc}{H}_{1}& {H}_{2}\\ {A}_{1}& {A}_{2}\end{array}\right]=\left[\begin{array}{cc}C& 0\\ {B}_{1}& {B}_{2}\end{array}\right]\left[\begin{array}{c}{X}_{s}\\ {X}_{r}\end{array}\right]{\left({X}^{\prime }{V}^{-1}X\right)}^{-1}\left[{{X}^{\prime }}_{s}\text{\hspace{0.17em}}{{X}^{\prime }}_{r}\right]\left[\begin{array}{cc}{C}^{\prime }& {{B}^{\prime }}_{1}\\ 0& {{B}^{\prime }}_{2}\end{array}\right]\\ =\left[\begin{array}{c}C{X}_{s}{\left({X}^{\prime }{V}^{-1}X\right)}^{-1}\\ \left({B}_{1}{X}_{s}+{B}_{2}{X}_{r}\right){\left({X}^{\prime }{V}^{-1}X\right)}^{-1}\end{array}\right]\left[{{X}^{\prime }}_{s}{C}^{\prime }\text{\hspace{0.17em}}\text{\hspace{0.17em}}{{X}^{\prime }}_{s}{{B}^{\prime }}_{1}+{{X}^{\prime }}_{r}{{B}^{\prime }}_{2}\right]\\ =\left[\begin{array}{cc}C{X}_{s}{\left({X}^{\prime }{V}^{-1}X\right)}^{-1}{{X}^{\prime }}_{s}{C}^{\prime }& C{X}_{s}{\left({X}^{\prime }{V}^{-1}X\right)}^{-1}\left({{X}^{\prime }}_{s}{{B}^{\prime }}_{1}+{{X}^{\prime }}_{r}{{B}^{\prime }}_{2}\right)\\ \left({B}_{1}{X}_{s}+{B}_{2}{X}_{r}\right){\left({X}^{\prime }{V}^{-1}X\right)}^{-1}{{X}^{\prime }}_{s}{C}^{\prime }& \left({B}_{1}{X}_{s}+{B}_{2}{X}_{r}\right){\left({X}^{\prime }{V}^{-1}X\right)}^{-1}\left({{X}^{\prime }}_{s}{{B}^{\prime }}_{1}+{{X}^{\prime }}_{r}{{B}^{\prime }}_{2}\right)\end{array}\right],\end{array}$

then

${A}_{1}=\left({B}_{1}{X}_{s}+{B}_{2}{X}_{r}\right){\left({X}^{\prime }{V}^{-1}X\right)}^{-1}{{X}^{\prime }}_{s}{C}^{\prime },$ (A.3)

and

${A}_{2}=\left({B}_{1}{X}_{s}+{B}_{2}{X}_{r}\right){\left({X}^{\prime }{V}^{-1}X\right)}^{-1}\left({{X}^{\prime }}_{s}{{B}^{\prime }}_{1}+{{X}^{\prime }}_{r}{{B}^{\prime }}_{2}\right).$ (A.4)

Using the fact that ${V}^{-1}={P}^{-{1}^{\prime }}{P}^{-1}$ , it follows that

${V}^{-1}=\left[\begin{array}{cc}{C}^{\prime }& {{B}^{\prime }}_{1}\\ 0& {{B}^{\prime }}_{2}\end{array}\right]\left[\begin{array}{cc}C& 0\\ {B}_{1}& {B}_{2}\end{array}\right]=\left[\begin{array}{cc}{C}^{\prime }C+{{B}^{\prime }}_{1}{B}_{1}& {{B}^{\prime }}_{1}{B}_{2}\\ {{B}^{\prime }}_{2}{B}_{1}& {{B}^{\prime }}_{2}{B}_{2}\end{array}\right]$

and

${X}^{\prime }{V}^{-1}X={{X}^{\prime }}_{s}{C}^{\prime }C{X}_{s}+{{X}^{\prime }}_{s}{{B}^{\prime }}_{1}{B}_{1}{X}_{s}+{{X}^{\prime }}_{r}{{B}^{\prime }}_{2}{B}_{1}{X}_{s}+{{X}^{\prime }}_{s}{{B}^{\prime }}_{1}{B}_{2}{X}_{r}+{{X}^{\prime }}_{r}{{B}^{\prime }}_{2}{B}_{2}{X}_{r}.$

Applying (A.1),

${X}^{\prime }{V}^{-1}X\text{​}=\text{​}{{X}^{\prime }}_{s}{V}_{s}^{-1}{X}_{s}\text{​}+\text{​}{{X}^{\prime }}_{s}{{B}^{\prime }}_{1}{B}_{1}{X}_{s}\text{​}+\text{​}{{X}^{\prime }}_{r}{{B}^{\prime }}_{2}{B}_{1}{X}_{s}\text{​}+\text{​}{{X}^{\prime }}_{s}{{B}^{\prime }}_{1}{B}_{2}{X}_{r}\text{​}+\text{​}{{X}^{\prime }}_{r}{{B}^{\prime }}_{2}{B}_{2}{X}_{r}.$ (A.5)

Application of a result of inverse matrix in conjunction with (A.4) and (A.5) yields

$\begin{array}{c}{\left(I-{A}_{2}\right)}^{-1}={\left[I-\left({B}_{1}{X}_{s}+{B}_{2}{X}_{r}\right){\left({X}^{\prime }{V}^{-1}X\right)}^{-1}\left({{X}^{\prime }}_{s}{{B}^{\prime }}_{1}+{{X}^{\prime }}_{r}{{B}^{\prime }}_{2}\right)\right]}^{-1}\\ =I-\left({B}_{1}{X}_{s}+{B}_{2}{X}_{r}\right)\left[\left({{X}^{\prime }}_{r}{{B}^{\prime }}_{2}+{X}_{s}{{B}^{\prime }}_{1}\right)\left({B}_{1}{X}_{s}\\ {\text{}+{B}_{2}{X}_{r}\right)-{X}^{\prime }{V}^{-1}X\right]}^{-1}\left({{X}^{\prime }}_{s}{{B}^{\prime }}_{1}+{{X}^{\prime }}_{r}{{B}^{\prime }}_{2}\right)\\ =I+\left({B}_{1}{X}_{s}+{B}_{2}{X}_{r}\right){\left({{X}^{\prime }}_{s}{V}_{s}^{-1}{X}_{s}\right)}^{-1}\left({{X}^{\prime }}_{s}{{B}^{\prime }}_{1}+{{X}^{\prime }}_{r}{{B}^{\prime }}_{2}\right).\end{array}$ (A.6)

Cite this paper: Elian, S. (2017) The Coordinate-Free Prediction in Finite Populations with Correlated Observations. Open Journal of Statistics, 7, 182-193. doi: 10.4236/ojs.2017.72014.
References

   Rodrigues, J. (1989) The Coordinate-Free Prediction in Finite Populations. Pakistan Journal of Statistics, 5, 119-129.

   Arnold, S.F. (1980) The Theory of Linear Models and Multivariate Analysis. John Wiley, New York.

   Drygas, H. (1970) The Coordinate-Free Approach to Gauss-Markov Estimation. Springer-Verlag, Berlin.
https://doi.org/10.1007/978-3-642-65148-9

   Royall, R.M. (1976) The Linear Least-Squares Prediction Approach to Two-Stage Sampling. Journal of the American Statistical Association, 71, 657-664.
https://doi.org/10.1080/01621459.1976.10481542

   Graybill, F.A. (1976) Theory and Application of the Linear Model. Duxbury Press, North Scituate, MA.

   Rao, C.R. (1973) Linear Statistical Inference and Its Applications. 2nd Edition, Wiley, New York.
https://doi.org/10.1002/9780470316436

Top