Robust Element-Wise Empirical Likelihood Estimation Method for Longitudinal Data

Show more

1. Introduction

Longitudinal data is a dataset obtained by repeatedly measuring multiple times for each individual over a period of time. The longitudinal data is equivalent to the combination of cross section and time series data, and is composed of a plurality of short time series. For a fixed time point, the observation data of different individuals is similar to the cross-sectional data; for fixed individuals, different time points observation data is similar to time series. Therefore, longitudinal data can make full use of the information inside the individual while distinguishing individual differences. In the fields of medicine and finance, the frequency of longitudinal data appears to be higher and higher, so the research on longitudinal data is of great significance.

Longitudinal data is a hot topic in statistical research in recent years. So far, significant progress has been made in the field of theoretical research. Liang et al. (1986) [1] extended generalized linear model research to longitudinal data, proposed a generalized estimating equation (GEE) method, introduced correlation matrices in estimating equations, and gave corresponding estimates of regression parameters and their variances. It is proved that the consistent estimation of regression coefficients can be obtained by using GEE method even if the work correlation matrix is misspecified (See Diggle et al. (2002) [2] for more details). However, the principle of the GEE developed from the generalized linear model is similar to the principle of the weighted least squares method, and is sensitive to outliers. In the longitudinal data, because of repeated measurements, there are abnormal values in individual measurements, which will lead to a series of abnormal values in samples. In order to reduce the interference of outliers, Fan et al. (2012) [3] introduced a generalized estimating equation method based on the bounded scoring function of Huber function to achieve robustness. For the definition of Huber function, (see Huber (1964) [4] ), Wang et al. (2013) [5] and Lv et al. (2015) [6] applied the bounded exponential score function to the generalized estimating equation. There are many articles (Qin et al. (2005) [7] ; Wang et al. (2005) [8] ; Qin et al. (2009) [9] ; Zheng et al. (2013) [10] ) about generalized estimation equation and robustness research.

In the field of longitudinal data research, empirical likelihood methods are also one of the frequently used methods. The empirical likelihood (EL) method was originally applied by Owen (1988) [11] to the estimation of the population mean of completely independent and identically distributed data. The method has the characteristics of asymmetric confidence intervals, transformation-preserving and better coverage probability. Azzalini (2017) [12] comprehensively introduced the application of empirical likelihood method in statistical inference. Qin and Lawless (1994) [13] first linked the empirical likelihood method with the estimation equation. They proved that the empirical likelihood estimation is effective when the moment conditions are correctly specified in the estimation equation. Bondell and Stefansk (2013) [14] proposed a robust estimator in linear regression which has relatively high efficiency compared to other robust estimators by using generalized EL methods. Bai et al. (2010) [15] introduced the EL method into longitudinal data research, and proposed a weighted empirical likelihood (WEL) inference method for generalized linear models of longitudinal data. Wang et al. (2010) [16] established two methods based on generalized empirical likelihood: elemental empirical likelihood and object empirical likelihood, where the element-wise empirical likelihood method can give slightly better coverage probabilities for small or medium samples.

Based on the existing research results of longitudinal data, this paper combines the bounded scoring function based on Huber function and the robust estimation equation of weight function based on covariate with elemental empirical likelihood method, and proposes an effective robust estimation method. The effectiveness of the proposed method is derived from the advantages of the empirical likelihood method, while robustness is obtained by means of robust estimation equations. We carried out the simulation and the results showed that whether the correlation matrix used is consistent with the real correlation matrix or not, the estimation method in this paper can reduce the impact of outliers on the estimation and improve the estimation efficiency.

The following content is divided into four subsections. In Section 1, we give the linear regression model of the longitudinal data and the estimation method used in this paper. The iterative algorithm of this paper is introduced in Section 2. Section 3 is the simulation experiment part and Section 4 is the summary and outlook.

2. Proposed Method

2.1. Models

Linear models are often used in longitudinal data research. Their structure is simple for analysis and the basis of many models. We will consider the following continuous response variable longitudinal regression model

${y}_{ij}={x}_{ij}^{\text{T}}\beta +{\epsilon}_{ij},\text{\hspace{0.17em}}\text{\hspace{0.17em}}i=1,\cdots ,n;\text{\hspace{0.17em}}\text{\hspace{0.17em}}j=1,\cdots ,{m}_{i}$ (2.1)

where ${y}_{ij}$ is the jth observation on the ith subject, ${x}_{ij}$ is a p-vector of covariance values and ${\beta}_{0}$ is a p-vector of unknown regression coefficients, ${x}_{ij}^{\text{T}}=\left({x}_{ij}^{\left(1\right)},{x}_{ij}^{\left(2\right)},\cdots ,{x}_{ij}^{\left(p\right)}\right)$ , ${\beta}^{\text{T}}=\left({\beta}^{\left(1\right)},{\beta}^{\left(2\right)},\cdots ,{\beta}^{\left(p\right)}\right)$ . n is the number of subjects participating in the study, ${m}_{i}$ is the number of repeated measurements for the ith subject, ${m}_{1}\mathrm{,}\cdots \mathrm{,}{m}_{n}$ are bounded positive integers. Denote the total sample size by $N={m}_{1}+\cdots +{m}_{n}$ . ${\epsilon}_{ij}$ is the random error term, satisfying $E\left({\epsilon}_{ij}\right)=0$ , $Var\left({\epsilon}_{ij}\right)={\sigma}^{2}$ . We set ${Y}_{i}^{\text{T}}=\left({y}_{i1},{y}_{i2},\cdots ,{y}_{i{m}_{i}}\right)$ , ${X}_{i}=\left({x}_{i1},{x}_{i2},\cdots ,{x}_{i{m}_{i}}\right)$ and ${\epsilon}_{i}^{\text{T}}=\left({\epsilon}_{i1},{\epsilon}_{i2},\cdots ,{\epsilon}_{i{m}_{i}}\right)$ . Then model (2.1) can be rewritten as

${Y}_{i}={X}_{i}^{\text{T}}\beta +{\epsilon}_{i}$ (2.2)

For the longitudinal data model, it is usually assumed that the variables between different individuals are independent of each other, and the different measurements of the same individual are related. The covariance matrix of the random vector ${\epsilon}_{i}$ is ${V}_{i}={\sigma}^{2}R\left(\rho \right)={A}_{i}^{\frac{1}{2}}R\left(\rho \right){A}_{i}^{\frac{1}{2}}$ , where ${V}_{i}$ is an ${m}_{i}\times {m}_{i}$ invertible matrix, $R\left(\rho \right)$ is a correlation matrix, $\rho $ is a correlation coefficient vector, ${A}_{i}^{\frac{1}{2}}=\sigma {I}_{{m}_{i}}$ . Exchangeable structure (Exch), work-independent structure (Ind) and first-order autoregressive structure (AR(1)) are common related structures in practice.

${R}_{Exch}=\left(\begin{array}{cccc}1& \rho & \cdots & \rho \\ \rho & 1& \cdots & \rho \\ \vdots & \vdots & \ddots & \vdots \\ \rho & \rho & \cdots & 1\end{array}\right)$ , ${R}_{lnd}=\left(\begin{array}{cccc}1& 0& \cdots & 0\\ 0& 1& \cdots & 0\\ \vdots & \vdots & \ddots & \vdots \\ 0& 0& \cdots & 1\end{array}\right)$ , ${R}_{AR\left(1\right)}=\left(\begin{array}{cccc}1& \rho & \cdots & {\rho}^{{m}_{i}-1}\\ \rho & 1& \cdots & {\rho}^{{m}_{i}-2}\\ \vdots & \vdots & \ddots & \vdots \\ {\rho}^{{m}_{i}-1}& {\rho}^{{m}_{i}-2}& \cdots & 1\end{array}\right)$ .

Let

${Z}_{i}\left(\beta \right)={\left\{{z}_{i1}\left(\beta \right),{z}_{i2}\left(\beta \right),\cdots ,{z}_{i{m}_{i}}\left(\beta \right)\right\}}^{\text{T}}={V}_{i}^{-1}\left({Y}_{i}-{X}_{i}^{\text{T}}\beta \right)$ (2.3)

2.2. Proposed Estimator

More generally, we can define an estimating equation

$\underset{i=1}{\overset{n}{\sum}}}\text{\hspace{0.05em}}{X}_{i}{Z}_{i}\left(\beta \right)={\displaystyle \underset{i=1}{\overset{n}{\sum}}}{\displaystyle \underset{j=1}{\overset{{m}_{i}}{\sum}}}\text{\hspace{0.05em}}{x}_{ij}{z}_{ij}\left(\beta \right)=0$ (2.4)

Such estimating equation is susceptible to the influence of outliers. Bounded scoring function of Huber function and weight function depending on covariates are introduced in formula (3)

${Z}_{i}^{R}\left(\beta \right)=\left\{{z}_{i1}^{R}\left(\beta \right),{z}_{i2}^{R}\left(\beta \right),\cdots ,{z}_{i{m}_{i}}^{R}\left(\beta \right)\right\}={\left({A}_{i}^{\frac{1}{2}}{R}_{i}\right)}^{-1}{W}_{i}{\psi}_{c}\left({A}_{i}^{-\frac{1}{2}}\left({Y}_{i}-{X}_{i}^{\text{T}}\beta \right)\right)$ (2.5)

Consequently, a robust estimation equation is obtained

$\underset{i=1}{\overset{n}{\sum}}}\text{\hspace{0.05em}}{X}_{i}{Z}_{i}^{R}\left(\beta \right)={\displaystyle \underset{i=1}{\overset{n}{\sum}}}{\displaystyle \underset{j=1}{\overset{{m}_{i}}{\sum}}}\text{\hspace{0.05em}}{x}_{ij}{z}_{ij}^{R}\left(\beta \right)=0$ (2.6)

where ${\psi}_{c}\left(x\right)=\mathrm{min}\left(c\mathrm{,}\mathrm{max}\left(-c\mathrm{,}x\right)\right)$ is the bounded scoring function, it is used to limit the influence on outliers in response. Because it is applied to the standardized residuals, the value of c are generally between 1 and 2. ${W}_{i}=diag\left({w}_{i1},{w}_{i2},\cdots ,{w}_{i{m}_{i}}\right)$ is the weight function. There are many ways to select the weight function, similar to the reference [14] , we consider a function of the Mahalanobis distance

${w}_{ij}=\mathrm{min}\left\{1,{\left(\frac{{b}_{0}}{{\left({x}_{ij}-{m}_{x}\right)}^{\text{T}}{S}_{x}^{-1}\left({x}_{ij}-{m}_{x}\right)}\right)}^{\frac{r}{2}}\right\}$ (2.7)

where $r\ge 1$ , ${b}_{0}$ is the 0.95 quantile of the chi-square distribution with the degree of freedom equal to the dimension of ${x}_{ij}$ , ${m}_{x}$ and ${S}_{x}$ are some robust estimates of location and scatter of ${x}_{ij}$ . If ${x}_{ij}$ deviates from the whole, ${\left({x}_{ij}-{m}_{x}\right)}^{\text{T}}{S}_{x}^{-1}\left({x}_{ij}-{m}_{x}\right)$ will be larger and

${\left(\frac{{b}_{0}}{{\left({x}_{ij}-{m}_{x}\right)}^{\text{T}}{S}_{x}^{-1}\left({x}_{ij}-{m}_{x}\right)}\right)}^{\frac{r}{2}}$

will be smaller. Then the corresponding weight ${w}_{ij}$ is less than 1. On the contrary, the corresponding weight ${w}_{ij}$ is 1. Therefore, the influence of the outliers on the estimation can be controlled by the Mahalanobis distance function. For data without outlying points, we can set ${\psi}_{c}\left(x\right)=x$ and ${w}_{ij}=1$ in the robust estimating Equation (2.6), and this will lead to the classical nonrobust estimating equation.

Empirical likelihood method is a non-parametric statistical method, which has many good properties. The empirical likelihood and the estimated equation were first associated by Qin and Lawless (1994) [13] . Wang et al. (2010) [16] proposed two methods for combining empirical likelihood methods with estimation equations. Subject-wise empirical likelihood is assigning a probability ${p}_{i}$ to each subject ${y}_{i}$ . Li et al. (2018) [17] introduced a robust estimation equation based on the exponential score function and prove that a better estimate can be obtained with outliers. Element-wise empirical likelihood is assigning a probability mass ${p}_{ij}$ to each observation ${y}_{ij}$ . This paper combines the element-wise empirical likelihood method with the robust estimation Equation (2.6) to obtain the following empirical likelihood ratio function

${L}_{1}\left(\beta \right)=\mathrm{sup}\left\{{\displaystyle \underset{i=1}{\overset{n}{\prod}}}{\displaystyle \underset{j=1}{\overset{{m}_{i}}{\prod}}}N{p}_{ij}|{p}_{ij}\ge 0,{\displaystyle \underset{i=1}{\overset{n}{\sum}}}{\displaystyle \underset{j=1}{\overset{{m}_{i}}{\sum}}}\text{\hspace{0.05em}}{p}_{ij}=1,{\displaystyle \underset{i=1}{\overset{n}{\sum}}}{\displaystyle \underset{j=1}{\overset{{m}_{i}}{\sum}}}\text{\hspace{0.05em}}{p}_{ij}{x}_{ij}{z}_{ij}^{R}\left(\beta \right)=0\right\}$ (2.8)

Similar to Owen (1988) [11] classic empirical likelihood, by using the Lagrangian multiplier method, we obtain that ${L}_{1}\left(\beta \right)$ is maximized at

${p}_{ij}=\frac{1}{N}{\left(1+{\lambda}^{\prime}{x}_{ij}{z}_{ij}^{R}\left(\beta \right)\right)}^{-1},i=1,\cdots ,n;j=1,\cdots ,{m}_{i}$ (2.9)

where the vector $\lambda ={\left({\lambda}_{1},{\lambda}_{2},\cdots ,{\lambda}_{p}\right)}^{\prime}$ satisfies $\frac{1}{N}{\displaystyle {\sum}_{i=1}^{n}}{\displaystyle {\sum}_{j=1}^{{m}_{i}}}\frac{{x}_{ij}{z}_{ij}^{R}\left(\beta \right)}{1+{\lambda}^{\prime}{x}_{ij}{z}_{ij}^{R}\left(\beta \right)}=0$ . From this we can get

$-2\mathrm{log}{L}_{1}\left(\beta \right)=2{\displaystyle \underset{i=1}{\overset{n}{\sum}}}{\displaystyle \underset{j=1}{\overset{{m}_{i}}{\sum}}}\mathrm{log}\left(1+{\lambda}^{\prime}{x}_{ij}{z}_{ij}^{R}\left(\beta \right)\right)$ (2.10)

The estimate proposed in this paper is the maximum point of Equation (2.8), which is equivalently the minimum point of Equation (2.10). Recorded as

${\stackrel{^}{\beta}}_{\text{RELGEE}}=\mathrm{arg}\mathrm{max}{L}_{1}\left(\beta \right)=\mathrm{arg}\mathrm{min}\left(-2\mathrm{log}{L}_{1}\left(\beta \right)\right)$ (2.11)

${\stackrel{^}{\beta}}_{\text{RELGEE}}$ is the RELEGE estimate of $\beta $ . Under some regular conditions, the estimate can be proved to be consistent and subject to an asymptotic normal distribution. $-2\mathrm{log}{L}_{1}\left(\beta \right)$ subject to an asymptotic linear combination of independent chi-Square distribution. The proof process is similar to Wang et al. (2010) [16] .

The parameter of interest in this paper is $\beta $ . But the parameters $\sigma $ and $\rho $ are unknown. Reference to He et al. (2005) [18] , we can use a median of absolute deviations to give $\sigma $ a robust estimate

$\stackrel{^}{\sigma}=\left\{1.483\text{median}\left\{\left|\left({y}_{ij}-{x}_{ij}^{\text{T}}{\beta}^{*}\right)-\text{median}\left({y}_{ij}-{x}_{ij}^{\text{T}}{\beta}^{*}\right)\right|\right\}\right\}$ (2.12)

where ${\beta}^{\mathrm{*}}$ is the current estimate of $\beta $ .

According to the above description of the correlation matrix, the specific robust estimation of the parameter $\rho $ depends on the selection of the relevant structure. Simply consider, when ${m}_{i}=m$ , for Exchange-able structure (Exch)

$\stackrel{^}{\rho}=\frac{{\displaystyle {\sum}_{i=1}^{n}{\displaystyle {\sum}_{j>{j}^{\prime}}{\stackrel{^}{r}}_{ij}^{*}{\stackrel{^}{r}}_{i{j}^{\prime}}^{*}}}}{0.5nm\left(m-1\right)-p}$ (2.13)

where ${\stackrel{^}{r}}_{ij}^{*}={\psi}_{c}\left(\frac{{y}_{ij}-{x}_{ij}^{\text{T}}{\beta}^{*}}{\stackrel{^}{\sigma}}\right)$ . For first-order autoregressive structure (AR(1)), We use robust moment estimators

$\stackrel{^}{\rho}=\frac{a}{b}-\sqrt{\frac{a}{b}-1}$ (2.14)

where $a={\displaystyle {\sum}_{i=1}^{n}}\text{\hspace{0.05em}}{r}_{i}^{*}{}^{\prime}{T}_{1}{r}_{i}^{*}$ , $b={\displaystyle {\sum}_{i=1}^{n}}\text{\hspace{0.05em}}\text{\hspace{0.05em}}{r}_{i}^{*}{}^{\prime}{T}_{2}{r}_{i}^{*}$ , ${r}_{i}^{\mathrm{*}}={\left({r}_{i1}^{\mathrm{*}}\mathrm{,}{r}_{i2}^{\mathrm{*}}\mathrm{,}\cdots \mathrm{,}{r}_{im}^{\mathrm{*}}\right)}^{\text{T}}$ , ${T}_{1}$ is m-order diagonal matrix $diag\left(\mathrm{1,1,1,1,}\cdots ,\mathrm{1,1}\right)$ and ${T}_{2}$ is m-order diagonal matrix $diag\left(\mathrm{0,1,1,1,}\cdots ,\mathrm{1,0}\right)$ .

3. Algorithm

Since the maximal estimation of computational empirical likelihood will encounter numerical calculation problems, when solving ${\stackrel{^}{\beta}}_{\text{RELGEE}}$ , we refer to the Newton-type algorithm of Lagrange multiplier for constrained optimization problems proposed by Özdemir (2018) [19] . In order to make the calculation simple and without loss of generality, we pull ${p}_{ij}$ , ${y}_{ij}$ , ${x}_{ij}$ and ${z}_{ij}^{R}\left(\beta \right)$ into a column vector. Like $\left({p}_{1},{p}_{2},\cdots ,{p}_{N}\right)=\left({p}_{11},\cdots ,{p}_{1{m}_{1}},\cdots ,{p}_{n1},\cdots ,{p}_{n{m}_{n}}\right)$ , $\left({y}_{1}\mathrm{,}{y}_{2}\mathrm{,}\cdots \mathrm{,}{y}_{N}\right)$ , $\left({z}_{1}^{R}\left(\beta \right)\mathrm{,}{z}_{2}^{R}\left(\beta \right)\mathrm{,}\cdots \mathrm{,}{z}_{N}^{R}\left(\beta \right)\right)$ and $\left({x}_{1}\mathrm{,}{x}_{2}\mathrm{,}\cdots \mathrm{,}{x}_{N}\right)$ are the same. We get

${L}_{2}\left(\beta \right)=\mathrm{sup}\left\{{\displaystyle \underset{i=1}{\overset{N}{\prod}}}{p}_{i}|{\displaystyle \underset{i=1}{\overset{N}{\sum}}}\text{\hspace{0.05em}}{p}_{i}=1,{p}_{i}>0,{\displaystyle \underset{i=1}{\overset{N}{\sum}}}\text{\hspace{0.05em}}{p}_{i}{x}_{i}{z}_{i}^{R}\left(\beta \right)\right\}$ (3.1)

This problem can also be defined as follows

$J\left(p,\beta \right)=\underset{{p}_{i}\in \left(0,1\right)}{\mathrm{min}}\left[-{\displaystyle \underset{i=1}{\overset{N}{\sum}}}\mathrm{ln}{p}_{i}\right]$ (3.2)

under the constraints ${\sum}_{i=1}^{N}}\text{\hspace{0.05em}}{p}_{i}=1$ and ${\sum}_{i=1}^{N}}\text{\hspace{0.05em}}{p}_{i}{x}_{i}{z}_{i}^{R}\left(\beta \right)=0$ .

Lagrangian multiplier method is commonly used to find the extreme value of a general constrained function. We can get

$J\left(p,\beta ,{\lambda}_{0},{\lambda}_{1}^{\text{T}}\right)=-{\displaystyle \underset{i=1}{\overset{N}{\sum}}}\mathrm{ln}{p}_{i}+{\lambda}_{0}\left({\displaystyle \underset{i=1}{\overset{N}{\sum}}}\text{\hspace{0.05em}}{p}_{i}-1\right)-{\lambda}_{1}^{\text{T}}{\displaystyle \underset{i=1}{\overset{N}{\sum}}}\text{\hspace{0.05em}}{p}_{i}{x}_{i}{z}_{i}^{R}\left(\beta \right)$ (3.3)

where ${\lambda}_{0}\in {R}^{1}$ and ${\lambda}_{1}^{\text{T}}\in {R}^{p}$ are Lagrange multipliers

$A=\left[\begin{array}{c}p\\ \beta \end{array}\right],\text{\hspace{1em}}\Lambda =\left[\begin{array}{l}{\lambda}_{0}\hfill \\ {\lambda}_{1}^{\text{T}}\hfill \end{array}\right]$ (3.4)

The first order gradient of Equation (3.3) is

$\nabla J\left(A,\Lambda \right)=\left[\begin{array}{c}\nabla R\left(A\right)+{\left(DG\left(A\right)\right)}^{\text{T}}\Lambda \\ G{\left(A\right)}^{\text{T}}\end{array}\right]$ (3.5)

where

$\nabla J\left(A\right)={\left[-\frac{1}{{p}_{1}}-\frac{1}{{p}_{2}}\cdots -\frac{1}{{p}_{N}}\text{\hspace{1em}}{0}_{1\times p}\right]}^{\text{T}}$ , $G\left(A\right)=\left[{\displaystyle {\sum}_{i=1}^{N}}\text{\hspace{0.05em}}{p}_{i}-1,{\displaystyle {\sum}_{i=1}^{N}}\text{\hspace{0.05em}}{p}_{i}{x}_{i}{z}_{i}^{R}\left(\beta \right)\right]$

$G\left(A\right)$ Jacobi matrix about A can be written

$DG\left(A\right)=\left[\begin{array}{ccccc}1& 1& \cdots & 1& {0}_{l\times p}\\ {x}_{1}{z}_{1}^{R}\left(\beta \right)& {x}_{2}{z}_{2}^{R}\left(\beta \right)& \cdots & {x}_{N}{z}_{N}^{R}\left(\beta \right)& -{\left[{\displaystyle {\sum}_{i=1}^{N}{p}_{i}{x}_{i}{\varphi}_{i}}\right]}^{\text{T}}\end{array}\right]$ (3.6)

where ${\varphi}_{i}=\frac{\partial {z}_{i}^{R}\left(\beta \right)}{\partial \beta}$ .

So the Hessian matrix of Equation (3.3) is

$HL\left(A,\Lambda \right)=\left[\begin{array}{cc}B\left(A,\Lambda \right)& DG{\left(A\right)}^{\text{T}}\\ DG\left(A\right)& {0}_{p\times p}\end{array}\right]$ (3.7)

where $B\left(A,\Lambda \right)=\left[\begin{array}{ccccc}\frac{1}{{p}_{1}^{2}}& 0& \cdots & 0& -{\lambda}_{1}^{\text{T}}{x}_{1}{\varphi}_{1}\\ 0& \frac{1}{{p}_{2}^{2}}& \cdots & 0& -{\lambda}_{1}^{\text{T}}{x}_{2}{\varphi}_{2}\\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0& 0& \cdots & \frac{1}{{p}_{N}^{2}}& -{\lambda}_{1}^{\text{T}}{x}_{N}{\varphi}_{N}\\ -{\left[{\lambda}_{1}^{\text{T}}{x}_{1}{\varphi}_{1}\right]}^{\text{T}}& -{\left[{\lambda}_{1}^{\text{T}}{x}_{2}{\varphi}_{2}\right]}^{\text{T}}& \cdots & -{\left[{\lambda}_{1}^{\text{T}}{x}_{N}{\varphi}_{N}\right]}^{\text{T}}& {0}_{p\times p}\end{array}\right]$ , it’s a $\left(N+k\right)\times \left(N+k\right)$ matrix.

By Newton iteration, we can get

$HL\left({A}_{t},{\Lambda}_{t}\right)\left[\begin{array}{l}{A}_{t+1}-{A}_{t}\hfill \\ {\Lambda}_{t+1}-{\Lambda}_{t}\hfill \end{array}\right]+\nabla L\left({A}_{t},{\Lambda}_{t}\right)=0$ (3.8)

where the value of ${A}_{t},{\Lambda}_{t}$ represents the result of iteration $A,\Lambda $ at the tth time, the value of ${A}_{t+1},{\Lambda}_{t+1}$ is calculated by ${A}_{t},{\Lambda}_{t}$ .

We can get the iterative expression of ${A}_{t},{\Lambda}_{t}$

$\left[\begin{array}{l}{A}_{t+1}\hfill \\ {\Lambda}_{t+1}\hfill \end{array}\right]=\left[\begin{array}{l}{A}_{t}\hfill \\ {\Lambda}_{t}\hfill \end{array}\right]-HL{\left({A}_{t},{\Lambda}_{t}\right)}^{-1}\nabla L\left({A}_{t},{\Lambda}_{t}\right)$ (9)

Summarize the algorithm for estimating the parameter ${\stackrel{^}{\beta}}_{\text{RELGEE}}$ as follows:

Step 1. Set the initial value of $p\mathrm{,}\beta \mathrm{,}{\lambda}_{0}\mathrm{,}{\lambda}_{1}$ and the threshold $\xi $ , where the initial value of $\beta $ can be set to the estimated value obtained by the least squares method to speed up the convergence.

Step 2. Calculate $HL\left({A}_{t},{\Lambda}_{t}\right)$ and $\nabla L\left({A}_{t},{\Lambda}_{t}\right)$ at the tth time.

Step 3. Calculate ${A}_{t+1},{\Lambda}_{t+1}$ according to Iterative Formula (3.9).

Step 4. Let $t\leftarrow t+1$ , repeat Step 2 and Step 3 until $\Vert {\beta}_{t+1}-{\beta}_{t}\Vert <\xi $ is satisfied and obtain a convergent ${\stackrel{^}{\beta}}_{\text{RELGEE}}$ . At this point, the solution of the robust empirical likelihood estimate is completed.

4. Simulation Study

In this section, we present a simulation study. The estimators obtained by the RELGEE method proposed in this paper are compared with the estimators obtained by the common element empirical likelihood method (ELGEE). The finite sample properties of the estimators are explored. The main research contents are as follows:

1) Estimated relative efficiency when there is no pollution in the data;

2) Estimated robustness when the data is contaminated;

3) The effect on the estimation efficiency when the work correlation matrix is correctly or incorrectly specified.

The model is set to

${y}_{ij}={x}_{ij}^{\text{T}}\beta +{\epsilon}_{ij},i=1,2,\cdots ,n;j=1,2,3$ (4.1)

where the number of repeated observations is 3, ${x}_{ij}^{\text{T}}=\left({x}_{ij}^{\left(1\right)},{x}_{ij}^{\left(2\right)},{x}_{ij}^{\left(3\right)}\right)$ , ${x}_{ij}~\left({\left(\mathrm{0,0,0}\right)}^{\text{T}}\mathrm{,}{I}_{3}\right)$ , ${\beta}^{\text{T}}=\left(\mathrm{7,1.7,}-1.5\right)$ and

${\left({\epsilon}_{i1},{\epsilon}_{i2},{\epsilon}_{i3}\right)}^{\text{T}}~N\left({\left(0,0,0\right)}^{\text{T}},R\left(\rho \right)\right)$ .

Considering sample size $n=50$ and $n=100$ . Let $R\left(\rho \right)$ take exchangeable structure (Exch) and first-order autoregressive structure (AR(1)), where the parameter $\rho $ is taken as 0.3 and 0.7 respectively. Because of the different values of parameter $\rho $ and the different settings of real correlation matrix and work correlation matrix, We repeat the simulation 1000 times for different settings to calculate MSE (×100) that represents 100 times the mean square error of the sample under different conditions. Since there are three parameters, we find the average of the mean square error of the three parameters.

In order to study the problem (1), we compared the mean squared error of the estimating method (RELGEE) and the ordinary element empirical likelihood method (ELGEE) in the case of no pollution. The simulation results are shown in Table 1.

When processing non-polluting data, due to the robust processing of the longitudinal data to some extent, resulting in the loss of part of the information, the efficiency of robust estimation is usually lower than the non-stable estimate when there is no pollution. Table 1 shows that the mean square error of RELGEE estimator is only slightly larger than that of ELGEE estimator, which shows that this method is efficient even in the case of no pollution.

In order to explore questions (2) and (3), we have designed three ways of pollution:

Table 1. Comparison of Two Estimators under Non-pollution Conditions.

WR: Woring R, TR: True R.

(C1). Pollution of the response variable Y: randomly turn S% of ${y}_{ij}$ into ${y}_{ij}+b$ , $b~N\left(\mathrm{2,1}\right)$ .

(C2). Pollution only for covariate X: randomly turn S% of ${x}_{ij}$ into ${x}_{ij}+a$ , $a~N\left({\left(1,1,1\right)}^{\text{T}},{I}_{3}\right)$ .

(C3). Simultaneous contamination of X and Y: randomly turn S%/3 of ${y}_{ij}$ into ${y}_{ij}+b$ , $b~N\left(\mathrm{2,1}\right)$ , randomly turn S%/3 of ${x}_{ij}$ into ${x}_{ij}+a$ , $a~N\left({\left(1,1,1\right)}^{\text{T}},{I}_{3}\right)$ .

Where S% is pollution rate. In this paper, 0.06 and 0.1 are selected. Some simulation results are shown in Tables 2-4.

Table 2 is the simulation results under C1. By comparison, in most cases, the

Table 2. Comparison of Two Estimators under C1.

YP: Pollution probability of the response variable Y.

Table 3. Comparison of Two Estimators under C2.

XP: Pollution probability of covariate X.

Table 4. Comparison of Two Estimators under C3.

YXP: Pollution probability of the response variable Y when Pollution probability of covariate X is 0.06.

mean square error of the estimator in this paper is smaller than that of the estimator in ELGEE method. Comparing different pollution intensity, we can see that the greater the pollution intensity, the more obvious the superiority of robust estimation efficiency, which shows that the estimation method in this paper has a strong robustness. It can also be seen that when the working matrix is set incorrectly, the difference of estimator is relatively small; when the working correlation matrix is a real matrix, the estimation efficiency is the highest; when the working matrix is independent structure (Ind), the estimation efficiency is the lowest without considering the correlation between data.

Table 3 and Table 4 are part of the results under C2 and pollution mode C3. Similar to Table 2, in most cases, the mean square error of this estimator is smaller than that of ELGEE estimator. Compared with Table 2, the mean square error of ELGEE estimator is only one-tenth of the mean square error of ELGEE estimator when the pollution intensity is significantly increased. The estimation method in this paper can significantly reduce the impact of outliers on the estimator. Similarly, the estimation efficiency is the highest when the working correlation matrix is a real matrix. When there is intra-group correlation in the model, the estimation efficiency is the lowest when the correlation is neglected in the estimation, which reflects the necessity of considering the longitudinal data model.

It is worth adding that because the method in this paper is a non-parametric method, the distribution of random errors does not necessarily follow the normal distribution. We simulate the distribution of random errors satisfying t(3) again. The simulation results are shown in Table 5. Bias (×100) represents 100 times the deviation.

In Table 5, in all cases, the mean square error of the estimator in this paper is less than that of ELGEE estimator, which shows that the estimation method constructed in this paper can also effectively and robustly estimate the data of heavy-tailed distribution.

5. Summary

We introduce the generalized estimating equations commonly used in longitudinal data, and derive robust estimation functions. Then we combine the robust estimation equations with the elemental empirical likelihood method to obtain the empirical likelihood ratio function of the estimated parameters. We show a

Table 5. The error distribution of data satisfies t(3).

relatively optimized algorithm that can improve the efficiency and computational time of operation. We do a systematic simulation study. The simulation results show that our method maintains high estimation efficiency when the data is not polluted; in the case of data pollution, the estimator of this paper is obviously better than the non-robust estimator. With the increase of pollution intensity, the robustness of our method is more significant, and it has a significant resistance to outliers. When the working matrix is set incorrectly, the difference of estimator is relatively small; when the working correlation matrix is a real matrix, the estimation efficiency is the highest; when the working matrix is independent structure (Ind), the estimation efficiency is the lowest without considering the correlation between data. At the same time, since the estimator in this paper is based on empirical likelihood method, it is suitable for the longitudinal data of thick-tailed distribution. There are still many problems worth further study in this paper, such as the application of estimation methods to partial linear models and variable selection based on robust estimation.

References

[1] Liang, K.Y. and Zeger, S.L. (1986) Longitudinal Data Analysis Using Generalized Linear Models. Biometrika, 73, 13-22.

https://doi.org/10.1093/biomet/73.1.13

[2] Diggle, P., Diggle, P.J., Heagerty, P., et al. (2002) Analysis of Longitudinal Data. Oxford University Press, Oxford.

[3] Fan, Y.L, Qin, G.Y. and Zhu, Z.Y. (2012) Variable Selection in Robust Regression Models for Longitudinal Data. Journal of Multivariate Analysis, 109, 156-167.

https://doi.org/10.1016/j.jmva.2012.03.007

[4] Huber, P.J. (1964) Robust Estimation of a Location Parameter. The Annals of Mathematical Statistics, 35, 73-101.

https://doi.org/10.1214/aoms/1177703732

[5] Wang, X., Jiang, Y., Huang, M. and Zhang, H.P. (2013) Robust Variable Selection with Exponential Squared Loss. Journal of the American Statistical Association, 108, 632-643.

https://doi.org/10.1080/01621459.2013.766613

[6] Lv, J., Yang, H. and Guo, C. (2015) An Efficient and Robust Variable Selection Method for Longitudinal Generalized Linear Models. Computational Statistics Data Analysis, 82, 74-88.

https://doi.org/10.1016/j.csda.2014.08.006

[7] Qin, G.Y., Zhu, Z.Y. and Fung, W.K. (2009) Robust Estimation of Covariance Parameters in Partial Linear Model for Longitudinal Data. Journal of Statistical Planning and Inference, 139, 558-570.

https://doi.org/10.1016/j.jspi.2008.03.042

[8] Wang, Y.G., Lin, X. and Zhu, M. (2005) Robust Estimating Functions and Bias Correction for Longitudinal Data Analysis. Biometrics, 61, 684-691.

https://doi.org/10.1111/j.1541-0420.2005.00354.x

[9] Qin, G.Y. and Zhu, Z.Y. (2007) Robust Estimation in Generalized Semiparametric Mixed Models for Longitudinal Data. Journal of Multivariate Analysis, 98, 1658-1683.

https://doi.org/10.1016/j.jmva.2007.01.006

[10] Zheng, X., Fung, W.K. and Zhu, Z. (2013) Robust Estimation in Joint Mean? Covariance Regression Model for Longitudinal Data. Annals of the Institute of Statistical Mathematics, 65, 617-638.

https://doi.org/10.1007/s10463-012-0383-8

[11] Owen, A.B. (1988) Empirical Likelihood Ratio Confidence Intervals for a Single Functional. Biometrika, 75, 237-249.

https://doi.org/10.1093/biomet/75.2.237

[12] Azzalini, A. (2017) Statistical Inference Based on the Likelihood. Routledge, Abingdon-on-Thames.

[13] Qin, J. and Lawless, J. (1994) Empirical Likelihood and General Estimating Equations. The Annals of Statistics, 100, 300-325.

https://doi.org/10.1214/aos/1176325370

[14] Bondell, H.D. and Stefansk, L.A. (1994) Empirical Robust Regression via Two-Stage Generalized Empirical Likelihood. Journal of the American Statistical Association, 108, 644-655.

https://doi.org/10.1080/01621459.2013.779847

[15] Bai, Y., Fung, W.K. and Zhu, Z. (2010) Weighted Empirical Likelihood for Generalized Linear Models with Longitudinal Data. Journal of Statistical Planning and Inference, 140, 3446-3456.

https://doi.org/10.1016/j.jspi.2010.05.007

[16] Wang, S., Qian, L. and Carroll, R.J. (2010) Generalized Empirical Likelihood Methods for Analyzing Longitudinal Data. Biometrika, 97, 79-93.

https://doi.org/10.1093/biomet/asp073

[17] Li, S.M., Ren, Y.Y. and Lu, G. (2018) A Robust Estimation and Empirical Likelihood Inference for Longitudinal Data Models. Journal of Applied Statistics and Management, 4, 631-638.

[18] He, X., Fung, W.K. and Zhu, Z. (2005) Robust Estimation in Generalized Partial Linear Models for Clustered Data. Journal of the American Statistical Association, 100, 1176-1184.

https://doi.org/10.1198/016214505000000277

[19] Ozdemir, S. and Arslan, O. (2015) An Alternative Algorithm of the Empirical Likelihood Estimation for the Parameter of a Linear Regression Model. Communications in Statistics-Simulation and Computation, 42, 1-9.

https://doi.org/10.1080/03610918.2018.1435801