Back
 OJS  Vol.10 No.6 , December 2020
A New Regression Type Estimator and Its Application in Survey Sampling
Abstract: In the present time, a large number of modified estimators have been proposed by authors to obtain efficiency. In this study, we suggested an alternative regression type estimator for estimating finite population means when there is either a positive or negative correlation between study variables and auxiliary variables. We obtained bias and mean square error equation of the proposed estimator ignoring the first-order approximation and found the theoretical conditions that make proposed estimator more efficient than simple random sampling mean estimator, product estimator and ratio estimator. In addition, these conditions are supported by a numerical example and it has been concluded that the proposed estimator performed better comparing with the usual simple random sampling mean estimator, ratio estimator and product estimator.

1. Introduction

The use of auxiliary information has increased widely in survey sampling to improve the precision of estimates. A lot of works have been done by many authors using auxiliary information. The usage of the transformed auxiliary variable in estimating finite population mean is presented in [1]. The use of two auxiliary variables in estimation has been introduced by [2]. A modified ratio estimator for the estimation of finite population mean when median of the auxiliary variable was available is shown in [3]. The modified estimator shows the significant role using of auxiliary information. The cum-dual ratio estimator was more efficient than other existing estimator proved by [4]. A new modified product estimator is better than other existing estimator when median of the auxiliary variable is known [5].

Regression estimator is important in estimation theory for its improved precision. It has been widely used in estimating population parameter. Regression estimator is more useful in many practical situations including business, economics, time series forecasting, agriculture etc. [6]. Before applying regression estimator, the regression parameter needs to be estimated. In this study, we proposed new regression type estimator where correlation coefficient is used instead of regression parameter. In many practical situations, population correlation coefficients are known [7].

The structure of the paper is as follows: Section 2 contains the concept of proposed estimator and its bias and mean square error. Section 3 shows the efficiency comparison of proposed estimator compared to that of ratio, product and simple random sampling mean estimator. Section 4 illustrates simulation study of the proposed estimator. Section 5 presents the real-life example of proposed estimator. Finally, section 6 provides the conclusion of our study.

2. Proposed Estimator

In this section we proposed our new estimator. For this purpose, we derived bias and mean square error of proposed estimator.

Consider a finite population U = ( u 1 , u 2 , , u N ) of size N units. Let y and x denote the study variable and auxiliary variable respectively. For estimating population mean Y ¯ , a sample of size n ( n < N ) is drawn using simple random sampling without replacement (SRSWOR) from the population U.

Where, the population mean is Y ¯ = 1 N y i .

Hansen, Hurwitz and Madow (1953) consider the linear regression estimator of the population mean Y ¯ as

y ¯ L R = y ¯ + β ^ ( X ¯ x ¯ ) .

where, β ^ = s x y s x 2 an estimate of regression coefficient is β = S x y S x 2 .

The new regression type estimator for estimating population mean Y ¯ as

y ¯ z = y ¯ + r ( X ¯ x ¯ ) (1)

where, r = s x y s x s y is an estimate of correlation coefficient ρ = S x y S x S y .

To calculate bias and mean square error of new regression type estimator, we consider first degree approximation.

Let us define [6]

e 0 = ( y ¯ Y ¯ 1 ) , e 2 = ( x ¯ X ¯ 1 ) , e 2 = ( s y 2 S y 2 1 ) , e 3 = ( s x 2 S x 2 1 ) , e 4 = ( s x y S x y 1 )

This implies that,

y ¯ = Y ¯ ( 1 + e 0 ) , x ¯ = X ¯ ( 1 + e 1 ) , s y = S y ( 1 + e 2 ) 1 2 , s x = S x ( 1 + e 3 ) 1 2 , s x y = ( 1 + e 4 ) .

E ( e 0 ) = E ( e 1 ) = E ( e 2 ) = E ( e 4 ) = 0.

E ( e 0 2 ) = ( 1 f n ) C y 2 , E ( e 1 2 ) = ( 1 f n ) C x 2 , E ( e 0 e 1 ) = ( 1 f n ) C x C y ρ x y .

E ( e 1 e 2 ) = ( 1 f n ) C x λ 21 , E ( e 1 e 3 ) = ( 1 f n ) C x λ 03 , E ( e 1 e 4 ) = ( 1 f n ) C x λ 12 ρ x y .

f = n N , C y 2 = S y 2 Y ¯ 2 , C x 2 = S x 2 X ¯ 2 , ρ = S x y S x S y .

S x 2 = 1 N 1 ( x i X ¯ ) 2 , S y 2 = 1 N 1 ( y i Y ¯ ) 2 , S x y = 1 N 1 ( y i Y ¯ ) ( x i X ¯ ) .

Theorem 1. The bias of the proposed estimator y ¯ z of population mean Y ¯ is given by

B i a s ( y ¯ z ) = 1 2 ( 1 f n ) S x ( ρ λ 21 + ρ λ 03 2 λ 12 ) .

Proof:

From Equation (1) we can write,

y ¯ z = y ¯ + r ( X ¯ x ¯ )

y ¯ z = Y ¯ ( 1 + e 0 ) + r { X ¯ X ¯ ( 1 + e 1 ) }

y ¯ z = Y ¯ + Y ¯ e 0 + r ( X ¯ X ¯ X ¯ e 1 )

y ¯ z Y ¯ = Y ¯ e 0 r X ¯ e 1

y ¯ z Y ¯ = Y ¯ e 0 s x y s x s y X ¯ e 1

The proposed estimator y ¯ z , in terms of e 0 , e 1 , e 2 , e 3 and e 4 can be easily written as

y ¯ z Y ¯ = Y ¯ e 0 S x y ( 1 + e 4 ) S x ( 1 + e 3 ) 1 2 S y ( 1 + e 2 ) 1 2 X ¯ e 1

y ¯ z Y ¯ = Y ¯ e 0 S x y S x S y X ¯ e 1 ( 1 + e 4 ) ( 1 + e 3 ) 1 2 ( 1 + e 2 ) 1 2 (2)

Using binomial expansion, we consider first order approximation and ignoring the higher order approximation. Hence, Equation (2) can be written as

y ¯ z Y ¯ = Y ¯ e 0 ρ X ¯ e 1 ( 1 + e 4 ) ( 1 e 2 2 + 3 8 e 2 2 ) ( 1 e 3 2 + 3 8 e 3 2 )

y ¯ z Y ¯ = Y ¯ e 0 ρ X ¯ ( e 1 + e 1 e 4 ) ( 1 e 2 2 + 3 8 e 2 2 ) ( 1 e 3 2 + 3 8 e 3 2 )

y ¯ z Y ¯ = Y ¯ e 0 ρ X ¯ ( e 1 e 1 e 2 2 e 1 e 3 2 + e 1 e 4 ) (3)

After simplification, Equation (3) can be expressed as

y ¯ z Y ¯ = Y ¯ e 0 ρ X ¯ e 1 + ρ X ¯ e 1 e 2 2 + ρ X ¯ e 1 e 3 2 ρ X ¯ e 1 e 4 (4)

Now, taking expectation of both sides of Equation (4) we get,

E ( y ¯ z Y ¯ ) = Y ¯ E ( e 0 ) ρ X ¯ E ( e 1 ) + 1 2 ρ X ¯ E ( e 1 e 2 ) + 1 2 ρ X ¯ E ( e 1 e 3 ) ρ X ¯ E ( e 1 e 4 )

E ( y ¯ z Y ¯ ) = 0 0 + 1 2 ρ X ¯ ( 1 f n ) C x λ 21 + 1 2 ρ X ¯ ( 1 f n ) C x λ 03 ρ X ¯ ( 1 f n ) C x λ 12 ρ

E ( y ¯ z Y ¯ ) = ( 1 f n ) ( 1 2 ρ X ¯ C x λ 21 + 1 2 ρ X ¯ C x λ 03 X ¯ C x λ 12 )

B i a s ( y ¯ z ) = ( 1 f n ) X ¯ C x ( 1 2 ρ λ 21 + 1 2 ρ λ 03 λ 12 ) .

Simplifying this, we get the bias of the estimator y ¯ z . Hence this is the theorem.

Theorem 2. The mean squared error of the proposed estimator y ¯ z , to the first order approximation is

M S E ( y ¯ z ) = ( 1 f n ) ( Y ¯ 2 C y 2 + ρ 2 X ¯ 2 C x 2 2 ρ 2 X ¯ Y ¯ C x C y ) .

Proof:

Squaring both sides of Equation (3) and then taking expectation

( y ¯ z Y ¯ ) 2 = { Y ¯ e 0 ρ X ¯ ( e 1 e 1 e 2 2 e 1 e 3 2 + e 1 e 4 ) } 2

( y ¯ z Y ¯ ) 2 = ( Y ¯ e 0 ) 2 + ρ 2 X ¯ 2 ( e 1 e 1 e 2 2 e 1 e 3 2 + e 1 e 4 ) 2 2 Y ¯ e 0 ρ X ¯ ( e 1 e 1 e 2 2 e 1 e 3 2 + e 1 e 4 )

( y ¯ z Y ¯ ) 2 = Y ¯ 2 e 0 2 + ρ 2 X ¯ 2 e 1 2 2 Y ¯ e 0 ρ X ¯ e 1

E ( y ¯ z Y ¯ ) 2 = Y ¯ 2 E ( e 0 2 ) + ρ 2 X ¯ 2 E ( e 1 2 ) 2 ρ Y ¯ X ¯ E ( e 0 e 1 )

By the definition of mean squared error (MSE), we have

M S E ( y ¯ z ) = E ( y ¯ z Y ¯ ) 2 = Y ¯ 2 E ( e 0 2 ) + ρ 2 X ¯ 2 E ( e 1 2 ) 2 ρ Y ¯ X ¯ E ( e 0 e 1 )

M S E ( y ¯ z ) = Y ¯ 2 ( 1 f n ) C y 2 + ρ 2 X ¯ 2 ( 1 f n ) C x 2 2 ρ X ¯ Y ¯ ( 1 f n ) ρ C x C y

M S E ( y ¯ z ) = ( 1 f n ) ( Y ¯ 2 C y 2 + ρ 2 X ¯ 2 C x 2 2 ρ 2 X ¯ Y ¯ C x C y ) .

Hence this is the theorem.

3. Efficiency Comparison

This section illustrates the conditions, for which the mean square error of newly proposed regression type estimator will be minimum compared to that of simple random sampling mean estimator, product estimator and ratio estimator.

Theorem 3. The proposed estimator y ¯ z is more efficient than the product estimator y ¯ p if

ρ < 0 and ( ρ X ¯ 2 C x 2 2 ρ X ¯ Y ¯ C x C y Y ¯ 2 C x 2 ρ 2 Y ¯ 2 C x C y ) > 0. or ρ > 0 and ( ρ X ¯ 2 C x 2 2 ρ X ¯ Y ¯ C x C y Y ¯ 2 C x 2 ρ 2 Y ¯ 2 C x C y ) < 0.

Proof:

The proposed estimator y ¯ z is more efficient than product estimator y ¯ p if

M S E ( y ¯ z ) < M S E ( y ¯ p )

( 1 f n ) ( Y ¯ 2 C y 2 + ρ 2 X ¯ 2 C x 2 2 ρ 2 X ¯ Y ¯ C x C y ) < ( 1 f n ) Y ¯ 2 ( C y 2 + C x 2 + 2 ρ C x C y )

Y ¯ 2 C y 2 + ρ 2 X ¯ 2 C x 2 2 ρ 2 X ¯ Y ¯ C x C y Y ¯ 2 C y 2 Y ¯ 2 C x 2 2 Y ¯ 2 ρ C x C y < 0

ρ 2 X ¯ 2 C x 2 2 ρ 2 X ¯ Y ¯ C x C y Y ¯ 2 C x 2 2 Y ¯ 2 ρ C x C y < 0

ρ ( ρ X ¯ 2 C x 2 2 ρ X ¯ Y ¯ C x C y Y ¯ 2 C x 2 ρ 2 Y ¯ 2 C x C y ) < 0

Now there are two cases:

Case 1. The inequality will be satisfied if

ρ < 0 and ( ρ X ¯ 2 C x 2 2 ρ X ¯ Y ¯ C x C y Y ¯ 2 C x 2 ρ 2 Y ¯ 2 C x C y ) > 0.

This condition holds in practice.

Case 2. The inequality will be satisfied if

ρ > 0 and ( ρ X ¯ 2 C x 2 2 ρ X ¯ Y ¯ C x C y Y ¯ 2 C x 2 ρ 2 Y ¯ 2 C x C y ) < 0.

This condition holds in practice.

Theorem 4. The estimator y ¯ z is more efficient than the ratio estimator y ¯ R if

ρ < 0 and ( ρ X ¯ 2 C x 2 2 ρ X ¯ Y ¯ C x C y Y ¯ 2 C x 2 ρ + 2 Y ¯ 2 C x C y ) > 0. or ρ > 0 and ( ρ X ¯ 2 C x 2 2 ρ X ¯ Y ¯ C x C y Y ¯ 2 C x 2 ρ + 2 Y ¯ 2 C x C y ) < 0.

Proof:

The proposed estimator y ¯ z is more efficient than ratio estimator y ¯ R if

M S E ( y ¯ z ) < M S E ( y ¯ R )

( 1 f n ) ( Y ¯ 2 C y 2 + ρ 2 X ¯ 2 C x 2 2 ρ 2 X ¯ Y ¯ C x C y ) < ( 1 f n ) Y ¯ 2 ( C y 2 + C x 2 2 ρ C x C y )

Y ¯ 2 C y 2 + ρ 2 X ¯ 2 C x 2 2 ρ 2 X ¯ Y ¯ C x C y Y ¯ 2 C y 2 Y ¯ 2 C x 2 + 2 ρ Y ¯ 2 C x C y < 0

ρ ( ρ X ¯ 2 C x 2 2 ρ X ¯ Y ¯ C x C y Y ¯ 2 C x 2 ρ + 2 Y ¯ 2 C x C y ) < 0

ρ 2 X ¯ 2 C x 2 2 ρ 2 X ¯ Y ¯ C x C y Y ¯ 2 C x 2 + 2 ρ Y ¯ 2 C x C y < 0

Now there are two cases:

Case 1. The inequality will be satisfied if

ρ < 0 and ( ρ X ¯ 2 C x 2 2 ρ X ¯ Y ¯ C x C y Y ¯ 2 C x 2 ρ + 2 Y ¯ 2 C x C y ) > 0.

This condition holds in practice.

Case 2. The inequality will be satisfied if

ρ > 0 and ( ρ X ¯ 2 C x 2 2 ρ X ¯ Y ¯ C x C y Y ¯ 2 C x 2 ρ + 2 Y ¯ 2 C x C y ) < 0.

This condition holds in practice.

Theorem 5. The estimator y ¯ z is more efficient than simple random sample mean estimator if

ρ 2 X ¯ 2 C x 2 2 ρ 2 X ¯ Y ¯ C x C y < 0.

Proof:

The proposed estimator y ¯ z is more efficient than simple random variance estimator y ¯ if

M S E ( y ¯ z ) M S E ( y ¯ ) < 0

( 1 f n ) ( Y ¯ 2 C y 2 + ρ 2 X ¯ 2 C x 2 2 ρ 2 X ¯ Y ¯ C x C y ) ( 1 f n ) Y ¯ 2 C y 2 < 0

Y ¯ 2 C y 2 + ρ 2 X ¯ 2 C x 2 2 ρ 2 X ¯ Y ¯ C x C y Y ¯ 2 C y 2 < 0

ρ 2 X ¯ 2 C x 2 2 ρ 2 X ¯ Y ¯ C x C y < 0.

This condition holds in practice.

Theorem 6. The estimator y ¯ z is more efficient than simple linear regression estimator y ¯ L R if

X ¯ 2 C x 2 2 X ¯ Y ¯ C x C y + S y 2 < 0.

Proof:

The proposed estimator y ¯ z is more efficient than simple linear regression estimator y ¯ L R if

M S E ( y ¯ z ) M S E ( y ¯ L R ) < 0

( 1 f n ) ( Y ¯ 2 C y 2 + ρ 2 X ¯ 2 C x 2 2 ρ 2 X ¯ Y ¯ C x C y ) ( 1 f n ) S y 2 ( 1 ρ 2 ) < 0

Y ¯ 2 C y 2 + ρ 2 X ¯ 2 C x 2 2 ρ 2 X ¯ Y ¯ C x C y S y 2 ( 1 ρ 2 ) < 0

Y ¯ 2 S y 2 Y ¯ 2 + ρ 2 X ¯ 2 C x 2 2 ρ 2 X ¯ Y ¯ C x C y S y 2 + ρ 2 S y 2 < 0

S y 2 + ρ 2 X ¯ 2 C x 2 2 ρ 2 X ¯ Y ¯ C x C y S y 2 + ρ 2 S y 2 < 0

ρ 2 X ¯ 2 C x 2 2 ρ 2 X ¯ Y ¯ C x C y + ρ 2 S y 2 < 0

X ¯ 2 C x 2 2 X ¯ Y ¯ C x C y + S y 2 < 0.

This condition does not hold in practice.

4. Simulation Study

In this section, we will discuss the simulation study for the proposed estimator. For this purpose, we have to generate data from standard normal distribution. Steps for generating data from normal distribution and estimating relative efficiency of different estimator with simple random sampling estimator are given below:

Step 1: Choose a random sample of size n.

Step 2: Generate normal random variable with specified mean and variance.

Step 3: Calculate mean square error and relative efficiency of considered estimator.

Step 4: Repeat step1 to step3 until desired result is obtained.

Table 1 provides the estimated values of Mean Square Error and Relative Efficiency of proposed estimator and different estimators for simulated data.

The above information delineates that proposed estimator is more efficient than other estimator for simulating data.

5. Real Life Example

To illustrate the applicability of the proposed estimators over other estimators through real data sets, we used data in [6]. The data used in this study to illustrate the application of proposed estimator is a database for the amount of real and non-real estate farm loans in different states during 1997. The result, description and necessary data statistics of the populations are given as follows:

y = Amount (in $000) of real estate farm loans in different states during 1997

x = Amount (in $000) of non-real estate farm loans in different states during 1997

N = 50 , x ¯ = 964.70 , y ¯ = 582.58 , n = 30 , s x 2 = 1145801 , s y 2 = 432636.7 , c x 2 = 1.2311 , c y 2 = 1.2747 , ρ = 0.1222 , f = 0.6.

Table 2 shows the estimated values of the bias, mean square error and relative efficiency of proposed estimator and different estimator for real data.

Figure 1 shows the bias and mean square error of the proposed estimator is less than the bias and mean square error of the existing random sample estimator, product and ratio estimators. To examine the efficiency of the proposed estimator y ¯ z over the estimator y ¯ , y ¯ R and y ¯ p , we calculated the percentage of relative efficiency of different estimator of Y ¯ with respect to usual estimator y ¯ provided in Table 1 and Table 2 and Figure 2. The proposed estimator y ¯ z performed better than other estimators.

Figure 1. Bias and mean square error of proposed and existing estimators.

Figure 2. Relative efficiency of proposed and existing estimators.

Table 1. Mean square error and relative efficiency comparison of proposed estimator and different estimator of Y ¯ with respect to the usual estimator y ¯ for simulated data.

Table 2. Bias, mean square error and relative efficiency comparison of proposed estimator and different estimator of Y ¯ with respect to the usual estimator y ¯ .

6. Conclusion

In this demonstration, we developed a new regression type estimator using the correlation coefficient Equation (1) and theoretically showed that the proposed estimator is more efficient than other usual ratio estimators, product estimator and simple random sampling mean estimator under some certain conditions. These theoretical conditions are also satisfied by the results of a numerical example.

Acknowledgements

The authors gratefully acknowledge the excellent research facility provided by Bangabandhu Sheikh Mujibur Rahman Science and Technology University. M. Zahid Hasan would like to specifically thank Ministry of Science and Technology Government of the People’s Republic of Bangladesh for research support through R & D.

Cite this paper: Hasan, M. , Sultana, M. , Fatema, K. , Hossain, M. and Hossain, M. (2020) A New Regression Type Estimator and Its Application in Survey Sampling. Open Journal of Statistics, 10, 1010-1019. doi: 10.4236/ojs.2020.106057.
References

[1]   Upadhyaya, L.N., Singh, H.P., Chatterjee, S. and Yadav, R. (2011) Improved Ratio and Product Exponential Type Estimators. Journal of Statistical Theory and Practice, 5, 285-302.
https://doi.org/10.1080/15598608.2011.10412029

[2]   Kadilar, C. and Cingi, H. (2005) A New Estimator Using Two Auxiliary Variables. Applied Mathematics and Computation, 162, 901-908.
https://doi.org/10.1016/j.amc.2003.12.130

[3]   Subramani, J. and Kumarapandiyan, G. (2013) New Modified Ratio Estimator for Estimation of Population Mean When Median of the Auxiliary Variable Is Known. Pakistan Journal of Statistics and Operation Research, 9, 137-145.
https://doi.org/10.18187/pjsor.v9i2.486

[4]   Adebola, F.B., Adegoke, N.A. and Sanusi, R.A. (2015) A Class of Regression Estimator with Cum-Dual Ratio Estimator as Intercept. International Journal of Statistics and Probability, 4, 42-50.

[5]   Hasan, M.Z., Hossian, M.A., Sultana, M., Fatema, K. and Hossain, M.M. (2019) A New Modified Product Estimator for Estimation of Population Mean When Median of the Auxiliary Variable Is Known. International Journal of Scientific Research in Mathematical and Statistical Sciences, 6, 108-113.

[6]   Sing, S. (2003) Advanced Sampling Theory with Applications. Springer Science & Business, Media Dordrecht.

[7]   Cochran, W.G. (1977) Sampling Techniques. 3rd Edition, John Wiley & Sons, Hoboken.

 
 
Top