Improved Estimation of Rare Sensitive Attribute in a Stratified Sampling Using Poisson Distribution

Show more

Received 21 December 2015; accepted 20 February 2016; published 23 February 2016

1. Introduction

The collection of data through direct questioning on rare sensitive issues such as extramarital affairs, family disturbances and declaring religious affiliation in extremism condition is far-reaching issue. Warner [1] introduced the randomized response procedure to procure trustworthy data for estimating, the proportion of respondents in the population belonging to the sensitive group. Greenberg et al. [2] suggested an unrelated question randomized response model in which each individual selected in the samples was asked to reply “yes” or “no” to one of two statements: (a) Do you belong to Group A? (b) Do you belong to Group Y? with respective probabilities P and. Second question asked in the sampling does not have any effect on the first question. Greenberg et al. [2] considered and the proportion of persons possessing sensitive and unrelated characteristic respectively and discussed both the cases when was known and unknown. The probability of yes responses, defined by them is. Mangat and Singh [3] proposed a two stage randomized response procedure which required the use of two randomization devices. The random device consists of two statements namely (a) I belong to the sensitive group, and (b) Go to random device, with probabilities T and respectively. The random device which uses two statements (a) I belong to the sensitive group, and (b) I do not belong to the sensitive group with known probabilities P and respectively. Then, the probability of yes responses is.

Later on, different modifications have been made to improve the methodology for collection of information. Some of them are Lee et al. [4] , Chaudhuri and Mukerjee [5] , Mahmood et al. [6] , Land et al. [7] , Bhargava and Singh [8] .

Land et al. [7] proposed the estimators for the mean number of persons possessing the rare sensitive attribute using the unrelated question randomized response model by utilizing a Poisson distribution. Recently, Lee et al. [4] extended the Land et al.’s [7] study to stratify sampling and propose the estimators when the parameter of rare unrelated attribute is known and unknown.

In this study, we propose improved estimators for the mean and its variance of the number of persons possessing a rare sensitive attribute based on stratified sampling by using Poisson distribution. The estimators are proposed when the parameter of the rare unrelated attribute is known and unknown. The proposed estimators are evaluated using a relative efficiency comparing the variances of the estimators reported in Lee et al. [4] .

2. Improved Estimation of a Rare Sensitive Attribute in Stratified Sampling-Known Rare Unrelated Attributes

Consider the population of size N individuals which is divided into L subpopulations (strata) of sizes. All the subpopulations are disjoint and together comprise the whole population. In stratum h, respondent are selected by simple random sampling with replacement (SRSWR) and asked to use the pair of randomization devices and, each consisting of the two statements. The randomization device is constructed as:

(i) “I possessrare sensitive attribute A”

(ii) “Go to randomization device R_{h}_{2}”

with respective probabilities and.

The randomization device consists of two statements:

(i) “I possess rare sensitive attribute A”

(ii) “I possess rare unrelated attribute Y”

with probabilities and respectively.

By this randomized device, the probability of a yes response in stratum h is given by

, (1)

where and are the population proportions of individuals possessing rare sensitive and rare unrelated attributes in the stratum, respectively. Here is assumed to be known. Since A and Y are very rare attributes, is finite, assuming and.

Let be an random sample in stratum h from a Poisson distribution with parameter. Then the maximum likelihood estimator for the mean number of persons who have the rare sensitive attribute in stratum h, , is given by

, (2)

where is (known) mean of persons who have rare unrelated attribute in stratum h. The parameter, is the mean number of persons possessing rare sensitive attribute A, in a population of size N and its estimator is given by

, (3)

where.

The variance of the estimator in each stratum is given by

, (4)

where

.

Thus, the variance expression of the estimator may be derived as

. (5)

THEOREM 1. is an unbiased estimator of.

Proof. From (3), we have

THEOREM 2. The unbiased estimator for is given by

. (6)

Proof.

Now, we consider the proportional and optimal allocations of the total sample size n into different strata. The method of proportional allocation is used to define sample sizes in each stratum depending on each stratum size. Since the sample size in each stratum is defined as, the variance of the estimator, under proportional allocation of sample size is given by

. (7)

However, the optimal allocation is a technique to define sample size to minimize variance for a given cost or to minimize the cost for a specified variance. The is proportionate to the standard deviation, of the va-

riable. In stratified sampling, let cost function is defined as, where is the fixed cost and

is the cost for the each individual stratum. Within each stratum the cost is proportional to the size of sample, but the cost may vary from stratum to stratum. For fixed cost, using the Cauchy Schwarz inequality, the sample size to minimize is given by

(8)

So the minimum variance of the estimator for the specified cost C under the optimum allocation of sample size is given by

. (9)

3. Improved Estimation of a Rare Sensitive Attribute in Stratified Sampling-Unknown Rare Unrelated Attributes

In this section, the estimators for the mean number of rare sensitive attribute are proposed under the assumptions that the sizes of stratum are known; however, , the mean of the rare unrelated attribute is unknown. In this case each selected respondent from stratum h is asked to use the sequential pair of randomization devices. That in the h^{th} stratum, , respondents are asked to use the randomization devices and consisting of two statements. The device consists of two statements:

(i) “I possess a sensitive group A”

(ii) “Go to randomization device R_{h}_{2}”

The statements occur with respective probabilities and.

The two statements of the randomization device are:

(i) “I possess a sensitive attribute A”

(ii) “I possess unrelated attribute Y”

represented with respective probabilities and. After using the first pair of randomized devices, respondent is asked to use the same pair of devices and but with probabilities, and, , respectively.

The probabilities of the yes responses for the first and second use of pair of randomization devices are respectively given by

(10)

and

, (11)

where and are the respective population proportions of rare sensitive and rare unrelated attribute in the stratum h. As is large and, therefore. Now, obviously,. Let and () be the pair of responses from the ith respondent selected in h^{th} stratum. We have

(12)

(13)

(14)

Following the expression given in Equations (12) and (13), we have the sample means for both set of responses as

(15)

and

. (16)

By solving (15) and (16), we get estimators of and as

(17)

(18)

where

and.

(19)

Puttinng (12), (13) and (14) in (19) we get

, (20)

where

The stratified estimators of and are defined as

, and. (21)

THEOREM 3. is an unbiased estimator for.

Proof.

(22)

Putting the values of and in Equation (22), we get the result.

THEOREM 4. The variance of is given by

, (23)

where

Proof. Since, we have

(24)

On putting (20) in (24) we have the theorem.

Corollary 1: An unbiased estimator for the variance of rare sensitive attribute is given by

(25)

It can be proved easily.

THEOREM 5. is an unbiased estimator of.

Proof. From (18), we have

Corollary 2: An unbiased estimator for is given by

(26)

where

Now under proportional allocation of sample size, the variance of is given by

.

However, in optimum allocation, the sample size in stratum h is

and the variance of is given by

.

4. Relative Efficiency

Lee et al. [4] proposed variance of for rare sensitive attribute based on Poisson distribution when the rare unrelated attribute known and unknown respectively is:

, (27)

(28)

where

For comparison of the proposed estimator with, the relative efficiency is given by

.

Large samples are required to estimate the means of rare sensitive attribute. So we consider a large hypothetical population, in order to study the relative efficiency, setting with two strata having and. We choose values of the parameters, as and, and we let the value range from 0.3 to 0.7, and let that of range from 0.6 to 0.9 when the weights (and ) and (and) which is proportional allocation. Also, let () and ().

4.1. Relative Efficiency When Rare Unrelated Attribute Is Known

Let be the variance of the proposed estimator for the rare sensitive attribute when the parameter of rare unrelated attribute is known. The relative efficiency of proposed estimator with respect to estimator is defined as

. (29)

From Equation (29) it evident that the relative efficiency of proposed estimator is free from the sample size n. We set the design probabilities as and. In Table 1, the relative efficiencies are given with parameter values, as and, varies from 0.3 to 0.7, and from 0.6 to 0.9 having weights . It is evident that the proposed estimator has efficiency greater than 1 in all cases, and is always better than the estimator. A study of Figure 1 confirms this.

4.2. Relative Efficiency When Rare Unrelated Attribute Is Unknown

Let be the variance of the proposed estimator for the rare sensitive attribute when the parameter of rare unrelated attribute is unknown. The relative efficiency of proposed estimator with respect to estimator is defined as

Figure 1. Relative Efficiency (RE) of the proposed model with respect to Lee et al. [4] for W_{1} = 0.4 and P_{12} = 0.3 to 0.8.

. (30)

The relative efficiency of proposed estimator is free from the sample size n. For the analysis, the design probabilities are fixed as, , ,. Setting, with parameter values of, as and, , T_{12} = 0.2, 0.3, 0.4, 0.5 and . The relative efficiencies are given in Table 2 depict that the proposed

estimator outer perform than estimator having efficiency greater than 1 if we set the probabilities as

. However the relative efficiency starts decreasing as we take. A study of Figure 2 confirms this. Also, when increasesthe relative efficiency of proposed estimator increases.

Table 1. Relative efficiency of the proposed estimator with Lee et al. (2013).

Figure 2. Relative Efficiency (RE) of the proposed model with respect to Lee et al. [4] for indicated values.

Table 2. Relative efficiency of the proposed estimator with Lee et al. (2013), W_{1} = 0.4, and W_{1} = 0.5.

5. Conclusion

In this study, a two stage randomized response model is proposed with improved estimators for the mean and its variance of the number of persons possessing a rare sensitive attribute based on stratified sampling by using Poisson distribution. It is shown that our proposed method have better efficiencies than the existing randomized response model, when the parameter of rare unrelated attribute is known and in unknown case, depending on the probability of selecting a question. For future work, we can obtain more sensitive information from respondents by using stratified double sampling with the proposed model.

References

[1] Warner, S.L. (1965) Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias. Journal of Computational and Graphical Statistics, 60, 63-66.

http://dx.doi.org/10.1080/01621459.1965.10480775

[2] Greenberg, B.G., Abul-Ela, A.L.A., Simmons, W.R. and Horvitz, D.G. (1969) The Unrelated Question Randomized Response Model: Theoretical Framework. Journal of the American Statistical Association, 64, 520-539.

http://dx.doi.org/10.1080/01621459.1969.10500991

[3] Mangat, N.S. and Singh, R. (1990) On the Confidentiality Guaranteed under Randomized Response Sampling: A Comparison with Several New Techniques. Biometrical Journal, 40, 237-242.

[4] Lee, G.S., Uhm, D. and Kim, J.M. (2013) Estimation of a Rare Sensitive Attribute in a Stratified Sample Using Poisson Distribution. Statistics, 47, 685-709.

http://dx.doi.org/10.1080/02331888.2011.625503

[5] Chaudhuri, A. and Mukerjee, R. (1988) Randomized Response: Theory and Techniques. Marcel Dekker, New York.

[6] Mahmood, M., Singh, S. and Horn, S. (1998) On the Confidentiality Guaranteed under Randomized Response Sampling: A Comparison with Several New Techniques. Biometrical Journal, 40, 237-242.

http://dx.doi.org/10.1002/(SICI)1521-4036(199806)40:2<237::AID-BIMJ237>3.0.CO;2-N

[7] Land, M., Singh, S. and Sedory, S.A. (2012) Estimation of a Rare Sensitive Attribute Using Poisson Distribution. Statistics, 46, 351-360.

http://dx.doi.org/10.1080/02331888.2010.524300

[8] Bhargava, M. and Singh, R. (2000) A Modified Randomization Device for Warner’s Model. Statistica, 60, 315-321.