On the Use of Randomization Device for Estimating the Proportion and Truthful Reporting of a Qualitative Sensitive Attribute

In this paper, a simple and obvious procedure is presented that allows to estimate the population proportion  possessing sensitive attribute using simple random sampling with replacement (SRSWR). In addition to T, the probability that a respondent truthfully states that he or she bears a sensitive character when experienced in a direct response survey. An efficiency comparison is carried out to investigate in the performance of the proposed method. It is found that the proposed strategy is more efficient than Warner’s (1965) as well as Huang’s (2004) randomized response techniques under some realistic conditions. Numerical illustrations and graphical representations are also given in support of the present study.


Introduction
A major source of bias in surveys of human populations results from the refusal of participants to cooperate and provide truthful responses, especially in cases where a question of sensitive nature is involved.To eliminate this source of bias, in estimating the proportion of a population possessing a characteristic of sensitive nature, Warner (1965) introduced a technique termed "randomized response".Other randomized response techniques were introduced by various other authors.These techniques either improves upon Warner's procedure provide alternative procedures, or consider more complicated situations, for example allow unequal probabilities of selection.One can mention the work Fox and Tracy (1986), Mangat and Singh (1990), Mangat (1994), Mahmood et al. (1998), Chua and Tsui (2000), Singh et al. (2000), Chang and Huang (2001), Huang (2004), Chang et al. (2004aChang et al. ( ,2004b)), Chaudhary (2011) and Singh and Tarray (2012).
In this paper we have developed an alternative to Huang's (2004) randomized response model.A brief discussion of Warner's (1965), Direct Response (DR) procedure and Huang's (2004) models is given in Section 2. Properties of the proposed procedures are given in Section 3. Efficiency comparison is worked out in Section 4 to investigate the performance of the suggested procedures.Numerical studies and graphical representations are worked out to demonstrate the superiority of the suggested model.

A brief review of randomized response models
In this section we present review of the Warner's (1965), Direct Response (DR) procedure and Huang (2004) models.

Warner's (1965) Models
The randomized response technique is a procedure for collecting the information on sensitive characteristics without exposing the identity of the respondent.It was first introduced by Warner (1965) as an alternative survey technique for socially undesirable or incriminating behavior questions such topics as drunk driving, tax evasion, illicit drug use, induced abortion, shop lifting, child abuse, family disturbances, cheating in exams, HIV/AIDS, and sexual behavior, etc.Instead of a DR procedure, a randomization device used to gather sample information consisting of two statements: (i) 'I am a member of group A' and (ii) 'I am not a member of group A' with probabilities P and (1-P) respectively.Following this device, the respondent selects a statement unobserved by the interviewer, and then simply gives a 'Yes' or 'No' answers in a random sample of n respondents.By the method of moments, Warner obtained an unbiased estimator of the population proportion  , possessing the sensitive attribute A. He considered the maximum likelihood estimator of  where P is the proportion of the sensitive character represented in the randomized response device and n / m ˆ  , the proportion of "Yes" answers obtained from the n respondents selected by simple random sampling with replacement.
The estimator  is unbiased with variance

Direct Response (DR) Procedure
Social stigma and fear of reprisals often lead respondents to give biased, misleading or even erroneous responses when approached with a direct response (DR) survey method.
Even for the reason of merely unwillingness to reveal secrets to strangers, many individuals attempt to avoid certain questions put to them by interviewers.Consider a dichotomous population in which every person belongs either to a sensitive group "A" or the non -sensitive complement "A c ".The problem of interest is to estimate the population proportion  of individuals who are members of "A".Let T be the probability that the respondents belonging to "A" report the truth.The respondents belonging to the non -sensitive group "A" have no reason to tell a lie.For a DR survey of size n, the interviewee is asked if he / she are a member of "A".then, we have a direct estimator , n where Xi = 1(0) if the ith interviewee responds "Yes(N0)" and T , see Chang and Huang (2001).
An interesting method for the estimation of  and T is given by Huang (2004), which improves on an earlier proposal by Chang and Huang (2001).In this procedure each respondent is initially required to declare if he is in group "A" or in group "A c ".If the respondent claims to belong to group "A c ", Warner's (1965) procedure is carried out.Huang's (2004) suggestion actually consists of a twostage method which couples the direct question procedure and Warner's (1965) procedure.The description of Huang (2004) model is as below.

Huang (2004) Model
In his procedure, a simple random sample of size n is drawn with replacement from a finite population.The sampled observation is required to reply to a direct query whether he / she bears "A" or not.When answering "No", the respondent is provided with a randomization device consisting of two statements (a) "I am a member of A, and (b) I am not a member of A, with probabilities P and (1-P) respectively.It is assumed that the respondents bearing to "A" give totally honest responses under the randomized response procedure, but with probability T following the usual direct response procedure.
The probability of a 'Yes' response in the direct response procedure is given by , T 1    and in the randomized response procedure by Huang (2004) suggested the following estimators of  and T respectively as where j  , the observed proportion of "Yes" answers, is the binomial random variable with parameters n and j  , j=1,2.Huang (2004) obtained the variance of H  as and the mean square error of the estimator H T ˆ, up to terms of order

The suggested Procedure
Let a simple random sample of size n is drawn with replacement from a finite population.The sampled respondent is required to reply to a direct query whether he / she bears sensitive group "A" or not.When answering "No", the respondent is provided with a randomization device consisting of three statements: (i) I belong to the stigmatizing group, (ii) Yes, (iii) No with known probabilities p, (1-P)w and (1-P) w respectively where ] 1 , 0 [ w  , see Singh et al. (1995).Since the respondents bearing "A" have no reason to tell a lie, it may reasonably be expected that they will be completely truthful in their answers, no matter whether a direct response or a randomized response procedure is adopted.It is assumed that the respondents belonging to sensitive group "A" give completely honest responses under the randomized response procedure, but the probability T following the conventional direct response procedure.
Under the suggested procedure, the probability of "Yes" response in the direct response procedure is given by T and the probability of "Yes" answer using randomization device The estimators for  and T are respectively given by where 1  and w  , the observed proportion of "Yes" answers, are the binomial random variable with parameters   Proof.The unbiasedness follows from . The variance of the estimator w  is given by Hence the theorem.
Theorem 2. The unbiased estimator of the variance Proof is simple so omitted.
To derive the MSE of w T ˆwe write


. Further, we define the following quantities: assuming that |e1| < 1 so that the function (1+ e2) -1 can be validly expanded as a power series.It can be easily checked that The estimation error of the estimator w T ˆcan be expressed as Proof.We have Thus the mean square error of the estimator w T ˆ up to terms of Hence the theorem.
Proof is straight forward so omitted.

Efficiency comparison through numerical illustration
To have tangible idea about the magnitude of the relative efficiency of the suggested procedure with respect to Huang's (2004) and direct response procedures.We have computed the percent relative efficiencies of the proposed estimators ) T , (  for the values of P= 0.6, 0.7, 0.8; T = 0.10, 0.15, 0.20, 0.25, 0.30, w = 0.10, 0.30, 0.50, 0.70, 0.90, and  = 0.1 (0.1) 0.5 and findings are displayed in Tables 1,2 and 3.
Tables 1 and 2   and w T ˆ are more efficient than H  and H T ˆ respectively.This fact can also be observed from Figures 1 and 2.
Tables 1 and 2  There is substantial gain in efficiency by using the proposed estimator w  over direct estimator D  for all values of ) w , T , , P (  considered here, See Figure 3. Finally we conclude that the proposed procedures are superior to Huang's (2004) procedure and hence the Chang and Huang's (2001) procedure and the usual direct procedure.

Conclusion
This paper illustrates an enrichment on the Huang's (2004) proposed randomized response model.We have suggested a new randomized response procedure with the help of a randomized response procedure discussed in Singh et al. (1995).We have proposed the estimator of  , the population proportion of a sensitive group and the estimator of T, the probability that the respondent belonging to the sensitive group tell the truth whenever questioning directly.The exact variance of the estimator of  has been obtained and compared with Huang's (2004) estimator and direct estimator.The mean squared error of the proposed estimator of T has been derived to the first degree of approximation and comparison has been made with Huang's (2004) estimator of T. It is found that the proposed randomized response model is more efficient than the one suggested by Huang's (2004) and the direct response procedure.We have also provided the unbiased estimator of the mean square error of the direct estimator with the help of the proposed randomized response procedure.Thus the proposed randomized response procedure is therefore recommended for use in survey sampling practice.

1 ,nTheorem 1 .
 and   w , n  .The main properties of the estimator w  are given in the following theorem.The estimator w  is unbiased with the variance given by

Theorem 4 .
The unbiased estimator of mean square error of the direct estimator D  is given by

Fig. 1 :
Fig. 1: The percent relative efficiency of the proposed estimator w  with respect to Huang's (2004) estimator

Table 2 : The percent relative efficiency of the proposed estimator
w T ˆ with