Some Improved Estimators of Population Mean using Two-Phase Sampling Scheme in the Presence of Non-Response

In the present paper, we have proposed some improved estimators of the population mean utilizing the information on two auxiliary variables with the use of two-phase sampling scheme under non-response. We have discussed the situation in which study and first auxiliary variables suffer from the non-response while the second auxiliary variable is free from the non-response. The expressions for the bias and mean square error of the proposed estimators have been derived up to the first order of approximation. We have compared the efficiency of the proposed estimators with that of the usual mean estimator and some well known existing estimators of the population mean. The theoretical results have also been illustrated through some empirical data.


Introduction
In some practical situations, the information on more than one auxiliary variable is easily available. The information on all such variables can be utilized to provide more précised estimates of the population parameters. Olkin(1958) was the first who proposed multivariate ratio estimator of the population mean utilizing the information on a number of auxiliary variables. It was further extended by Srivastava(1966) , Rao and Mudholkar(1967), Singh(1967) and Sahai(1980). Moreover, Chand(1975), Kiregyera(1980), Kadilar and Cingi(2005), Shukla and Thakur(2012), Kumar and Sharma(2020) and others have suggested a variety of estimators of the population mean using the information on two auxiliary variables.
It is of the great importance to deal with the problem of non-response in a sample survey. Non-response arises from the fact that the investigator fails to get the information from the expected units of the population. If the characteristics of the non-responding units are very similar to those of the responding units, then the non-response does not have an important role. But, practically, it is always seen that the responding units are not very similar to the non-responding units and hence the non-response plays an important role in such situations. The problem of non-response was first tackled by Hansen and Hurwitz(1946). They handled the problem by introducing a technique of sub-sampling of the non-respondents. Later, there are several authors who have dealt with the problem of non-response by proposing Some Improved Estimators of Population Mean using Two-Phase Sampling Scheme in the Presence of Non-Response various estimators of the population parameters. The authors such as Cochran(1977), Olkin(1986), Khare and Srivastava(1993), Khare and Srivastava (1995), Okafor and Lee(2000), Tabasum and Khan(2004), Tabasum and Khan(2006), Singh and Kumar(2007), Singh and Kumar(2009),Chaudhary and Smarandache(2014), Pal and Singh (2018), Khare and Sinha(2019) have discussed the problem of non-response in two-phase sampling scheme.
The problem of estimating the parameters in two-phase sampling would be more complex when the non-response is observed on both study and auxiliary variables. The problem of such situation was handled by Chaudhary and Kumar(2016). To get the improved estimators of the population mean under the present situation, we have utilized the information on two auxiliary variables. We have proposed the estimators under the assumption that the additional (second) auxiliary variable is free from the non-response. It is further assumed that the first auxiliary variable has a high degree of correlation with the study variable rather than the second auxiliary variable has with the study variable. In order to demonstrate the theoretical facts, we have considered two different sets of empirical data.

Sampling Strategy
Suppose a population comprises of N units (U 1 , U 2 , ..., U N ). Let Y be the study variable with population mean Y . Let X and Z be the two auxiliary variables with respective population means Xand Z . Let y i , x i and z i (i = 1, 2, ...N ) be the observations on the i th unit in the population for the variables Y , X and Z respectively. Let us assume that the non-response is observed on the study variable Y and auxiliary variable X whereas the auxiliary variable Z is free from the non-response. It is further assumed that the population mean Xof the auxiliary variable X is not known. In the present situation, we first estimate X adopting the idea of double (two-phase) sampling scheme and then use the estimate so obtained along with the information on another auxiliary variable Z to get the estimate of Y . Let a larger sample of n units be selected from N units using simple random sampling without replacement (SRSWOR) scheme at the first phase and then a smaller sub-sample of n units be selected from n (n < n ) units using SRSWOR scheme at the second phase. At the first phase, out of n units, n 1 units respond and n 2 units do not respond on the auxiliary variableX. Now, a sub-sample of h 2 units is selected from the n 2 units using SRSWOR scheme (h 2 = n 2 L , L > 1) and the information from all the h 2 units is collected [See Hansen and Hurwitz (1946)]. Thus, the estimators of X and Z at the first phase are respectively given as and where x n1 and x h2 are respectively the means based on n 1 responding units and h 2 non-responding units for the auxiliary variable X. The variances of the estimators x * and z are respectively given by where S 2 X and S 2 X2 are respectively the population mean squares of entire group and non-response group for the auxiliary variable X. S 2 Z is the population mean square of the auxiliary variable Z and W 2 represents the non-response rate in the population. Further, at the second phase, it is observed that there are n 1 responding units and n 2 non-responding units in the sample of n units for the study variable Y and auxiliary variableX. It is remembered that the auxiliary variable Z is free from the non-response. Adopting the Hansen and Hurwitz (1946) technique of sub-sampling of non-respondents, Some Improved Estimators of Population Mean using Two-Phase Sampling Scheme in the Presence of Non-Response we select a sub-sample of h 2 units from the n 2 non-responding units using SRSWOR scheme at the second phase (h 2 = n 2 /L, L > 1) and gather the information from all the h 2 units. Thus, the Hansen and Hurwitz (1946) estimators of Y and X at the second phase are respectively given by where y n1 and y h2 are the means based on n 1 responding units and h 2 non-responding units respectively for the study variable. x n1 and x h2 are respectively the means based on n 1 responding units and h 2 non-responding units for the auxiliary variable X.
The variances of the estimators y * and x * are respectively given as where S 2 Y and S 2 Y 2 are the population mean squares of entire group and non-response group respectively for the study variable.
Chaudhary and Kumar(2016) have proposed some ratio and regression-type estimators of the population mean Y using the information on the auxiliary variable X under the condition that the non-response is observed on both study and auxiliary variables and the population mean X of the auxiliary variable X is not known. The estimators are respectively given as . s * xy and s 2 * x are respectively the unbiased estimators of S XY and S 2 X , based on (n 1 + h 2 ) units. S XY = ρ XY S X S Y and ρ XY is the population correlation coefficient between Y and X.
The expressions for the mean square error (MSE) of the estimators T * 1ratio and T * 2reg up to the first order of approximation are respectively given as and, M SE where ρ XY 2 is the population correlation coefficient between Y and X for the non-response group. R = Y X.
Some Improved Estimators of Population Mean using Two-Phase Sampling Scheme in the Presence of Non-Response

Proposed Estimators
We now propose some improved ratio and regression-type estimators of the population mean Y using the information on one more auxiliary variable Z under the situation in which study variable Y and auxiliary variable X are suffering from the non-response while the auxiliary variable Z is free from the non-response. It is further assumed that the population mean X is not known. The proposed ratio and regression-type estimators under the given situation are respectively represented as is an estimator of the population regression coefficient . s * xz and s 2 * z are respectively the unbiased estimators of S XZ and S 2 Z , based on n 1 + h 2 units. S XZ = ρ XZ S X S Z and ρ XZ is the population correlation coefficient between X and Z.
To obtain the biases and mean square errors of the estimators T * * 1ratio and T * * 2reg , we adopt the theory of large sample approximations. Let us assumē i y i and ρ Y Z is the population correlation coefficient between Y and Z. Now, we express the equation (13) in terms of e 0 , e 1 , e 1 , e 2 and neglect the terms involving powers of e 0 , e 1 , e 1 , e 2 higher than two. Thus, we have Taking expectation on both the sides of equation (15), we get Thus, the expression for the bias of T * * 1ratio up to the first order of approximation is given by Squaring both the sides of the equation (15) and then taking expectation on neglecting the terms having powers of e 0 , e 1 , e 1 and e 2 greater than two, we get Thus, the expression for the MSE of T * * 1ratio up to the first order of approximation is given as Some Improved Estimators of Population Mean using Two-Phase Sampling Scheme in the Presence of Non-Response Now, on expressing the equation (14) in terms of e 0 , e 1 , e 1 , e 2 , e 2 , e 3 , e 4 , e 5 and then ignoring the items involving powers of e 0 , e 1 , e 1 , e 2 , e 2 , e 3 , e 4 , e 5 greater than two, we get Taking expectation on both the sides of equation (18), we get µ 03 n Z S 2 Z Therefore, the expression for the bias of T * * 2reg up to the first order of approximation is given as Squaring both the sides of the equation (18) and then taking expectation on neglecting the terms having powers of e 0 , e 1 , e 1 , e 2 , e 2 , e 3 , e 4 , e 5 higher than two, we get Some Improved Estimators of Population Mean using Two-Phase Sampling Scheme in the Presence of Non-Response Thus, the expression for the MSE of T * * 2reg up to the first order of approximation is given by

Efficiency Comparisons
In this section, we have tried to compare the efficiency of the proposed estimators with that of some existing estimators: Case (i): A comparison of the equations (7) and (17) reveals that Case (ii): From equations (11) and (17), we have Case (iii): A comparison of the equations (7) and (20) provides Case (iv): From equations (12) and (20), we get The cases (i), (ii), (iii) and (iv) provide the requisite conditions for which the proposed estimators T * * 1ratio and T * * 2reg would be preferred over the existing estimatorsȳ * , T * 1ratio and T * 2reg .

Empirical Study
In order to examine the behaviour of the proposed estimators, it is essential to illustrate the results, whatever obtained in the previous sections with some numerical data. Therefore, we have considered two different data sets with a view to support the theoretical results: Data Set-1 We have used the data regarding the population I considered by Anderson(1958) respectively considered as the auxiliary variables X andZ. The population details are given below:  Table 1 shows the variance/MSE of the estimatorsȳ * , T * 1ratio , T * * 1ratio , T * 2reg and T * * 2reg along with the percentage relative efficiency (PRE) of the estimatorsȳ * , T * 1ratio , T * * 1ratio , T * 2reg and T * * 2reg with respect to the usual mean estimatorȳ * .  Here, the data considered by Singh(1967) have been used to demonstrate the theoretical results. In this data set, number of females employed is considered to be the study variableY , whereas the number of females in service and number of educated females are considered to be the auxiliary variables Xand Z respectively. The characteristics of the population are given below:  Table 2 also shows the variance/MSE of the estimatorsȳ * , T * 1ratio , T * * 1ratio , T * 2reg and T * * 2reg along with the PRE of the estimatorsȳ * , T * 1ratio , T * * 1ratio , T * 2reg and T * * 2reg with respect to the usual mean estimatorȳ * .

Concluding Remarks
We have proposed some improved ratio and regression-type estimators of the population mean using the information on two auxiliary variables in double (two-phase) sampling scheme under non-response. A theoretical study on the properties of the proposed estimators has been presented in detail. A comparative study of the proposed estimators with the usual mean estimator and some other existing estimators of the population mean has also been presented.
To support the theoretical results, we have presented an empirical study by considering two different real data sets. From the Tables 1 and 2, it is revealed that the proposed estimators T * * 1ratio and T * * 2reg provide much better estimates as compared to the existing estimatorsȳ * , T * 1ratio and T * 2reg .
Some Improved Estimators of Population Mean using Two-Phase Sampling Scheme in the Presence of Non-Response