Islamabad

In this study, comparison has been made for different sampling designs, using the HIES data of North West Frontier Province (NWFP) for 2001-02 and 1998-99 collected from the Federal Bureau of Statistics, Statistical Division, Government of Pakistan, Islamabad. The performance of the estimators has also been considered using bootstrap and Jacknife. A two-stage stratified random sample design is adopted by HIES. In the first stage, enumeration blocks and villages are treated as the first stage Primary Sampling Units (PSU). The sample PSU’s are selected with probability proportional to size. Secondary Sampling Units (SSU) i.e., households are selected by systematic sampling with a random start. They have used a single study variable. We have compared the HIES technique with some other designs, which are: Stratified Simple Random Sampling. Stratified Systematic Sampling. Stratified Ranked Set Sampling. Stratified Two Phase Sampling. Ratio and Regression methods were applied with two study variables, which are: Income (y) and Household sizes (x). Jacknife and Bootstrap are used for variance replication. Simple Random Sampling with sample size (462 to 561) gave moderate variances both by Jacknife and Bootstrap. By applying Systematic Sampling, we received moderate variance with sample size (467). In Jacknife with Systematic Sampling, we obtained variance of regression estimator greater than that of ratio estimator for a sample size (467 to 631). At a sample size (952) variance of ratio estimator gets greater than that of regression estimator. The most efficient design comes out to be Ranked set sampling compared with other designs. The Ranked set sampling with jackknife and bootstrap, gives minimum variance even with the smallest sample size (467). Two Phase sampling gave poor performance. Multi-stage sampling applied by HIES gave large variances especially if used with a single study variable.


Introduction
The importance of household income and expenditure statistics for a country has been well recognized as it is needed to know the changes in the level of living, for guiding policy makers in framing socio-economic developmental policies and in initiating financial measures for improving economic conditions of the people.The availability of information at different points of time are helpful in evaluating the changes which occur, as a result of economic development in the consumption pattern, incidence of poverty, trend in the saving propensities and preferences of different groups of population.Moreover, the information on per capita income of the household sector may also be of use in evaluating the validity of the National Income estimates obtained through conventional methods.
In Pakistan, the report on household income and consumption data is given by Household Integrated Economic Survey (HIES) and Pakistan Integrated Household Survey (PIHS).The HIES has seen some major developments during the 1990s.The Household integrated Economic Survey (HIES) was conducted, with some breaks, since 1963.In 1990 HIES questionnaires were revised in order to address the requirements of a new system of national accounts.The four surveys of 1990-91, 1993-94 and 1996-97 followed the design of these new questionnaires.In 1998, the HIES data collection methods and questionnaires were changed to reflect the integration of the HIES with the Pakistan Integrated Household Survey (PIHS).
The national average household size was 6.8 members in 1998, but it differs between rural and urban areas and provinces.Household size in Sindh is slightly larger than of Punjab.But household in NWFP and Baluchistan have approximately one more member than households in Punjab and Sindh.From the results of the last three surveys, the numbers of earners to the household has tended to increase in both urban and rural areas.The household size in 2001-02 has reached to 6.96 as described in HIES 2001-02.
HIES applies multistage stratified random sample design for estimation.During this study, we have compared different sampling techniques for estimating HIES data.

Methodology
In this study, HIES data is collected from the Statistical Division, Islamabad, of two years.It was used to develop ideas for future for Income and household size.Income groups with respect to household sizes of Pakistan and its provinces in (1998-1999) and (2001-2002) were taken and estimated where x = Household size y = Income

Methods of Estimation
Occasions arise where the estimation of the population mean or total for a variable X is assisted by information on a subsidiary variable Y. Two ways to do this is by ratio or regression estimation

Ratio method -Stratified sample
The ratio estimator is most effective when the relationship between y and x is linear through the origin.
As discussed by Yates (1981), each stratum is treated separately, using the formula for a random sample and build up the population estimates by summation of the estimates of separate strata, with division by N for population means i.e., Here the 2 i q s are estimated separately for each stratum, using the value of the ratio appropriate to the stratum

Regression method -Stratified Sample
The linear regression estimator is more efficient than the ratio estimator except when the regression line of y on x passes through neighborhood of origin in which case the efficiency of these estimators is almost equal The procedure the same in every strata  Cochran (1977) has described that in stratified sampling the population of N units is first divided into subpopulations of N 1 , N 2 , ………..,N L units, respectively.These subpopulations are non overlapping, and together they comprise the whole of the population, so that The subpopulations are called strata.To obtain the full benefit from stratification, the values of the N h must be known.When the strata have been determined, a sample is drawn from each, the drawings being made independently in different strata.The sample sizes within the strata are denoted by n 1 , n 2 ,………., n L , respectively.
As the data was in stratified form so we have used stratification in every technique.

Simple Random Sampling
As discussed by Thompson (1992), simple random sampling, is a method of selecting n units out of the N, such that, every one of the N C n distinctly sample has an equal chance of being drawn.In practice, a simple random sample is drawn unit by unit either by means of a table of random numbers or by means a computer program that produces such a table .The above procedure is used independently in each stratum to get the final results In simple random sampling, The procedures of ratio and regression estimation are used after drawing the sample randomly.

Systematic Sampling
Sample obtained by randomly selecting one element from the first k elements in the sampling frame, and every k th element thereafter, is called a 1-in-k Systematic Sampling.For stratified systematic sampling, the same procedure is used for every stratum In systematic sampling, for k possible systematic samples The estimates are calculated using the above ratio and regression methods after drawing the sample systematically.

Ranked Set Sampling
In Ranked Set Sampling, from a population of N elements, a sample of n elements is drawn by simple random sampling.The drawing is repeated independently n times, so, we have n independent samples of size n each.Next, we rank each sample.Then, choose for the final sample, the element with smallest ranked from the first sample, the element with the second smallest ranked from the second sample, and so on.Kowalczyk (2004) 1(2.15) Then, the ratio and regression methods are applied.

Two Phase Sampling
A multi-phase sample collects basic information from a large sample of units and then, for a smaller sample, may be sub sample, collects more detailed information.The most common form of multi-phase sampling is two-phase sampling, but three or more phases are also possible.
Multi-phase sampling is useful when the frame lacks auxiliary information that could be used to stratify the population or to screen out part of the population.
Multi-phase sampling is beneficial when there is insufficient budget to collect information from the whole sample, or when doing so would create excessive burden on the respondent, or even when there are very different costs of collection for different questions on a survey.
We used stratification in both phases.In ordinary stratification, we can use population values, in two phase sampling, we must use their estimates obtained in the preliminary sample of size m.Thus the estimate of mean, by Kish (1965), is The variance of the regression estimate, as discussed by Mukhopadhyay (1998), is approximately given by, The variance of the ratio estimate is approximately given by

Two Stage Sampling
In multi-stage sampling a frame is required at each stage for the units that have been selected at that stage.Initially, a frame is taken by which first-stage units may be defined and selected.For the second-stage of selection, a frame is required by which second-stage units may be defined within the first-stage units which have been selected.
One of the advantages of multi-stage sampling is that second-stage frames are only required for selected first-stage units and so on.
According to Som (1973), the combined unbiased estimator of Y h from all the n h FSU's is the arithmetic mean

The Replication methods for Variance Estimation
For variance replication, we have used 1) Jacknife and 2) Bootstrap.

Jacknife
The jackknife is a method in statistics allowing one to judge the uncertainties of estimators derived from small samples, without assumptions about the underlying probability distributions.The method consists of forming new samples by omitting, in turn, one of the observations of the original sample.For each of the samples thus generated, the estimator under study can be calculated, and the probability distribution thus obtained will allow one to draw conclusions about the estimator's sensitivity to individual observations.The procedure given by Chaudhuri and Stenger (1992) is given as; In the Jacknife, we form n DELETE-1 sub-samples θ ˆ(i) by computing our estimators based on a sample that leaves the i th point out of dataset.
Jackknife estimate of standard error is where For stratified sampling, jackknife is applied independently in each stratum by omitting one observation, out of the dataset, as given by Lohr (1999).

Bootstrapping
The bootstrap is a method to determine the trustworthiness of a statistic, comparable to the standard deviation of a mean.The bootstrap is a generalization of this standard deviation.
It is a re-sampling procedure to assess the accuracy of an estimator and is in fact computing power as a substitute for theoretical analysis.Shao and Tu (1995) have given following bootstrap algorithm; We have pairs (y i , x i ), i = 1,2,…,N where y i 's are random and x i 's fixed.We call this regression experiment. (2.34) In case of stratified sampling designs, re-sampling is carried out independently in each stratum, Its main drawback is that it is too time consuming.

Results and Discussion
The data set of HIES collected from The Federal Bureau of Statistics, Islamabad was in SPSS format.From this data set, the data of NWFP province was extracted for further analysis.The data was in stratified form in 33 strata.
In MINITAB software a number of macros are developed for the calculations and results.
A sample size of approximately 25% of N was taken.

Ranked Set Sampling
It is the most efficient sampling technique for this data.For the sample size of 467, it gives minimum variances.It is also suitable with Jacknife and Bootstrap.

Two Phase Sampling
It gives large variances.With Bootstrap, for smaller sample sizes, with 1353 sample (HIES 1998-99) and sample of 544 (HIES 2001-02) gives the desired results.Jacknife performs poorly with stratified two phase sampling as the variances are extremely large.

Multi-Stage Sampling
It results in large variances, although, it is useful in the case of incomplete frame.
The results of the above designs are given in table 1 and table 2.

Results
Results of various designs are shown in table1 below: 12) Where X = Total of x for the population y x, = Means of x, y for the sample S = Summation over sample values n = No. of units in the sample N = Number of units in the population.Y = Total of y for the population.f = Sampling fraction g = has written in detail about ranked set sampling.The same procedure is used for every stratum Elements y 1(1:n)  , y 2(2:n) ,… , y n(n:n) constitute the ranked set sample.The mean of ranked set sample is denoted by RSS y .
hij y is the value of the study variable in the j th selected SSU (household or field) of the i th selected FSU (village) in the h th stratum, and

α
ˆand β ˆare the values of regression parameters estimated from the regression experiment.i is selected from e 1 , e 2 , …, e N using sampling with replacement with the help of normal distribution.Calculate C*, D* from the bootstrap sample (y* i , x i ) i = 1,2,…,N using μ and σ 2 calculated from the relation with B known.Repeat 3 rd step B times and calculate

Table : Sampling plan for a stratified two-stage desing with pps sampling at the first-stage and sys or srs at the second-stage. In the hth stratum (h= 1, 2,……,L)
Stratified Simple Random Sampling was applied 50 times and obtained mean of means, and mean of variances and ratios.Stratified simple random sampling is a good selection for this data.It is small variance with Jacknife.But with bootstrap having sample size 462, the results gave smaller variance for ratio estimates compared to regression estimates which is not possible.A slightly bigger sample i.e., 561, gives the desired answers.Stratified Systematic Sampling is bad choice for HIES 2001-02 data.It gives large variances for regression estimates in comparison with ratio estimates with Jacknife.Only for a very large sample sizes, as 952, gives the appropriate answers.With Bootstrap, sample of 631 gives the appropriate results.