Stratified Two-Phase Ranked Set Sampling

We propose an alternative two-phase stratified ranked set sampling. A comparison of the performances of the proposed estimators made by simulation studies using both real and simulated data sets. It is found that the proposed two-phase stratified regression estimator beats its competitors in literature. AMS Subject Classification: 62D05


Introduction
Ranked set sampling (RSS) introduced by McIntyre (1952), was used to estimate the mean pasture and forage yield. The RSS is employed when precise measurement of the variable of interest is difficult or expensive, but one can easily rank the variable without measuring the variable by an inexpensive method such as visual perception, judgment and auxiliary information. For example, in the problem of estimating the mean height of trees in a forest, one can rank the heights of a small sample of two or three trees standing nearby easily by visual inspection without measuring them. In estimating the number of bacterial cells per unit volume, we can rearrange two or three test tubes easily in order of concentration using optical instruments without measuring exact values. In the RSS, instead of selecting a single sample of size m , we select m -sets of samples each of size m . In each set, we rank all the elements but we only measure one of them. Finally, the average of the m -measured units is taken as an estimate of the population mean. Takahashi and Wakimoto (1968) and Takahasi (1970) provided the theoretical justification for using the ranked set sampling. They proved that when the ranking is perfect, the sample mean of the RSS is an unbiased estimator of the population mean and the variance of the RSS mean is smaller than that of the sample mean of the simple random sampling with Pak.j.stat.oper.res. Vol.XV No.IV 2019 pp867-879 868 replacement (SRSWR) of the same size. Dell and Clutter (1972) proved that the sample mean based on the RSS is unbiased for the population mean regardless of the ranking error and it is at least as precise as the SRSWR sample mean of the same size. Stokes (1977) considered the performance of the Dell and Clutter estimator when the regression of the study variable (y) and the ranking variable (x) is linear, and y and x follow certain model. Yu and Lam (1997) proposed regression estimator when x and y follow a bivariate normal distribution and found on the basis of simulation studies that their proposed regression estimator performs better than the naive estimator, unless the correlation between x and y is low (|ρ|< 0.4). Kadilar et al., (2006) and Arnab and Olaomi (2015) proposed an improved estimator of mean y  , the population mean of the study variable y using the ranking variable as an auxiliary variable x when the population mean x  of x is unknown. Zamanzade and Al-Omari (2016) developed a new ranked set sampling for estimating the population mean and variance, called neoteric ranked set sampling (NRSS) under perfect and imperfect ranking conditions while Mahdizadeh and Zamanzade (2018) introduced stratified pair ranked set sampling (SPRSS) and utilized it in estimating the population mean, with some theoretical results. In this paper, we propose two alternative estimators for two-phase sampling where in the first phase; information only on the ranking variable x is collected. Based on the observed x -values, the population is divided into a number of homogeneous strata. From each of the stratum so formed, one selects ranked set samples independently using proportional allocation. The performances of the proposed estimators are compared by simulation studies using both real tree data collected by Platt et al. (1988) and generated bivariate normal data. We found that the proposed two-phase stratified regression estimator performs better in respect of relative bias (RB) and mean-square error (MSE) than those of naïve and Yu and Lam (1997) estimators for the tree data and it behaves better in most situations for the simulated bivariate data.

Rank set sampling by SRSWR method
First, we choose a small number m (set size) such that one can easily rank the m elements of the population with sufficient accuracy. Then the selection of RSS is as follows: Select a sample of 2 m units from a population U by SRSWR method. Allocate these 2 m units at random into m sets each of size m . Rank all the units in a set with respect to the values of the variable of interest y from 1 (minimum) to m (maximum) by a very inexpensive method such as eye inspection. At this stage, no actual measurement is done. After the ranking has been completed, the unit holding rank i ( 1,.., im = ) in the i th set is actually measured. This completes a cycle of the sampling. One repeats the process for r cycles to obtain the desired sample of size n mr = . Thus, in a RSS, a total of 2 mrunits are drawn from the population but only mr of them are measured and the rest ( 1) mr m − are discarded. We call these measured mr observations "ranked set sample". Since the ordering of a large number of observations is difficult, increase of sample size n mr = is done by increasing the number of cycles .
r It is well known that ˆ( , ) rss mr  , the sample mean the RSS of size n mr = is unbiased for the population mean y  .

Judgment ranking
Sometimes, perfect ranking (no error in ranking) is not possible. In such cases, we use judgment ranking where each of the selected samples is ranked by an approximate method such as visual inspection, expert opinion or use of concomitant variable. It should be noted that some tests have been developed in the literature to assess the assumption of perfect ranking in RSS. Some of these include Frey, et al (2007), Zamanzade, et al (2012) and Zamanzade and Vock (2018). Let | i j k y be the smallest j th "judgment order statistic" corresponding to order statistic

Population mean with unknown
x is unknown, Yu and Lam (1997) considered a two-phase sampling procedure where in the first-phase, a relatively large sample s of size n is selected by the simple random sampling without replacement (SRSWOR) method from a population of size N and only information on the auxiliary variable x is collected. On the second-phase, a subsample s of size ( ) n rm = is selected from s using ranked set sampling with r cycles and information of study variable y is obtained using x as ranking variable. The proposed estimator for the population mean

Two-phase stratified ranked set sampling
Initially, a relatively large sample s of size n is selected from the entire population by SRSWOR method. From each of the selected units of s , information only on the concomitant variable x is obtained similar to Yu and Lam (1977) in two-phase sampling. Here, we assume that the condition of two-phase sampling is valid i.e. the cost of collecting data on x is much cheaper than that of the study variable y . Observing the values of x , the sampled units are classified into a number of strata H so that each of the stratum becomes homogeneous with respect to the variable under study y . The number of strata will certainly depend on the characteristics of the variable y and sample size n . For example, noting eye estimates of heights or date of plantation, one can classify the plants as small, medium or big. Similarly, noting the CD counts of HIV patients, we may classify the conditions of the HIV infected patients into bad, very bad and severe.

Comparison of stratified and un-stratified ranked set sampling strategies
It is very difficult to compare the performances of the proposed estimators 2ˆs  Table 1 and Table 2.

Simulation Results
Relative biases of all the estimators are very low in general. For the tree data it ranges from -0.2260 to 0.0518. The relative biases for the simulated bivariate normal data is much lower than that of the tree data and it varies from -0.014 to 0.023. The estimators and using auxiliary information possess higher relative efficiency than the naïve estimator in almost all situations. The proposed two-phase stratified regression estimator performs the best, the next place is occupied by two-phase un-stratified regression estimator . The estimator performs the best in all situations for the tree data with a maximum PRE 239.2576. The stratified estimator performs better than naïve estimator in general but in some isolated situations, it possesses lower efficiency (with the minimum PRE = 86.252) than the naïve estimator. For a given combination of (̃, ), PREs of all the estimators for both the tree data and simulated data decrease with m . For a given ̃ and m , PRE of the estimators decreases with . n The relative efficiencies of all the estimators for the simulated data increase with the correlation coefficient  . The Yuleestimator performs slightly better than the proposed two-phase estimator in scanty occasions.

Conclusion
Stokes (1977) recommended regression estimator for the ranked set sampling when the population mean of the auxiliary variable is known. Yu and Lam (1997) proposed the regression estimator in two-phase sampling when the population mean of the auxiliary variable is unknown. We also propose an alternative two-phase stratified ranked set sampling. On the basis of real and simulated data, it is found that the proposed regression estimator outperforms the other estimators in most situations, especially for the real tree data. We suggest therefore that instead of using two-phase sampling, one should use twophase stratified sampling for small strata for improving efficiency of the Yu-Lam estimator. Determination of the asymptotic distribution of the proposed estimators, coverage probabilities of the asymptotic confidence intervals and Bootsrap confidence intervals using Akgul et al. (2018) are subjects of our future research.