Entropy Estimation from Judgment Post Stratified Data

This article concerns entropy estimation using judgment post stratification sampling scheme. Some nonparametric estimators are developed and shown to be consistent. Monte Carlo simulations are used to compare these estimators with their competitors in simple random sampling. The results indicate the preference of the new estimators.


Introduction
Judgement post stratification (JPS) sampling scheme, introduced by MacEachern et al. (2004), has wide applications in situations where auxiliary information is available to induce an additional ranking structure in simple random sampling (SRS) scheme. This structure is obtained via a ranking process to determine the position of the measured units in the simple random sample among additional independent 1 H  sample units from the target population. These positions are then used to put homogeneous units in the SRS in the same strata, and therefore an increased efficiency of a judgement post stratified data is expected from theory of stratified sampling in survey sampling designs.
To draw a judgement post stratified sample of size N , using set size H , one first draws a simple random sample of size N , say 1  ). This set is then ranked from smallest to largest and the rank of i Y is recorded. This ranking process in JPS sampling scheme is done by any inexpensive method which does not require actual quantifications of the units in the set (e.g. eye inspection, personal judgement or using a covariate). If the ranking process is done based on eye inspection, the researcher should be blinded to the actual value of i Y to avoid possible biases in the ranking process. Here the term judgement rank indicates that the ranking process in the JPS sampling scheme is done without actual measurement of the additional units, and thereby is prone to errors. Let F be the cumulative A judgement post stratified sample of size N consists of a simple random sample of size N , with their corresponding judgement ranks, can be represented as 11 ( , ),...,( , ) The main difference between the JPS sampling scheme and the RSS is about the ranking process. In the JPS setting, the ranking process is performed after measurements of the sample units, so the judgement ranks are loosely related to the measured units, and can be ignored. Therefore, a judgement post stratified sample can still be analyzed with standard SRS procedures. This is very useful in situations in which the researcher believes that the ranking process is too poor or the required statistical method has not been developed yet for the JPS setting. However, in the RSS setting, the ranking process is preformed prior to measurements of the sample units, so the judgement ranks of the units are strongly attached to them and cannot be disregarded. So, a ranked set sample must be analyzed with an appropriate procedure specially developed for the involved situation. Up-to-date references for the RSS scheme can be found in Wolfe (2012).
Both RSS and JPS sampling schemes are useful in situations in which exact measurement of sample units is expensive or time-consuming but ranking them (without obtaining their precise values) is easy and cheap. These situations frequently happen in forestry (Halls and Dell, 1966), medicine (Chen et al., 2005), environmental monitoring (Kvam, 2003; A lot of research has been done in the JPS sampling scheme in recent years. Wang  In Section 2, we discuss the problem of CDF estimation in the JPS setting. In Section 3, we propose some nonparametric entropy estimators for judgment post stratified data. We then prove that the proposed estimators are consistent. In Section 4, we compare the proposed entropy estimators with their counterparts in the SRS setting. We end with a conclusion in Section 5. The standard CDF estimator of JPS sampling scheme is  Dastbaravarde et al. (2013) showed that this estimator is unbiased for the population CDF, and has less variance than its SRS rival,

Nonparametric estimation of CDF in JPS sampling scheme
, provided that the sample size N is not too small. They also proved that this estimator is strongly consistent and established its asymptotic normality as If there is no empty stratum, then it turns out from properties of isotonic regression that the two above formulas are equivalent (Robertson and Waltman, 1968 It is worth mentioning that the asymptotic behavior of the CDF estimators proposed by Wang et al. (2012) is the same as ˆs t F .

Nonparametric estimation of entropy
Let Y be the continuous random variable with the density function () fy and the CDF () Fy. The entropy () Hf of this random variable, as a measure of uncertainty, is defined by Shannon (1948) The problem of estimation of H(f) has been considered by many researchers in the literature. Vasicek (1976) was the first who proposed to estimate H (f) based on spacings. His estimate was obtained by using the fact that Equation (8) can be rewritten as The estimate was constructed by replacing distribution function F by the EDF, and the derivative 1 () d Fp dp  is estimated by a function of order statistics. be an ordered random sample of size N from the population of the interest. Vasicek (1976)'s entropy estimator is given by is an integer which is called window size.
Vasicek (1976) The next proposition establishes the consistency of the proposed estimators.
where the last limit holds because under a consistent ranking process F Z is a consistent estimator of F.

Monte Carlo comparison
We compare the performance of the proposed entropy estimators in the JPS sampling scheme with their competitors in SRS scheme via Monte Carlo simulation in terms of root of mean square error (RMSE). We have generated 10,000 judgement post stratified samples of size 10, 20,30,50 N  , with set sizes 3, 4,10 H  from standard normal, standard exponential and standard uniform distributions. So, we consider both effects of increasing sample size ( N ) and the set size ( H ) on the performance of the estimators. We control the quality of the ranking by using a concomitant variable in adaptive perceptual model proposed by Dell and Clutter (1972      ). It is worth mentioning that all entropy estimators underestimate the true value of the population entropy.
The analogous results for standard exponential distribution are presented in Table 2. We observe that for exponential distribution, although the JPS entropy estimators do not provide any improvement, their performances are slightly worse than  Table 3 presents the simulation results when the parent distribution is standard uniform. The performance of the entropy estimators in this case is very similar to that of Table 1. The only clear difference is that the values of RMSE and Bias of entropy estimators in the standard uniform distribution are less than those in the standard normal distribution.

Conclusion
In this paper, we developed some nonparametric entropy estimators for judgement post stratification sampling scheme. The estimators were obtained by using different cumulative distribution function estimators in the JPS setting. We proved that the proposed entropy estimators are consistent. Our simulation results show that the entropy estimators in the JPS setting typically have better performance than their competitors in SRS setting, especially when the quality of ranking is fairly good.
In this paper, we confined our attention to estimation of entropy. However, it would be interesting to evaluate the performance of different entropy estimators for goodness of fit tests in the JPS setting, as well. This will be studied in the subsequent work.