A Bayesian Justification for Random Sampling in Sample Survey

In the usual Bayesian approach to survey sampling the sampling design, plays a minimal role, at best. Although a close relationship between exchangeable prior distributions and simple random sampling has been noted; how to formally integrate simple random sampling into the Bayesian paradigm is not clear. Recently it has been argued that the sampling design can be thought of as part of a Bayesian's prior distribution. We will show here that under this scenario simple random sample can be given a Bayesian justification in survey sampling.


Introduction
In the Bayesian approach to sample survey one must specify a prior distribution over the parameter space for the entire population of possible values for the characteristic of interest.Once a sample is observed, the posterior distribution is just the conditional distribution of the unobserved units given the values of the observed units computed under the prior distribution for the population.This posterior does not depend on the sampling design used to select the sample.The Bayesian approach to finite population sampling was elegantly described in the writings of D. Basu.For further discussion see his collection of essays in Ghosh (1988).In theory one can use the prior distribution to select an optimal, purposeful sample (Zacks, 1969) but this is almost never done in practice.A problem with the Bayesian approach is that it can be difficult to find prior distributions which make use of available prior information about the population.
The sampling design plays a fundamental role in the standard frequentist theory for survey sampling.The design is the only source of randomness in the model since units in the sample are assumed to be observed without error and it is upon the selection probabilities that the frequentist properties of estimators are based.
It was noted in Godambe (1955) that many sampling designs can be thought of as being defined conditionally on the order that the units appear in the sample.He then suggested that from a theoretical perspective it is convenient to ignore this fact and just consider the unordered sample (where the order is ignored).A good reason for doing this was pointed out in Murthy (1957) where it was demonstrated that for any estimator which depends on the ordered values there exists another estimator which only uses the unordered values which has the same expectation but smaller variance except when the two estimators are the same.This application of the Rao-Blackwell theorem was also discussed by Pathak (1961).Because of this most sampling theory has concentrated on unordered designs although Raj (1956) is one example where the order in which the sample was drawn was considered.Meeden and Noorbaloochi (1910) noted that given a design, but before any data has been collected, the actual units that will appear in the sample are unknown.They argued that this suggests the design could be considered as part of the Bayesian's prior distribution.There they considered prior distributions which were defined in two steps.First, using a design, they randomly assigned an order to the units in the population and then conditional on a given order they specified a distribution for the possible values of the units.They showed that this approach gives a flexible method to incorporate prior information into survey sampling problems.Ericson (1969) presented subjective Bayesian models for survey sampling when the labels contain little prior information about the units in the population.His prior distributions were exchangeable and in section 2.2 he discussed the ``intimate similarities'' between a subjective exchangeable prior distribution and an objective distribution introduced by the design using simple random sampling.In this note we will show how in the Meeden and Noorbaloochi (1910) framework these ``intimate similarities'' can be formally expressed which it turn yields a Bayesian justification for simple random sampling.

Some notation
Consider a population of size N and let = { , , , }      be a set of N labels which identify the units in the population.We let  denoting a typical label.Let = { : } y y     denote a typical unknown set of population values.Here we assume that each y  can only take on the values 0 and 1.  is an unordered set since the population labels have no order.But since order will be important for us we let = ( , , , ) u     denote the labels in some fixed order.Then u y denotes y arranged in this standard default order.Hence the set of possible values for u y is given by (0,1) = { : such that for each = 0 or 1} When we write y its order does not matter while it does matter in u y .If  is a permutation of 1, 2, , N  we let ( ) u  be the permutation  applied to u to give a new order for the labels.Then  .Let  be the set of all possible permutations of 1, 2, , N  .Since order will matter for us another space of interest is ( ) (0,1, ) = {( , ) : where (0,1) and } For each fixed u y this set will contain !N points, one for each possible permutation  .For each  the point ( ) ( , ) u y   consists of the permutation along with the order of u y under this permutation.
Consider a sampling design such as simple random sampling without replacement (srs) where units are selected one at a time without replacement until the desired sample size, say n , is reached.At the point where the units have been selected but before their y values are observed, we can imagine continuing the sampling procedure until all the units from the population have been selected and given an order.This is just a thought experiment and is not something that would be implemented.However we see that the srs design can be extended in a natural way to define a probability distribution on  .When the is srs the resulting distribution is just the uniform distribution on  .Before the labels are selected and the characteristic of interest observed we can think of both  and ( ) u y  as unknown.Observing the data results in partial information about both of them.From the Bayesian perspective this means we could define a joint prior distribution over the pair on the space (0,1, ) Y  .In the next section we will compare two different approaches using this setup when little is know a priori about the population.

3.Comparing two Bayesians
If we are assuming that (0,1, ) Y  is the space of possible values for the unknowns then a prior distribution may be defined on this space.The joint distribution for ( ) ( , ) u y   may be written as a marginal for  and a conditional for ( ) u y  given  .This means that we can use the probability distribution on  , coming from srs, as the marginal for  .Then given an order,  , it remains to define the conditional distribution of ( ) Under this setup a prior distribution will be of the form Then the prior distribution for y is just the marginal distribution for y under the above model.
We now wish to compare a Bayesian who incorporates srs into their prior as described above with the usual Bayesian approach where the design is ignored.We will call them the design Bayesian and the standard Bayesian and denote them by DB and SB respectively.
For the SB her prior will be  , but which will play no role in her analysis.Since she knows the labels her prior beliefs about the unknown u y values are contained in ( ) u p y .If these beliefs are exchangeable then she will use a prior of the form ( ) = ( ) / when = for = 0,1, , . Suppose in a random sample of size n there were observed x 1's and n x  0's then her posterior will be of the form ( ) ( | ) for = , 1, , ( ) We now turn to the DB who assumes that the labels exist but that they are not known to him.For him * (0,1) = { : where has 1's followed by N-0's for some =0,1,.....,N is the set of sensible possible values for u y rather than the set given in equation 1.From this it follows that his prior distribution will be restricted to the subset of ( 0 But given a sample of size n with x 1's his posterior distribution will be ( | ) ( ) for = , 1, , ( ) where he is using the design probabilities to compute his posterior.Now assuming that the two Bayesians are using the same ( ) p  these two posteriors will be identical if / for = , 1, , ( ) for fixed x , n and N .But this is easy to verify.
So the SB who assumes that the labels exist but carry no information and the DB who does not know the labels but makes use of the simple random sampling design when making his inferences have exactly the same posterior distribution given the sample.

Discussion
We have argued that once a Bayesian considers the sampling design as part of the prior distribution then in the situation where little is known about the population it makes sense that their inferences can be based on the fact that the sample was drawn at random.Furthermore their posterior distribution is identical to a posterior of a standard Bayesian values of y arranged in the order determined by ( ) u


and assume he is using srs to define ( ) p  .Now for a fixed value of  , since there are no labels, there are only N is exactly the same probability assigned to such an ordering as that assigned by the SB in equation 4.