Bayesian Analysis for the Paired Comparison Model with Order Effects (Using Non-informative Priors)

Sometimes it may be difficult for a panelist to rank or compare more than two objects or treatments at the same time. For this reason, paired comparison method is used. In this study, the Davidson and Beaver (1977) model for paired comparisons with order effects is analyzed through the Bayesian Approach. For this purpose, the posterior means and the posterior modes are compared using the noninformative priors.


Introduction
Sometimes-it may be difficult for a panelist to simultaneously rank or compare more than two treatments (objects, items, options, stimuli etc.) specially when differences between treatments are small or the criteria are rather subtle.For this reason, the paired comparison data is regarded as more reliable and can be obtained more readily from panelists.In a paired comparison trial, a panelist is given a pair of treatments and is asked to pick the better one with respect to a given attribute.This process is repeated for all pairs of the treatments under study.
This method is widely used in industry for assessing customer preference and designing products using trained panelists.For example, in taste testing it is often difficult for a panelist to cope with more than two tastes at the same time and the introduction of a third may be confusing.A good review of the paired comparison models including their analysis is given by Bradley (1976).David (1988) has a detailed survey of the literature and references concerning the method of paired comparisons.Further literature can be seen in Augustin (2004), David, (2004), Hatzinger et. al. (2004) and the references cited therein.Some recent developments include Abbas and Aslam (2009) who develop a paired comparison model based on Cauchy distribution and analyze it via the Bayesian approach.Abbas and Aslam (2010) perform Bayesian analysis of the chi-square models suggested by Stern (1990).

The Davidson and Beaver Model
Consider a paired comparison trial with a set of m treatments 1 2 , ,..., m T T T .Each pair formed from the set of m treatments is ranked r ij times by being presented to a respondent who is asked to indicate a preference for one treatment of the pair.It is assumed that the responses to the treatments can be described in terms of an underlined continuum on which the relative worth of the treatments can be located.We denote the worth as an index of relative merit of the treatment i T by , 0 , where, without loss of generality, 1 i .A model can be based on the idea that when a panelist is confronted with treatment T i , he responds with an unconscious or latent variable X i .The assumed mechanism is that he prefers treatment T i to treatment T j if i j X X .Bradley and Terry (1952) propose a model of paired comparisons for the trial mentioned above.They assume that the difference between two latent variables i.e., (  )   i j X X , has a logistic (squared hyperbolic secant) density with location parameter (ln ln ) i j .Now the probability ( | , ) i j i j P X X that the treatment T i is preferred to the treatment T j , ( ) i j , when the treatments T i and T j are compared, is defined as: The Bradley-Terry model is defined by (1).Davidson and Beaver (1977) propose a modification of the Bradley -Terry model to account for the effect of the order of presentation of the treatments within a pair.A multiplicative order effect is suggested as an alternative to the additive order effect proposed by Beaver and Gokhale (1975).An important feature of the Bradley-Terry model is that the values ln 1 , ln 2 , …, ln m can be used to represent the merits of the treatments under study on a linear scale.Thus it is natural to assume that the logarithms of the worths, as supposed to the worths themselves, are affected additively by the order of presentation.This is equivalent to assume that when treatment T i and T j appear together in a pair, their relative worths are subject to a multiplicative within-pair order effect ij .It is assumed that ij .= ji , i.e., the within-pair order effect depends only on the treatments pair.The resulting preference probabilities for the ordered pair (T i , T j ) are given by .Here ij > 0. When ij =1, there is no order effect and the model yields the Bradley-Terry model (1).When ij > 1, the worth of the treatment presented second becomes inflated while if ij < 1, the treatment presented first gains an advantage.The case ij = , for all (i, j) is of interest because of the considerable reduction in the number of parameters required to specify the model.The multiplicative model (2) arises naturally in the setting of the linear model, (David 1988).Suppose that an individual experiences sensations X i and X j when presented the pair (i, j), and that the response i j (i preferred to j) results when X i > X j is interpreted to mean that the sensation to treatment T i comes closer to the ideal sensation than that for treatment T j .It is noted by Bradley (1953) that when the difference X i -X j has a logistic distribution with location parameter (ln i -ln j ), one obtains the preference probabilities of the Bradley-Terry model.Now suppose that for the ordered pair (i, j), the response i j results when X i -X j > ij where ij is interpreted as a shift on the sensation scale which arises because of the order of presentation.Using the logistic distribution for X i -X j one obtains the preference probabilities given by the multiplicative model ( 2) with ij = ln ij .
So far it has been assumed that each response to a pair of treatments consists of an indication of preference for one member of the pair.

Notations and Likelihood Function for the Model
Let w ijk (1) and w ijk (2) be the random variables associated with the rank of the treatment i in the kth repetition of the treatment pair (T i , T j ), i ( j) = 1, 2, 3,…, m, k = 1,…, r ij .They are defined as: w ijk (1) = 1 or 0 according as the treatment T i is preferred to treatment T j or not in the kth repetition of comparison.
w ijk (2) = 1 or 0 according as the treatment T j is preferred to treatment T i or not in the kth repetition of comparison.(2) Now we derive the likelihood function of the data for the Davidson and Beaver model stated in (2).We put constraint on the treatment parameters of the model that they are positive and they sum to unity.These conditions ensure that the parameters are well defined and identifiable.
The probability of the observed result in the kth repetition of the pair (T i , T j ) is Hence, the likelihood function of the observed outcome x (where x represents the data (w ijk (1), w ijk (2)) of the trial is where 0 1, 1, 2,..., Here is the order effect parameter and 1 2 , ,..., m are the treatment parameters.

The Choice of Prior Distribution
Bayesian analysis is a statistical procedure, which endeavors to estimate parameters of an under lying based on the observed distribution.We begin with the derivation of the prior distribution of parameters, include an assessment of the likelihoods function of the sample observations derived from the distribution identified by the parameters and then merge them to yield a posterior distribution of the parameters of interest.Then we base the entire parametric inference on the very posterior distribution derived for the parameters of interest conditional upon the data.In practice, it is common to assume a uniform distribution over the appropriate range of values of the parameters for the prior distribution.Adams (2005) throws light on the advantages of the Bayesian approach in an explained way.
Prior distribution quantifies information about parameter prior to any data being gathered.The prior which expresses specific definite information about a random variable is the informative prior.But in some cases, such as multiparameter situations, it becomes difficult to formalize any available prior information in to a distribution.In such cases when little prior information is known or prior elicitation is difficult, analysis is done by choosing a prior which reflect little prior information.These priors are known as non-informative priors.A prior distribution is non-informative if it is flat relative to the likelihood function.Thus a prior distribution is non-informative, if it has minimal impact on the posterior distribution of parameter of interest and is dominated by the likelihood, that is, it does not change very much over the region in which the likelihood is appreciable and does not assume large values outside the range.A prior which has these properties is said to be a locally uniform prior.Other names for non-informative priors are reference priors, vague priors, ignorant priors or flat priors.
Many approaches for the choice of a non-informative prior have been given.One way of eliminating the subjectivity in the choice of prior is to use a flat or diffuse prior distribution that is uniform across all possible values of the parameter.Such a non-informative diffuse prior is simply a constant, i.e., ( ) , p c for belonging to the parametric space.With a diffuse prior, the posterior is just a constant c times the likelihood, i.e.
Another approach is using the Jeffreys prior.It satisfies the local uniformity property for non-informative priors.It is the prior based on the Fisher information matrix.

Reference (Jeffreys) Prior for the Parameters of the Model
A non-informative prior has been suggested by Jeffreys (1946Jeffreys ( , 1961) ) which is frequently used in the situation where one does not have much information about the parameters.It is defined as the density of the parameters proportional to the square root of the determinant of the Fisher's Information Matrix.

Let p( | )
x denotes the density of x given .The Fisher information is If is a p×1 vector then

Properties of the Jeffreys' Prior
The Jeffreys Prior shows many nice properties that make it an attractive non-infornative prior.The Jeffreys prior has invariance property with regards to its one to one (1-1) transformation of the parameter in the sense that we get consistent answers in any parameterization.Bernardo (1979) shows that the Jeffreys prior is the appropriate reference prior if there are no parameters that are regarded as the nuisance parameters and the joint posterior distribution of all the parameters is asymptotically normal.Another important aspect of the prior is that it is not effective by a restriction on the parametric space.
If the likelihood function (4) belongs to the exponential family and it follows from the regularity conditions of Johnson and Ladalla (1979) that the posterior distribution is asymptotically normal.Here no parameter is regarded as 'nuisance' so the Jeffreys prior is the appropriate choice of non-informative prior and hereafter will be called the reference (Jeffreys) Prior.
Let us consider the case for the number of treatments m=2.

Bayesian Analysis of the Model
Using the data given in Table 1, the Bayesian analysis of the model with order effect for two treatments with equal number of comparisons for each pair r ij = 50, (i, j = 1, 2), is presented using the non-informative priors: the uniform and the reference (Jeffreys) priors.(2) where K = 6.7355 ×10 -29 is the normalizing constant.Here we may replace 1 by for simplicity.
The (marginal) posterior densities of the parameters 1 and are: Similarly the (marginal) posterior densities of the parameters can be derived.The (joint) posterior distribution using the Jeffreys prior for the treatment parameters 1 (with constraint 2 = 1 -1 ) and the order effect parameter is (2) where K = 5.3549 × 10 -29 is the normalizing constant.The (marginal) posterior densities of the parameters 1 and may be found using ( 5) and ( 6).

The Posterior Estimates
The posterior means and joint posterior modes of the parameters are considered to be the estimates of the parameters.The means of the parameters using uniform prior and Jeffreys priors are evaluated using the quadrature method for the data set given in Table 1.We evaluate the expressions respectively to find the mean estimates the worth and order effect parameters.Using the uniform prior, the values of the posterior means of the parameters 1 , 2 and are evaluated to be 0.31972, 0.68028 and 1.296670, whereas using the Jeffreys prior, these estimates are 0.32050, 0.67950 and 1.230970.Here it can be seen that the posterior means of the parameters evaluated using the uniform prior are very close to those produced using the Jeffreys prior.
The estimates of the parameters of interest which maximize the posterior density are termed as the joint posterior modes.These are found by solving the equations obtained by equating to zero the first partial derivatives of the logarithm of the likelihood function with regards to the unknown parameters.That is we solve: The modal estimates using the uniform prior are found for the parameters 1 2 , and by executing a computer program designed in the SAS package.The values of the posterior joint mode of the vector ( 1 , 2 , ) are obtained to be ( 1 = 0.318665, 2 = 0.681335, = 1.22676).Similarly the modal estimates evaluated using the Jeffreys prior are found by running another SAS program and the values of the joint posterior mode of the parameter vector ( 1 , 2 , ) are obtained to be ( 1 = 0.319267, 2 = 0.680733, = 1.145057).Here we observe that the resulting estimates produced using both the non-informative priors -the Jeffreys and the uniform priors -are also very close.

Conclusions
Here we perform the Bayesian analysis of the Davidson Beaver model for paired comparisons based on the non-informative priors -the Jeffreys as well as the uniform priors.We derive the means and the modal estimates of the model parameters for an illustration based on the observed data.From the results it reveals that the posterior estimates obtained using the two types of the non-informative priors are very similar, which gives confidence in using either of the priors.The results also exhibit the robustness with respect to the choice of non-informative priors: the Jeffreys and the uniform.This means the simpler uniform prior is a reasonable option in this case.
frequency of preference for the treatment presented first (treatment i).w ij (2) =(2)ijk k w = The frequency of preference for the treatment presented second (treatment j) r ij = The number of timed treatment T i is compared with treatment T j and =(1) number of wins of the treatment T i I θ is a p × p matrix.The Jeffreys prior is defined as the determinant of the Fisher information matrix, i.e., The likelihood function for the parameters , 1 and 2 of the Davidson and Beaver model is proportional to )