Linking Diversity and Disparity Measures

The purpose of this paper is to examine links between the diversity measures (Patil and Taillie 1982) and the disparity measures (Lindsay 1994), quantities apparently developed for somewhat different purposes. We demonstrate that numerous diversity measures satisfying all the desirable criteria mentioned by Patil and Taillie can be defined by the generating functions of certain disparities and the associated residual adjustment functions. This provides the statistician and the ecologist a wide class of flexible indices for the statistical measurement of diversity.


Introduction
The statistical measurement of diversity is an extremely important practical problem.Diversity, under various names, has been a very significant concept in ecological, biological, economic, social, physical and management sciences.See Atkinson (1970), Finkelstein and Friedberg (1967), Greenberg (1956), Hart (1971), Lieberson (1969), Nei (1973), and Sen (1974), among others.There is a vast literature on diversity related issues.Here we discuss only some of the important, relevant references.
Two widely used indices of diversity are Shannon's (1948) index and Simpson's (1949) index.Good (1953) suggested a more general diversity measure which includes Shannon's and Simpson's indices.Baczkowski et al. (1997Baczkowski et al. ( , 1998Baczkowski et al. ( , 2000) ) discussed a further generalization of the Good's index.
In practice, diversity has been interpreted and measured in different ways.One approach in measuring biological diversity is to consider joint dissimilarity of species in a community.Using this approach based on an intrinsic notion of dissimilarity between individuals of a population, Rao (1982a, b) developed the axiomatic theory for diversity.However, this approach has a limited impact on the ecological practice of measuring bio-diversity (Solow and Polasky 1994;Champely and Chessel 2002).Rao (1982a, b) extended the concept of analysis of variance (ANOVA) to the more general analysis of diversity (ANODIV), which can be used for qualitative data as well.His work generalized the work on the analysis of one-way classified categorical data, called CATANOVA, of Light and Margolin (1971) and Anderson and Landis (1980).Nayak (1986a) discussed generalization of the CATANOVA methods of these authors using Rao's quadratic entropy.For a general class of diversity measures, Nayak (1986b) discussed sampling distributions of quantities arising in ANODIV.
In this paper, however, we restrict our attention to the the widely accepted traditional approach of measuring ecological diversity, in which one considers the relative abundances in a community without regard to the differences between species.For this approach, Patil and Taillie (1982) provided a formal definition and logical development of diversity as a concept and worked out a related theory for the statistical measurement of diversity.They defined diversity of a community as the average rarity of species within the community, and proposed a family of measures called diversity indices of degree  .Below we mention some recent works on the application of these diversity indices.
For square contingency tables having nominal categories, Tomizawa (1994) proposed two measures to represent the degree of departure from symmetry using the average of the Shannon's index and the average of Simpson's index respectively.Tomizawa et al (1998) gave a generalization of the two measures using the average of the diversity index of Patil and Taillie (1982, Sec 3.2).
For a two-way contingency table with nominal explanatory and a nominal response variable, Tomizawa et al (1997) defined measures which describe the proportional reduction in variation from the marginal distribution to the conditional distributions of the response using Patil and Taillie's (1982) diversity index.Tomizawa and Ebi (1998) extended Tomizawa et al's work to multi-way contingency tables.
In the context of developing robust and fully efficient inference procedures under count data models, Lindsay (1994) defined a class of density based divergences, called disparities.A disparity is a measure of average discrepancy between two densities, which in statistical inference are the model density and an appropriate nonparametric density estimator obtained from the sample data.Lindsay's class of disparities includes the well known and well studied Hellinger distance ( HD ), the more recent negative exponential disparity ( NED ) which is an excellent competitor to the HD in generating robust statistics (see Basu et al. 1997), the Pearson's chi-square, the likelihood disparity, and the Kullback-Leibler divergence.This class of disparities contains some important subclasses, namely the blended weight chi-squares, the blended weight Hellinger distances (Lindsay 1994), and the celebrated power divergence family (Cressie and Read 1984).
The development of the class of disparities is the natural culmination of the study of density based divergences.Beran (1977) first showed that the the robust minimum Hellinger distance estimator attains full asymptotic efficiency at the model, something which the traditional robust estimators such as the M-estimators fail to do.Among others, Tamura and Boos (1986) and Simpson (1987Simpson ( , 1989) ) further pursued Beran's work.Lindsay (1994) presented a comprehensive approach, developed the class of disparities, and extended the range of choice beyond the Hellinger distance.Many new members of the class of disparities share and sometimes improve upon the desirable properties of the Hellinger distance.
Several disparity based analogs (Simpson 1989, Bhandari et al 2000) of the likelihood ratio test are excellent robust alternatives to the usually non-robust likelihood ratio test.As another application of disparities, Basu and Sarkar (1994) investigated disparity based goodness-of-fit tests for multinomial models under simple as well as composite hypotheses thus generalizing the Cressie-Read power divergence approach.This line of research is pursued by Shin et al (1995Shin et al ( , 1996) ) and Jeong and Sarkar (2000) among others.Thus a considerable amount of research are based on the disparities in the area of robust inference and goodness-of-fit tests.For a comprehensive description see Basu et al (2011).
The form of the diversity index of degree  defined by Patil and Taillie (1982) is strikingly similar to that of the well-known power divergence of Cressie and Read (1984).This makes one wonder about possible connections.More generally, since diversity of a community is about measuring the average rarity of its species, and disparity is about measuring the average discrepancy between suitable densities, the question arises: Is there a link between diversity and disparity measures?For example, can one generate diversity measures using the functions related to disparities?Such enquiries motivated the work of the present paper and we hope that we have presented at least a partial answer to the above questions.
The rest of the paper is organized as follows: A short review of diversity measures is given in Section 2 whereas a brief discussion of disparities is provided in Section 3. Section 4 presents the links between diversity and disparity measures.Finally, Section 5 contains some concluding remarks.

Diversities
We briefly review the diversity measures introduced by Patil and Taillie (1982).Suppose a certain quantity is distributed among a countable set of categories, labeled = 1, 2, , i  with i  as the proportionate share received by category i and . This quantity may be discrete (e.g.biological organisms, errors in a bank ledger) or continuous (e.g.biomass, energy, income).
For concreteness of further discussion on the concept and measurement of diversity, Patil and Taillie consider a community of biological organisms grouped into species and call    the species (relative) abundance vector.A community may be identified with the pair = ( , C s  ), where s is the number of nonzero components of  .Thus s is the number of species that are physically present in the community.Assume that s is finite.A community is called completely even when

 
   are obtained by arranging the components of  in a decreasing order.
Given a community = ( , C s  ), let ( ; R i  ) denote a numerical measure of rarity to be associated with species i , = 1, 2, i  .Then a diversity measure of the community is an average rarity of its species, and the diversity index associated with the measure of rarity R is defined by Obviously, the diversity measure  depend on the choice of the function R measuring rarity of species within the community.
Assume that the rarity measure ( ; R i  ) depends only on the numerical value of i  .This phenomenon is called dichotomy, and the resulting diversity index ( ) C  is known as dichotomous.We will write ( ; Note that the function R measuring rarity is defined on the interval (0,1] and (0) R is inherently undefined and (1) = 0 R is a natural normalizing requirement.Since rarer species correspond to smaller values of  , ( ) R  should be a decreasing function of  .
Since (1) = 0 R , the function R should be nonnegative.Patil and Taillie (1982) listed these conditions in their Criterion C1.
One obtains three widely used indices of ecological diversity ) respectively.All three assign diversity value zero to a single-species community.The above three measures are special cases of the diversity index of degree  , denoted by   and defined by using the measure of rarity, defined on (0,1],  .The R functions in equation ( 1) correspond to = 1, 0   and 1 respectively under this setup.
Patil and Taillie imposed another desirable condition on the diversity measures through their Criterion C2: For two communities = ( , '  C by introducing a species or by a transfer of abundance, which are defined in the following.
s s  and if there are two distinct positive integers i and j such that '   C by a transfer of abundance if = ' s s and if there are positive integers i and j such that > > 0 To state conditions under which a diversity index ( ) C  satisfies their Criteria C1 and C2, Patil and Taillie defined an auxiliary function V by 0 = 0 The function V may be discontinuous at 0 .Then, the Criteria C1-C2 are satisfied if the auxiliary function V is concave on the closed interval [0,1].Thus, the diversity index Another desirable condition imposed on the diversity measures, stated in Patil and Taillie's Criterion C3, is that (   ) be a concave function of  .The motivation for this condition comes from the consideration that the diversity in a mixture of populations should not be smaller than the average of diversities within individual populations (Rao 1982b, Sec 2).Criterion C3 is satisfied by a diversity measure  if the corresponding auxiliary function V is concave on the closed interval [0,1], and, in particular, by   if 1    .

When thus standardized, the function ( )
A  is called the residual adjustment function ( RAF ) of the disparity.

LD HD NED
The RAF plays a major role in determining the second order efficiency and robustness of the MDEs .The RAF of a disparity controls the impact of large outliers much in the same way as the  function of the M -estimation procedure.For an extensive discussion, see Lindsay (1994), Basu and Lindsay (1994) and Basu et al (2011).

Links between diversities and disparities
The Criteria C1-C3 of Patil and Taillie (1982) for a diversity measure are satisfied if the corresponding rarity function ( ) R  defined on the interval (0,1] is: (i) non-negative, (ii) decreasing, and (iii) have (1) = 0 R ; and the auxiliary function To define either (a) ( ) R  (and hence ( ) = ( ) ) satisfying the four properties (i)-(iv), one may use the disparity generating function G or the associated residual adjustment function A on the interval [ 1, 0]  or [0,1].We present four cases depending on three factors, namely, whether the function G or A is non-negative or non-positive, increasing or decreasing, and convex or concave on the interval [ 1,0]  or [0,1].Assume that the first and second derivatives of G and A are well defined on [ 1,0]  or [0,1].
Case 1. Suppose on the interval [ 1, 0]  an RAF ( ) A  , which is non-positive and increasing with (0 Application 1 of 1a: For the Hellinger distance   , we then obtain the rarity function Application 2 of 1a: The RAF of power divergence family is given by On (0,1] , define Thus, Patil and Taillie's rarity function is the negative of a shift (by unity) of the RAF of Cressie and Read's (1984) power divergence with = 1    .
Remark 1.Note that ( ) . Thus, strictly speaking, application of 1a only shows that properties (i)--(iv) are satisfied for R in (Error!Reference source not found.)for 0   , but Patil and Taillie showed that properties (i)--(iv) are satisfied for R in (Error!Reference source not found.)for 2 In fact, for ( ) ( ) as in ( 9), a direct calculation shows that ).Therefore, properties (i)--(iv) are satisfied for R in (11) for 2    .Thus the convexity condition imposed on the residual adjustment function A in Case 1 is sufficient, but not necessary.This condition on A is assumed to make the resulting V concave.

Case 2. Suppose on the interval
, is non-negative and decreasing.That is, (0 , which is non-negative and decreasing on  .Thus, the Pearson's chi-square corresponds to Patil and Taillie's diversity index of order 2. Application 2 of 2a: For the blended weight chi-square ( BWCS ) family (Lindsay 1994), its (convex) G function is given by 2 Application 3 of 2a: For the blended weight Hellinger distance ( BWHD ) family (Lindsay 1994), its (convex) G function is given by 2 Case 3. Suppose on the interval [0,1] an RAF ( ) A  , which is non-negative and increasing with (0 Now consider the RAF of the generalized negative exponential disparity (Jeong and Sarkar 2000) Application 2 of 3a: For the Hellinger distance Case 4. Suppose on the interval [0,1] the function ( ) G  , which is convex with , is non-negative and increasing.That is, (0 Application 4 of 4a: For the Kullback-Leibler divergence, ( ) = ( 1)

  
In Table 1 we present several rarity functions (diversities) obtained from different wellknown disparities.
Remark 2: Use of Case 1 and Case 2 (alternatively Case 3 and Case 4) may sometimes lead to the development of the same diversity (i.e., the same rarity function R ) from the same disparity.For example, using the RAF

Concluding remarks
Diversity measures have wide practical use for the measurement of ecological biodiversity.In the traditional approach of measuring ecological diversity, one considers the relative abundances in a community without regard to the differences between species.For this scenario, the work by Patil and Taillie (1982) developed the appropriate concepts and provided a formal definition together with a logical framework.These authors defined diversity of a community as the average rarity of species within the community, and proposed a family of measures called diversity indices of degree  .
The diversity measures defined under this approach are characterized by certain basic requirements.It turns out that a large and rich class of such measures satisfying these basic requirements may be constructed following the structure of the estimating functions in density-based minimum distance estimation.In this paper we demonstrate the construction of several such disparity measures based on different minimum distance procedures.The minimum disparity estimators considered here have show a great deal of variation in their behavior and we expect that the corresponding class of diversity measures will continue to show a responses leading to different interpretations.
Given the usefulness of measures of diversity, we expect that the link between the actual diversities and the class of minimum distance processes will act as a major facilitator in constructing useful diversity measures with the required properties.Clearly, more detailed studies and investigations will be required to select and choose the more suitable measures from within this class.But the availability of the class itself leads to useful gains, and represents the basis of important future selections.
averaging is done with respect to the model density .

2 (
of PCS in Case 2a will result in the same rarity function does not happen in general.For example, using the RAF of the HD in Case 1a we get the rarity function ( ) = 2second example, one can see that use of the RAF ( ) = 2 of the NED in Case 2a produces a different rarity function ( ) = 1 .
done with respect to the model density.An x -value is called an outlier if it has a large positive Pearson residual ( )x