Vector Exponential Models and Second Order Inference

.


Introduction
Exponential models are widely used in contemporary statistics, offering rich model flexibility with relatively easy analysis that is largely immune to high data dimension: In basic theory they support the uniformly most powerful unbiased and similar tests (Lehmann & Romano,2005, Section 4.4); In model building they provide the structure to go beyond Normal theory to generlized linear models (Nelder & Wedderburn,1972;McCullagh & Nelder,1989); With graphical models they provide key structure (Lauritzen,1996); In machine learning they are a primary ingredient (Wainwright & Jordan,2008); In approximation theory they offer a basis for saddlepoint methodology (Daniels,1954); In current inference theory they underpin higher order methods (Barndorff-Nielsen,1986); And for determining the connections between Bayesian and frequentist inference they give access to the needed location models (Welch & Peers,1963).

   
 can be viewed as an exponential tilt of a basic density or relative density ( ) h y .By suitably recentering the variable and the parameter relative to observed data 0 y and corresponding 0  we can work with the more transparent form of model ( ; ) = exp{ ( )} ( ) g s s g s

  
  where ( )   is the reexpressed observed log-likelihood with observed data value = 0 s .Asymptotic theory examines the effects of increasing data size n and produces remarkably accurate approximations for the density and the distribution function at the observed data point.
These approximations use the log-likelihood function ( ; ) = log ( ; ) s g s

 
 which is typically directly available from the original model as log ( ; ) f y  provided allowance is made for parameter change from  to  .The approximations also typically use two intermediate measures of departure, the signed likelihood root r and the maximum likelihood departure q in the canonical  scaling: where ˆ= ( ) is the value that maximizes the likelihood ( ; ) s is the curvature of likelihood at that maximum.For a model with data, the value of r and q are part of the usual output from many statistical packages, and come directly from the observed likelihood function: r comes from vertical change in likelihood and q comes from horizontal change, maximum likelihood value less parameter value of interest.
The highly accurate approximations for the density and distribution functions are where ( ) z  and ( ) z  are the standard Normal density and distribution functions and accuracy is third order with k constant to that order.These approximations from Daniels (1954) and Barndorff-Nielsen(1991) provide exceptional access to statistical inference, both theoretical and practical; for some recent discussion see Fraser (2011).
In particular, the approximations lead to a simple proof of the Welch & Peers (1963) theorem, and provide the stronger statement that a scalar exponential model is a location model to second order, which thus justifies the Jeffreys (1946) choice of root information as a second order prior for the scalar exponential model; this is described from a Taylor expansion view in § 2 and § 3. We then pursue the Welch-Peers direction and derive the second order form for a vector exponential model in § 4. Some discussion in § 5 focusses on the implications for general statistical inference but does not pursue the details; some immediate consequences will developed in Fraser et al. (2012).Thus our present material can be viewed as a contribution to statistical theory and to the interpretation of statistical theory, particularly the profound Welch-Peers result.

Welch-Peers and log-model expansions
Statistics has two rather different methodologies for statistical inference: the frequentist which is based on frequency properties in the statistical model, and the Bayesian which augments this with a prior density purporting to describe origins for the particular parameter value.The literature records support for one or the other of these, or discusses conflicts between or within the approaches, or proposes alternative approaches such as (Fisher,1956).A profound link among these emerged with Welch & Peers (1963) who showed that the two approaches lead to the same result to second order in the presence of a scalar exponential model, provided the prior used is the Jeffreys root information prior for some recent discussion see Fraser (2011).
A more transparent route to this Welch & Peers (1963) result is available using a Taylor expansion of the log-model in terms of appropriately standardized variable and parameter; for some background see Fraser & Reid (1993) and Cakmak et al. (1998).
These expansions are usually examined to third order, but for the Welch-Peers result to be discussed here a second order expansion suffices, and is flexible for extension to the vector parameter context.
For the present scalar exponential model ( ; we first consider the log-model using a centered and scaled version of  , a centered and scaled version of s , plus expansion to the second order; this leads to the second order expansion of log-likelihood as  is the negative third derivative of log-likelihood at the maximum.The standardized statistical model then has the second order form: where ( ) h s has been determined to the second order by expanding the second exponential in (3), then using for the pure Normal ( ;1)  , and then returning one term to the exponent.
The likelihood negative Hessian of likelihood or the negative derivative of score . This then gives the Welch-Peers prior these present the conditional probability consequences of introducing or formally assuming probability properties for the mathematical prior ( )   when no such properties are part of the given.Introductory calculus then shows that the two integrals are equal.This gives a p -value or confidence calibration for the Bayes approach, and provides a primary or fundamental connection between Bayesian and frequentist methodologies.For some recent discussion see Fraser et al.(2010) and Fraser & Reid (2011), and for some background on the Bayes-frequentist connection see Fraser (2011).
And then for the scalar exponential model considered here using the location results from the next section we have the WelchP:1963 result that the frequentist p -value and the Bayes survivor value ( ; ) s   are equal to the second order, where ˆˆˆˆˆˆ( ; ) = ( ) , ( ; ) = with the constant prior = 1 c for  : the two integrals are numerically equal so that ˆ( ; ) = ( ; ) p s     .In § 3 we determine the second order location parameter  and the

Location parameterization and the reexpressed model
We use the information function to rescale on the parameter space and thereby derive a new parameterization.For the standardized model (2) the information ( ; ) = ( ) j s

   
 arises as a second derivative; the root information can then be viewed as a rate for the given parameter and accordingly we can rescale locally to obtain an increment for a new parameterization and then integrate to obtain the new  in terms of the old  : where the lower limit of integration for convenience is at 0  which is 0 in the standardized notation The reverse transformation expressing the old  in terms of the new  is then And somewhat similarly we obtain the connections between the old and the new variables; using (9) we obtain
We now reexpress the standardized model in terms of the new parameter  and the related maximum likelihood variable  .For this we make the change of variable where the first equality brings one term down from the exponent to form the factor before ds , the second equality come from moving the Normal density to the exponent, the third collects the quadratic and cubic terms to order

Vector exponential model and the second order reexpression
Consider a full p -dimensional continuous exponential model with canonical parameter ( )   and canonical variable ( ) s y , as reexpressed in the standardized form used in the preceding section:    in the = 2 p case so as to maintain the integrity of an interest parameter say  ; we can then take 1/2 j to be the right positive lower triangular square root of the observed information matrix, 0 to the second order, where is the third derivative of negative log- likelihood with respect , , i j k    but calculated in the standardized coordinates.
We now derive ( ) h s to second order following the pattern from (3) to ( 4) and ( 5) in § 2. There a term say

Discussion
Exponential models can be used to construct larger models in applications, to deconstruct larger models to determine p -values for interest parameters, and also to get highly accurate approximations for those p -values.With this as background we have discussed the Welch & Peers (1963) demonstration that Bayes analysis can reproduce standard results to the second order, but just in the scalar-parameter exponential model context.
We have then developed the second order version of the vector parameter exponential model, for general purposes and for seeking vector-parameter extensions of the Welch-Peers.We view the second order exponential model as a contribution to statistical theory; some aspects of the vector Welch-Peers have been mentioned in Fraser et al. (2005) and further aspects will be reported separately (Fraser et al.,2012).
)to second order: seeReid & Fraser(2010)  andFraser et al.(2011).The essence ofWelch & Peers (1963) analysis comes from a location relationship between a reexpressed parameter say  , a reexpressed variable say  , and an associated model ˆ( ) f d     ; we demonstrate this essence of Welch & Peers (1963) in § 3.But first we record a fundamental consequence that comes from a location model.With such a model say ( ) f y   with scalar variable, scalar parameter and data 0 y we have the p statistical position of the data 0 y in the population labelled  ; it can be viewed as the primitive or mother-of-all p -values.And with the addition of a flat prior for  as motivated by location invariance we have the posterior distribution


n  , and the fourth expresses the model in terms of the centered pivot = z    .We thus see that the model for  is location with constant observed information ˆ= 1 j  .A simple example with immediate verification.Consider y with an exponential life model ( ; ) = exp( ) and positive y and  .The canonical variable is =   , free of the parameter; thus  has a location distribution exactly, which is centered at  .


by recentering the canonical parameter.With moderate regularity we then further standardize to obtain observed information 0 ˆ= j I  , the identity matrix.For this it is often convenient to order the coordinates of  such as = ( , ) T