On the Consistency of a Class of Nonlinear Regression Estimators

In this paper, we study conditions sufficient for strong consistency of a class of estimators of parameters of nonlinear regression models. The study considers continuous functions depending on a vector of parameters and a set of random regressors. The estimators chosen are minimizers of a generalized form of the signed-rank norm. The generalization allows us to make consistency statements about minimizers of a wide variety of norms including the 1 L and 2 L norms. By implementing trimming, it is shown that high breakdown estimates can be obtained based on the proposed dispersion function.


1.Introduction
Over the last twenty five years considerable work has been done on robust procedures for linear models.Several classes of robust estimates have been proposed for these models.One such class is the generalized signed-rank class of estimates.This class uses an objective function which depends on the choice of a score function,   .If   is monotone then the objective function is a norm and the geometry of the resulting robust analysis, (estimation, testing, and confidence procedures), is similar to that of the geometry of the traditional least squares (LS) analysis; see McKean and Schrader (1980).Generally this robust analysis is highly efficient relative to the LS analysis; see the monograph by Hettmansperger and McKean (1998) for a discussion of this analysis.For the simple location model, if Wilcoxon scores, ( ) = u u   , are used then this estimate is the famous Hodges-Lehmann estimate while if sign scores are used, ( ) 1 u    , it is the sample median.If the monotonicity of   is relaxed then high breakdown estimates can be obtained; see Hössjer (1994).Thus the signed-rank family of robust estimates for the linear model contain estimates which range from highly efficient to those with high breakdown and they generalize traditional nonparametric procedures in the simple location problem.
Many interesting problems, though, are nonlinear in nature.Traditional procedures based on LS estimation have been used for years.Since these LS procedures for nonlinear models use the Euclidean norm they are as easily interpreted as their linear model counterparts.The asymptotic theory for nonlinear LS has been developed by Jennrich (1969) and Wu (1981), among others.In this paper, we propose a nonlinear analysis based on the signed-rank objective function.The objective function is a norm if   is monotone; hence, the estimates are easily interpretable.We keep our development quite general, though, to include nonlinear estimates based on Hössjer-type estimates also.Hence our estimates include the nonlinear extensions of the signed-rank Wilcoxon estimate and the 1 L estimate as well as the extensions of high breakdown linear model estimates.Thus we offer a rich family of estimates from which to select for nonlinear models.Abebe & McKean (2007) studied the asymptotic properties of the Wilcoxon estimator for the general nonlinear model.Just as in linear models, this estimator was shown to be efficient but sensitive to local changes in the direction of x .Jure c  ková (2008) studied the asymptotic properties of general rank tests using regression rank scores for the nonlinear model.Her approach uses the asymptotic equivalence of regression quantiles and regression rank scores.This limits the set score functions that can be used.In contrast, our proposed estimator allows for a set of scores generated by any nondecreasing bounded score function that has at most a finite number of discontinuities.
In Section 2 we present our family of estimates for nonlinear models.
In Section 3, we show that these estimates are strongly consistent under certain assumptions.We discuss these assumptions, contrasting them with assumptions for current existing estimates.The same section contains a general discussion of interesting special cases such as the 1 L and the Wilcoxon.Section 4 discusses the conditions needed to achieve positive breakdown of our estimator.In Section 5 we provide the proofs of our theory.

Definition and Existence
Consider the following general regression model 0 = ( , ) , 1 x  is a vector of independent variables, and be the set of sample data points.Note that    V    .
We shall assume that  is compact, 0  is an interior point of  , and ( , ) f  x is a continuous function of  for each  x  and a measurable function of x for each    .
We define the estimator of 0  to be any vector  minimizing where ( ) = ( , ) The function :  is continuous and strictly increasing.
  that has at most a finite number of discontinuities.This estimator will be denoted by Jennrich (1969) implies the existence of a minimizer of ( , ) We adopt Doob's (1994) convention and denote by p L , 1 p    , the space of measurable functions : (0,1)  and the space of essentially bounded measurable functions for All integrals are with respect to Lebesgue measure on (0,1) .The range of integration will be assumed to be (0,1) unless specified otherwise.

Consistency
Let ( , , ) P   be a probability space.For = 1, , i n  , assume that A2: for 1 q    , assume there exists a function h such that  and, A3: G has a density g that is symmetric about 0 and strictly decreasing on   .
As usual, we let . .a s convergence, denote almost sure convergence, i.e., pointwise convergence everywhere except for possibly an event in  of probability 0. , then A2 puts h and   in conjugate spaces when (1, ) p   .Hölder's inequality ensures that the product is integrable.Furthermore, if  is a convex function, an application of Minkowski's inequality yields Thus separate conditions on e and f are sufficient for

Some Corollaries
Next some special cases of interest are considered.We consider the 1 L , least squares, signed-rank Wilcoxon, and their trimmed variations.All these cases involve a convex  and hence Remark 2 is directly applicable.Trimming is implemented by "chopping-off" the ends of the score generating function,   [cf Hössjer (1994)].The proofs follow from Theorem 1 in a straightforward manner.

Let ( )
A I  be a function such that ( ) = 1   , then the dispersion function becomes the least trimmed squares dispersion.The following corollary gives the sufficient conditions for the strong consistency of the least squares estimator by taking = = 2 p q in Theorem 1.

Corollary 1. 5 If
x for all    , and B3: G has a density g that is symmetric about 0 and strictly decreasing on   , then the least squares (least trimmed squares) estimator is strongly consistent for 0  .Jennrich (1969) establishes the strong consistency of the least squares estimator under some assumptions.His assumptions in the notation of this paper are J1: x for all    , and Assumptions B2 and J2 are identical.B3 and J3, while not generally comparable, are identical in most practical situations where a symmetric, unimodal error density is assumed.Proceeding to compare B1 and J1, assume that B1 fails to hold, that is there exists a point . Thus J1 fails.The converse is also immediate.Hence our assumptions reduce to the assumptions of Jennrich (1969) in the case of least squares.
For linear models, the consistency of the least trimmed squares estimator is established by Víšek (2006).He considers the estimator to be nonlinear, since a subset of the data is considered, and establishes consistency using two different approaches: (1) using an asymptotic linearity argument and (2) using the uniform law of large numbers of Andrews (1987).Čižek (2006) applied the approach used in Víšek (2006) and studied least trimmed squares estimators for nonlinear regression models.His study included models with certain types of dependence such as  -mixing.The conditions given in Víšek (2006) and Čižek (2006) are general; however, our approach establishes consistency for a much larger class of models and estimators.L and trimmed absolute deviations estimators can be found from Theorem 1 by taking = p  and = 1 q .These are given in the following corollary.

Corollary 2. 6 If
x for all    , and C3: G has a density g that is symmetric about 0 and strictly decreasing on   , then the 1 L (trimmed absolute deviations) estimator is strongly consistent for 0  .
We next compare the result in Corollary 2 with the one given by Oberhofer (1982).
Oberhofer proves the weak consistency by imposing the following conditions.
O1:If *  is a closed set not containing 0  , then there exist numbers > 0  and 0 n such that for all  where 0 ( ) = ( ; ) ( ; ) x for all    , and O3: Here O3 is weaker than C3.However, O2 is stronger than C2.Following similar contrapositive arguments as in the least squares case, we can easily show that O1 is also stronger than C1 (see also Oberhofer (1982) p. 318).For a detailed discussion of this and sufficient conditions for O1, the reader is referred to Oberhofer (1982).

Signed-Rank Wilcoxon
x for all    , and D3: G has a density g that is symmetric about 0 and strictly decreasing on   , then the signed-rank Wilcoxon estimator is strongly consistent for 0  .

Remark 4. 8Normal Scores
The frequently used normal scores are generated by where  represents the standard normal distribution function.These scores were first proposed by Fraser (1957).Since   needs to be bounded for our approach to work, our results do not directly extend to the case of normal scores.However, we may use Winsorized normal scores such as .
Usually we take = 4 k .

Breakdown Point
One of the virtues of the estimators discussed in this paper is that they allow for trimming.This in turn provides us with estimates that are robust when one or more of the model assumptions are violated.In this section we will consider the breakdown point of our estimator as a measure of its robustness.Assuming that the true value of the parameter to be estimated is in the interior of the parameter space  , breakdown represents a severe form of inconsistency in that the estimator converges to a point on the boundary of  instead of 0  .
Recall that where  ( )  V is the estimate obtained based on the sample V .In nonlinear regression, however, this definition of the breakdown point fails since *  is not invariant to nonlinear reparameterizations.For a discussion of this see Stromberg and Ruppert (1992).We will adopt the definition of breakdown point for nonlinear models given by Stromberg and Ruppert (1992).The definition proceeds by defining finite sample upper and lower breakdown points,   and   , which depend on the regression model, f .For any 0  x  , the upper and lower breakdown points are defined as and The finite sample breakdown point is now defined as Here [ ] b stands for the greatest integer less than or equal to b .This forces at least the first half of the ordered absolute residuals to contribute to the dispersion function.In light of this, the dispersion function may be written as The following theorem is a version of Theorem 3 of Stromberg and Ruppert (1992).We impose the same conditions but give the result in terms of k .The results given are for upper breakdown.Analogues for lower breakdown are straightforward.The proof is obtained by replacing Ruppert's (1992) proof of Theorem 3. In the following, #( ) A denotes the cardinality of the set A .Theorem 2.9 Assume for some fixed Theorem 2 establishes that even when the regression function f lies on the boundary for a portion of the data, the bias of the estimator of 0  remains within reasonable bounds if trimming is implemented.The following corollary gives the asymptotic (as n   ) breakdown point of  n  .

Corollary 4.10 Let
This is reminiscent of the breakdown point of a linear function of order statistics which is equal to the smaller one of the two fractions of mass at either ends of the distribution which receive weights equal to zero (Hampel, 1971).The same result obtained in Corollary 4 was given by Hampel (1971) for one-sample location estimators based on linear functions of order statistics (see sec. 7 (i) of Hampel (1971)).
Consider the class of models with the form     and ( ) g t is monotone increasing in t .This class of models is considered by Stromberg and Ruppert (1992) and contains popular models like the logistic regression model Stromberg and Ruppert (1992) was given by Sakata and White (1995).Under our assumptions this definition reduces to the one used in the current paper as shown in Theorem 2.3 of Sakata and White (1995).

Proofs
Let  be order statistics from a sample of n i.i.d.uniform (0,1) random variables.Let : (0,1) , = 1, 2, be Lebesgue measurable functions and let . In the defining expression for the function ( , ) The following is Corollary 2.1 of van Zwet (1980) in the notation of this paper and is given for completeness.
, and suppose that and there exists a function p J L  such that , or For our purposes let indicator of the set A and take = J   .Notice that n J is a step function and thus the uniform integrability condition in assumption (ii) of Lemma 1 becomes 1 | ( / ( 1)) |= 0, sup lim . This condition is satisfied if we have convergence in    for all (0,1) t  provided that   has at most a finite number of discontinuities.Thus if   satisfies (5.2) and q g L  all the conditions of Lemma 1 hold.The following corollary is a special case of this result.
 be a continuous Borel measurable function.Suppose, for A formal proof of Corollary 5 may be constructed along the lines described in the paragraph preceding it with the function g defined as where :     is a function satisfying Then under 1 A -3 A , Theorem 2 of Jennrich (1969) gives (5.3).
To establish (5.4) we follow a similar strategy as in Hössjer (1994).
Therefore, setting = min{ , }     , for all n   and all , ' Proof of Theorem 1.By Lemma 1 of Wu (1981), to establish the consistency of where ( , ) = ( , ) ( ) Now take liminf of all three parts as n   .Since the functions min and max are continuous, we have The proof is complete.

, 1 e and 1 |
independent random variables (carried by ( , , ) P  ) with distributions H and G , respectively.We shall write x , e and | ( )| z  for 1 x ( ) | z  respectively.Let G   denotes the distribution of | ( ) |z  and we will assume

1 
the dispersion function given by (2.2) is the least squares dispersion function.If 0 < < < 1

1 L
, Trimmed Absolute DeviationsThe 1 L estimator corresponds to the case where 1   and ( ) =w w  for 0 w  .A situation similar to the least trimmed squares estimator holds for the trimmed absolute deviations estimator.The sufficient conditions for the strong consistency of the 1 The following corollary gives the sufficient conditions for the strong consistency of the signed-rank Wilcoxon estimator.The proof is analogous to the proof of Corollary 2 and thus omitted.
sample data points.Let m  be the set of all data sets obtained by replacing any m points in V by arbitrary points.The finite sample breakdown point of an estimator   is defined as [see Donoho and Huber (

4 )
The finite sample upper and lower breakdown points are defined analogously by replacing  by   and   , respectively, in the above definition.Stromberg and Ruppert (1992) also show that * =   in the case of a linear regression (i.e. ( , ) = '  for nonlinear least squares regression as expected.Assume the scores ( ) n a i are nonnegative and = max{ : ( ) > 0} n

Corollary 5 . 13
Let 1 , , n W W  be a random sample from a distribution F with support on   .Let :

V
follows from expression (5.1) and Corollary 5, which also furnishes the function 1

.(V
For the statement given in(5.6)  to hold, it suffices to show is that * , being continuous on a compact set *  , is uniformly continuous on *  .Then ( , )   , forms an open covering of *  .But *  is compact, hence there is a finite subcovering , ) |< , . ., .