A Redescending M-Estimator for Detection and Deletion of Outliers in Regression Analysis

Outliers in a statistical analysis strongly affect the performance of the ordinary least squares, such outliers need to be detected and extreme outliers deleted. This paper is aimed at proposing a redescending M-estimator, which is more efficient and robust, compared to other existing redescending M-estimators. The proposed method is applied to real life data to verify its effectiveness in detecting and deleting of outliers. The Monte Carlo simulation method is also used to investigate the performance of the newly proposed method. The results from the real life data and the Monte Carlo simulation method show that the proposed method is effective in the detection and deletion of extreme outliers compared to other existing redescending M-estimators.


Introduction
The standard multiple regression model in matrix notation is given as where = ( 1 , 2 , … , )  is  1 vector of observations, is  matrix of observations on each of the explanatory variables, = ( 0, 1 , … , )  is a  1 vector of regression coefficients and  = ( 1 ,  2 , … ,  )  is a  1 vector of random error components. According to Sokal and Rohlf (2012), Ordinary Least Squares (OLS) regression fit a line to bivariate data such that the (squared) vertical distance from each data point to the line is minimized across all data points. OLS estimates are obtained by minimizing the sum of squared error (SSE) given as =∑  2 =    = ( − )  ( − ) =1 (2) Some of the assumptions of Ordinary Least Squares are: () = 0, (  ) =  2 1 , is a non-stochastic matrix and  (0,  2 1 ).

Review of M-estimators
M-estimators are robust estimators developed to give less weight to the observations that are outliers. It was introduced by Huber (1964) and can be regarded as a generalization of Maximum Likelihood Estimation; hence the "M". The Maximum Likelihood Estimator (MLE) is a method of estimating the parameters of a model by maximizing the model's likelihood function. Considering the linear model in equation (1), the fitted model is Replacing the squared error term in equation (6) by (r), M-estimator is given as where ( ) is the objective function of an M-estimator.
Standardizing the residuals, r i , equation (7) can also be written as where  is the scale parameter given as  = 0.674 (9) and MAD is the Median Absolute Deviation given as Since standard deviation is not resistant to outliers, the Median Absolute Deviation (MAD) is used as a measure of spread in robust regression.
The Influence function describes the sensitivity of the overall estimate on the outlying data. Hampel (1974) disclosed that the robustness of an estimator is measured by its influence function. The derivative of equation (8) with respect to the regression coefficient  gives rise to the influence function ( -function), that is, where ( ) is defined as the influence function and  is the scale parameter.
Draper and Smith (1998) defined the weighted function, , as Huber (1964) proposed the Huber M-estimator and its influence function, ( ), is

Huber M-estimator
where is arbitrary value known as tuning constant and are the residuals scaled over Median Absolute Deviation (MAD). Its influence function is non-decreasing function with a tuning constant c = 1.345 which yields 95% efficiency on a normal distribution (the tuning constant , determines the degree of robustness in M-estimators). Huber estimator is not robust when the outliers present in the data are in -direction (leverage points).
Redescending M-estimators are estimators with -functions redescending to zero. They are those M-estimators that reject extreme outliers. Some of these estimators discussed in the literature are:

Hampel M-estimator
Hampel's three-part redescending M-estimator was proposed by Hampel et al. (1986) in the Princeton Robustness study. It has three tuning constants , and . Its -function is given as where , , are positive constants and 0 < ≤ < < ∞ and are the residuals scaled over Median Absolute Deviation MAD. The drawback of this estimator is that, its influence function is non-differentiable.

Tukey's Biweight M-estimator
Beaten and Tukey (1974) proposed Tukey's biweight M-estimator and its -function is given as where is arbitrary value known as tuning constant and are the residuals scaled over MAD. For Tukey's biweight, = 4.685 gives 95% efficiency on normal distribution. The performance of Tukey's biweight estimator was encouraging, that is, the influence function is differentiable and smooth when compared to the methods proposed by Huber (1964) and Hampel et al. (1986). Alamgir et al. (2013) proposed the Alarm's Redescending M-estimator for robust regression and outlier detection. Its -function is given as

Alarm M-estimator
where is the tuning constant and are the residuals scaled over MAD.
The Alarm estimator was based on the modified tangent hyperbolic (tan h) type weight function. Its Mean Square Errors (MSE) are the smallest when compared with that of Huber (1964), Beaton and Tukey (1974) and Hampel et al. (1986), yielding efficient results. For Alarm M-estimator, = 3 gives approximately 95% efficiency at normal distribution.

The Proposed Estimator
In this section, a new Influence function, ( ), is proposed (based on modified Tukey's biweight -function) with 95% efficiency at normal distribution, we introduced a function ( ) ( ) is a smooth and differentiable function for all , are the residuals scaled over Median Absolute Deviation (MAD) and is the tuning constant.
In addition, we multiply the function, ( ), by the Tukey's biweight -function resulting in the proposed influence function, ( ), given as where is the tuning constant for the ℎ observation and the variable are the residuals scaled over MAD.
By integrating the ( ) with respect to , we obtain the corresponding objective function, ( ), given as where is the tuning constant for the ℎ observation and the variable, , are the residuals scaled over MAD.

Derivation of equation (19)
For the second part of ( ) , we use the same argument in Beaton and Tukey (1974) by substituting for in equation (21).
The proposed ( ) satisfies the standard properties of the objective function of an M-estimator.
Dividing the proposed ( ) by gives the weight function, ( ), as follows: Graphs of the proposed objective, influence and weight functions are shown below: Figure 1 : Graph of the Proposed Influence Function Figure 1 shows that the values of the objective function are non-negative. Secondly, the graph indicates that the objective function is symmetric, differentiable and a continuous function. In Figure 2, the proposed estimator redescends to zero by assigning zero influence to extreme outliers thereby rejecting them.  Figure 3 indicates that, the good observations are assigned bigger weights while outliers have the smaller weights. In addition, extreme outliers are assigned zero weights, which implies that they are eliminated from the dataset.

Simulation Design
Monte Carlo simulation method is used to generate random data from different probability distributions. The purpose of the simulation study is to determine the extent our estimates differ from their true values (robustness). We took the true parameters to be 1, 2, and 5 for 0 , 1 , and 2 respectively. Each simulation case was replicated = 1000 times. The estimates of each estimator were calculated in each of the iteration and the mean of the M replicated estimates given by was recorded for each estimator. Robustness of an estimator is measured using absolute bias given as Efficiency of an estimator is measured using the MSE (mean square error) defined as and the variance of the estimator is defined as The estimator with lowest MSE is the most efficient; the smaller the MSE the more efficient is the estimator. Simulated data were generated (including percentage mixtures of contaminated and uncontaminated data) in both simple and multiple regressions, using two sample sizes, = 20 and 200. The percentages of outliers considered in the simulation study were as follows: For the x-axis, we chose contamination at 20%, and 30%. For the y-axis, we chose contamination at 20%, 30% and 40%.

Algorithm for the Simulation Studies
1. Compute the initial estimates using the Least Median Squares (LMS).
2. Obtain the corresponding residuals from our initial estimates.
3. Compute the corresponding weights based on the proposed weight function.
4. Calculate the new estimates of the regression coefficients using weighted least squares.

Choice of the Tuning Constant
A simulation study was done to determine the choice of the tuning constant. A tuning constant of = 3 gives the best result for estimating the true parameters, detecting and delecting of outliers.

Results
The Simulated results for the proposed estimator and that of OLS, Huber, Hampel, Bisquare (Biweight) and Alarm estimators are discussed as follows:

Discussion of Simulated results for data without outlier
Tables 1 and 7 present detailed stimulated results for uncontaminated data from simple and multiple regressions respectively. The OLS having the least MSE, outperformed the Huber, Hampel, Bisquare, Alarm and the proposed estimators, that is, the most efficient estimator. Similarly, the OLS, Huber, Hampel, Bisquare, Alarms and the proposed estimators are all closer to their true parameters estimates (robustness).  Based on data generated from 30% outliers in -direction in a multiple regression, shown in table 9, the result indicates that the Alarm estimator is the most efficient having the smallest MSE among others while Hampel estimator is the second. The third most efficient is the proposed estimator followed by the remaining three estimators (OLS, Huber, and Bisquare estimators). Furthermore, outliers strongly affect the slopes of all the estimators (The proposed, Hampel, Alarm, OLS, Huber, and Bisquare estimators). All the estimators performed badly in this category.

Discussion of stimulated results for data with outliers at the response, that is, in the -direction
Furthermore, the results from tables 10 and 4 (20% outliers in the -direction for simple and multiple regressions respectively), indicate that the proposed estimator is the most efficient and robust, followed closely by the Bisquare estimator. The Alarm, Hampel and Huber estimators follow thereafter, while OLS is the least. Tables 5 and 11, present the results for 30% outliers in the -direction for simple and multiple regressions respectively. The proposed estimator competes favourably with the Bisquare estimator as the most efficient estimator but Bisquare estimator gets better with an increase in the sample size. The least efficient is the OLS followed by the Huber, then, the Hampel's estimator. Nevertheless, the proposed and Bisquare estimators are more robust compared to the Hampel, Alarm, OLS and Huber estimators.
Nevertheless, in tables 6 and 12 (results for 40% outliers in the -direction for simple and multiple regressions respectively), the proposed and Bisquare estimators are also more efficient and robust compared to the Hampel, Alarm, OLS and Huber estimators.

Data Analysis
In this section, we applied the proposed estimator to real-life data to verify its effectiveness in detecting and deleting of outliers. These datasets had been extensively used by other researchers in the area of robust regression.

Example 1: Telephone-Call Data (Simple Regression Case)
This is a real regression data with few outliers in -direction. The data set is taken from Belgium Statistical Survey (Rousseeuw and Leroy, (1987)). The data contains 24 data points and 2 variables. The dependent variable is the number of telephone calls made from Belgium and the independent variable is the year.  1987)) generated artificial data for testing the performance of robust estimators. The data contains 75 observations in four dimensions (one response and three explanatory variables). The first 10 observations are bad leverage points, and the next four points are good leverage points (i.e., their xi are outlying, but the corresponding yi fit the model quite well). The summary of the results for estimates of the model parameters for Hawkins, Bradu and Kass data for the estimators are presented in Table 14. With smaller Residual Standard Error (RSE), the Alarm, Hampel, Biweight and the proposed estimators, performed better than OLS and Huber estimators. In addition, OLS and Huber used all the data in the analysis while Alarm, Hampel and the proposed method detected and deleted 10 outliers in the robust fit. The Biweight estimator detected and deleted 4 outliers in the analysis.

Conclusion
A redescending M-estimator was proposed and the graphs of its objective, influence and weight functions satisfied the various properties of these functions. Simulation studies were done to ascertain the effectiveness of the proposed redescending M-estimator and for comparison with other existing methods. Mean square error (MSE) and BIAS were used for comparison under two different sample sizes.
From the Stimulated results, it was obvious that Ordinary least squares estimator outperformed other estimators in an uncontaminated data (clean data). On the other hand, all the estimators performed very well when outliers are in the -direction but the proposed estimator tops the list as the most efficient and robust estimator while the Biweight, Alarm and Hampel estimators followed closely. Consequently, when outliers are in the leverage points, the proposed and Alarm estimators take the lead as the most efficient and robust estimators among others.
In addition, robust regression analysis was fitted using the Telephone call data and the Hawkins, Bradu and Kass data to illustrate the ability of the proposed estimator to detect and delete outliers and to compare with the existing ones. The results from the two robust fits showed that the proposed method can successively detect and delete outliers and for comparison, the proposed estimator alongside the Alarm, Hampel and Biweight (only when outliers are in the response) estimators showed great resistance to outliers.