On the Influential Points in the Functional Circular Relationship Models

If the interest is to calibrate two instruments then the functional relationship models are more appropriate than regression models. Fitting a straight line when both variables are circular and subject to errors has not received much attention. In this paper, we consider the problem of detecting influential points in two functional relationship models for circular variables. The first is developed based on the simple circular regression model, demoted by (SC), while the second is derived from the complex linear regression model and denoted by (CL). The covariance matrices are derived and then the COVRATIO statistics are formulated for both models. The cutoff points are obtained and the power of performance is examined via simulation studies. The performance of COVRATIO statistics depends on the concentration of error, sample size and level of contamination. In the case of the linear relationship between two circular variables, COVRATIO statistics of the (SC) model performs better than the (CL). Furthermore, a novel diagram, the so-called spoke plot, is utilized to detect possible influential points. For illustration purposes, the proposed procedures are applied on real data of wind directions measured by two different instruments. COVRATIO statistic and the spoke plot were able to identify two observations as influential points.


Introduction
In practice, calibrating two or more instruments producing angular measurements can be statistically handled via circular functional relationship models (see Caires andWyatt, 2003, Hassan, et al, 2010).Taking into the account that the compared variables are subjected to errors.Thus, the existence of one or more influential points in a data set is more likely to affect the efficiency of a suggested model.Up to date, there are only two published papers consider the problem of influential points in the circular functional relationship models.The first studies the linear functional relationship model for circular variables (SC) which proposed by Caires and Wyatt (2003), and the COVRATIO statistic is derived to detect possible influential points by Hussin, et al. (2010).The second paper treats the circular variables as complex numbers and then the complex linear functional relationship model (CL) was developed and the COVRATIO statistic was derived by Hussin and Abuzaid (2012).Caires and Wyatt (2003) fixed the slope parameter to be one, later on Hussin, et al. (2010) derived the COVRATIO based on the intercept and concentration parameters only.
Thus, introducing the slope parameter into the model will make the covariance matrix more informative.
On the other hand, the investigation of COVRATIO statistic performance for the (CL) model is questionable, where the real and imaginary components were contaminated separately, and as given by Hussin and Abuzaid (2012) the power of COVRATIO statistic was more than 0.15 for free contamination case.
In this paper, we will compare the performances of the COVRATIO statistics in detecting the influential points in the (SC) and the (CL) models by considering two issues: the first is to introduce the slope parameter for the (SC) model, and the second is to contaminate both model consistently.
The rest of the paper is organized as follows: the following section presents the considered two functional relationship models of circular data.Section 3 discusses the derivation of the COVRATIO statistics, calculation of the cut-off points and the power of performance.A real data set is presented and analyzed in Section 4.

Functional Relationship Models of Circular Variables
Fitting a straight line when both variables are circular and subject to errors has not received much attention.In the following two subsections we consider two functional relationship models for circular variables and derive their corresponding COVRATIO statistics.

Linear Functional Relationship Model for Circular Variables (SC)
For any two circular random variables X and Y measured with errors, Caires and Wyatt (2003) proposed the following model: where i i   and are independently distributed with von Mises distributions, that is Under the same assumptions, Hussin (2008) extended model (1) to the (SC) model and it is given by: where  is the slope parameter.There are (n+4) parameters to be estimated, i.  is given by: where 1  and 1 ˆi X are improvements of 0  and 0 ˆi X , respectively, and Then we can find an estimate of  for any value of  from the equation  ,  , the estimate of concentration parameter, ˆ may be obtained by using the approximation given in Fisher (1993): The second partial derivatives of the loglikelihood function with respect to the parameters are obtained and then the Fisher's information matrix is formulated to find the covariance matrix via finding its inverse.Then the following results can be obtained: (For detailed derivation see Hussin, 2008)

Complex Linear Functional Relationship Model for Circular variables (CL):
For any two circular random variables X and Y Hussin and Abuzaid (2012) proposed the complex linear functional relationship model, and it is given by: , and where j  and j  are independently distributed errors from the bivariate complex Gaussian distributions.The MLE of model parameters are given by: Due to the absence of the closed-form for  the estimates may be obtained iteratively.The asymptotic properties of  ˆ and  ˆ are obtained from Fisher's information matrix and given by: For large values of n, these estimates are normally distributed and can be used to estimate the standard error of   , ˆ and 2 1  (see Hussin and Abuzaid, 2012).

COVRATIO Statistic
Many procedures are derived based on deletion one-row approach to identify influential points in linear regression models (see Belsley, et al. 1980).COVRATIO statistic is defined as the determination ratio of covariance matrixes for full and reduced data.The COVRATIO is given by , where COV is the covariance matrix for full data set and ) ( i COV  is the covariance matrix by excluding the ith row.If the ratio is close to unity then there is no significant difference between the covariance matrices, i.e. the ith observation is not influential. Recently, COVRATIO statistic has been manipulated for circular regression models by Abuzaid, et al. (2011).The determinant of coefficients covariance matrix for the (SC) model can be written in the following form: and the determinant of coefficients covariance matrix for the (LC) model is given by: Assuming that the ratio of concentration parameters 1   , a random error from von Mises distribution with mean 0 and concentration parameter  =5,10,15,20 and 30 are added to the observed variables as given in model (2).Thus, the variance of the random error of the (CL) model are 0.2, 0.1, 0.067, 0.05 and 0.03, respectively.The values of error concentration parameters are determined to minimize their variation compared to the modeled variables.
The generated circular data are fitted by models (2) and (3) independently.The COV for the (SC) and the (CL) models by using expressions (4) and ( 5

COVRATIO
and to specify the maximum value.
The process is repeated 2000 times for each combination of sample size n and concentration parameter  (and variance values).Then the 10%, 5% and 1% upper percentiles of the maximum values of 1 ) (

COVRATIO
are calculated.The results show that the percentiles are independent of the variation parameters, where the values of standard deviations for the obtained cut-off points based on the considered variation are ranged between 0.001 and 0.019.Table 1 presents the cut-off points which are the mean of the percentiles associated with the standard deviation in the parentheses for each sample size n.

99%
For all sample sizes, the cut-off points of COVRATIO statistics of the (SC) model are less than its corresponding of the (CL) model.For small sample size (n=10), the values of the cut-off points exceed the value of one, reflecting the inappropriateness of COVRATIO statistics for both models to detect influential points in small samples.Furthermore, at certain level of significance, the cut-off points is a decreasing function of the sample size, which may refer to the relative effect of one point to the total weight of sample size.

Power of Performance
The performance of the two statistics are examined numerically via a series of simulation studies by considering five sample sizes, n=10,30,50,70 and 100 and two values of concentration parameter  =10 and 20.
Two different types of association between circular variables are considered.The first is a linear association (  =1 ), and the second is a nonlinear form of association.
Making use of the fact that bivariate von Mises distribution with a large concentration parameter,  tends to a bivariate normal distribution with variance ) /( where  is the circular correlation between two circular random variables (Singh, et al,   2002).A similar procedure to that described in Subsection 3.2 is used to generate the data, and for the purpose of comparison between COVRATIO statistics for the (SC) and the (CL) models, the generated data are contaminated at observation d as follows: ), In order to generate two observed circular dependent random variables X and Y, for each sample size n, a set of bivariate von Mises distribution ) , , ( is generated based on the rejection sampling algorithm, which proposed by Best and Fisher (1979), where ) and  =1.Thus, the variance of the each variable becomes 0.375.
The process is repeated 2000 times and the power of performance is calculated as the percentage of the correct detection of the contaminated observation at position d .The results of simulation study show that: in all cases, the power of performance is an increasing function of the contamination level  , Figure 1 shows the power of performance is a decreasing function of the sample size n.On the other hand, Figure 2 shows that the COVRATIO statistic for the (SC) model performs more efficient than the (CL) model when the association between the circular random variables is linear.Power reasonable and consistent where the power starts at almost zero when  =0 and approaches to 100% when  goes to 1.
Contrasting with the case when the circular variables are linearly correlated, Figure 3 shows that the COVRATIO statistic for the (SC) model performs less efficient than the (CL) model when the association is nonlinear.

Numerical Example
A real data set consisting of 129 pairs of observations of wind direction are recorded by two different instruments: an HF radar system and an anchored wave buoy.Figure 4 shows the spoke plot of wind direction data (Zubairi, et al., 2008) for the (SC) and the (CL) models are obtained and in Figure 5 and Figure 6, respectively.The statistics are able to define two observations as influential points which are 38 and 111.To investigate the effect of these two points they are deleted and the data are refitted using the (SC) and the (CL) models.The values of parameter estimates and their standard errors are given in and one, respectively.The standard error of the slope parameter in the (SC) model for the reduced data is less than the full data.On the other hand, for the (CL) model the standard error is almost the same.This indicates the efficiency of the COVRATIO statistic for the (SC) model more than the (CL) model in the considered data.The contamination procedure of the (CL) model has shown a reasonable performance compare to the procedure used previously by Hussin and Abuzaid (2012).The application of the proposed statistics for both models on wind data has shown a consistent conclusion of detecting two points as influential points.
Other functional relationship models for circular data based on nonlinear association between variables need to be studied.

3 . 2
formula to detect a suspected influential point when its value is exceeding the cut-off point.Cut-off points of COVRATIO statistic Monte Carlo simulation study is carried out to obtain the cut-off points of COVRATIO statistics for the (SC) and the (CL) models.Seven different sample sizes of n=10,30,50,70,100, 150 and 200 are used.Making use of the fact that the von Mises distribution with large concentration parameter,  tends to the normal distribution with variance and SenGupta, 2001).We generate X variable of size n from von Mises distribution, of generality, the parameters of the (SC) and the (CL) models are fixed at  =0 and  =1.Then the observed values of variable Y are calculated based on models (2) and (3) separately.
) are calculated.Next the ith row from the generated data ( value of d y after contamination and  is the level of contamination (

Fig. 4 :
Fig. 4: Spoke plot of wind direction data measured by both techniques.
an identifier of the influential points in functional relationship models of circular variables is derived and tested for two types of models.If two circular variables are correlated linearly, then the COVRATIO statistic of the (SC) model performs better than the (CL) model, vice versa for other types of association.

Table 1 : Cut-off points for the null distribution of 1
. The inner ring represents the HF radar while the outer ring represents the anchored wave buoy.Since almost all the lines do not cross the inner circle, it means that the data are highly i COVRATIO

Table 2 .
It is noticeable that for both models the estimates of intercept and slope become closer to zero