On Zeroes in Sign and Signed Rank Tests

When zeroes (or ties within pairs) occur in data being analyzed with a sign test or a signed rank test, nonparametric methods textbooks and software consistently recommend that the zeroes be deleted and the data analyzed as though zeroes did not exist. This advice is not consistent with the objectives of the majority of applications. In most settings a better approach would be to view the tests as testing hypotheses about a population median. There are relatively simple p-values available that are consistent with this viewpoint of the tests. These methods produce tests with good properties for testing a different (often more appropriate) set of hypotheses than those addressed by tests that delete the zeroes.

The sign test has a lengthy history in statistics, including its early application by Arbuthnot (1710) in eighteenth century and its formal description by Dixon and Mood (1946). Throughout, there has been substantial controversy (Randles 2001) about the role and use of zero (neutral) responses.

Example 1:
A study was conducted which had as one of its objectives to determine whether taking dichloroacetate (DCA) affects the hearts of patients. Since DCA is typically administered to correct energy metabolism disorders, effects on heart rate, either increases or decreases, could be viewed as a undesirable side-effect. Measurements of heart rate both before and 30 minutes after administration of DCA are displayed in Table  1. We focus our attention on the difference (after-before) column in this table. There are 15 positive differences, 3 negative differences and 2 zeroes. When conducting the sign test on this data, what roles should be paid by the 2 zero observations?
Statisticians with practical experience often claim that zeroes, which represent "no change in condition", are meaningful and important responses that should not be discarded. They argue that in most, but not all, settings, the zeroes should lend credence to the null hypothesis.
Some authors have used the zeroes to improve the power of the sign test as a test of (1). See for example, Starks (1979), Suissa and Shuster (1991), and Presnell (1996). The tests discussed in these papers have the property that with + and − fixed, the p-values generally decrease as 0 increases. Thus, zeroes add credence to the alternative when using these methods.
The purpose of this article is to recommend that the sign and signed rank tests be viewed as tests about population medians when handling observed zeroes. This would be consistent with the point estimates and confidence intervals that correspond to these tests, since they estimate population medians. It will also ensure that any zeroes would be viewed as meaningful and lending credence to the null hypothesis. In the majority of problem settings, this is the more appropriate viewpoint toward zeroes. Moreover, as this article presents, there are simple, practical ways to find p-values for the tests corresponding to this viewpoint.

The Median Sign Test
Consider the multinomial model as in Figure 1. Let denote the population median, and consider the one-sided test of the hypothesis 0 : = 0 vs : > 0 .
(5) Therefore, as atest of (3), the p-value is: Here the zeroes are combined with the negatives and both types are considered "failures" in the binomial setting. Using this p-value has sometimes been described as the ultraconservative approach to handling zeroes in the sign test. But, it is actually a very appropriate and powerful test of (3), which is a distinctly different objective from (4), the problem addressed by the usual (delete zeroes) sign test. The sign test is often described as a test about the population median. See, for example, Hollander et al (2014), page 90. Yet, when it comes to handling zeroes, this objective is usually abandoned.
Fong et al (2003) identified this as an interesting problem. They noted that doubling the smallest tail probability, i.e., − = 2 [ ≥ max ( + , − )|~( , 0.5)] (9) leads to p-values that are much too large and, in fact, may exceed 1. They proposed the following method of finding a p-value: where [|.|] is the greatest integer function. This is very simple and easy to implement. It only requires ( , 0.5) tables. The denominator in (10) is the maximum value possible for the numerator. Thus the p-value in (10) is always less than or equal to one.
We propose that the two-sided test be based on * = max ( + , − ), given the value of 0 . If 0 was known, we could construct a p-value at the boundary of the null via − ( 0 ) = [ * ≥ * |( + , 0 , − )~( , + * , 0 * , − * )], where + * = 0.5, 0 * = min ( 0 , 0.5) and − * = 1 − + * − 0 * . In practice, we would use − ( 0 ), where 0 = min (0.5, 0 ⁄ ). This can be viewed as a plug-in bootstrap p-value, where we have estimated the unknown 0 . It is more complex than (10), but is also less conservative. With modern computing packages and languages, the proposed p-value can be found easily via where ( | , ) is the binomial probability function and * ( ) is the distribution function of a binomial ( − , ) random variable with = (2(1 − 0 )) −1 . The p-value in (12) has some nice properties. It is equal to the usual two-sided binomial p-value when there are no zeroes. It has a natural relationship to the one-sided p-value in (6) and it is relatively easy to compute.
To illustrate the influence of zeroes on the p-vaues, table 2 uses the DCA data with + = 15 and − = 3 fixed, but varying the number of zeroes. The median sign tests are testing hypothesis (7) instead of hypothesis (1) which are tested by usual sign test (delete zeroes). The median sign tests have p-values which increase as the number of zeroes increases. The proposed p-value does not differ substantially from (10), described by Fong, Kwan, Lam and Lam, but the p-values are generally somewhat smaller.

Power Functions
The power of the two-sided tests based on the p-values described earlier: the usual sign test (2), the Fong, Kwan, Lam and Lam (10) and the proposed (12), were compared for a fixed probability 0 = 0.1,0.2,0.4; varying = + − − .Note that, for the median sign test, the boundary of the null hypothesis occurs when = 0 . Graphs of the actual power curves (enumerated, not simulated) are shown in Figure 2 for different sample sizes with 0 = 0.2 and = 0.05. This graph show that the proposed p-value controls the levels and improves the power of the median sign test for smaller sample sizes. The proposed p-values (12) are typically smaller than those in (10). But for a fixed value the tests may have the same rejection region, because of the discrete nature of the tests. The cases pictured are ones in which the rejection regions differ. They show that the proposed method can improve the power of the test because of the smaller p-values.

Signed Rank Tests
In the signed rank test, we assume { 1 , 2 , … , } are independent and identically distributed with distribution function (. ). Under the null hypothesis, the distribution This assigns the same p-value as would be found by the signed rank test, if the observed zeroes were actually negative numbers that were very close to zero. This approach is ths analogous to (6), because the zeroes are counted as evidence against the alternative.
where again + corresponds to use of all 2 equally likely sign sets attached to 1,2, … , . This p-value is the analogue to the one proposed by Fong et al (2003) for the sign test. The denominator is the largest possible for the numerator. It is computationally very easy as it only uses tables of distribution of + under null hypothesis. Table 3 displays the p-values of the signed rank tests for the DCA data, varying the number of zeroes. For this particular data set, there is an initial decrease in the p-value as number of zeroes increase, but it eventually increases sharply as number of zeroes keep increasing. Pratt's p-values decrease as number of zeroes increase for this data.

Power Simulation
The power of the two-sided test p-values were simulated when handling zeroes in the manner suggested by Wilcoxon (14), by Pratt (16) and by the proposed method (20). The model used a fixed value of 0 on 0 and probability (1 − 0 ) spread over a continuous distribution that is symmetric around a location parameter . As increases from 0, the value of + also increases, eventually exceeding 0.5. The powers were simulated using a normal distribution and Cauchy distribution for the continuous part of the distribution. The normal distribution results are displayed in Figure 3

Conclusions
In most applications, the zeroes are meaningful and a test about a population median is more appropriate than simply deleting the zeroes. While true in most settings, it is not always the case. Consider the AZT data reported by Makutch and Parks (1988) which displays serum antigen levels for 20 AIDS patients before and after treatment with AZT. Some of the patients had 0 serum antigen levels before treatment so their levels could only go up or stay the same. This data includes several types of zeroes, many with different interpretations. Including the zeroes in the analysis would seem to be problematic because of varying interpretations.
So, how should one decide whether or not to include zeroes in the analysis? The researcher needs to decide whether the focus is on (a) What is a typical response ( a difference with a paired data) or (b) Which type of change (increase or decrease) is more prevalent?
If a typical (median) response is the focus, then a median sign or median signed-rank test as proposed in this paper is the proper way to handle the zeroes. This is completely analogous to the paired t-test where typical is interpreted as the average response (difference) and zeroes are always included. If, on the other hand, the focus is on (b), then the zeroes are irrelevant and should be discarded. Asking whether a population like one shown in Figure 1 (right panel) should be detected, may help to elicit the choice between (a) or (b).