1
Research Article
Science Journal of Mathematics & Statistics Published By
ISSN:2276-6324 Science Journal Publication
http://www.sjpub.org/sjms.html International Open Access Journal
© Author(s) 2013. CC Attribution 3.0 License.
Nonparametric Hypothesis Testing Report
Loc Nguyen
Vietnam Institute of Mathematics
Accepted 17
th
July, 2013. Not Published Yet
Abstract
This report is the brief survey of nonparametric hypothesis testing. It includes
four main sections about hypothesis testing, one additional section discussing
goodness-of-fit and conclusion section.
Sign test section gives an overview of nonparametric testing, which begins with
the test on sample median without assumption of normal distribution.
Signed-rank test section and rank-sum test section concern improvements of
sign test. The prominence of signed-rank test is to be able to test sample mean
based on the assumption about symmetric distribution. Rank-sum test discards
the task of assigning and counting plus signs and so it is the most effective
method among ranking test methods.
Nonparametric ANOVA section discusses application of analysis of variance
(ANOVA) in nonparametric model. ANOVA is useful to compare and evaluate
various data samples at the same time.
Nonparametric goodness-fit-test section, an additional section, focuses on
different hypothesis, which measure the distribution similarity between two
samples. It determines whether two samples have the same distribution without
concerning how the form of distribution is.
The last section is the conclusion.
Note that in this report terms sample and data sample have the same meaning. A
sample contains many data points. Each data point is also called an observation.
Keywords: Overview of nonparametric testing, Nonparametric ANOVA section
1.1 Sign test
Nonparametric testing is used in case of without knowledge about
sample distribution; concretely, there is no assumption of normality. The
nonparametric testing begins with the test on sample median. If
distribution is symmetric, median is identical to mean. Given the median
is the data point at which the left side data and the right side data are of
equal accumulate probability.
P(D < ) = P(D > ) = 0.5
If data is not large and there is no assumption about normality, the
median is approximate to population mean. Given null hypothesis H0: =
and alternative hypothesis H1: ≠
, the so-called sign test [1, pp.
656-660] is performed as below steps:
1. Assigning plus signs to sample data points whose values are
greater than
and minus signs to ones whose values are less
than
. Note that values which equal
are not considered. Plus
signs and minus signs represent the right side and left side of
,
respectively.
2. If the number of plus signs is nearly equal to the number of
minus signs, then null hypothesis H0 is true; otherwise H0 is false.
In other words, that the proportion of plus signs is significantly
different from 0.5 cause to rejecting H0 in flavor of H1.
The reason of H0 acceptance is that the probability that data points (or
observations) fall in both left side and right side of
are of equal value
0.5 and of course, it is asserted that
is a real median. Note that terms
data point, sample point, sample value and observation are identical.
In the case that alternative hypothesis H1: <
, if the proportion of
plus signs is less than 0.5 then rejecting H0 in flavor of H1. In the case that
alternative hypothesis H1: >
, if the proportion of plus signs is greater
than 0.5 then rejecting H0 in flavor of H1. Now let X be the discrete
random variable representing the number of plus signs and suppose that
X conforms binomial distribution B(X; n; p) where n and p are the total
number of sample data points and the probability that plus sign is
assigned to a data point, respectively. Because the proportion of plus
signs gets 0.5 when H0: =
is true, the parameter p is set to be 0.5.
Given the distribution of plus signs is B(X; n; 0.5) and significant level α
and let x be the instance of X where x =
T୦ ୬u୫b୰ ୭ ୮୪uୱ ୱ୧୬ୱ
, there are
three following tests [1, pp. 657-660]:
H0: =
and H1: ≠
: In case of x < n/2, if 2P(X x) < α then
rejecting H0. In case of x > n/2, if 2P(X x) < α then rejecting H0.
This test belongs to two-sided test family.
H0: =
and H1: <
: if P(X x) < α then rejecting H0. This
test belongs to one-sided test family.
H0: =
and H1: >
: if P(X x) < α then rejecting H0. This
test belongs to one-sided test family.
Note that Pȋ…Ȍ is accumulated probability of binomial distribution B(X; n;
0.5), for example, P(X x) = ∑ ቀ
ቁ Ͳ.5
Ͳ.5
− ௫
=
. In case that n is large
enough, for instance n > 10, B(X; n; 0.5) is approximate to standard
normal distribution N(Z; 0; 1) where Z =
−.ହ
√.ଶହ
. Let z be the instance of Z
where z =
௫−.ହ
√.ଶହ
, there are three following tests:
H0: =
and H1: ≠
: if |z| > zα/2 then rejecting H0 where zα/2
is 100α/2 percentage point of standard normal distribution.
H0: =
and H1: <
: if z < -zα/2 then rejecting H0.
H0: =
and H1: >
: if z > zα/2 then rejecting H0.
In case of pair-test H0:
ଵ
–
ଶ
= d0 which we need to know how much
median
ଵ
shifts from other one
ଶ
, sign test is applied in similar way
with a little bit of change. If d0 = 0, H0 indicates whether
ଵ
equals
ଶ
. We
compute all deviations between two samples X and Y where
ଵ
is sample
median of X and
ଶ
is sample median of Y. Let di = xi – yi be the deviation
between x ∈ Y and y ∈ Y. Plus signs (minus signs) are assigned to di (s)
which are greater (less) than d0. Now signed test is applied into such plus
signs and minus signs by discussed method.
2.0 Signed-rank test
Sign test focuses on whether or not the observations are different from
null hypothesis but not considers the magnitude of such difference.
Wilcoxon signed-rank test [1, pp. 660-663] based on assumption of
symmetric and continuous distribution considers both difference and
how much difference is. The median
is identical to the mean μ
according to symmetric assumption. It includes four following steps [1,
pp. 660-663]:
1. Calculating all deviations between data points and μ0, we have D
= {d1, d2,…, dn} where di = xi – μ0 and di ≠ 0. Note that data point
xi is instance of random variable X.
2. Assigning a rank ri to each deviation di without regard to sign, for
instance, rank value 1 and rank value n to be assigned to
smallest and largest absolute deviation (without sign),
respectively. If two or more absolute deviations have the same
value, these deviations are assigned by average rank. For
example, if 3
rd
, 4
th
and 5
th
deviations get the same value, they
receive the same rank (3+4+5) / 3 = 4. We have a set of ranks R =
{r1, r2,…, rn} where ri is the rank of di.
3. Let w
+
and w
–
be the sum of ranks whose corresponding
deviations are positive and negative, respectively. We have w
+
=
∑
ௗ
>
and w
–
= ∑
ௗ
<
and w = min(w
+
, w
–
). Note that w is the
minimum value between w
+
and w
–
.
4. In flavor of H1: μ < μ0, H0 is rejected if w
+
is sufficiently small. In
flavor of H1: μ > μ0, H0 is rejected if w
–
is sufficiently small. In case
of two-sided test H1: μ ≠ μ0, H0 is rejected if w is sufficiently small.
The concept Dzsufficiently smalldz is defined via thresholds or pre-
computed critical values, see [Walpole, Myers, Myers, Ye 2012,
pp. 759] for critical values. The value w
+
, w
–
or w is sufficiently
small if it is smaller than a certain critical value with respect to
significant level α.
In case of pair test H0: μ1 – μ2 = d0, the deviation di in step 1 is calculated
based d0 and two samples X and Y, so di = xi – yi – d0 where x ∈ Y and y ∈ Y.
Note that μ1 and μ2 are taken from X and Y, respectively. Steps 2, 3, 4 are
performed in similar way.
Let W
+
be random variables of w
+
. If n 15 then W
+
approaches
normal distribution with mean
ௐ+
=
ሺ+ଵሻ
ସ
and variance
ௐ+
ଶ
=
ሺ+ଵሻሺଶ+ଵሻ
ଶସ
. We can normalize W
+
so as to define critical region via
percentage point zα of normal standard distribution,
ௐ+
=
ௐ
+
−
+
+
3.0 Rank-sum test
Rank-sum test [1, pp. 665-667] is a variant of signed-rank test. Suppose
there are two samples X = {x1, x2,…, ݔ
భ
} and Y = {y1, y2,…, ݕ
మ
} and the null
hypothesis is specified as H0: μ1 = μ2 where μ1 and μ2 are taken from X