1 Research Article Science Journal of Mathematics & Statistics Published By ISSN:2276-6324 Science Journal Publication http://www.sjpub.org/sjms.html International Open Access Journal © Author(s) 2013. CC Attribution 3.0 License. Nonparametric Hypothesis Testing Report Loc Nguyen Vietnam Institute of Mathematics Accepted 17 th July, 2013. Not Published Yet Abstract This report is the brief survey of nonparametric hypothesis testing. It includes four main sections about hypothesis testing, one additional section discussing goodness-of-fit and conclusion section. Sign test section gives an overview of nonparametric testing, which begins with the test on sample median without assumption of normal distribution. Signed-rank test section and rank-sum test section concern improvements of sign test. The prominence of signed-rank test is to be able to test sample mean based on the assumption about symmetric distribution. Rank-sum test discards the task of assigning and counting plus signs and so it is the most effective method among ranking test methods. Nonparametric ANOVA section discusses application of analysis of variance (ANOVA) in nonparametric model. ANOVA is useful to compare and evaluate various data samples at the same time. Nonparametric goodness-fit-test section, an additional section, focuses on different hypothesis, which measure the distribution similarity between two samples. It determines whether two samples have the same distribution without concerning how the form of distribution is. The last section is the conclusion. Note that in this report terms sample and data sample have the same meaning. A sample contains many data points. Each data point is also called an observation. Keywords: Overview of nonparametric testing, Nonparametric ANOVA section 1.1 Sign test Nonparametric testing is used in case of without knowledge about sample distribution; concretely, there is no assumption of normality. The nonparametric testing begins with the test on sample median. If distribution is symmetric, median is identical to mean. Given the median  is the data point at which the left side data and the right side data are of equal accumulate probability. P(D <  ) = P(D >  ) = 0.5 If data is not large and there is no assumption about normality, the median is approximate to population mean. Given null hypothesis H0:  =  ଴ and alternative hypothesis H1:  ≠  ଴ , the so-called sign test [1, pp. 656-660] is performed as below steps: 1. Assigning plus signs to sample data points whose values are greater than  ଴ and minus signs to ones whose values are less than  ଴ . Note that values which equal  ଴ are not considered. Plus signs and minus signs represent the right side and left side of  ଴ , respectively. 2. If the number of plus signs is nearly equal to the number of minus signs, then null hypothesis H0 is true; otherwise H0 is false. In other words, that the proportion of plus signs is significantly different from 0.5 cause to rejecting H0 in flavor of H1. The reason of H0 acceptance is that the probability that data points (or observations) fall in both left side and right side of  ଴ are of equal value 0.5 and of course, it is asserted that  ଴ is a real median. Note that terms data point, sample point, sample value and observation are identical. In the case that alternative hypothesis H1:  <  ଴ , if the proportion of plus signs is less than 0.5 then rejecting H0 in flavor of H1. In the case that alternative hypothesis H1:  >  ଴ , if the proportion of plus signs is greater than 0.5 then rejecting H0 in flavor of H1. Now let X be the discrete random variable representing the number of plus signs and suppose that X conforms binomial distribution B(X; n; p) where n and p are the total number of sample data points and the probability that plus sign is assigned to a data point, respectively. Because the proportion of plus signs gets 0.5 when H0:  =  ଴ is true, the parameter p is set to be 0.5. Given the distribution of plus signs is B(X; n; 0.5) and significant level α and let x be the instance of X where x = T୦ ୬u୫b୰ ୭୤ ୮୪uୱ ୱ୧୥୬ୱ ௡ , there are three following tests [1, pp. 657-660]:  H0:  =  ଴ and H1:  ≠  ଴ : In case of x < n/2, if 2P(X ൑ x) < α then rejecting H0. In case of x > n/2, if 2P(X ൒ x) < α then rejecting H0. This test belongs to two-sided test family.  H0:  =  ଴ and H1:  <  ଴ : if P(X ൑ x) < α then rejecting H0. This test belongs to one-sided test family.  H0:  =  ଴ and H1:  >  ଴ : if P(X ൒ x) < α then rejecting H0. This test belongs to one-sided test family. Note that Pȋ…Ȍ is accumulated probability of binomial distribution B(X; n; 0.5), for example, P(X ൑ x) = ∑ ቀ   ቁ Ͳ.5 ௞ Ͳ.5 ௡−௞ ௫ ௞=଴ . In case that n is large enough, for instance n > 10, B(X; n; 0.5) is approximate to standard normal distribution N(Z; 0; 1) where Z = ௑−଴.ହ௡ √଴.ଶହ௡ . Let z be the instance of Z where z = ௫−଴.ହ௡ √଴.ଶହ௡ , there are three following tests:  H0:  =  ଴ and H1:  ≠  ଴ : if |z| > zα/2 then rejecting H0 where zα/2 is 100α/2 percentage point of standard normal distribution.  H0:  =  ଴ and H1:  <  ଴ : if z < -zα/2 then rejecting H0.  H0:  =  ଴ and H1:  >  ଴ : if z > zα/2 then rejecting H0. In case of pair-test H0:  ଵ –  ଶ = d0 which we need to know how much median  ଵ shifts from other one  ଶ , sign test is applied in similar way with a little bit of change. If d0 = 0, H0 indicates whether  ଵ equals  ଶ . We compute all deviations between two samples X and Y where  ଵ is sample median of X and  ଶ is sample median of Y. Let di = xi – yi be the deviation between x ∈ Y and y ∈ Y. Plus signs (minus signs) are assigned to di (s) which are greater (less) than d0. Now signed test is applied into such plus signs and minus signs by discussed method. 2.0 Signed-rank test Sign test focuses on whether or not the observations are different from null hypothesis but not considers the magnitude of such difference. Wilcoxon signed-rank test [1, pp. 660-663] based on assumption of symmetric and continuous distribution considers both difference and how much difference is. The median  ଴ is identical to the mean μ according to symmetric assumption. It includes four following steps [1, pp. 660-663]: 1. Calculating all deviations between data points and μ0, we have D = {d1, d2,…, dn} where di = xi – μ0 and di ≠ 0. Note that data point xi is instance of random variable X. 2. Assigning a rank ri to each deviation di without regard to sign, for instance, rank value 1 and rank value n to be assigned to smallest and largest absolute deviation (without sign), respectively. If two or more absolute deviations have the same value, these deviations are assigned by average rank. For example, if 3 rd , 4 th and 5 th deviations get the same value, they receive the same rank (3+4+5) / 3 = 4. We have a set of ranks R = {r1, r2,…, rn} where ri is the rank of di. 3. Let w + and w – be the sum of ranks whose corresponding deviations are positive and negative, respectively. We have w + = ∑  ௜ ௗ  >଴ and w – = ∑  ௜ ௗ  <଴ and w = min(w + , w – ). Note that w is the minimum value between w + and w – . 4. In flavor of H1: μ < μ0, H0 is rejected if w + is sufficiently small. In flavor of H1: μ > μ0, H0 is rejected if w – is sufficiently small. In case of two-sided test H1: μ ≠ μ0, H0 is rejected if w is sufficiently small. The concept ǲsufficiently smallǳ is defined via thresholds or pre- computed critical values, see [Walpole, Myers, Myers, Ye 2012, pp. 759] for critical values. The value w + , w – or w is sufficiently small if it is smaller than a certain critical value with respect to significant level α. In case of pair test H0: μ1 – μ2 = d0, the deviation di in step 1 is calculated based d0 and two samples X and Y, so di = xi – yi – d0 where x ∈ Y and y ∈ Y. Note that μ1 and μ2 are taken from X and Y, respectively. Steps 2, 3, 4 are performed in similar way. Let W + be random variables of w + . If n ൒ 15 then W + approaches normal distribution with mean  ௐ+ = ௡ሺ௡+ଵሻ ସ and variance  ௐ+ ଶ = ௡ሺ௡+ଵሻሺଶ௡+ଵሻ ଶସ . We can normalize W + so as to define critical region via percentage point zα of normal standard distribution,  ௐ+ = ௐ + − +  + 3.0 Rank-sum test Rank-sum test [1, pp. 665-667] is a variant of signed-rank test. Suppose there are two samples X = {x1, x2,…, ݔ ௡ భ } and Y = {y1, y2,…, ݕ ௡ మ } and the null hypothesis is specified as H0: μ1 = μ2 where μ1 and μ2 are taken from X