Sei sulla pagina 1di 12

In statistics, the term non-parametric statistics covers a range of topics: distribution free methods which do not rely on assumptions

s that the data are drawn from a given probability distribution. As such it is the opposite of parametric statistics. It includes non-parametric statistical models, inference and statistical tests. non-parametric statistic can refer to a statistic (a function on a sample) whose interpretation does not depend on the population fitting any parametrized distributions. Statistics based on the ranks of observations are one example of such statistics and these play a central role in many non-parametric approaches. non-parametric regression refers to modeling where the structure of the relationship between variables is treated non-parametrically, but where nevertheless there may be parametric assumptions about the distribution of model residuals. Non-parametric methods are widely used for studying populations that take on a ranked order (such as movie reviews receiving one to four stars). The use of non-parametric methods may be necessary when data have a ranking but no clear numerical interpretation, such as when assessing preferences; in terms of levels of measurement, for data on an ordinal scale. As non-parametric methods make fewer assumptions, their applicability is much wider than the corresponding parametric methods. In particular, they may be applied in situations where less is known about the application in question. Also, due to the reliance on fewer assumptions, non-parametric methods are more robust. Another justification for the use of non-parametric methods is simplicity. In certain cases, even when the use of parametric methods is justified, non-parametric methods may be easier to use. Due both to this simplicity and to their greater robustness, non-parametric methods are seen by some statisticians as leaving less room for improper use and misunderstanding. The wider applicability and increased robustness of non-parametric tests comes at a cost: in cases where a parametric test would be appropriate, non-parametric tests have less power. In other words, a larger sample size can be required to draw conclusions with the same degree of confidence. Non-parametric models differ from parametric models in that the model structure is not specified a priori but is instead determined from data. The term nonparametric is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance. A histogram is a simple nonparametric estimate of a probability distribution Kernel density estimation provides better estimates of the density than histograms. Nonparametric regression and semiparametric regression methods have been developed based on kernels, splines, and wavelets. Data Envelopment Analysis provides efficiency coefficients similar to those obtained by Multivariate Analysis without any distributional assumption. Sign test In statistics, the sign test can be used to test the hypothesis that there is "no difference" between the continuous distributions of two random variables X and Y, in the situation when we can draw paired samples from X and Y. It is a non-parametric test which makes very few assumptions about the nature of the distributions under test - this means that it has very general applicability but may lack the statistical power of other tests such as the paired-samples T-test. Formally, let p = Pr(X > Y), and then test the null hypothesis H0: p = 0.50. In other words, the null hypothesis states that given a random pair of measurements (xi, yi), then xi and yi are equally likely to be larger than the other. Method Independent pairs of sample data are collected from the populations {(x1, y1), (x2, y2), . . ., (xn, yn)}.

Pairs are omitted for which there is no difference so that there is a possibility of a reduced sample of m pairs.[1] Then let w be the number of pairs for which yi - xi > 0. Assuming that H0 is true, then W follows a binomial distribution W ~ b(m, 0.5). The "W" is for Frank Wilcoxon who developed the test, then later, the more powerful Wilcoxon signed-rank test.[2] Significance testing Since the test statistic is expected to follow a binomial distribution, the standard binomial test is used to calculate significance. The normal approximation to the binomial distribution can be used for large sample sizes, m>25.[1] The left-tail value is computed by Pr(W = w), which is the p-value for the alternative H1: p < 0.50. This alternative means that the X measurements tend to be higher. The right-tail value is computed by Pr(W = w), which is the p-value for the alternative H1: p > 0.50. This alternative means that the Y measurements tend to be higher. For a two-sided alternative H1 the p-value is twice the smaller tail-value. chi-square test A chi-square test (also chi-squared or 2 test) is any statistical hypothesis test in which the sampling distribution of the test statistic is a chi-square distribution when the null hypothesis is true, or any in which this is asymptotically true, meaning that the sampling distribution (if the null hypothesis is true) can be made to approximate a chi-square distribution as closely as desired by making the sample size large enough. Some examples of chi-squared tests where the chi-square distribution is only approximately valid: Pearson's chi-square test, also known as the chi-square goodness-of-fit test or chisquare test for independence. When mentioned without any modifiers or without other precluding context, this test is usually understood (for an exact test used in place of 2, see Fisher's exact test). Yates' chi-square test, also known as Yates' correction for continuity. MantelHaenszel chi-square test. Linear-by-linear association chi-square test. The portmanteau test in time-series analysis, testing for the presence of autocorrelation Likelihood-ratio tests in general statistical modelling, for testing whether there is evidence of the need to move from a simple model to a more complicated one (where the simple model is nested within the complicated one). One case where the distribution of the test statistic is an exact chi-square distribution is the test that the variance of a normally-distributed population has a given value based on a sample variance. Such a test is uncommon in practice because values of variances to test against are seldom known exactly. Chi-square test for variance in a normal population If a sample of size n is taken from a population having a normal distribution, then there is a wellknown result (see distribution of the sample variance) which allows a test to be made of whether the variance of the population has a pre-determined value. For example, a manufacturing process might have been in stable condition for a long period, allowing a value for the variance to be determined essentially without error. Suppose that a variant of the process is being tested, giving rise to a small sample of product items whose variation is to be tested. The test statistic T in this instance could be set to be the sum of squares about the sample mean, divided by the nominal value for the variance (ie. the value to be tested as holding). Then T has a chi-square distribution with n1 degrees of freedom. For example if the sample size is 21, the acceptance region for T for a significance level of 5% is the interval 9.59 to 34.17. median test In statistics, Mood's median test is a special case of Pearson's chi-square test. It is a nonparametric test that tests the null hypothesis that the medians of the populations from which

two samples are drawn are identical. The data in each sample are assigned to two groups, one consisting of data whose values are higher than the median value in the two groups combined, and the other consisting of data whose values are at the median or below. A Pearson's chi-square test is then used to determine whether the observed frequencies in each group differ from expected frequencies derived from a distribution combining the two groups. The test has low power (efficiency) for moderate to large sample sizes, and is largely regarded as obsolete. The WilcoxonMannWhitney U two-sample test should be considered instead. Siegel & Castellan (1988, p. 124) suggest that there is no alternative to the median test when one or more observations are "off the scale." The relevant difference between the two tests is that the median test only considers the position of each observation relative to the overall median, whereas the WilcoxonMannWhitney test takes the ranks of each observation into account. Thus the latter test is usually the more powerful of the two.

The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test for the case of two related samples or repeated measurements on a single sample. It can be used as an alternative to the paired Student's t-test when the population cannot be assumed to be normally distributed. The test is named for Frank Wilcoxon (18921965) who, in a single paper, proposed both it and the rank-sum test for two independent samples (Wilcoxon, 1945). Like the paired or related sample t-test, the Wilcoxon test involves comparisons of differences between measurements, so it requires that the data are measured at an interval level of measurement. However it does not require assumptions about the form of the distribution of the measurements. It should therefore be used whenever the distributional assumptions that underlie the t-test cannot be satisfied. Set up Suppose we collect 2n observations, two observations of each of the n subjects. Let i denote the particular subject that is being referred to and the first observation measured on subject i be denoted by xi and second observation be yi. For each i in the observations, xi and yi should be paired together. Assumption Let Zi = Yi Xi for i = 1, ... , n. 1. The differences Zi are assumed to be independent. 2. Each Zi comes from a continuous population (they must be identical) and is symmetric about a common median . 3. The values of Xi and Yi represent are ordered, so the comparisons "greater than", "less than", and "equal to" are meaningful. Procedure The null hypothesis tested is H0: = 0. The Wilcoxon signed rank statistic W+ is computed by ordering the absolute values |Z1|, ..., |Zn|, the rank of each ordered |Zi| is given a rank of Ri. Denote the positive Zi values with i = I(Zi > 0), where I(.) is an indicator function. The Wilcoxon signed ranked statistic W+ is defined as

It is often used to test the difference between scores of data collected before and after an experimental manipulation, in which case the central point under the null hypothesis would be expected to be zero. Scores exactly equal to the central point are excluded and the absolute values of the deviations from the central point of the remaining scores are ranked such that the smallest deviation has a rank of 1. Tied scores are assigned a mean rank. The sums for the ranks of scores with positive and negative deviations from the central point are then calculated separately. A value S is defined as the smaller of these two rank sums. S is then compared to a table of all possible

distributions of ranks to calculate p, the statistical probability of attaining S from a population of scores that is symmetrically distributed around the central point. As the number of scores used, n, increases, the distribution of all possible ranks S tends towards the normal distribution. So although for n = 20, exact probabilities would usually be calculated, for n > 20, the normal approximation is used. The recommended cutoff varies from textbook to textbook here we use 20 although some put it lower (10) or higher (25). The Wilcoxon test was popularised by Siegel (1956) in his influential text book on non-parametric statistics. Siegel used the symbol T for the value defined here as S. In consequence, the test is sometimes referred to as the Wilcoxon T test, and the test statistic is reported as a value of T. Example Xi Subje X Y Sign of Absolute Rank of Signed Y ct (i) Xi Yi Xi Yi Absolute Rank i i
i

1 2 5 1 1 5 1 3 0 1 4 0 1 4 0 1 1 5 1 4 0 1 2 5 1 4 0 1 3 5

1 1 0 1 2 2 1 2 5 1 2 0 1 4 0 1 2 4 1 2 3 1 3 7 1 3 5 1 4 5

15

15

1.5

1.5

20

20

17

17

12

12

1.5

1.5

10

10

10

1. The sign of Xi Yi is denoted in the Sign column by either (+) or (). If Xi and Yi are equal, then the value is thrown out. 2. The values of Xi Yi are given in the next two columns. 3. The last two columns are the ranks. The absolute rank column has no signs, and the signed rank column gives the ranks along with their signs. 4. The data is ranked from the smallest value to the largest value. In the case of a tie, ranks are added together and divided by the number of ties. For example, in this data, there were two instances of the value 5. The ranks corresponding to 5 are 1 and 2. The sum of these ranks is 3. After dividing by the number of ties, you get a mean rank of 1.5, and this value is assigned to both instances of 5. 5. The test statistic, W+, is given by the sum of all of the positive values in the Signed Rank column. The test statistic, W, is given by the sum of all of the negative values in the Signed Rank column. For this example, W+ = 27 and W=18. The minimum of these is 18. 6. Lastly, this test statistic is analyzed using a table of critical values. If the test statistic is less than or equal to the critical value based on the number of observations n, then the null hypothesis is rejected in favor of the alternative hypothesis. Otherwise, the null cannot be rejected. See table here. In this case the test statistic is W = 18 and the critical value is 8 for a two-tailed p-value of 0.05. The test statistic must be less than this to be significant at this level, so in this case the null hypothesis can not be rejected. MannWhitneyWilcoxon for 2 independent variables In statistics, the MannWhitney U test (also called the MannWhitneyWilcoxon (MWW), Wilcoxon rank-sum test, or WilcoxonMannWhitney test) is a non-parametric test for assessing whether two independent samples of observations have equally large values. It is one of the best-known non-parametric significance tests. It was proposed initially by Frank Wilcoxon in 1945, for equal sample sizes, and extended to arbitrary sample sizes and in other ways by H. B. Mann and D. R. Whitney (1947). MWW is virtually identical to performing an ordinary parametric two-sample t test on the data after ranking over the combined samples. Although Mann and Whitney (1947) developed the MWW test under the assumption of continuous responses with the alternative hypothesis being that one distribution is stochastically greater than the other, there are many other ways to formulate the null and alternative hypotheses such that the MWW test will give a valid test.[1] A very general formulation is to assume that: 1. All the observations from both groups are independent of each other, 2. The responses are ordinal or continuous measurements (i.e. one can at least say, of any two observations, which is the greater), 3. Under the null hypothesis the distributions of both groups are the same, so that the probability of an observation from one population (X) exceeding an observation from the second population (Y) equals the probability of an observation from Y exceeding an observation from X, that is, there is a symmetry between populations with respect to probability of random drawing of a larger observation. 4. Under the alternative hypothesis the probability of an observation from one population (X) exceeding an observation from the second population (Y) (after correcting for ties) is not equal to 0.5. The alternative may also be stated in terms of a one-sided test, for example: P(X > Y) + 0.5 P(X = Y) > 0.5. If we add more strict assumptions than those above such that the responses are assumed continuous and the alternative is a location shift (i.e. F1(x) = F2(x + )), then we can interpret a significant MWW test as showing a significant difference in medians. Under this location shift assumption, we can also interpret the MWW as assessing whether the HodgesLehmann estimate of the difference in central tendency between the two populations differs significantly from zero.

The HodgesLehmann estimate for this two-sample problem is the median of all possible differences between an observation in the first sample and an observation in the second sample. The general null hypothesis of a symmetry between populations with respect of obtaining a larger observation is sometimes stated more narrowly as both populations having exactly the same distribution. However, such a specific formulation of MWW test is not consistent with the original formulation of Mann and Whitney (1947), furthermore it leads to problems with interpretation of a test results when both distributions have different variances: for example, the power of the test is less than or equal to the level if both populations have normal distributions with the same mean but different variances. In fact, if we formulate the null hypothesis as X and Y having the same distribution, the alternative hypothesis must be that the distributions of X and Y are the same except for a shift in location -- otherwise the test may have little power (or no power at all) to reject the null hypothesis. [edit] Calculations The test involves the calculation of a statistic, usually called U, whose distribution under the null hypothesis is known. In the case of small samples, the distribution is tabulated, but for sample sizes above ~20 there is a good approximation using the normal distribution. Some books tabulate statistics equivalent to U, such as the sum of ranks in one of the samples, rather than U itself. The U test is included in most modern statistical packages. It is also easily calculated by hand, especially for small samples. There are two ways of doing this. For small samples a direct method is recommended. It is very quick, and gives an insight into the meaning of the U statistic. 1. Choose the sample for which the ranks seem to be smaller (The only reason to do this is to make computation easier). Call this "sample 1," and call the other sample "sample 2." 2. Taking each observation in sample 1, count the number of observations in sample 2 that are smaller than it (count a half for any that are equal to it). 3. The total of these counts is U. For larger samples, a formula can be used: 1. Arrange all the observations into a single ranked series. That is, rank all the observations without regard to which sample they are in. 2. Add up the ranks for the observations which came from sample 1. The sum of ranks in sample 2 follows by calculation, since the sum of all the ranks equals N(N+1)/2 where N is the total number of observations. 3. U is then given by:

where n1 is the sample size for sample 1, and R1 is the sum of the ranks in sample 1. Note that there is no specification as to which sample is considered sample 1. An equally valid formula for U is

The smaller value of U1 and U2 is the one used when consulting significance tables. The sum of the two values is given by

Knowing that R1 + R2 = N(N + 1)/2 and N = n1 + n2 , and doing some algebra, we find that

the sum is

The maximum value of U is the product of the sample sizes for the two samples. In such a case, the "other" U would be 0. The MannWhitney U is equivalent to the area under the receiver operating characteristic curve that can be readily calculated

[edit] Examples [edit] Illustration of calculation methods Suppose that Aesop is dissatisfied with his classic experiment in which one tortoise was found to beat one hare in a race, and decides to carry out a significance test to discover whether the results could be extended to tortoises and hares in general. He collects a sample of 6 tortoises and 6 hares, and makes them all run his race. The order in which they reach the finishing post (their rank order, from first to last) is as follows, writing T for a tortoise and H for a hare: THHHHHTTTTTH What is the value of U? Using the direct method, we take each tortoise in turn, and count the number of hares it is beaten by (lower rank), getting 0, 5, 5, 5, 5, 5, which means U = 25. Alternatively, we could take each hare in turn, and count the number of tortoises it is beaten by. In this case, we get 1, 1, 1, 1, 1, 6. So U = 6 + 1 + 1 + 1 + 1 + 1 = 11. Note that the sum of these two values for U is 36, which is 6 6. Using the indirect method: the sum of the ranks achieved by the tortoises is 1 + 7 + 8 + 9 + 10 + 11 = 46. Therefore U = 46 - (67)/2 = 46 - 21 = 25. the sum of the ranks achieved by the hares is 2 + 3 + 4 + 5 + 6 + 12 = 32, leading to U = 32 - 21 = 11. [edit] Illustration of object of test A second example illustrates the point that the MannWhitney does not test for equality of medians. Consider another hare and tortoise race, with 19 participants of each species, in which the outcomes are as follows: HHHHHHHHHTTTTTTTTTTHHHHHHHHHHTTTTTTTTT The median tortoise here comes in at position 19, and thus actually beats the median hare, which comes in at position 20. However, the value of U (for hares) is 100 (using the quick method of calculation described above, we see that each of 10 hares is beaten by 10 tortoises so U = 10 10). Consulting tables, or using the approximation below, shows that this U value gives significant evidence that hares tend to do better than tortoises (p < 0.05, two-tailed). Obviously this is an extreme distribution that would be spotted easily, but in a larger sample something similar could happen without it being so apparent. Notice that the problem here is not that the two distributions of ranks have different variances; they are mirror images of each other, so their variances are the same, but they have very different skewness. [edit] Normal approximation For large samples, U is approximately normally distributed. In that case, the standardized value

where mU and U are the mean and standard deviation of U, is approximately a standard normal deviate whose significance can be checked in tables of the normal distribution. mU and U are given by

The formula for the standard deviation is more complicated in the presence of tied ranks; the full formula is given in the text books referenced below. However, if the number of ties is small (and especially if there are no large tie bands) ties can be ignored when doing calculations by hand. The computer statistical packages will use the correctly adjusted formula as a matter of routine. Note that since U1 + U2 = n1 n2, the mean n1 n2/2 used in the normal approximation is the mean of the two values of U. Therefore, the absolute value of the z statistic calculated will be same whichever value of U is used. WaldWolfowitz runs test The runs test (also called WaldWolfowitz test after Abraham Wald and Jacob Wolfowitz) is a non-parametric statistical test that checks a randomness hypothesis for a two-valued data sequence. More precisely, it can be used to test the hypothesis that the elements of the sequence are mutually independent. A "run" of a sequence is a maximal non-empty segment of the sequence consisting of adjacent equal elements. For example, the sequence "++++---+++--++++++----" consists of six runs, three of which consist of +'s and the others of -'s. If +s and -s alternate randomly, the number of runs in the sequence N for which it is given that there are N+ occurrences of + and N- occurrences of (so N = N+ + N-) is a random variable whose conditional distribution given the observation of N+ positive runs and N- negative runs is approximately normal with:[1]

mean

variance These parameters do not depend on the "fairness" of the process generating the elements of the sequence in the sense that +'s and -'s must have equal probabilities, but only on the assumption that the elements are independent and identically distributed. If there are too many runs more or less than expected, the hypothesis of statistical independence of the elements may be rejected. Runs tests can be used to test: 1. the randomness of a distribution, by taking the data in the given order and marking with + the data greater than the median, and with the data less than the median; (Numbers equalling the median are omitted.) 2. whether a function fits well to a data set, by marking the data exceeding the function value with + and the other data with -. For this use, the runs test, which takes into account the signs but not the distances, is complementary to the chi square test, which takes into account the distances but not the signs. The KolmogorovSmirnov test is more powerful, if it can be applied.

KolmogorovSmirnov test In statistics, the KolmogorovSmirnov test (KS test) is a form of minimum distance estimation used as a nonparametric test of equality of one-dimensional probability distributions used to compare a sample with a reference probability distribution (one-sample KS test), or to compare two samples (two-sample KS test). The KolmogorovSmirnov statistic quantifies a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution, or between the empirical distribution functions of two samples. The null distribution of this statistic is calculated under the null hypothesis that the samples are drawn from the same distribution (in the two-sample case) or that the sample is drawn from the reference distribution (in the one-sample case). In each case, the distributions considered under the null hypothesis are continuous distributions but are otherwise unrestricted. The two-sample KS test is one of the most useful and general nonparametric methods for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples. The KolmogorovSmirnov test can be modified to serve as a goodness of fit test. In the special case of testing for normality of the distribution, samples are standardized and compared with a standard normal distribution. This is equivalent to setting the mean and variance of the reference distribution equal to the sample estimates, and it is known that using the sample to modify the null hypothesis reduces the power of a test. KolmogorovSmirnov statistic The empirical distribution function Fn for n iid observations Xi is defined as

where is the indicator function, equal to 1 if Xi = x and equal to 0 otherwise. The KolmogorovSmirnov statistic for a given cumulative distribution function F(x) is

where sup x is the supremum of the set of distances. By the GlivenkoCantelli theorem, if the sample comes from distribution F(x), then Dn converges to 0 almost surely. Kolmogorov strengthened this result, by effectively providing the rate of this convergence (see below). The Donsker theorem provides yet a stronger result. In practice, the statistic requires relatively large number of data to properly reject the null hypothesis. [edit] Kolmogorov distribution The Kolmogorov distribution is the distribution of the random variable

where B(t) is the Brownian bridge. The cumulative distribution function of K is given by[2]

[edit] KolmogorovSmirnov test Under null hypothesis that the sample comes from the hypothesized distribution F(x),

in distribution, where B(t) is the Brownian bridge. If F is continuous then under the null hypothesis converges to the Kolmogorov distribution,

which does not depend on F. This result may also be known as the Kolmogorov theorem; see Kolmogorov's theorem for disambiguation. The goodness-of-fit test or the KolmogorovSmirnov test is constructed by using the critical values of the Kolmogorov distribution. The null hypothesis is rejected at level if

where K is found from

The asymptotic power of this test is 1. If the form or parameters of F(x) are determined from the data Xi the critical values determined in this way are invalid. In such cases, Monte Carlo or other methods may be required, but tables have been prepared for some cases. Details for the required modifications to the test statistic and for the critical values for the normal distribution and the exponential distribution have been published by Pearson & Hartley (1972, Table 54). Details for these distributions, with the addition of the Gumbel distribution, are also given by Shorak & Wellner (1986, p239). Two-sample KolmogorovSmirnov test The KolmogorovSmirnov test may also be used to test whether two underlying one-dimensional probability distributions differ. In this case, the KolmogorovSmirnov statistic is

where F1,n and F2,n' are the empirical distribution functions of the first and the second sample respectively. The null hypothesis is rejected at level if

Note that the two-sample test checks whether the two data samples come from the same distribution. This does not specify what that common distribution is (e.g. normal or not normal). Setting confidence limits for the shape of a distribution function While the KolmogorovSmirnov test is usually used to test whether a given F(x) is the underlying probability distribution of Fn(x), the procedure may be inverted to give confidence limits on F(x) itself. If one chooses a critical value of the test statistic D such that P(Dn > D) = , then a band of width D around Fn(x) will entirely contain F(x) with probability 1 - . The KolmogorovSmirnov statistic in more than one dimension The KolmogorovSmirnov test statistic needs to be modified if a similar test is to be applied to multivariate data. This is not necessarily straightforward because one may note that the maximum difference between two joint cumulative distribution functions is not generally the same as the maximum difference of any of the complementary distribution functions. Thus the maximum difference will differ depending on which of or or any of the other two possible arrangements is used. One might require that the result of the test used should not depend on which choice is made. One approach to generalizing the KolmogorovSmirnov statistic to higher dimensions which meets the above concern is to compare the cdfs of the two samples with all possible orderings, and take the largest of the set of resulting K-S statistics. In d dimensions, there are 2d-1 such orderings. One such variation is due to Peacock (1983) and another to Fasano & Franceschini (1987): see Lopes et al. (2007) for a comparison and computational details. Critical values for the test statistic can be obtained by simulations, but depend on the dependence structure in the joint distribution.

KruskalWallis one-way analysis of variance In statistics, the KruskalWallis one-way analysis of variance by ranks (named after William Kruskal and W. Allen Wallis) is a non-parametric method for testing equality of population medians among groups. It is identical to a one-way analysis of variance with the data replaced by their ranks. It is an extension of the MannWhitney U test to 3 or more groups. Since it is a non-parametric method, the KruskalWallis test does not assume a normal population, unlike the analogous one-way analysis of variance. However, the test does assume an identicallyshaped and scaled distribution for each group, except for any difference in medians. Method 1. Rank all data from all groups together; i.e., rank the data from 1 to N ignoring group membership. Assign any tied values the average of the ranks they would have received had they not been tied. 2. The test statistic is given by:

where:
o o o

ni is the number of observations in group i rij is the rank (among all observations) of observation j from group i N is the total number of observations across all groups ,

o o

is the average of all the rij. 3. Notice that the denominator of the expression for K is exactly (N - 1)N(N + 1) / 12 and . Thus

Notice that the last formula only contains the squares of the average ranks.

4. A correction for ties can be made by dividing K by , where G is the number of groupings of different tied ranks, and ti is the number of tied values within group i that are tied at a particular value. This correction usually makes little difference in the value of K unless there are a large number of ties. 5. Finally, the p-value is approximated by . If some ni values are small (i.e., less than 5) the probability distribution of K can be quite different from this chi-square distribution. If a table of the chi-square probability distribution is available, the critical value of chi-square, , can be found by entering the table at g - 1 degrees of freedom and looking under the desired significance or alpha level. The null hypothesis of equal population medians would then be rejected if then be performed on the group medians. . Appropriate multiple comparisons would

Binomial test In statistics, the binomial test is an exact test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories. Common use The most common use of the binomial test is in the case where the null hypothesis is that two categories are equally likely to occur. Tables are widely available to give the significance observed numbers of observations in the categories for this case. However, as the example below shows, the binomial test is not restricted to this case. Where there are more than two categories, and an exact test is required, the multinomial test, based on the multinomial distribution, must be used instead of the binomial test. Large samples For large samples such as the example below, the binomial distribution is well approximated by convenient continuous distributions, and these are used as the basis for alternative tests that are much quicker to compute, Pearson's chi-square test and the G-test. However, for small samples these approximations break down, and there is no alternative to the binomial test. Example Suppose we have a board game that depends on the roll of a die, and special importance attaches to rolling a 6. In a particular game, the die is rolled 235 times, and 6 comes up 51 times. If the die is fair, we would expect 6 to come up 235/6 = 39.17 times. Is the proportion of 6s significantly higher than would be expected by chance, on the null hypothesis of a fair die? To find an answer to this question using the binomial test, we consult the binomial distribution B(235,1/6) to find out what the probability is of finding exactly 51 sixes in a sample of 235 if the true probability of a 6 on each trial is 1/6. We then find the probability of finding exactly 52, exactly 53, and so on up to 235, and add all these probabilities together. In this way, we obtain the probability of obtaining the observed result (51 6s) or a more extreme result (>51 6s) and in this example, the result is 0.0265443, which is unlikely (significant at the 5% level) to come from die that are not loaded to give many 6s (one-tailed test). Clearly a die could roll too few sixes as easily as too many and we would be just as suspicious, so we should use the two-tailed test which considers the probability of having a particular effect size either above or below expectation. Here the effect size is 11.83, since that's how many more sixes there were than expected, with 51 found vs. 39.17 expected. So now we have to find the probability that that the die would roll a six 27 times or less (39.17 expected - 11.83 equal effect size). Summing over all the probabilities (< 28 6s) yields .0172037. When we add this to the first result, we get .0437480, which is significant at the 5% significance level. If the cost of a false accusation was too high, we might have a more stringent requirement, like 1% significance level, in which case we could not reject the null hypothesis of a fair die with sufficient certainty.

Potrebbero piacerti anche