Sei sulla pagina 1di 19

Non Parametric Tests

Introduction

In the previous lesson, it was assumed that the distribution of the population from which the samples were
drawn must be normally distributed. Sometimes, the investigator have no prior knowledge about the
distribution of the sampled population and attempts to guess the distribution which result to serious error
of the decision.

Nonparametric or distribution-free statistics which does not identify or specify the distribution of the
population from which the sample were drawn. A nonparametric test is a statistical procedure in which no
hypothesis is made about a specific values of the population parameters.

Objectives

After completing this module, you are expected but not limited to:

1. Explain the advantages and disadvantages of nonparametric tests;

2. know when and how to use the Chi-square, Wilcoxon signed- ranked, Mann-Whitney U, Friedman, and
Kruskal-Wallis tests respectively.

Advantages and Disadvantages

The advantages of nonparametric test are the following:

1. most nonparametric tests involve easy computation, fewer mathematical details, and easier to
understand and to apply;

2. they may be used to test data that are measured only on a nominal or ordinal data;

3. many nonparametric tests make it possible to work with small sample size, and very helpful to the
experimenter involved in a pilot studies or dealing with rare types of characteristics;

4. In multi-sample cases, there are available nonparametric tests for testing hypothesis concerning
observations drawn from several different populations without checking the assumptions about the
population;

5. Probability statements associated with most nonparametric are exact probabilities, which are
independent of the shape of the population.

The major disadvantage of nonparametric tests is that if in fact all parametric assumptions were
satisfied in the data the values were of the required level of measurement, nonparametric tests would
disregard much of the information contained in the data. It is also though the information conveyed by the
figures used in the parametric test were ignored by its nonparametric counterpart. In general,
nonparametric tests when applied to data which are nearly normal or nearly normal are not so powerful as
the equivalent parametric method.
Chi-square Test

The Chi-square test has three important functions, namely: goodness of fit test, test for independence
and, test for the equality of variance. This module will only focus on the application of chi-square test for
the independence. The chi-square test for independence is used when the researcher is interested of
detecting significant association/relationship between two nominal/categorical data

Procedure for Test of Independence

1. Summarize the data using a contingency table (into rows and columns)
2. Determine the row and column total
3. Compute the expected frequency of each cell using the formula

Eij = (ith row total) X (jth column total)


N
= (Ri x Cj)
N
4. Compute the Chi-Square Statistics using the formula

Oij - Observed Frequency Eij - Expected Frequency

5. Reject the null hypothesis if the chi-square computed is greater than the chi-square tabulated at
level of significance and (r-1)(c-1) degrees of freedom

Sample Problem

A student council conducted a survey to study the independence of gender and their opinion concerning a
proposal "Comfort Room with fee". Two hundred fifty five students were randomly selected and
interviewed, with the following results:

Opinion
Gender Total
In Favor Opposed Undecided
Female 50 70 30 150
Male 25 30 50 105
Total 75 100 80 255

Solution

1. Ho: Gender and their opinion about CR with fee are independent

Ha: Gender and their opinion about CR with fee are not independent
2. Level of significance = 0.05 and sample size n = 255

3. Test Statistics : Chi-square test

4. Critical Region : Reject Ho if X2c > 5.991

5. Computations: Compute the Chi-square test statistics using the formula

X2c = 22.07976 = 22.080

X2c = {(50 - 44)2/44} + {(25 - 31)2/31} + {(70 - 59)2/59} + {(30 - 41)2/41} + {(30 - 47)2/47} + {(50 - 33)2/33}

6. Decision : Since X2c > 5.991, therefore reject Ho and conclude that gender and their opinion
concerning the proposal are not independent

Wilcoxon Signed Rank Test

Wilcoxon signed rank test is applicable to the case of two dependent or related samples which aimed to
detect significant differences between the two groups. This test gives more importance on the direction
and the magnitude of the observed difference between the groups. Wilcoxon signed rank test requires
the data to be in a continuous case but the assumption of normality is not being considered. This test is a
nonparametric alternative of the t-test for two dependent or correlated samples. Hence, the data layout is
similar with the t-test for two dependent samples.

Procedure

1. For each pairs of observation, determine di = Xi Yi

2. Rank these dis without respect to sign.

3. Affix to each rank their corresponding sign (either + or -)

4. Determine n, the number of nonzero dis

5. Determine the sum of all positive ranks denoted by T + and T- for the sum of all negative ranks. Let T
denote the smaller of T+ and T-

6. When the sample size is small (n15) , the critical value of T can be found in Table A.14 of the book
"Business Statistics: A Contemporary Decision Making" 2nd edition by Ken Black (this book is available in
the library). If the computed T is less than the critical value of T in Table A14, reject the null hypothesis.

7. For the large samples (n>15) the test statistics of T is given by the formula
Sample Problem

A researcher wishes to determine if there is systematic difference between the readings of the two digital
weighing scales. The following data were obtained:

Sample No. 1 2 3 4 5 6 7 8

Scale A 50.0 53.8 85.4 75.4 63.5 35.8 25.3


82.5
Scale B 49.9 82.7 53.8 85.3 75.4 63.7 35.7 24.9

0.1 -0.2 0.0 0.1 0.0 -0.2 0.1 0.4


di's

2 -4.5 2 -4.5 2 6
Ranks

Use 0.05 level of significance to test whether there is no significant difference between the readings of the
two scales. (the weights are express in grams).

Solution

1. Ho : There is no significant difference between the two scales

Ha : There is a significant difference between the two scales

2. The level of significance is 0.05 and n = 6

3. Test Statistics : Wilcoxon Signed-Ranked Test

4. Critical Region : Reject Ho if the observed T < T critical value

5. Computations:

The sum of the positive ranks is = 12 and the negative ranks is = 9, The critical value of is T 1

6. Decision : Since the computed T is greater than the tabular T value, therefore there is no enough
evidence to show that the two scales are different.

The Mann-Whitney U test


The Mann-Whitney U test is a nonparametric alternative of the t-test for two independent samples. This
will enable us to test whether the two independent samples come from the same populations. The null
hypothesis to be tested is that the means of the two independent samples are identical without
considering the assumptions of normality. The test requires only that the samples are in a continuous
case.

Procedure

1. Assign the smaller of the two groups as sample 1. If the sample size are equal, either groups may be
assigned as sample 1.

2. Rank together the scores for both groups in a single series. The smallest score gets the rank of 1, the
next smallest 2, and so on. In the case of tie values, each of the tied values gets their average rank.

3. Determine the sum of the rank in sample 1 denoted by W 1 and W2 for the sum of the rank in sample
2.

4. Calculate the U statistic using the formula: U = [n1n2 + n1(n1+1)/2 ] - W1

5. Use Table A.13 of the book "Business Statistics: A Contemporary Decision Making" 2nd edition by
Ken Black (this book is available in the library), for n 1 and n2 less than or equal 10, locate the value of U
in the left column. The intersection of U and n 1 is the p-value for a one-tailed test. And for the two-tailed
test double the p-value in the table.

6. Reject the null hypothesis if the p-value of U is less than the specified level of significance

7. For a large samples use the formula given below

Sample Problem

A classroom teacher wishes to compare the performance of students in statistics using two methods of
teaching. Two independent samples of sizes 15 and were randomly selected. The following data have
been obtained. Is there a significant difference between the performance of students in the two methods
of teaching statistics? Use 0.05 level of significance.

Method A 82 81 86 75 77 83 85 Total
Ranks 7 6 11 1 3 8 10 W1 = 46
Method B 76 90 89 87 84 79 88 78
Ranks 2 15 14 12 9 5 13 4 W2 = 74

Solution
1. Ho : There is no significant difference between the two Methods of teaching

Ha : There is a significant difference between the two Methods of teaching

2. The level of significance is 0.05 and n1 = 7 n2 = 8

3. Test Statistics : Mann-Whitney U Test

4. Critical Region : Reject Ho if the observed probability is < level of significance

5. Computations: U1 = 7(8) + [7(8)/2] - 46 = 38 U2 = 7(8) + [8(9)/2] - 74 = 18

Hence U = 18 and Table A.13 yields a p-value of 0.1405. Since this problem is a two tailed test, then
the p-value will be doubled with a final p-value of 0.2810.

6. Decision: Since the p-value of 0.2810 is greater than the specified level of significance then there is
no sufficient evidence to reject the null hypothesis and conclude that the two methods of teaching had the
same effect on the performance of student in statistics

The Friedmann Two Way ANOVA

The usual analysis of variance (ANOVA) is applicable when the populations involve are normally
distributed and the scale of measurement are in an interval scale. However, when the scale of
measurement are in an ordinal, the Friedman two-way analysis of variance is more appropriate. This test
was developed by M. Friedman in 1937.

The Friedman two-way ANOVA tests the hypothesis that the k repeated measures or matched groups
come from the populations with identical medians.

Procedure of the test

(1) Cast the scores in a two way table having n rows (number of paired samples) and k columns (number
of groups to be compared); (2) Rank the scores in each row (1 to k); (3) Determine the sum of the ranks
of each columns denoted by C1, C2, .. Ck ; and (4) Compute the value of Fr using the formula.

5. Compare the computed Fr with the tabular chi-square value with (k-1) degrees of freedom. Reject the
null hypothesis if the computed Fr is greater than or equal to the tabular chi-square value.

Sample Problem

The following data represent the grades of 8 students in Math, Science, and Filipino. Test the hypothesis
that there is no significant difference between the performance of students of the three subjects.
Math Science English
Student No.
Grades Rank Grades Rank Grades Rank
1 86 1 85 2 79 3
2 82 2 80 3 86 1
3 79 3 87 1 80 2
4 90 1 89 2 85 3
5 76 3 82 2 93 1
6 82 3 86 2 91 1
7 88 1 83 3 84 2
8 78 3 86 1 82 2
Total 17 16 15

Solution

1. Ho : There is no significant difference between the thee subjects

Ha : There is a significant difference between the three subjects

2. The level of significance is 0.05 and n = 8

3. Test Statistics : Friedman Test

4. Critical Region : Reject Ho if the observed Fr > X2tab = 5.991

5. Computations:

c1 = 17 c2 = 16 c3 = 15 Fr = 0.25

6. Decision : Since the computed Fr is less than 5.991, therefore we conclude that there is no
significant difference between the three subjects

The Kruskall-Wallis One-way ANOVA

The Kruskall-Wallis one-way Analysis of variance is a useful test in deciding whether the k independent
samples with an ordinal measurement come from populations with the same medians. This test was
developed by William H. Kruskall and W. Allen Wallis in 1952.

Procedure

(1) Cast the data into a two-way-table; (2) Rank all the numerical values for the k groups in single series
from 1 to N. The smallest values gets the rank of 1, the next smallest 2, and so on. In the case of tie
values, each of the tied values gets their average rank; (3) Determine the rank total Rj of each group; and
(4) Compute the value of H using the formula.

Sample Problem

The following data represent the scores of a random sample students in each section during the first long
examination. Is there a significant difference between the performance of students in the three sections?
Use 0.01 level of significance.

Section
Samples

87 45 75 65 82 Total
A

18 3 14 11 17 63
Ranks

78 57 66 49 56 46
B

16 8 12 5 7 4 52
Ranks

53 76 59 73 43 32 62
C

6 15 9 13 2 1 10 56
Ranks

Solution

1. Ho : There is no significant difference between the three sections

Ha : There is a significant difference between the three sections

2. The level of significance is 0.01 and n1 = 5 n2 = 6 n3 = 7

3. Test Statistics : Kruskal-Wallis One-Way ANOVA

4. Critical Region : Reject Ho if H computed > 5.99

5. Computations: H = [12/(18x19)]x[(632/5)+(522/6)+(562/7)] - 3(19) = 2.385

6. Decision : Since H is less than the chi-square tabulated at (k-1) degrees of freedom, therefore accept
the null hypothesis and conclude that there is no significant difference between the performance of
students in the three sections during the first long examination.

Learning Activities

Directions:

1. Follow the procedure/steps in the hypothesis testing, analysis and interpretation of data.
2. Use 0.05 level of significance

3. Assuming that the assumption of parametric test were violated

4. Based from the data collected in the learning activities of the first module perform the
following:

a. Test the hypothesis that there is no significant difference between the performance of
students in Math and English.

b. Test the hypothesis that there is no significant difference between the performance of
male and female students in Math.

c. Test the hypothesis that there is no significant difference between the performance of
students in Math, English and Science.

d. Test the hypothesis that there is no significant difference between the performance of
students in Math when grouped according to year/grade level.

e. Test the hypothesis that there is no significant association between the year/grade level
and socio-economic status of the students.

MODULE 7:

Tests of Relationships and Association

Introduction

There are many statistical investigation in which the main objectives of the study was to determine
whether there exist significant relationship or association between two or more variables, the correlation
coefficient is basically the appropriate statistical tool of measurement. Correlation analysis primarily tells
us the magnitude or degree to which the two variables are related. It is useful in expressing how
efficiently one variable has predicted the value of another variable. It would also tells us whether the
variability of one variable indicates the variability of another variable. This module deals with the most
commonly type of relationship between two variables with varying levels of measurements such as:
interval, ordinal or nominal. This will also help to compute and interpret the degree of relationships of
these variables.

Objectives

After completing this module, you are expected but not limited to:

1. determine the appropriate tool to measure the degree of relationship between two variables;

2. ascertain the applicability of different correlation coefficient;

3. interpret the degree of relationship/association between two variables;


4. solve problems involving relationships/associations between variables.

Interpretation of Correlation Coefficient

Coefficient
Interpretation
- 1.00 perfect negative correlation
- 0.76 to - 0.99 very high negative correlation
- 0.51 to - 0.75 high negative correlation
- 0.26 to - 0.50 moderately small negative correlation
- 0.01 to - 0.25 very small negative correlation
0.00 no correlation
0.01 to 0.25 very small positive correlation
0.26 to 0.50 moderately small positive correlation

0.51 to 0.75 high positive correlation

0.76 to 0.99 very high positive correlation

1.00 perfect positive correlation

Coefficient Interpretation

- 1.00 perfect negative correlation

- 0.76 to - 0.99 very high negative correlation

- 0.51 to - 0.75 high negative correlation

- 0.26 to - 0.50 moderately small negative correlation

- 0.01 to - 0.25 very small negative correlation

0.00 no correlation

0.01 to 0.25 very small positive correlation

0.26 to 0.50 moderately small positive correlation

0.51 to 0.75 high positive correlation

0.76 to 0.99 very high positive correlation

1.00 perfect positive correlation

Pearson Product Moment Correlation Coefficient


The most important and widely used measure of relationship between two quantitative variables (usually
in an interval scale) is the Pearson Product Moment Correlation Coefficient. Its formula is given by

The Pearson Product Moment Correlation Coefficient can be easily obtained using a scientific calculator
with an LR mode

Data Layout

N X Y XY X2 Y2
1 X1 Y1 . . .
2 X2 Y2 . . .
. . . . . .
N Xn Yn . . .
2
Total Xi Yi XiYi X Y2

Sample Problem

A principal of a public high school wishes to investigate how well the entrance examination scores affect
the grade point average of the freshmen students. the data of a random sample of 15 freshmen student
are as follows:

Entrance Score
Student GPA (Y) XY X2 Y2
(X)
1 68 85 5780 4624 7225
2 56 80 4480 3136 6400
3 79 85 6715 6241 7225
4 53 79 4187 2809 6241
5 46 86 3956 2116 7396
6 80 87 6960 6400 7569
7 40 78 3120 1600 6084
8 69 83 5757 4761 6889
9 34 76 2584 1156 5776
10 26 75 1950 676 5625
11 76 88 6688 5776 7744
12 85 95 8075 7225 9025
13 52 78 4056 2704 6084
14 30 77 2310 900 5929
15 49 81 3969 2401 6561
Total 843 1233 70557 52525 101773

r = 0.858083 = 0.858

Testing for the Significance of pearson r

1. Ho: r = 0 or there is no significant relation between entrance scores and GPA

Ha : r 0 or there is significant relation between entrance scores and GPA

2. level of significance = 0.05 and sample size n = 15

3. Test Statistics : t-test

4. Critical Region : Reject Ho if tc > 2.160

5. Computations: Compute the t-test statistics using the formula

tc = 3.09356994 = 3.094

6. Decision : Since tc > 2.160, therefore reject the Ho and conclude that there is significant relation
between entrance examination score and the grade point average of freshmen students.

Spearman Rank Correlation Coefficient

The spearman rank correlation coefficient is the best known measure of relationship between two
variables based on ranks (ordinal scale). It is applicable when quantitative measurements of the
variables are not normally distributed and could be ranked in two ordered series. Its formula is given by

Data Layout
n X Y Rank of X Rank of Y di
d i2
1 X1 Y1 . . d1 d12

2 X2 Y2 . . . d22
. . . . . . .
. . . . . . .
n. Xn Yn . . dn dn2

di2

Sample Problem

An administrator wishes to determine significant relationship between the self- evaluation and
supervisors' evaluation of their faculty members. A random sample of 10 selected faculty members were
asked to rate their overall performance on a scale ranging from 1 to 5 (5 as the highest rate). Their rating
are given as

Faculty Supervisors' Self Evaluation


Ranked of X Ranked of Y di d i2
Member Evaluation (X) (Y)
1 3.50 4.52 10 8 2 4
2 4.23 4.69 5 4 1 1
3 4.51 4.72 2 2 0 0
4 3.98 4.29 8 9 -1 1
5 4.35 4.70 4 3 1 1
6 4.05 4.60 7 6 1 1
7 4.19 4.59 6 7 -1 1
8 4.46 4.63 3 5 -2 4
9 4.63 4.75 1 1 0 0
10 3.75 4.20 9 10 -1 1
Total 14

rs = 0.915152 = 0.915

Testing for the Significance of Spearman rank


1. Ho: r = 0 or there is no significant relation between self and supervisors' evaluation

Ha : r 0 or there is significant relation between self and supervisors' evaluation

2. Level of significance = 0.05 and sample size n = 10

3. Test Statistics : t-test

4. Critical Region : Reject Ho if tc > 2.306

5. Computations: Compute the t-test statistics using the formula

tc = 3.303047108 = 3.303

6. Decision : Since tc > 2.306, therefore reject the Ho and conclude that there is significant relation
between supervisors' evaluation self-evaluation of the faculty members

Phi Coefficient

The phi coefficient is a measure of association based on the chi-square statistics. It is applicable only to
a 2x2 contingency table where the variables are genuinely dichotomous such as, gender (male or
female), employment status (employed or unemployed), religion (catholic or non catholic), etc. Its formula
is given by

where the X2 is the chi-square statistics and n is the total frequency

For computation purposes, it is desirable to convert the above formula in the form of
The schematic 2 x 2 table is given

Y variable
X variable Total
1 2
1 a b a+b
2 c d c+d
Total a+c b+d n

Sample Problem

A survey was conducted to determine if there is significant association between gender and students
opinion on the parking policy of the university. A random sample of 200 students was selected from the
office of registrar with the following results

Opinion
Gender Total
Favor (1) Not Favor (0)
Female (0) 85 (a) 81 35 (b) 39 120 (a+b)
Male (1) 50 (c) 54 30 (d) 36 80 (c+d)
Total 135 (a+c) 65 (b+d) 200 (n)

Expected Frequency

= 0.87162727 = 0.087

Testing for the Significance of phi coefficient

1. Ho: r = 0 or there is no significant association between gender and their opinion

Ha : r 0 or there is significant relation between gender and their opinion

2. level of significance = 0.05 and sample size n = 200

3. Test Statistics : Chi-square test

4. Critical Region : Reject Ho if X2c > 3.841

5. Computations: Compute the Chi-square test statistics using the formula


X2c = {(85 - 81)2/81} + {(50 - 54)2/54} + {(35 - 39)2/39} + {(30 - 26)2/26} = 1.519468186

X2c = 1.519

6. Decision : Since X2c < 3.841, therefore do not reject Ho and conclude that there is no significant
association between gender and their opinion on university parking
policy

Contingency/Cramer's V Coefficient

Contingency Coefficient

The contingency coefficient is the most useful and oldest measure of association based on the
chi-square statistics. It has a general applicability, preferably with a nominal data arranged in more than
2x2 contingency table. Its formula is given by

Where X2 the chi-square statistics n - total frequency

Cramer's V Coefficient

The Cramer's coefficient is a measure of the degree of association or relationship between two
sets of nominal data. It is useful when we have only a categorical information about one or both sets of
attributes. Its formula was based from the chi-square statistics and is given by

Where X2 the chi-square statistics, n is the total frequency and L is the minimum of number of rows and
columns.
Sample Problem

A study was conducted whether there is significant association between highest educational attainment of
father/mother and the number of siblings. A sample of 525 households were randomly selected with the
following results:

Educational Number of Children


Total
Attainment 0 to1 2 to 3 4 and above
Elementary 50 64 70 95 130 90 250
High School 25 39 88 58 40 55 153
College 60 31 42 46 20 44 122
Total 135 200 190 525

Expected Frequency

Testing for the Significance of phi coefficient

1. Ho: there is no significant association between educational attainment and number of children

Ho: there is a significant association between educational attainment and number of children

2. Level of significance = 0.05 and sample size n = 525

3. Test Statistics : Chi-square test

4. Critical Region : Reject Ho if X2c > 9.488

5. Computations: Compute the Chi-square test statistics using the formula

X2c = {(50 - 64)2/64} + {(25 - 39)2/39} + {(60 - 31)2/31} + {(70 - 95)2/95} + {(88 - 58)2/58} +

{(42 - 46)2/46} + {(138 - 90)2/90} + {(40 - 55)2/55} + {(20 - 4)2/44} X2c


= 91.54

6. Decision : Since X2c < 9.488, therefore reject Ho and conclude that there is a significant association
between educational attainment and number of children in the household

The degree of association is computed using the formula of contingency coefficient is


C = 0.385329 = 0.385

The degree of association is computed using the formula of Cramers' V coefficient is

C = 0.29526401 = 0.295

Testing the Significance of the Correlation Coefficient

1. Ho: r = 0 Ha : r 0
r<0 r>0

2. Specify the level of significance and the sample size n

3. Test Statistics : t-test

4. Critical Region : Reject Ho if


tc > ttab with n-2 degree of freedom for Ha: r > 0
tc < ttab with n-2 degree of freedom for Ha: r < 0
tc ttab with n-2 degree of freedom for Ha: r 0

Note: Reject the Ho if the computed level of significance of the test statistics is greater than the
specified level of significance.

5. Computations: Compute the t-test statistics using the formula:

6. Decision:

Note : The significant result of the chi-square statistics is a must before the computation of the Phi,
Contingency and the Cramers V Coefficient respectively.

Learning Activities

Directions
1. Follow the procedure/steps in the hypothesis testing, analysis and interpretation of data.

2. Use 0.05 level of significance

3. Based from the data collected in the learning activities of the first module perform the
following:

a. What is the degree of relationship between gender and academic performance in math?

b. What is the degree of relationship between academic performance in math and English?

c. What is the degree of relationship between gender and the presence of computer at home?

d. What is the degree of relationship between academic performance in math and Science when
both variables were converted into an ordinal scale?

e. What is the degree of relationship between Socio-Economic Status and highest educational
attainment of their parents?

http://faculty.vassar.edu/lowry/utest.html (mann-whitney)

http://faculty.vassar.edu/lowry/kw3.html (kruskal wallis)

http://faculty.vassar.edu/lowry/corr_rank.html (spearman rank)

http://faculty.vassar.edu/lowry/corr_stats.html (correlation)

Potrebbero piacerti anche