Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Introduction
In the previous lesson, it was assumed that the distribution of the population from which the samples were
drawn must be normally distributed. Sometimes, the investigator have no prior knowledge about the
distribution of the sampled population and attempts to guess the distribution which result to serious error
of the decision.
Nonparametric or distribution-free statistics which does not identify or specify the distribution of the
population from which the sample were drawn. A nonparametric test is a statistical procedure in which no
hypothesis is made about a specific values of the population parameters.
Objectives
After completing this module, you are expected but not limited to:
2. know when and how to use the Chi-square, Wilcoxon signed- ranked, Mann-Whitney U, Friedman, and
Kruskal-Wallis tests respectively.
1. most nonparametric tests involve easy computation, fewer mathematical details, and easier to
understand and to apply;
2. they may be used to test data that are measured only on a nominal or ordinal data;
3. many nonparametric tests make it possible to work with small sample size, and very helpful to the
experimenter involved in a pilot studies or dealing with rare types of characteristics;
4. In multi-sample cases, there are available nonparametric tests for testing hypothesis concerning
observations drawn from several different populations without checking the assumptions about the
population;
5. Probability statements associated with most nonparametric are exact probabilities, which are
independent of the shape of the population.
The major disadvantage of nonparametric tests is that if in fact all parametric assumptions were
satisfied in the data the values were of the required level of measurement, nonparametric tests would
disregard much of the information contained in the data. It is also though the information conveyed by the
figures used in the parametric test were ignored by its nonparametric counterpart. In general,
nonparametric tests when applied to data which are nearly normal or nearly normal are not so powerful as
the equivalent parametric method.
Chi-square Test
The Chi-square test has three important functions, namely: goodness of fit test, test for independence
and, test for the equality of variance. This module will only focus on the application of chi-square test for
the independence. The chi-square test for independence is used when the researcher is interested of
detecting significant association/relationship between two nominal/categorical data
1. Summarize the data using a contingency table (into rows and columns)
2. Determine the row and column total
3. Compute the expected frequency of each cell using the formula
5. Reject the null hypothesis if the chi-square computed is greater than the chi-square tabulated at
level of significance and (r-1)(c-1) degrees of freedom
Sample Problem
A student council conducted a survey to study the independence of gender and their opinion concerning a
proposal "Comfort Room with fee". Two hundred fifty five students were randomly selected and
interviewed, with the following results:
Opinion
Gender Total
In Favor Opposed Undecided
Female 50 70 30 150
Male 25 30 50 105
Total 75 100 80 255
Solution
1. Ho: Gender and their opinion about CR with fee are independent
Ha: Gender and their opinion about CR with fee are not independent
2. Level of significance = 0.05 and sample size n = 255
X2c = {(50 - 44)2/44} + {(25 - 31)2/31} + {(70 - 59)2/59} + {(30 - 41)2/41} + {(30 - 47)2/47} + {(50 - 33)2/33}
6. Decision : Since X2c > 5.991, therefore reject Ho and conclude that gender and their opinion
concerning the proposal are not independent
Wilcoxon signed rank test is applicable to the case of two dependent or related samples which aimed to
detect significant differences between the two groups. This test gives more importance on the direction
and the magnitude of the observed difference between the groups. Wilcoxon signed rank test requires
the data to be in a continuous case but the assumption of normality is not being considered. This test is a
nonparametric alternative of the t-test for two dependent or correlated samples. Hence, the data layout is
similar with the t-test for two dependent samples.
Procedure
5. Determine the sum of all positive ranks denoted by T + and T- for the sum of all negative ranks. Let T
denote the smaller of T+ and T-
6. When the sample size is small (n15) , the critical value of T can be found in Table A.14 of the book
"Business Statistics: A Contemporary Decision Making" 2nd edition by Ken Black (this book is available in
the library). If the computed T is less than the critical value of T in Table A14, reject the null hypothesis.
7. For the large samples (n>15) the test statistics of T is given by the formula
Sample Problem
A researcher wishes to determine if there is systematic difference between the readings of the two digital
weighing scales. The following data were obtained:
Sample No. 1 2 3 4 5 6 7 8
2 -4.5 2 -4.5 2 6
Ranks
Use 0.05 level of significance to test whether there is no significant difference between the readings of the
two scales. (the weights are express in grams).
Solution
5. Computations:
The sum of the positive ranks is = 12 and the negative ranks is = 9, The critical value of is T 1
6. Decision : Since the computed T is greater than the tabular T value, therefore there is no enough
evidence to show that the two scales are different.
Procedure
1. Assign the smaller of the two groups as sample 1. If the sample size are equal, either groups may be
assigned as sample 1.
2. Rank together the scores for both groups in a single series. The smallest score gets the rank of 1, the
next smallest 2, and so on. In the case of tie values, each of the tied values gets their average rank.
3. Determine the sum of the rank in sample 1 denoted by W 1 and W2 for the sum of the rank in sample
2.
5. Use Table A.13 of the book "Business Statistics: A Contemporary Decision Making" 2nd edition by
Ken Black (this book is available in the library), for n 1 and n2 less than or equal 10, locate the value of U
in the left column. The intersection of U and n 1 is the p-value for a one-tailed test. And for the two-tailed
test double the p-value in the table.
6. Reject the null hypothesis if the p-value of U is less than the specified level of significance
Sample Problem
A classroom teacher wishes to compare the performance of students in statistics using two methods of
teaching. Two independent samples of sizes 15 and were randomly selected. The following data have
been obtained. Is there a significant difference between the performance of students in the two methods
of teaching statistics? Use 0.05 level of significance.
Method A 82 81 86 75 77 83 85 Total
Ranks 7 6 11 1 3 8 10 W1 = 46
Method B 76 90 89 87 84 79 88 78
Ranks 2 15 14 12 9 5 13 4 W2 = 74
Solution
1. Ho : There is no significant difference between the two Methods of teaching
Hence U = 18 and Table A.13 yields a p-value of 0.1405. Since this problem is a two tailed test, then
the p-value will be doubled with a final p-value of 0.2810.
6. Decision: Since the p-value of 0.2810 is greater than the specified level of significance then there is
no sufficient evidence to reject the null hypothesis and conclude that the two methods of teaching had the
same effect on the performance of student in statistics
The usual analysis of variance (ANOVA) is applicable when the populations involve are normally
distributed and the scale of measurement are in an interval scale. However, when the scale of
measurement are in an ordinal, the Friedman two-way analysis of variance is more appropriate. This test
was developed by M. Friedman in 1937.
The Friedman two-way ANOVA tests the hypothesis that the k repeated measures or matched groups
come from the populations with identical medians.
(1) Cast the scores in a two way table having n rows (number of paired samples) and k columns (number
of groups to be compared); (2) Rank the scores in each row (1 to k); (3) Determine the sum of the ranks
of each columns denoted by C1, C2, .. Ck ; and (4) Compute the value of Fr using the formula.
5. Compare the computed Fr with the tabular chi-square value with (k-1) degrees of freedom. Reject the
null hypothesis if the computed Fr is greater than or equal to the tabular chi-square value.
Sample Problem
The following data represent the grades of 8 students in Math, Science, and Filipino. Test the hypothesis
that there is no significant difference between the performance of students of the three subjects.
Math Science English
Student No.
Grades Rank Grades Rank Grades Rank
1 86 1 85 2 79 3
2 82 2 80 3 86 1
3 79 3 87 1 80 2
4 90 1 89 2 85 3
5 76 3 82 2 93 1
6 82 3 86 2 91 1
7 88 1 83 3 84 2
8 78 3 86 1 82 2
Total 17 16 15
Solution
5. Computations:
c1 = 17 c2 = 16 c3 = 15 Fr = 0.25
6. Decision : Since the computed Fr is less than 5.991, therefore we conclude that there is no
significant difference between the three subjects
The Kruskall-Wallis one-way Analysis of variance is a useful test in deciding whether the k independent
samples with an ordinal measurement come from populations with the same medians. This test was
developed by William H. Kruskall and W. Allen Wallis in 1952.
Procedure
(1) Cast the data into a two-way-table; (2) Rank all the numerical values for the k groups in single series
from 1 to N. The smallest values gets the rank of 1, the next smallest 2, and so on. In the case of tie
values, each of the tied values gets their average rank; (3) Determine the rank total Rj of each group; and
(4) Compute the value of H using the formula.
Sample Problem
The following data represent the scores of a random sample students in each section during the first long
examination. Is there a significant difference between the performance of students in the three sections?
Use 0.01 level of significance.
Section
Samples
87 45 75 65 82 Total
A
18 3 14 11 17 63
Ranks
78 57 66 49 56 46
B
16 8 12 5 7 4 52
Ranks
53 76 59 73 43 32 62
C
6 15 9 13 2 1 10 56
Ranks
Solution
6. Decision : Since H is less than the chi-square tabulated at (k-1) degrees of freedom, therefore accept
the null hypothesis and conclude that there is no significant difference between the performance of
students in the three sections during the first long examination.
Learning Activities
Directions:
1. Follow the procedure/steps in the hypothesis testing, analysis and interpretation of data.
2. Use 0.05 level of significance
4. Based from the data collected in the learning activities of the first module perform the
following:
a. Test the hypothesis that there is no significant difference between the performance of
students in Math and English.
b. Test the hypothesis that there is no significant difference between the performance of
male and female students in Math.
c. Test the hypothesis that there is no significant difference between the performance of
students in Math, English and Science.
d. Test the hypothesis that there is no significant difference between the performance of
students in Math when grouped according to year/grade level.
e. Test the hypothesis that there is no significant association between the year/grade level
and socio-economic status of the students.
MODULE 7:
Introduction
There are many statistical investigation in which the main objectives of the study was to determine
whether there exist significant relationship or association between two or more variables, the correlation
coefficient is basically the appropriate statistical tool of measurement. Correlation analysis primarily tells
us the magnitude or degree to which the two variables are related. It is useful in expressing how
efficiently one variable has predicted the value of another variable. It would also tells us whether the
variability of one variable indicates the variability of another variable. This module deals with the most
commonly type of relationship between two variables with varying levels of measurements such as:
interval, ordinal or nominal. This will also help to compute and interpret the degree of relationships of
these variables.
Objectives
After completing this module, you are expected but not limited to:
1. determine the appropriate tool to measure the degree of relationship between two variables;
Coefficient
Interpretation
- 1.00 perfect negative correlation
- 0.76 to - 0.99 very high negative correlation
- 0.51 to - 0.75 high negative correlation
- 0.26 to - 0.50 moderately small negative correlation
- 0.01 to - 0.25 very small negative correlation
0.00 no correlation
0.01 to 0.25 very small positive correlation
0.26 to 0.50 moderately small positive correlation
Coefficient Interpretation
0.00 no correlation
The Pearson Product Moment Correlation Coefficient can be easily obtained using a scientific calculator
with an LR mode
Data Layout
N X Y XY X2 Y2
1 X1 Y1 . . .
2 X2 Y2 . . .
. . . . . .
N Xn Yn . . .
2
Total Xi Yi XiYi X Y2
Sample Problem
A principal of a public high school wishes to investigate how well the entrance examination scores affect
the grade point average of the freshmen students. the data of a random sample of 15 freshmen student
are as follows:
Entrance Score
Student GPA (Y) XY X2 Y2
(X)
1 68 85 5780 4624 7225
2 56 80 4480 3136 6400
3 79 85 6715 6241 7225
4 53 79 4187 2809 6241
5 46 86 3956 2116 7396
6 80 87 6960 6400 7569
7 40 78 3120 1600 6084
8 69 83 5757 4761 6889
9 34 76 2584 1156 5776
10 26 75 1950 676 5625
11 76 88 6688 5776 7744
12 85 95 8075 7225 9025
13 52 78 4056 2704 6084
14 30 77 2310 900 5929
15 49 81 3969 2401 6561
Total 843 1233 70557 52525 101773
r = 0.858083 = 0.858
tc = 3.09356994 = 3.094
6. Decision : Since tc > 2.160, therefore reject the Ho and conclude that there is significant relation
between entrance examination score and the grade point average of freshmen students.
The spearman rank correlation coefficient is the best known measure of relationship between two
variables based on ranks (ordinal scale). It is applicable when quantitative measurements of the
variables are not normally distributed and could be ranked in two ordered series. Its formula is given by
Data Layout
n X Y Rank of X Rank of Y di
d i2
1 X1 Y1 . . d1 d12
2 X2 Y2 . . . d22
. . . . . . .
. . . . . . .
n. Xn Yn . . dn dn2
di2
Sample Problem
An administrator wishes to determine significant relationship between the self- evaluation and
supervisors' evaluation of their faculty members. A random sample of 10 selected faculty members were
asked to rate their overall performance on a scale ranging from 1 to 5 (5 as the highest rate). Their rating
are given as
rs = 0.915152 = 0.915
tc = 3.303047108 = 3.303
6. Decision : Since tc > 2.306, therefore reject the Ho and conclude that there is significant relation
between supervisors' evaluation self-evaluation of the faculty members
Phi Coefficient
The phi coefficient is a measure of association based on the chi-square statistics. It is applicable only to
a 2x2 contingency table where the variables are genuinely dichotomous such as, gender (male or
female), employment status (employed or unemployed), religion (catholic or non catholic), etc. Its formula
is given by
For computation purposes, it is desirable to convert the above formula in the form of
The schematic 2 x 2 table is given
Y variable
X variable Total
1 2
1 a b a+b
2 c d c+d
Total a+c b+d n
Sample Problem
A survey was conducted to determine if there is significant association between gender and students
opinion on the parking policy of the university. A random sample of 200 students was selected from the
office of registrar with the following results
Opinion
Gender Total
Favor (1) Not Favor (0)
Female (0) 85 (a) 81 35 (b) 39 120 (a+b)
Male (1) 50 (c) 54 30 (d) 36 80 (c+d)
Total 135 (a+c) 65 (b+d) 200 (n)
Expected Frequency
= 0.87162727 = 0.087
X2c = 1.519
6. Decision : Since X2c < 3.841, therefore do not reject Ho and conclude that there is no significant
association between gender and their opinion on university parking
policy
Contingency/Cramer's V Coefficient
Contingency Coefficient
The contingency coefficient is the most useful and oldest measure of association based on the
chi-square statistics. It has a general applicability, preferably with a nominal data arranged in more than
2x2 contingency table. Its formula is given by
Cramer's V Coefficient
The Cramer's coefficient is a measure of the degree of association or relationship between two
sets of nominal data. It is useful when we have only a categorical information about one or both sets of
attributes. Its formula was based from the chi-square statistics and is given by
Where X2 the chi-square statistics, n is the total frequency and L is the minimum of number of rows and
columns.
Sample Problem
A study was conducted whether there is significant association between highest educational attainment of
father/mother and the number of siblings. A sample of 525 households were randomly selected with the
following results:
Expected Frequency
1. Ho: there is no significant association between educational attainment and number of children
Ho: there is a significant association between educational attainment and number of children
X2c = {(50 - 64)2/64} + {(25 - 39)2/39} + {(60 - 31)2/31} + {(70 - 95)2/95} + {(88 - 58)2/58} +
6. Decision : Since X2c < 9.488, therefore reject Ho and conclude that there is a significant association
between educational attainment and number of children in the household
C = 0.29526401 = 0.295
1. Ho: r = 0 Ha : r 0
r<0 r>0
Note: Reject the Ho if the computed level of significance of the test statistics is greater than the
specified level of significance.
6. Decision:
Note : The significant result of the chi-square statistics is a must before the computation of the Phi,
Contingency and the Cramers V Coefficient respectively.
Learning Activities
Directions
1. Follow the procedure/steps in the hypothesis testing, analysis and interpretation of data.
3. Based from the data collected in the learning activities of the first module perform the
following:
a. What is the degree of relationship between gender and academic performance in math?
b. What is the degree of relationship between academic performance in math and English?
c. What is the degree of relationship between gender and the presence of computer at home?
d. What is the degree of relationship between academic performance in math and Science when
both variables were converted into an ordinal scale?
e. What is the degree of relationship between Socio-Economic Status and highest educational
attainment of their parents?
http://faculty.vassar.edu/lowry/utest.html (mann-whitney)
http://faculty.vassar.edu/lowry/corr_stats.html (correlation)