Sei sulla pagina 1di 201

COMPUTER-AIDED DATA

ANALYSIS FOR RESEARCH


Gabino P. Petilos, Ph.D.
Education Supervisor II
Commission on Higher Education
Regional Office VIII
ROLE OF STATISTICS IN RESEARCH

Identification Measurement
of Variables: of Variables:
PROBLEM Variable A
Variable B • direct
Variable C •Indirect
... etc...

Collection of
Data

Interpretation of Data Analysis of Data


Variable
A characteristic or attribute of persons or
objects that can assume different values

Constant
A characteristic or attribute of persons
that does not change
Illustration 1.

Population: Public School Teachers in Region 8

• Gender
• Age
• Length of Service
• Marital Status
• Salary
• Educational Attainment
• Work Performance
• Religion
• Level of Motivation
• Attitude towards Teaching
• Tenure
Illustration 2.

Population: Senior High School Students currently


enrolled in Region 8
• Gender
• Age
• Attitude towards Schooling
• Level of Interest in Math
• Math Ability
• Reading Ability
• Type of High School Enrolled in
• Curriculum (Strand)
Variables may be:

 Quantitative – a variable that


assumes numerical values

 Qualitative – a variable that cannot


assume numerical values (only
attributes)
Quantitative Variables

 Age
 Length of Service
 Salary
 Work Performance (scores)
 Level of Motivation (scores)
 Attitude towards Teaching
(scores)
Qualitative Variables
• Gender
• Marital Status
• Educational Attainment
• Religion
• Tenure
Classification of Variables
 According to FUNCTIONAL RELATIONSHIP
Independent Variable (or variate) –
variable that explains the variation of
another variable;

Dependent Variable (response or


criterion variable) – variable that is
influenced by another variable
Temporal Sequence

Independent Dependent
Variable Variable Y

X Y
Classification of Variables

 According to CONTINUITY OF SCALE


Continuous Variable – variable that can assume an
unlimited number of intermediate
values within a specified range of
values.

Examples: Height, Weight, Age, Teaching Experience


Attitude towards Teaching

Discrete Variable – variable that can take on only


designated values (finite or
countable)

Examples: Number of Children in a Family,


Number of service vehicles of government
agencies
Classification of Variables

 According to SCALE OF MEASUREMENT


Measurement – the assignment of numbers to the
categories of a variable according to
rules (may be arbitrary rule or standard
standard rule)
Illustrations:
Variable: Sex
Categories: Male and Female
Measurement (Arbitrary): Assign 1 if sex is male
Assign 0 if sex is female
Variable: Age
Categories: Years
Measurement (standard): Use the number of years as
measure of age

Variable: Attitude towards Teaching


Categories: Very Positive to Very Negative
Measurement (Arbitrary): Use a 5 point scale
SCALES OF MEASUREMENT
 NOMINAL SCALE
 ORDINAL SCALE
 INTERVAL SCALE
 RATIO SCALE
NOMINAL Scale (KEY WORD: LABEL)

• Establishes equivalence or difference


between the attributes of the objects or
respondents

• In this scale, numbers are used as mere


labels of the categories of the variable;

• The numbers cannot be meaningfully


ordered
NOMINAL Scale (KEY WORD: LABEL)

Examples: Sex: 1 – male; 0 – female


Here, 1 & 0 serve only as labels of the
categories of sex.

Religion: 1 – Roman Catholic


2 – Protestant
3 – INC
4 – Others

Marital Status:
1 – single; 2 – married; 3 – others
ORDINAL Scale (KEY WORD: RANK)

• This scale possesses all characteristics of


the nominal scale, i.e., numbers are
used as labels

• The numbers can be meaningfully


ordered

• But difference between successive


categories may not be equal
ORDINAL Scale (KEY WORD: RANK)

Examples: SES: 1 – Low; 2 – Average; 3 - High

Educational Attainment:
3 – Doctorate Degree Holder
2 – Master’s Degree Holder
1 – Bachelor’s Degree Holder

Salary Grade: 1, 2, 3, ... , 30


INTERVAL Scale (KEY WORDS: EQUAL INTERVAL)

• This scale possesses all characteristics


of the ordinal scale, i.e., numbers are
used as labels and they can be
meaningfully ranked.

• The differences between successive


categories can be assumed equal

• But there is no true zero point of the


scale
INTERVAL Scale (KEY WORDS: EQUAL INTERVAL)

Examples: IQ Score

Temperature in Degree Celsius

Achievement Score in Standardized


Tests
RATIO Scale (KEY WORDS: TRUE ZERO POINT)

• This scale possesses all characteristics


of the interval scale, i.e., numbers are
used as labels, they can be
meaningfully ranked, and differences
between successive categories are
equal.

• The scale has a true zero point (the


point that indicates the complete
absence of the characteristic being
measured.)
RATIO Scale (KEY WORDS: TRUE ZERO POINT)

Examples: Age
Height
Weight
Work Experience
Number of children in a family
Number of times absent in a year
Relationship of the concepts scale, variable, data:

Nominal Scale Nominal Nominal Data


Variable

1 – male SEX 20 males and


0 - female 30 females
Relationship of the concepts scale, variable, data:

Ordinal Scale Ordinal Ordinal Data


Variable

3 – High SES High SES = 10


2 – Average Average SES = 20
1 - Low Low SES = 30
9, 10, 11, 12 Salary 9, 10, 11, 12
Grades
Relationship of the concepts scale, variable, data:

Interval Scale Interval Interval Data


Variable
100, 105, 102, IQ Scores 100, 105, 102,
110, 115 110, 115
36, 37, 36, 35, Temperature 36, 37, 36, 35,
38 in Degree 38
Celsius
Relationship of the concepts scale, variable, data:

Ratio Scale Ratio Variable Ratio Data


110, 115, 150, Height 110, 115, 150,
160 (in cm) 160

49, 50, 50, 53, Weight 49, 50, 50, 53,


55 (in Kg) 55
Remark: It is possible to downgrade data from
higher level of measurement to lower
level but NOT THE OTHER WAY
AROUND

Interval/Ratio Score on a 60, 85, 90


Data standardized
test
Ordinal Data Ranked 1, 2, 3
Scores
Nominal Data Categorized Passed – 2
Score as Pass- Failed - 1
Fail
Remark: However, downgrading of data results
in loss of information and we will be
constrained to use a less powerful
statistical test.

Power of a statistical test is the probability of


correctly rejecting a null hypothesis.
INDEPENDENT SAMPLES
DISTINCT POPULATIONS

P2 P3
P1Parents)
(Teachers) (Students)

S1 S2 S3
(parents) (teachers) (students)

Independent or uncorrelated samples


DEPENDENT SAMPLES
(Repeated Measures Design)

Pretest Scores Intervention Posttest Scores

Dependent or correlated samples


DEPENDENT SAMPLES
(Matched Groups Design)

Population of
Paired Subjects

Sample of
Paired Subjects

Sample 1 Sample 2

Dependent or correlated samples


Probability – a measure of the likelihood of
occurrence of events which
ranges in value from 0 (if an event
cannot occur) to 1 if the event is
sure to occur.

The concept of probability is important in


inferential statistics since we often attach a
measure of reliability of the inference we make
from the sample to the population.
The p-value (probability value) associated to a
test statistic is used as basis in deciding whether
a hypothesis is rejected or not.

P-value or attained level of significance is


defined as the probability of obtaining a value of
a test statistic as extreme or more extreme than
the one observed when the null hypothesis is
true.
Illustration:

Research Hypothesis: A given coin is biased

Null Hypothesis: The Coin is Fair; i.e.,


P(H) = ½ and P(T) = 1/2

One way of testing the null hypothesis is to toss


the coin 12 times, say, and observe the number
of heads H (or tails T) that occur of the different
outcomes.
Illustration:

If the null hypothesis is true, an outcome such


as 6H (or 6T) is not surprising (likely to
occur if Ho is TRUE) and tends to SUPPORT
the null hypothesis.

Note that if the null hypothesis is true, the


probability of occurrence of 6T or 6H will be
very high.
Illustration:

Moreover, if the null hypothesis is true, an


outcome such as 0H (or 12T) is surprising
(unlikely to occur) and CASTS DOUBT on the
truth of the null hypothesis.

Also, if the null hypothesis is true, the


probability of occurrence of 12T and 0H (0T and
12H) should be very low.
Tabled Probability of outcomes if the coin is tossed 12 times:
Event of Interest Expected Frequency Probability (H = h) if
(Number of Heads H that occur) of Occurrence Null Hypothesis is True
12 1 .00024
11 12 .0029
10 66 .0161
9 220 .0537
8 495 .1208
7 792 .1936
6 924 .2256
5 792 .1936
4 495 .1208
3 220 .0537
2 66 .0161
1 12 .0029
0 1 .00024
Total number of
outcomes = 4096 Total probability = 1.000

Note: If hypothesis is true, P(H) = P(T) = ½ or 0.50


Note that the probability of obtaining 10 heads (or a
value more extreme) if the null hypothesis is TRUE is

P(>10H) = P(10H) + P(11H) + P(12H)


= .00024 + .0029 + .0161
= .019 which is interpreted as “highly unlikely”

On the other hand, the probability of obtaining 6 heads


or a value more extreme than 6 heads is

P(>6H) = P(6H) + P(7H) + P(8H) + P(9H) + P(10H) + P(11H) + P(12H)


= .2256 + .1936 + .1208 + .0537 + .0161 + .0029 + .00024
= .6127 which is interpreted as “likely”
Problem:

What probability of an outcome (criterion) are we


going to use as evidence that the hypothesis is
FALSE and must be rejected? This probability is
set by the researcher and must be “small”. This
probability is called “level of significance ”.

Thus, if the probability of an outcome is less than


, we decide to reject the Hypothesis.
Decision Rule for rejecting a null hypothesis:

If the null hypothesis is TRUE, a small probability


associated to an outcome casts doubt on the truth of the
null hypothesis and if this probability is smaller than or
equal to the criterion set by the researcher, the null
hypothesis is rejected.

Example: Suppose  = .05

If p-value = .049 < , H0 is rejected. (Any p-value which is


less than or equal to  leads to the rejection of H0)

If p-value = .051 > , H0 is not rejected. (Any p-value


which is greater than  leads to the non-rejection of H0)
Decision Rule for rejecting a null hypothesis:

If data are manually analyzed, we use a statistical


table and the decision rule is:

Reject the null hypothesis if and only if the


computed value of the test statistic is greater than
or equal to the critical (tabular) value.
Decision Rule for rejecting a null hypothesis:

If data are analyzed using a computer software, p-


values are automatically generated and the
decision rule is:

Reject the null hypothesis if and only if the p-value


associated to the computed test statistic is less
than or equal to the specified level of significance
Equivalence between the two rules:

If the computed value is greater than the critical value, iw


will fall in the rejection when plotted, hence we reject the
null hypothesis.

Note that this computed value will cut an area to the right
(p-value) that is less than the given level of significance .
Hence, we have to reject the null hypothesis when the p-
value  .
Some Statistical Tests Used for Comparing Groups:
(Experimental & Causal Comparative Research Designs)
Type of Samples
Independent Samples Dependent or Correlated
Type of DATA Samples
2 Groups at least 3 Grps 2 Groups at least 3 Grps
Interval or t-test for One Way t-test for Repeated
Ratio independent ANOVA dependent Measures
(Population samples Samples ANOVA
means are
compared)
Ordinal Wilcoxon Kruskal-Wallis Wilcoxon Friedman’s
(Population Rank Sum ANOVA Signed Two Way
medians are Test Ranks Test ANOVA
compared)
Nominal Chi-square Chi-square McNemar Cochran’s
(Population Test Test Test Test
proportions Fisher Exact
are compared) Test
THE t-TEST FOR TWO
INDEPENDENT SAMPLES
Assumptions (t-test for independent samples):

1. Both sampled populations have relative


frequency distributions that are approximately
normal.

2. The population variances are equal


(Homogeneity of variance assumption)

Homogeneous population

Heterogeneous population
Assumptions:

3. The samples are randomly and independently


selected from the populations.

The manner of selecting an element in the


population does not influence the manner of
selecting the succeeding element.

4. Data are at least interval


TEST STATISTIC:
x1  x 2 x1  x 2
t or t
1 1 S2
S 2
S   
2
p
1
 2

 n1 n2  n1 n2
Where:
x1 is the mean of group 1; s12 variance of group 1
x2 is the mean of group 2; s22 variance of group 2
Sp2 is the pooled variance defined by

S p2 
n1  1s12  n 2  1s 22
n1  n 2  2 
ILLUSTRATION:
Research Question: Is there a significant difference
between the mean math achievement
scores of students exposed to the
constructivist model and those
exposed to the lecture method?

Null hypothesis: There is no significant difference between


the mean achievement scores of
students exposed to the two methods of
teaching mathematics.
ALTERNATIVE HYPOTHESIS:

Directional: The mean achievement score of students


exposed to the constructivist model is
significantly higher than those exposed to the
lecture method.

Non-Directional: There is a significant difference between


the mean achievement scores of students
exposed to the constructivist model and
those exposed to the lecture method.

Note: A directional alternative hypothesis requires a one-


tailed test while a non-directional one requires a two-
tailed test.
Hypothetical Data

Group 1 Group 2
(Constructivist) (Lecture Method)
x1 x2
84 79
81 81
89 78
85 82
85 80

What are the variables of the study?


 Method of Teaching – independent variable
 Achievement Score – dependent variable
Variable name – one word alphanumeric
Example: method

Variable Label: Example:


Method of Teaching

Value labels (labels of the categories of a


qualitative variable)

Example: Method of Teaching is qualitative

Group Code Label


1 1 Constructivist
2 2 Lecture Method

Measure:
Method – Nominal
Score - Scale
To analyze
the data:
Define your Test Variable
(dependent variable) and
your Grouping Variable
(independent Variable) by
CLICKING the button
with arrow

Provide the value labels for


method of teaching
(qualitative variable) by
CLICKING the button
“Define Groups”
Click Define Groups

Type 1 for Group 1


and 2 for Group 2

then

Click Continue

After defining the


value labels, click OK
Homogeneity of Variance Assumption:

This assumption requires that the population variances of


the groups compared are equal. Thus, the null hypothesis
to be tested is:


H0: 1
2
  2
2 If the null hypothesis is not rejected, report the
pooled variance estimate t-value

H1: 1  2 If the null hypothesis is rejected, report the


2 2
separate variance estimate t-value

The test statistic for testing the above null hypothesis is the
Levene’s Test. If the p-value associated with the F-value is
less than  = 0.05, say, reject H0, otherwise do not reject H0.
Output1: Descriptive Statistics (Highlight the number of cases N, Means
(rounded to one decimal place), and the standard deviation
(rounded to two decimal places)
Output 2:
1. Levene’s Test gives information if the Homogeneity of Variance
Assumption of the population variances is satisfied.
H0: 21 = 22
Value of the test statistic: F = 0.439 and the
p-value = .526 (not significant)
Hence, we can assume that the population variances are equal and the
computed t value to be reported is 3.281 with 8 degrees of freedom
(pooled variance estimate).

*** If the F value is significant, we report the t-value with 6.2 degrees
of freedom (separate variance estimate).

1.

P-value
Output 2:
2. To test the null hypothesis: H0: 1 = 2
of equality between the population means, we report the computed
t-value of 3.218 with df = 8. Since the associated p-value of the test
statistic is .011 (two-tailed) <  = 0.05 or
2(0.011) = .022 (one-tailed) <  = 0.05
the null hypothesis is rejected.

There is sufficient evidence to show that the mean score posted by the
students exposed to the constructivist model (84.8) is significantly higher
than the mean posted by the students exposed to the lecture method
(80.0), t=3.281; p-value = .011.

P-value
THE T-TEST FOR
COMPARING TWO
DEPENDENT SAMPLES
x1  x2
t
 D   D 
TEST STATISTIC: 2 2
n
n ( n  1)
2

Where:

x1 is the mean of group 1;


x2 is the mean of group 2;
n is the total number of paired data/observations;
D is the difference between paired scores/data
ILLUSTRATION:
Management training programs are often instituted in
order to teach supervisory skills and thereby increase
productivity. Suppose a set of examinations was
administered to each of ten supervisors before such a training
program begins and then administers similar examinations at
the end of the program. The examinations are designed to
measure the supervisory skills with higher scores indicating
increased skills.
The results of the test are shown in the table below. Is the
training program effective?
Supervisor 1 2 3 4 5 6 7 8 9 10
Score Before 63 93 84 72 65 72 91 84 71 80

Score After 78 92 91 80 69 85 99 82 81 87
NULL HYPOTHESIS:

There is no significant difference between the mean scores


on supervisory skills posted by the participants before and
after the training.

ALTERNATIVE HYPOTHESIS:
The mean posttest score on supervisory skills is
significantly higher than the mean pretest score.

(There is a significant improvement in the mean score on


supervisory skills after the training).
WORKSHEET:

BEFORE AFTER
63 78
93 92
84 91
72 80
65 69
72 85
91 99
84 82
71 81
80 87
x=77.5 x=84.4
Data view

Click Variable View:


Under the column Name: Type pre_test, then
post_test as variable names

Variable Labels: Type “Pretest Score on


Supervisory Skills” for pre_test and “Posttest
Score on Supervisory Skills” for post_test and

Click: “Scale” under the column measure

Variable view
To analyze
the data:
Data view
Click the arrow button
to transfer the pretest
variable to the right
box.

Do the same for the


posttest variable

Click ok.
P-value

Based on the results of the analysis, the posttest mean score on


supervisory skills (84.4) is significantly higher than the pretest
mean (77.5), t = 4.022, df = 9, p = .003.
ONE WAY ANALYSIS OF
VARIANCE
What factors explain the variation of teachers’
emotional exhaustion (level of burnout)?

Let the circle below represent the total variation of the


teachers’ emotional exhaustion scores and suppose we
identify only one variable, marital status, as the factor that
contributes to such variation.

This total variation may be partitioned into two sources:


1. due to marital status
2. due to all other factors different from marital status
(lumped together and is called error)
PARTITION OF THE TOTAL VARIATION:

Explained Variation
(due to Marital Status)

Unexplained Variation (Error)


(Due to all other factors marital status excluded)

Note that the proportion of total variance explained by marital


status is very large compared to the proportion unexplained
(error variance). If marital status explains a large proportion of
the total variance, it means that the emotional exhaustion of
the teachers would differ across the categories of marital
status.
PARTITION OF THE TOTAL VARIATION:

Explained Variation
(due to Marital Status)

Unexplained Variation (Error)


(Due to all other factors marital status excluded)

On the other hand, if marital status explains only a small


proportion of the total variance, it means that the emotional
exhaustion of the teachers is not an important factor and the
emotional exhaustion scores are expected to be comparable
across the categories of marital status.
The test statistic that we can use to test the equality of
the mean exhaustion scores across categories of
marital status is the ratio of the variance due to marital
status divided by the variance due to error, i.e.,

Variance due to Marital Status


F
Variance due to Error

A large value of F would occur when the numerator is


large compared to the denominator. Hence, large
values of F (much greater than 1.0) would lead to the
rejection of the null hypothesis.
Research Question:
Do teachers’ emotional exhaustion vary across categories of
marital status?

Null Hypothesis:
There is no significant difference in the mean level of
emotional exhaustion among teachers with different marital
status.

Alternative Hypothesis:

There is a significant difference in the level of emotional


exhaustion among teachers with different marital status.
Sample Data:
single married widow/separated
Group 1 Group 2 Group 3
34 34 13
38 22 21
What are the variables of the study?
35 30 28
 Marital Status – independent variable
31 16 13
34 30 14  Emotional Exhaustion Score
34 14 16 – dependent variable
34 22 25
27 30 24
34 35 19
34 14 14
(encode the data using SPSS)
35 23 17
34 35 14
31 16
30 25
29 17
31
26
Encoded Data Using SPSS. Note that we only have two columns since there are only
two variables (Data View worksheet).
Click “Variable View” then do the following:
1. Under the column Name: Type e_score; then ms

2. Type the variable Label under the column “Label”:


for e_score - Emotional Exhaustion Score
for ms - Marital Status

3. Indicate the labels of the values of the categorical variable by clicking the
right portion of the cell under the column Values:
Thus: Value: 1 Label: Single - then click “Add”
Value: 2 Label: Married - then click “Add”
Value: 3 Label: Widow/Separated - then click “Add”

4. Under the column “Measure”,


click “Scale” for “e-score” since it is interval
click “Nominal” for “ms” since it is categorical
To analyze
the data:
Click “One way ANOVA” and declare the dependent variable by clicking the
button for “Dependent List” as well as the independent variable for the “Factor”.
The result is shown in the exhibit below.

For the desired output:


1. Click Options
For the desired output:
1. Click Options, click “Descriptive”;
Click “Homogeneity of variance test”
2. Click “Continue” to go back to the previous window, then click “Post Hoc”
for the pairwise comparison test.
3. Cliick “Scheffe” – most conservative pairwise comparison test
4. Click “Continue” to go back to the previous window.
5. Click “ok” to generate the outputs.
Output 1: descriptive statistics

Output 2: Test for equality of population variances

P-value
Output3: Test of equality among the population means

P-value

Output4: Pairwise comparison of two means


Output5: Clustering of the means
REPEATED MEASURES
ANALYSIS OF VARIANCE
Assumptions:

1. Scores in each population are normally


distributed around the population mean 

2. The population variances are equal


(Homogeneity of variance assumption)

3. The correlations among pairs of levels of the


repeated variable is constant (Assumption on
Sphericity).

4. Data are at least interval


Research Question:
Does repeated experience with the Licensure Examination for
Teachers (LET) lead to better scores, even without any
intervening study?
Null Hypothesis:
There is no significant difference in the mean scores of 15
students on four sessions of taking the LET examinations.

Alternative Hypothesis:
There is a significant difference in the mean scores of 15
students on four sessions of taking the LET examinations.
Data: Scores on four practice sessions

Session 1 Session 2 Session 3 Session 4


178 180 179 183
156 160 163 165
182 179 185 188
145 152 154 165
113 110 116 118
162 165 164 170
134 129 132 136
152 160 162 168
145 149 151 154
178 180 176 183
129 132 138 140
149 146 152 154
Data encoded using SPSS (Data View Worksheet)
Variable View Worksheet
1. Under the column “Name”, type Session1, Session2, Session3, Session4
2. Under the column “Measure”, click “Scale” for all variables
(The result is shown in the second exhibit below.)
To Analyze: Click Analyze, General Linear Model, Repeated Measures…
Result after clicking “Repeated Measures”
1. Type the within subjects variable: Session
2. Type the number of levels, 4 then click “Add”
3. Click Define (to obtain the exhibit at the right)
4. Transfer the 4 levels of Session to the right, one after the other by clicking the
arrow
Result after transferring the levels of “session” to the right:

NEXT: Click “Options”; Transfer the variable “Session” to the box “Display Means
for”;
Check “Compare Means”; Click “Bonferroni”; You may click “Descriptive” if you
want to show the means per session. (The result is shown in the exhibit at the
right), Click “Continue” to go back to the previous window; click “ok”.
Reading the outputs

The mean scores and standard deviations per session are indicated. Note
that the mean scores tend to increase as the number of times of taking
the LET examination increases.
P-value

Mauchly’s Test of Sphericity tests equality of variances of the


differences of paired scores in all combination of treatment levels.

If significant, sphericity assumption has not been met, we can use the
corrected ANOVA depending on the estimated value of sphericity
(epsilon); or we can use an alternative method of analyzing the data.

If not significant, assumption on sphericity has been met.

In the output, Mauchly’s test is not significant, 2 = 8.238,


p = .146.
When the estimated sphericity (epsilon in the output) is > .75, use Huynh-Feldt
values; When < .75, use Greenhouse-Geisser values. Since sphericity can be
assumed, the ANOVA table will be as follows:

P-value

Based on the results of the analysis, the null hypothesis is rejected,


F = 22.238, p < .001. The mean scores on the four sessions are
significantly different.
The pairwise comparison test using Bonferroni technique indicates
that the mean score posted in the first session is significantly lower
than the means during the 3rd and 4th sessions. The mean scores
during the 2nd and 3rd session are significantly lower than the mean
score during the 4th session. Overall, the mean score during the 4th
session is significantly higher than in the first three sessions.
WILCOXON RANK SUM TEST
(MANN WHITNEY U TEST)
Click “Analyze”  “Nonparametric Tests” to generate the
window shown at the right.

Using the window at the right, Click “Fields”


Transfer the variables to the appropriate boxes
Click “Run”
Output 1 (IBM SPSS): P-value

The null hypothesis is rejected because the p-value


is .016 (<  = .05)
Using the Legacy Dialogs

Click “2 Independent Samples”


Using the Legacy Dialogs

Transfer the variables to the appropriate boxes; Click “Define Groups”


and encode the codes for “Methods of Teaching”; Click “Continue”;
then Click “OK”
Output 2 Using the Legacy Dialogs: P-value

Based on the medians, it can be said that the median score posted by
the students exposed to the constructivist model is higher than those
exposed to the lecture method and this is significant at the .05 level of
significance, z = 2.312, p = .016. Hence, the null hypothesis is
rejected.
WILCOXON SIGNED RANKS TEST
Click “Analyze”  “Nonparametric Tests” to generate the
window shown at the right.

Using the window at the right, Click “Fields”


Transfer the variables to the appropriate boxes
Click “Run”
Output 1 (IBM SPSS):
P-value

The null hypothesis is rejected because the p-value


= .012 (<  = .05)
Using the Legacy Dialogs

Click “2 Related Samples”


Using the Legacy Dialogs

Transfer the paired variables to the appropriate boxes one at a time


(or by highlighting both and clicking the arrow button; Click “Options”
then check “Quartiles” to generate the medians; then Click “OK”
Output 2 Using the Legacy Dialogs:

P-value

The results indicate that the median postttest score (83.5) is higher
than the median pre-test score (76) and this is significant at the .05
level of significance, z = 2.501, p = .012. Hence, the null hypothesis
is rejected.
KRUSKAL WALLIS ONE WAY
ANOVA
Click “Analyze”  “Nonparametric Tests” to generate the
window shown at the right.

Using the window at the right, Click “Fields”


Transfer the variables to the appropriate boxes
Click “Run”
Output 1 (IBM SPSS):

P-value

The null hypothesis is rejected because the p-value


is less than .001 (highly significant)
Using the Legacy Dialogs

Click “K Independent Samples”


Using the Legacy Dialogs

Transfer the variables to the appropriate boxes one at a time; Click


“Define Range” then encode the smallest and highest codes for the
categorical independent variable; Click “Continue”; then Click “OK”
Using the Legacy Dialogs

To generate the median scores of the three groups, click “Analyze” 


“Compare Means”  “Means”. Next, transfer the variables to the
appropriate boxes, transfer the statistics at the right (Mean, Number
of Cases, Standard Deviation) to the left box, select “Median” statistic
from the left and transfer it to the right box; click “continue”; click
“OK”.
Output 2 Using the Legacy Dialogs:

P-value

The Kruskal Wallis test indicates that the median emotional exhaustion
scores of the three groups are significantly different, 2 = 22.535, d.f.
= 2, p < .001. Hence, the null hypothesis is rejected.
Pairwise Comparison Test:

We can run the Wilcoxon Rank Sum Test for each


pair of groups. Since there are three groups, we
will run this test three times (G1 vs G2), (G1 vs.
G3), (G2 vs. G3).

To control the level of significance  = .05, we


divide this value by the number of comparison
groups (3). Thus, .05/3 = 0.0167. Two groups
will be declared significantly different if the p-
value of the test statistic is less than 0.0167. This
is called the Bonferroni technique for comparing
groups.
Pairwise Comparison Test:

To control the error rate  = 0.05, we first divide


this value by the number of comparison groups
(3). Thus,
0.05
 0.0167
3

Hence, p-values less than 0.0167 indicates that


the two groups are significantly different.
Output 1:

The median emotional scores of the single and married teachers are
significantly different, z = 2.877, p = .004 (< .0167).
Output 2:

The median emotional scores of the single and widowed/separated


teachers are significantly different, z = 4.442, p < .001 (< .0167).
Output 3:

The median emotional scores of the married and widowed/separated


teachers are significantly different, z = 2.502, p = .012 (< .0167).
FRIEDMAN’S TWO WAY ANOVA
Click “Analyze”  “Nonparametric Tests” to generate the
window shown at the right.

Using the window at the right, Click “Fields”


Transfer the variables to the appropriate boxes by
clicking each variable and pressing the arrow button.
Click “Run”
Output 1 (IBM SPSS):
P-value

The null hypothesis is rejected because the


p-value is less than .001 (<  = .05)
Using the Legacy Dialogs

Click “K Related Samples”


Using the Legacy Dialogs

Transfer the paired variables to the appropriate boxes one at a time


(or by highlighting all and clicking the arrow button; click “Statistics”
and check quartiles to generate the medians, click “Continue” then
Click “OK”
Output 2 Using the Legacy Dialogs:

Results show that the median emotional exhaustion scores


of the three groups are significantly different. 2 = 25.900,
d.f. = 3, p < .001. Hence, the null hypothesis is rejected.
Pairwise Comparison Test Using the Bonferroni Approach

Compare two groups at a time using the Wilcoxon Signed Ranks Test

Click “Analyze” 
“Nonparametric
Tests”  “Legacy
Dialogues” ”2
Related Samples”
Pairwise Comparison Test Using the Bonferroni Approach

Select the pair of variables to be compared and transfer them to the


right box by clicking the arrow button.

Once done,
Click “OK”.
Outputs:

Criterion for declaring


significantly different pairs
using the Bonferroni
technique:

0.05
 0.0083
6
Hence two groups are
significantly different if the p-
value of the test statistic is
less than 0.0083

significant significant significant


THE CHI-SQUARE TEST
for Homogeneity of Samples
TWO INDEPENDENT SAMPLES
Example :

In a study conducted on the use of seat


belts in preventing fatalities, records of the
last 100 vehicular accidents were
reviewed. These 100 accidents involved
238 persons. Each person was classified
as using or not using seat belts when the
accident happened and as injured fatally or
a survivor.
Research Question:
Is there a significant difference in the proportion
of persons who are fatally injured between those
who wear seatbelts and those who do not?

Null Hypothesis:
There is no significant difference in the
proportion of persons who are fatally injured
between those who wear seatbelts and those
who do not.
Alternative Hypothesis (Non-directional):
There is a significant difference in the proportion
of persons who are fatally injured between those
who wear seatbelts and those who do not.

Alternative Hypothesis (Directional):


The proportion of persons who are fatally injured
is higher for those who do not wear seatbelts
than those who wear seatbelts.
Data:

Injured Wearing Seat Belt? Total


Fatally? Yes (1) No (0)
Yes (1) 9 88 97
No (0) 23 118 141
Total 32 206 238
Test Statistic:

o  e 
2
 
2

over all cells e
Where o is the “observed frequency” and e is the
expected frequency.

For contingency tables, the degrees of freedom is


given by
df= (r-1)(c-1)
where r is the number of categories of the row
variable while c is the number of categories of the
column variable.
The tabled data is actually a summary of the
responses of the respondents which can be
encoded directly using SPSS

• Use codes for the


responses: 1 – Yes; 0 – No

• Encode the frequency


counts in a column

• Put the corresponding


codes for the two variables

• So that SPSS will read the


data as frequency counts,
we have to set on the
“weight cases” command
1. Click “Data”; then “Weight Cases”
to show the window on top right.
2. Highlight the data “Freq”, Click “Weight cases by” then Click
the arrow button; Click “ok”
Check if “Weight On” already appears at the bottom right corner
of the worksheet
To Analyze the data: Click “Analyze” ”Descriptive Statistics”
 “Crosstabs” to obtain the window at the right.

Transfer the row variable “Injured Fatally” and the column


variable “Wearing Seatbelts” by highlighting them one at a
time and clicking the arrow button.
Click “Statistics” then check “Chi-
square”, then click “Continue”;
Click “Cells” then check “Column”;
then Click “Continue”; then Click
“OK”
Output (Descriptive Statistics):

Note that 28.1% of those who wore seatbelts were


injured fatally while 42.7% of those who did not wear
seatbelts were also injured fatally. Is 42.7% significantly
higher than 28.1%?
Output (Inferential Statistics):

If there is no cell with expected frequency of less than 5, we report the


Pearson Chi-square value. Since 2 = 2.443 and the associated
probability is .118 (which is greater than  = .05), the null hypothesis
is NOT rejected. There is no sufficient evidence to conclude that
wearing seat belts significantly reduces fatal injuries.
Assumption on the use of the Chi-square Test:

“No more than 20% of the cells must have expected


frequencies of less than 5”

For 2x2 tables, if the smallest expected frequency is less


than 5, or for very small sample size N  20, use the FISHER
EXACT PROBABILITY TEST.
In the SPSS output shown in the previous slide, there is no cell with
expected frequency of less than 5, so reporting the Chi-square value is
correct. If there is a cell with expected frequency of less than 5, we
have to report the Fisher’s Exact Test. Note that the Fisher’s Exact
Probability is .084 which is greater than 0.05 criterion. Hence the null
hypothesis is also NOT rejected.
FISHER EXACT PROBABILITY TEST

Do male and female presidents of the 10 State Colleges and


Universities (SUCs) differ on their opinion regarding the integration of
CHED supervised tertiary schools to the different SUCs in the region?
Each president was asked if he/she is for or against the planned
integration of the schools to the SUCs. The data are shown in the table
below

Based on the data, 4 out of 5 or 80% of the male presidents are in


favour while only 2 out of the 5 pr 40% of the female presidents are in
favour.
FISHER EXACT PROBABILITY TEST

Null Hypothesis:
There is no significant difference between the proportion of male and
female SUC presidents who are in favor of integrating the CHED
supervised tertiary schools to the different SUCs

Alternative Hypothesis (Directional):


The proportion of male presidents who are in favour of the integration
is significantly higher than the female presidents.

Alternative Hypothesis (Non-Directional):


There is a significant difference between the proportion of male and
female SUC presidents who are in favor of integrating the CHED
supervised tertiary schools to the different SUCs
FISHER EXACT PROBABILITY TEST
FISHER EXACT PROBABILITY TEST
FISHER EXACT PROBABILITY TEST

Hence, the probability (one-tailed) of obtaining a frequency


distribution as extreme or more extreme than the one
observed when the null hypothesis is true is given by

P = 0.238 + 0.0238 = 0.262

Since 0.262 is greater than  = 0.05, the null hypothesis is


not rejected. This means that the proportion of male and
female presidents who are in favor of the integration of
CHED supervised tertiary schools to the SUCs are not
significantly different.
FISHER EXACT PROBABILITY TEST
SPSS OUTPUT:
THREE INDEPENDENT SAMPLES
Example :

Suppose we want to compare the effectiveness of three methods of


teaching physics namely, Lecture Method (Method 1), Modular
Method (Method 2) and Using CAI Materials (Method 3). Students
are randomly assigned to three groups and the three teaching
methods are randomly assigned students to the three groups also.

The dependent variable in this case is the performance of the


students in the final examination in Physics. If the data are scores,
the One way ANOVA will be applicable. But suppose the
performance of the student is categorized into one of the following
categories: Below Satisfactory (score of 74 and below), Fair (75 –
79), Satisfactory (80 – 84), and Above Satisfactory (85 and above).
THREE INDEPENDENT SAMPLES

It would be interesting to determine how many of the


students in each group would have scores falling within
each of these four categories. A comparison of the
frequencies or proportions can be done descriptively. But
if we want to test whether the proportions within each
category of the dependent variable differ significantly
from one another, the Chi-Square test of significance can
be used.
THREE INDEPENDENT SAMPLES
Ho: The distribution of grades of students exposed to the
three teaching methods will not differ significantly. (or
There is no significant difference between the
proportion of students in each of the three groups
who obtained Above Satisfactory ratings)

Ha: The distribution of grades of students exposed to the


three teaching methods will differ significantly. (or
There is a significant difference between the
proportion of students in each of the three groups
who obtained Above Satisfactory ratings)
THREE INDEPENDENT SAMPLES

Performance Method of Teaching TOTAL


Category
Lecture Modular CAI
(3) (2) (1)
Above Satisfactory (4) 9 20 18 47
Satisfactory (3) 12 18 21 51
Fair (2) 15 10 8 33
Below Satisfactory (1) 24 12 6 42
TOTAL 60 60 53 173
The tabled data is actually a summary of the
responses of the respondents which can be
encoded directly using SPSS
• Use codes for the responses:
• Perf_Cat: 1 – Above Satisfactory
2 – Satisfactory
3 – Fair
4 – Below Satisfactory
• Method: 1 – Lecture
2 – Modular
3 – CAI

• Encode the frequency counts in a


column

• Put the corresponding codes for the


two variables

• So that SPSS will read the data as


frequency counts, we have to set on
the “weight cases” command
1. Click “Data”; then “Weight Cases”
to show the window on top right.
2. Highlight the data “Freq”, Click “Weight cases by” then Click
the arrow button; Click “OK”
Check if “Weight On” already appears at the bottom right corner
of the worksheet
To Analyze the data: Click “Analyze” ”Descriptive Statistics”
 “Crosstabs” to obtain the window at the right.

Transfer the row variable “Performance Category” and the


column variable “Method of Teaching” by highlighting them
one at a time and clicking the arrow button.
Click “Statistics”  check “Chi-
square”; click “Continue”;
Click “Cells”  check “Column”;
Click “Continue”; then Click “OK”
Output (Descriptive Statistics):

Among the three groups, those exposed to CAI method


posted the highest proportion (34.0%) followed by those
exposed to Modular (33.3%), then by those exposed to
the Lecture Method (15.0%).
Output (Inferential Statistics):

If the number of cells with expected frequency of less than 5 is only


20% or less, we report Pearson Chi-square value. Since 2 = 20.647
and the associated probability is p = .002 (which is less than  = .05),
the null hypothesis is rejected.

If more than 20% of the cells have expected frequencies of less than 5,
either we collapse categories (provided the resulting category has
meaning) or we gather more data.
MCNEMAR TEST
McNemar Test

The McNemar test for the significance of changes


is applicable to “before-and-after” designs in
which each subject is its own control and in which
the measurements are made on either a nominal
or ordinal scale.
McNemar Test

To test the significance of any observed change


using this test, a fourfold table of frequencies is
used to represent the first and second sets of
responses from the same individuals. In this
table, + and – signs are used to denote different
responses arranged as shown below:
McNemar Test
After
– +
Before + A B
– C D

A - denotes the number of individuals whose responses were


POSITIVE on the first measure and NEGATIVE on the second
measure;

D - the number of individuals whose responses changed from


NEGATIVE to POSITIVE

B and C are the respondents who responded the same (POSITIVE


for B and NEGATIVE for C) on both measures.
EXAMPLE:

How consistent are people in their voting habits? Do people vote for
the same party from election to election? Below are the results of a
poll in which people were asked if they had voted for NP or LP in each
of the last two presidential elections.

1992 1998 Elections Total


Elections LP NP
NP 27 117 144
LP 178 23 201
Total 205 140 345
144
Note: % of people who voted for NP in 1992 was  41.74%
345

140
% of people who voted for NP in 1998 was  40.58%
345
Ho: There is no significant difference between the
proportion of those who voted for NP during
the 1992 elections and those who voted the
same party affiliation during the 1998
elections.

H1: The proportion of those who voted for NP


during the 1998 elections was significantly
lower than those who voted for NP during the
1992 elections.
We use the following codes for the party affiliation:
1 – NP; 2 – LP and encode the data as follows:

1992 1998 FREQ


1 2 27
1 1 117
2 2 178
2 1 23

We encode the data in similar manner as we did for the Chi-square test
To Analyze the data: Click “Analyze”
”Nonparametric Tests”  “Legacy
Dialogs”  “2 Related Samples” to
obtain the window shown in the nesxt
slide.
Transfer the paired variables to the right; Uncheck
“Wilcoxon”, Check “McNemar”, then Click “OK”.
IBM SPSS

LEGACY DIALOGS
COCHRAN’S Q TEST
Assumptions:

1. Responses are binary and from k matched


groups.

2. The subjects are independent of one another and


were selected at random from a larger population.

1. The sample size is sufficiently “large”. (A rule of


thumb: the number of subjects for which the
responses are not all 0’s or 1’s, n, should be  4
and nk  24. This assumption is not required for
the exact binomial McNemar Test.
Example:

An experimental study was conducted to


identify the method of teaching that would best
improve the conceptual understanding of the
students in Chemistry. Fifteen (15) sets of matched
individuals were selected and randomly assigned to
the three groups. The dependent variable of the
study was the students’ performance in the test to be
given after the experiment. Each student’s
performance was coded as 1 if the student passes
the test and 0 if he fails. The data are shown below.
DATA Pair METHOD A METHOD B METHOD C
1 1 1 1
1 – PASS 2 1 0 1
0 - FAIL 3 0 0 0
4 1 1 0
5 0 0 1
6 0 0 0
7 1 0 0
8 1 1 0
9 1 0 1
10 1 0 0
11 0 0 0
12 1 0 0
13 1 1 1
14 1 0 1
15 0 0 0
Ho: There is no significant difference in the
proportion of subjects who pass the test in each
of the three groups.

H1: There is a significant difference in the proportion


of subjects who pass the test in each of the
three groups
Click “Analyze”  “Nonparametric Tests” to generate the
window shown at the right.

Using the window at the right, Click “Fields”


Transfer the paired variables to the right, then click “RUN”.
IBM SPSS
Using the Legacy Dialogues:.
Using the Legacy Dialogues:.

Transfer the paired variables


to the right; Uncheck
“Friedman” then Check
“Cochran’s Q”  Click
“Statistics”  Check
“Descriptives”  Click “OK”
LEGACY DIALOGS

Based on the output, the three proportions (66.7%, 26.7%, and 40.0%)
are significantly different, Q = 6.222, d.f. = 2, p-value = .045.
PAIRWISE COMPARISON USING THE MCNEMAR TEST:

Click “Analyze”  Click “Nonparametric Tests”  “Legacy Dialogues”  “2


Related samples”. Then transfer the paired variables to the right one at a
time. Check “McNemar”, then click “OK”.
Outputs:

Using the Bonferroni technique with the


criterion set at .05/3 = 0.0167, no two
groups are significantly different. Note that
the Cochran’s Q Test was significant yet
Mcnemar using Bonferroni approach failed to
detect two groups which are significantly
different.

McNemar is less powerful since only the


discordant pairs are used in the computation
of the test statistics
ALTERNATIVE TO MCNEMAR: Minimum Required Difference (MRD)

Where k = number of groups; N = number of paired cases


ALTERNATIVE TO MCNEMAR: Minimum Required Difference (MRD)
Pair METHOD A METHOD B METHOD C Sum Sum2
N  k 
1
2
1
1
1
0
1
1
3
2
9
4
T   


i 1 

Yi , j   20

3 0 0 0 0 0
j 1 
4 1 1 0 2 4 2
5 0 0 1 1 1 N  k 
6 0 0 0 0 0 R  


i 1 

Yi , j   42

7 1 0 0 1 1 j 1 
8 1 1 0 2 4
9 1 0 1 2 4 .05
10 1 0 0 1 1 adj   0.0167
3
11 0 0 0 0 0
12 1 0 0 1 1
zadj  2.13
13 1 1 1 3 9
14 1 0 1 2 2
15 0 0 0 0 0 Obtained using the
Total 10 4 6 20 42 normal table
% 66.7% 26.7% 40.0%
ALTERNATIVE TO MCNEMAR: Minimum Required Difference (MRD)

 kT  R 
MRD  zadj 2 2 
 N k (k  1) 

 (3)(20)  42 
 (2.13) 2  2 
 15 (3)(2) 
 .3478
 34.78%
Comparison i j Absolute Minimum Decision
Difference Required
Absolute
Difference

A vs. B 66.7 26.7 40.0 34.7 Reject


A vs. C 66.7 40.0 26.7 34.7 Do not reject
B vs. C 26.7 40.0 13.3 34.7 Do not reject

Conclusion: Only the effects of Method A and Method B are significantly different.
Thank you !
Establishing Relationship Between Variables:
(Descriptive Correlational Research Designs)

Variable X Variable Y Measure of Correlation


At Least Interval At least Interval Pearson’s r
At Least Ordinal At Least Ordinal Spearman’s Rho
Nominal At Least Interval Point-Biserial
(Dichotomous)
Nominal At Least Interval Eta (Measure of Non-linear Correlation)
(At least Three (Dependent Variable)
Categories)
Categorical Categorical Chi-Square Based Measures
Contingency Coefficient (r x r tables)
Cramer’s Coefficient (r x c tables)
Phi Coefficient (2 x 2 tables)
Categorical Categorical PRE-Based Measure
(Unordered) (Unordered) Lambda Coefficient
Categorical Categorical PRE-Based Measure
(Ordered) (Ordered) Gamma

Potrebbero piacerti anche