Sei sulla pagina 1di 43

Lesson 1

Measures of Central Tendency:


The Mean, Median, and Mode
One of the most basic purposes of statistics is simply to enable us to make sense of
large numbers. For example, if you want to know how the students in your school are
doing in the statewide achievement test, and somebody gives you a list of all 600 of
their scores, thats useless. This everyday problem is even more obvious and
staggering when youre dealing, lets say, with the population data for the nation.
Weve got to be able to consolidate and synthesize large numbers to reveal their
collective characteristics and interrelationships, and transform them from an
incomprehensible mass to a set of useful and enlightening indicators.
The Mean
One of the most useful and widely used techniques for doing thisone which you
already knowis the average, or, as it is know in statistics, the mean. And you
know how to calculate the mean: you simply add up a set of scores and divide by the
number of scores. Thus we have our first and perhaps the most basic statistical
formula:

Where:
(sometimes call the X-bar) is the symbol for the mean.
(the Greek letter sigma) is the symbol for summation.
X is the symbol for the scores.
N is the symbol for the number of scores.
So this formula simply says you get the mean by summing up all the scores and
dividing the total by the number of scoresthe old average, which in this case were
all familiar with, so its a good place to begin.
This is pretty simple when you have only a few numbers. For example, if you have
just 6 numbers (3, 9, 10, 8, 6, and 5), you insert them into the formula for the mean,
and do the math:

But we usually have many more numbers to deal with, so lets do a couple examples
where the numbers are larger, and show how the calculations should be done. In our
first example, were going to compute the mean salary of 36 people. Column A of
Table 1 show the salaries (ranging from $20K to $70K), and column B shows how
many people earned each of the salaries.
Table 1
Example 1 of Method for Computing the Mean
A B C
Salary (X) Frequency (f) fX
$20k 1 20
$25K 2 50
$30K 3 90
$35K 4 140
$40K 5 200
$45K 6 270
$50K 5 250
$55K 4 220
$60K 3 180
$65K 2 130
$70K 1 70
Sum 36 1,620

To get the for our formula, we multiply the number of people in each salary
category by the salary for that category (e.g., 1 x 20, 2 x 25, etc.), and then total those
numbers (the ones in column C). Thus we have:

And this is how the distribution of these salaries looks:
Figure 1
Distribution of Example 1 Salaries

The scores in this distribution are said to be normally distributed, i.e., clustered
around a central value, with decreasing numbers of cases as you move to the extreme
ends of the range. Thus the term normal curve.
So, computing the mean is pretty simple. Piece of cake, right? Not so fast.
In our second example, lets look what happens if we change just six peoples salary
in Table 1. Lets suppose that the three people who made $60K actually made $220K,
and that the two who made $65K made $205K, and the one person who made $70K
made $210K. The revised salary table is the same except for these changes.
Table 2
Example 2 of Method for Computing the Mean
A B C
Salary (X) Frequency (f) fX
$20k 1 20
$25K 2 50
$30K 3 90
$35K 4 140
$40K 5 200
$45K 6 270
$50K 5 250
$55K 4 220
$200K 3 600
$205K 2 410
$210K 1 210
Sum 36 2,460
But before we recompute the mean, lets look at how different the distribution looks.
Figure 2
Distribution of Example-2 Salaries

Now, using the revised numbers in Table 2, we compute the mean as follows:

What this shows is that changing the salaries of just six individuals to extreme values
greatly affects the mean. In this case, it raised the mean from $45K to $68.3K (an
increase of 52%), even though all the other scores remained the same. In fact, the
mean is a figure that no person in the group hashardly a figure we would think of
as "average" for the group.
The important lesson here is that the mean is intended to be a measure of central
tendency, but it works usefully as such only if the data on which it is based are more
or less normally distributed (like in Figure 1). The presence of extreme scores distorts
the mean, and, in this case, gives us a mean salary ($68.3K) that is not a very good
indication of the "average" salary of this group of 36 individuals.
So if we know or suspect that our data may have some extreme scores that would
distort the mean, what measure can we use to give us a better measure of central
tendency? One such measure is the median, and we move on to learn about that now.
The Median
If your data are normally distributed (like those in Figure 1), the preferred measure
of central tendency is the mean. However, if your data are not normally distributed
(like those in Figure 2), the median is a better measure of central tendency, for
reasons well see in a moment.
The median is the point in the distribution above which and below which 50% of the
scores lie. In other words, if we list the scores in order from highest to lowest (or
lowest to highest) and find the middle-most score, thats the median.
For example, suppose we have the following scores: 2, 12, 4, 11, 3, 7, 10, 5, 9, 6. The
next step is to array them in order from lowest to highest.
2
3
4
5
6
7
9
10
11
12
Since we have 10 scores, and 50% of 10 is 5, we want the point above which and
below which there are five scores. Careful. If you count up from the bottom, you
might think the median is 6. But thats not right because there are 4 scores below 6
and 5 above it. So how do we deal with that problem? We deal with it by
understanding that in statistics, a measurement or a score is regarded not as a point
but as an interval ranging from half a unit below to half a unit above the value. So in
this case, the actual midpoint or median of this distributionthe point above which
and below which 50% of the scores lieis 6.5
As we saw with the mean, when we have only a few numbers, its pretty simple. But
how do we find the median when we have larger numbers and more than one person
with the same score? Its not difficult. Lets use the salary data in Table 1.
Table 3
Example 1 of Method for Computing the Median
Salary Range Frequency
$20K $19.5K-20.5K 1
$25K $24.5K-25.5K 2
$30K $29.5K-30.5K 3
$35K $34.5K-35.5K 4
$40K $39.5K-40.5K 5
$45K $44.5K-45.5K 6
$50K $49.5K-50.5K 5
$55K $54.5K-55.5K 4
$60K $59.5K-60.5K 3
$65K $64.5K-65.5K 2
$70K $69.5K-70.5K 1
Sum 36
The salaries are already in order from lowest to highest, so the next step in finding
the median is to determine how many individuals (ratings, scores, or whatever) we
have. Those are shown in the frequency column, and the total is 36. So our N = 36,
and we want to find the salary point above which and below which 50%, or 18, of the
individuals fall. If we count up from the bottom through the $40K level, we have 15,
and we need three more. But if we include the $45K level (in which there are 6), we
have 21, three more than we need. Thus, we need 3, or 50%, of the 6 cases in the $45K
category. We add this value (.5) to the lower limit of the interval in which we know
the median lies ($44.5K-$45.5K), and this gives us value of $45K.
In this case, the mean and the median are the sameas they always are in normal
distributions. So in situations like this, the mean is the preferred measure.
But things arent always so neat and tidy. Lets now compute the median for the
salary data in Table 2, which we know (from Figure 2) are not normally distributed.
Table 4
Example 2 of Method for Computing the Median
Salary Range Frequency
$20k $19.5K-20.5K 1
$25K $24.5K-25.5K 2
$30K $29.5K-30.5K 3
$35K $34.5K-35.5K 4
$40K $39.5K-40.5K 5
$45K $44.5K-45.5K 6
$50K $49.5K-50.5K 5
$55K $54.5K-55.5K 4
$200K $199.5K-200.5K 3
$205K $204.5K-205.5K 2
$210K $209.5K-210.5K 1
Sum 36
The N is the same (36), so we go through exactly the same calculations we did for the
data in Table 3. When we do that (count up from the bottom, find that we need half
the cases in the $45K category to get 50% (18) of the total, and do so by adding .5 to
the lower limit of that category), incredibly we get exactly the same result ($45K) we
did with the data in Table 3. In other words, those six extreme cases (the six whose
salaries changed from $60K, $65K, and $70K to $200K, $205K, and $210K) dont affect
the median even though they made a big change in the mean. They are still above the
midpoint, and it doesnt matter how much above it in the calculation of the median.
This example illustrates dramatically what the median is and why its a better
measure of central tendency than the mean when we have extreme scores.
Weve done the calculations for the median in a simple, descriptive way (arraying the
scores from high to low, counting up to the mid-category, dividing it as necessary,
etc.), but just so you wont feel slighted, here is the statistical formula for doing what
weve just done.

Where:
Mdn is the median.
L is the lower limit of the interval containing the median.
N is the total number of scores.
is the sum of the frequencies or number of scores up to the interval containing
the median.
fw is the frequency or number of scores within the interval containing the median.
i is the size or range of the interval.
The Mode
The third and last of the measures of central tendency well be dealing with in this
course is the mode. Its very simple: The mode is the most frequently occurring score
or value. In our case (see Figures 1 and 2), that value is 45K. But sometimes we may
have odd distributions in which there may be two peaks. Even if the peaks are not
exactly equal, theyre referred to as bi-modal distributions.
Lets assume we have such a bi-modal distribution of salaries as shown in Table 5
and Figure 3.
Table 5
Bi-Modal Distribution of Salaries
A B C
Salary (X) Frequency (f) fX
$20K 1 20
$25K 3 75
$30K 4 120
$35K 6 210
$40K 3 120
$45K 1 45
$50K 3 150
$55K 5 275
$60K 6 360
$65K 3 195
$70K 1 70
Sum 36 1,640

Figure 3
Example of a Bi-Modal Distribution


Before we talk about the mode, using the formulas and calculation procedures youve
just learned, calculate the mean and median for the salaries in Table 5 (the fx and the
data are in Column C).
When you look at this distribution of salaries, as shown graphically in Figure 3, its
hard to discern any central tendency. The mean (which you just calculated) is $45K,
which only one person earns, and the median is also $45K, which, while its the
middle-most value (50% of the cases are above and below it), certainly doesnt give
us a meaningful indication of the central tendency in this distributionbecause
there isnt any.
Therefore, the most informative general statement we can make about this
distribution is to say the it is bi-modal.
You now know the three principal measures of central tendencythe mean, the
median, and the modewhen they should be used, and how to calculate them, so we
now move on to the other side of the central-tendency coin: dispersion.







Lesson 2
The Standard Deviation and the Normal Curve
A Measure of Dispersion: The Standard Deviation
For various important reasons we'll see as we get further into this course, we often
want to know not only what the central tendency is in a set of scores or values (i.e.,
the mean, the median, or the mode), we also want to know how bunched up or
spread out the scores are. The most widely used indicator of dispersion is the
standard deviation which, in a nutshell, is based on the deviation of each score from
the mean.
To illustrate, compare the distribution of test scores in Figures 4 and 5. The first is
flat and spread out, while the second is concentrated and bunched up closely around
the mean.
Figure 4
Graphic Display of Flat or Spread-Out Score Distribution

Figure 5
Display of a Narrow or Concentrated Distribution

Note that he mean and median of these two quite different distributions are the same
( = 150, Mdn = 150), so simply calculating and reporting those two measures of
central tendency would fail to reveal how different the dispersion of scores is
between the two groups. But we can do this by calculating the standard deviation.
The standard deviation provides us with a measure of just how spread out the scores
are: a high standard deviation means the scores are widely spread; a low standard
deviation means they're bunched up closely on either side of the mean.
We'll now calculate the standard deviation for both these distributions. The formula
for the standard deviation is:

Where:
(little sigma) is the standard deviation.
d
2
is a score's deviation from the mean squared.
is the number of cases.
The numbers we need to calculate the standard deviation for Figure 4, the flat
distribution, are in Table 6.
Table 6
Data for Figure 4the Flat Distribution
A B C D E
Test Score (X) Frequency (f) XMean (d) fd fd
2

100 8 50 400 20,000
110 13 40 520 20,800
120 17 30 510 15,300
130 20 20 400 8,000
140 21 10 210 2,100
150 22 0 0 0
160 21 -10 -210 2,100
170 20 -20 -400 8,000
180 17 -30 -510 15,300
190 13 -40 -520 20,800
200 8 -50 -400 20,000
SUM 180

132,400
Column A displays the test scores (X).
Column B shows how many people got each test score (f).
Column C is the test score minus the mean (X minus the mean or d).
Column D is the sum of the deviations in column C (fd).
Column E contains the squares of all the deviations.
Of course, to get the deviation of each score from the mean (column C), we have to
calculate the mean, and you already know how to do that. We now have what we
need to calculate the standard deviation for the flat distribution in Figure 4:
or
You can do the last part of this calculation, the square root of 132,400/180 (which is
736) by using the square-root button on your little hand calculator.
Now let's compute the standard deviation for the data in Figure 5. The data are in
Table 7, and you follow the same steps we've just completed.
Table 7
Example of a Narrow or Concentrated Distribution
A B C D E
Test Score (X) Frequency (f) X - Mean (d) fd fd
2

100 0 50 0 0
110 0 40 0 0
120 0 30 0 0
130 10 20 200 4,000
140 45 10 450 4,500
150 70 0 0 0
160 45 -10 -450 4,500
170 10 -20 -200 4,000
180 0 -30 0 0
190 0 -40 0 0
200 0 -50 0 0
SUM 180

17,000
or
The two standard deviations provide a statistical indication of the how different the
distributions are: 27 for the spread-out distribution and 10 for the bunched-up
distribution.
So once we know the mean and median, why do we need to know the standard
deviation? What use is it?
The standard deviation is important because, regardless of the mean, it makes a great
deal of difference whether the distribution is spread out over a broad range or
bunched up closely around the mean. For example, suppose you have two classes
whose mean reading scores are the same. With only that information, you would be
inclined to teach the two classes in the same way. But suppose you discover that the
standard deviation of one of the classes is 27 and the other is 10, as in the examples
we just finished working with. That means that in the first class (the one where
27), you have many students throughout the entire range of performance. You'll need
to have teaching strategies for both the gifted and the challenged. But in the second
class (the one where = 10), you don't have any gifted or challenged students.
They're all average, and your teaching strategy will be entirely different.
The Normal Curve
Before we leave the standard deviation, it's a good time to learn a little more about
the normal curve. We'll be coming back to it later.
First, why is it called the normal curve? The reason is that so many things in life are
distributed in the shape of this curve: IQ, strength, height, weight, musical ability,
resistance to disease, and so on. Not everything is normally distributed, but most
things are. Thus the term normal curve.
In Figure 6, we have a set of scores which are normally distributed. The range is from
0 to 200, the mean and median are 100, and the standard deviation is 20. In a normal
curve, the standard deviation indicates precisely how the scores are distributed. Note
that the percentage of scores is marked off by standard deviations on either side of
the mean. In the range between 80 and 20 (thats one standard deviation on either
side of the mean), there are 68.26% of the cases. In other words, in a normal
distribution, roughly two thirds of the scores lie between one standard deviation on
either side of the mean. If we go out to two standard deviations on either side of the
mean, we will include 95.44% of the scores; and if we go out three standard
deviations, that will encompass 98.74% of the scores; and so on.
Another way to think about this is to realize that in this distribution, if you have a
score thats within one standard deviation of the mean, i.e., between 80 and 120, thats
pretty averagetwo thirds of the people are concentrated in that range. But if you
have a score thats two or three standard deviations away from the mean, that is
clearly a deviant score, i.e., very high or very low. Only a small percent of the cases
lie that far out from the mean.
This is valuable to understand in its own right, and will become useful when we take
up determining the significance of difference between meanswhich were going to
do next in Lesson 3.
Figure 6
Normal Curve Showing the Percent of Cases Lying Within 1, 2, and 3 Standard
Deviations From the Mean


















Lesson 3
Testing the Difference Between Means: The t-Test
This is one of the most important parts of this course in basic statistics. Here were
going to learn about testing the significance of difference between means. What does
that mean?
Suppose youre the superintendent, and one of your principals bursts into your office
enthusiastically and says, "I know youll be happy to learn that after our big effort
this year in reading, my third graders improved from 187 to 195 on the state reading
test!"
You immediately ask her, "Is the 8-point difference between those means statistically
significant?" When her eyes glaze over and she says, "Huh?" you smile,
forebearingly, (because youve taken this course in basic statistics, and she hasnt),
and you patiently explain to her that simply because there is a numerical difference
between last years and this years mean scores doesnt mean that there is real
difference. It could be due to chance variation in the scores.
So how do we know when the difference between two means is probably a real
difference, not one due to chance? We have to say "probably" because nothing in
statistics is absolutely certain (as is the case with most things in life). But there are
statistical tests which can tell us how likely a difference between two means is due to
chance.
One of the most widely used statistical methods for testing the difference between
means, and the one were going to get you up-to-speed on, is called the t-test.
Lets go back to the salary data we worked with in Table 1 of Lesson 1, but now lets
compare the mean salary of that group with another group, and ask whether the
mean salaries of the two groups are significantly different.
First, lets look at the formula for the t-test, and determine what we need to make the
computation:

Where:
is the mean for Group 1.
is the mean for Group 2.
is the number of people in Group 1.
is the number of people in Group 2.
is the variance for Group 1.
is the variance for Group 2.
The only thing in this formula youre not familiar with is the symbol s
2
, which stands
for the variance. The variance is the same as the standard deviation without the
square root, i.e., its nothing more than the sum of the deviations of all the scores
from the mean divided by n-1.
The formula above is for testing the significance of difference between two
independent samples, i.e., groups of different people. If we wanted to test the
difference between, say, the pre-test and post-test means of the same group of people,
we would use a different formula for dependent samples. That formula is:

Where:
is the sum of all the individuals pre-post score differences.
is the sum of all the individuals pre-post score differences squared.
is the number of paired observations.
But for now, well test the significance of difference between the mean salary of two
different groups. You can try the one for dependent samples on your own. (I knew
youd welcome that opportunity.)
Tables 8 and 9 provide the numbers we need to compute the t-test for the difference
in mean salaries of the two groups.
Table 8
Salaries and t-Test Calculation Data for Group 1
A B C D E
Salary (X) Frequency (f) X - Mean (d) fd fd2
20 1 25 25 625
25 2 20 40 800
30 3 15 45 675
35 4 10 40 400
40 5 5 25 125
45 6 0 0 0
50 5 -5 -25 125
55 4 -10 -40 400
60 3 -15 -45 675
65 2 -20 -40 800
70 1 -25 -25 625
SUM 36

5,250
The variance (s
2
)
Table 9
Salaries and t-Test Calculation Data for Group 2
A B C D E
Salary(X) Frequency (f) X - Mean (d) fd fd
2

20 0 27 0 0
25 2 22 44 968
30 3 17 51 867
35 3 12 36 432
40 4 7 28 196
45 6 2 12 24
50 6 -3 -18 54
55 5 -8 -40 320
60 3 -13 -39 507
65 2 -18 -36 648
70 2 -23 -46 1,058
SUM 36

5,074
The variance (s
2
)
You can see from a quick inspection of the two tables that the salary distributions are
similar. There a few more people making higher salaries. The mean of the second
group (which has been calculated for you) is slightly higher (47 vs. 45 for the first
group). And the variance is smaller (145 vs. 150). So lets plug the numbers into the t-
test formula and see what we get.

We now know that t = .222. So what does that mean? Is the difference between the
two means statistically significant or not? To find out whether a t-test of any value is
significant or not, we simply look it up in a table that can be found in the appendices
of any statistical text book. The quick answer in this case is no, it is not statistically
significant. That is, the 2-point difference in the mean salaries of these two groups
could likely have occurred by chance.
But thats the quick and dirty answer. Theres more about the matter of statistical
significance we need to understand. So were going to that important topic now, and
well return to this example after weve done that.















Lesson 4
Statistical Significance and
The Type I and Type II Errors
Certainty and UncertaintyUniverses and Samples
Why do we have to use statistical tests, anyway? When we have two groups with
different means, why cant we just say that one is higher than the other, and thats it?
The reason is that the difference between the means of the two groups may be due to
chance, and if we were to make the comparison again, the difference might be turned
around.
How can that be? The two main reasons are sampling and measurement error. The
particular sample we have may not be representative of the universe from which it is
drawn. Also, tests and measuring instruments are not perfect.
For example, suppose that within the next hour we could somehow magically
measure the height of every adult man and woman in the world, and we found that
the mean height of the men was 56", and the mean for the women was 53". Since we
have measured the entire universe of adult men and women, those are the averages,
not estimates of them based on samples. We dont need to run a t-test to see if the 3"
difference between the means is statistically significant. That is the difference.
But ifas is almost always the case in whatever we dowe have to use a sample, we
have to account for the fact that the sample, no matter how carefully drawn, may not
be representative of the universe. Usually it is, but sometimes its not.
A good way to understand this important point is to realize that if we were to take
100 random samples of a 1,000 people each, the means of those samples would form a
normal curve (just like the ones we worked on in Lesson 2). In other words, the
means of some of those samples would be as much (or more) than 3 standard
deviations on either side of the collective mean of the 100,000 people.
When we take just one sample, which is what we usually have to work with, the
chances are its close to the real mean, simply because most of the values are
clustered close to the mean (remember, 68% of the values are within 1 standard
deviation from the mean). But we cant be sure. The sample were working with just
might be one of those thats lying out at the extremes of the normal curve.
Thats why we have tests of statistical significance. They cant tell us for sure
whether the means were comparing are close to the true mean, but they can give us a
good estimate or probability of whether thats the case.
Scientific Knowledge and the Null Hypothesis
As youve probably realized by now, scientists and statisticians understand that error
and uncertainty are inevitable, but theyre very uncomfortable with it. Thus, one of
the basic tenets of science, which is reflected in statistics, is the requirement that
nothing be admitted into the body of scientific knowledge unless were as sure as we
can be that its true. In other words, there is a strong conservative bias in science and
statistics. Scientists would rather be guilty of waiting until theres more evidence to
be sure than to accept a finding prematurely and be wrong. In statistics, this takes the
form of what is called the "null hypothesis." Basically, the null hypothesis says that
whenever you are, for example, setting out to compare the difference between two
means, you begin with the assumptionindeed, the assertionthat there is no
difference between the means. And in order to conclude that there is a difference,
your task is to disprove the null hypothesis.
Levels of Significance
Now this leads to a very difficult decision. And to understand the difficulty, lets first
go back to the t-test of the two means we ran in Lesson 3. We found that, for that test,
t = .222. In order to find out if the difference between the means is statistically
significant (i.e., how likely it is that it is due to chance), we look up the value of t in
one of the statistical significance table that are found in the appendices of all
statistics texts. The t-test table we need is reproduced below.
Table 10
t-Test Values Required to Reject the Null Hypthothesis at the .05 and .01 Levels of
Confidence (Two-Tailed Test)
__________________________________________________________________________
Degrees of Freedom (df) .05 .01
__________________________________________________________________________
20 2.09 2.85
21 2.08 2.83
22 2.07 2.82
23 2.07 2.81
24 2.06 2.80
25 2.06 2.79
26 2.06 2.78
27 2.05 2.77
28 2.05 2.76
29 2.05 2.76
30 2.04 2.75
35 2.03 2.72
40 2.02 2.71
45 2.01 2.70
50 2.01 2.68
55 2.00 2.67
60 2.00 2.66
65 2.00 2.66
70 2.00 2.65
75 1.99 2.64
80 1.99 2.64
85 1.99 2.64
90 1.99 2.63
95 1.99 2.63
100 1.98 2.63
Infinity 1.96 2.58
________________________________________________________________________
In order to use this table, we enter it with our t value (.222) and something called
"degrees of freedom." The degrees of freedom is simply n11+n21 or, in our case, 70.
Note that there are two columns of t values, one labeled .05, and the other labeled .01.
If we go down to the degrees of freedom nearest to ours, which would be 70, we find
that both the .05 and the .01 t values are substantially larger than our .222. So we
didnt achieve a large enough t value to reject the null hypothesis, i.e., to be able to
conclude that the difference wasnt due to chance.
Why do we have two columns, one labeled .05 and the other .01? Because those are
the two levels of significance commonly used in statistical analysis. The t values in
the .05 column are likely to occur by chance 5 percent of the time, whereas the t
values in the .01 column are likely to occur by chance only 1 percent of the time.
Type I and Type II Errors
The choice of what significance level to use (.05, .01, or lower or higher) is the
difficult choice that you as the researcher must make. If you decide to accept the .05
level of confidence, which requires a smaller t value, you can more easily reject the
null hypothesis and declare that there is a statistically significant difference between
the means than if you select the .01 level, but you will be wrong 5 percent of the time.
This is the Type I error.
On the other hand, if you select the .01 value, you will be wrong only 1 percent of the
time. But since the .01 value requires a larger t value, you will less often be able to
reject the null hypothesis and say that there is a statistically significant difference
between the means when in fact that is the case. This is the Type II error. It is
crucially important to an understanding of even basic statistics that we have a clear
understanding of these two errors. If you spend a little time with Table 10, it will
help you achieve this understanding.
Table 11.
Accepting and Rejecting Null Hypotheses and the Making of Type I and Type II
Errors*


Decision

Accept The Null
Hypothesis

Reject The Null
Hypothesis


A


The null hypothesis is
really true, i.e., there is
1
You accepted the null
hypothesis when it is
true, i.e., you
concluded that there
2
You rejected the null
hypothesis when it is true,
i.e., you concluded that
there is a real difference
not a real difference
between the means of
the two groups.
is not a real
difference between
the means of the two
groups which, in fact,
is the case. That was
a good decision.
between the means of the
two groups when, in fact,
there is not a difference.
That was a bad decision.
You made the Type I
error.



B


The null hypothesis is
really false, i.e., there is
a real difference
between the means of
the two groups.
3
You accepted the null
hypothesis when it is
false, i.e., you
concluded that there
is not a real
difference between
the means of the two
groups when in fact
there is a real
difference. That was
a bad decision. You
made the Type II
error.
4
You rejected the null
hypothesis when it is true,
i.e., you concluded that
there is a real difference
between the means of the
two groups which, in fact,
is the case . That was a
good decision.

*This table was adapted from a similar one found in Neil Salkinds Statistics for
People Who (Think They) Hate Statistics, Sage Publications, 2000, p. 176.
Table 11 and the work weve done in this lesson make the mysteries of statistical
significance and Type I and Type II errors transparently clear. When youre reading a
professional journal and you encounter a discussion of the difference between the
means of two groups, and the authors conclude by saying, t = 2.64 p < .01, df = 70, two
tailed test, you will immediately know that:
1. The t-test of the two means yielded a t value of 2.64.
2. A t value of 2.64 with df = 70 for independent samples (the two-tailed rather
than the one-tailed test) is statistically significant beyond the .01 level of
confidence, i.e., likely to occur by chance less than 1 in 100 times.
So, this knowledge is a major step forward in your journey to master basic statistics.
And weve got a few more neat things to cover.


Lesson 5
The Effect Test
Take another look at Table 10 in Lesson 4 which provides the significance levels for
the t-test. You probably noticed that the size of the t value needed to reject the null
hypothesis (and enable you to declare that there is a statistically significant
difference between two means) is dependent on the size of the samples on which the
means are based. With df = 20, you need a t value of 2.09 to reach the .05 level of
significance; but with df = 100, you need a t value of only 1.98.
In other words, if you have small Ns, you will need a large difference between the
means to achieve statistical significance; but if you have very large Ns, you will need
only a very small difference to be able to declare that the difference between the
means is statistically significant.
So why is that of more than technical interest? Because we dont want to mistake
statistical significance for educational significance. Suppose you are comparing the
mean reading scores of students in your traditional program with those in a new
reading program. There are 500 students in each program, and at the end of the year
there is a 3 point difference favoring the new program, and that 3 point difference is
statistically significant beyond the .001 level of confidence. The proponents of the
new program are likely to cite that finding as clear research evidence of the
superiority of the new program and call on you, as the superintendent, to junk the
traditional program, even though the new program is substantially more costly.
But you should be wary of that recommendation. Why? Because the mean difference
in reading scores, even though its statistically significant, is very small. Is a 3 point
difference likely to have any practical significance, or even be observable? Probably
not. Even if the difference were a few points greater, would such a difference justify
the expenditure of substantially more funds? Probably not.
It turns out that statisticians have developed a test that is intended to give some help
when confronting the question of whether a difference between two means is of
practical consequence. Its called the Effect Test.
The formula for the Effect Test is:

Where:
E is the effect size.
is the mean of Group 1.
is the mean of Group 2.
is the standard deviation of Group 1.
is the standard deviation of Group 2.
As you can see, the formula is simply dividing the difference between the two means
by the average of the score variability in the two groups.
There is a general consensus that an effect size of .33 or greater indicates that the
difference has practical meaning or significance.
Lets do an example.
We have two groups with mean reading test scores of 188 and 185 and standard
deviations of 30 and 32. N = 500 for both groups, and the difference between the
means is statistically significant. We plug the numbers into the Effect Test formula as
follows:
or
The effect size does not reach the .33 level, so the 3 point difference between the
means would not be regarded as practically consequential, even though its
statistically significant.
But suppose the two means are 193 and 182, and the Ns and standard deviations are
the same. Then we have:
or
The 11 point difference between the means (with the associated score variability as
reflected in the standard deviations) exceeds the .33 threshold for practical
significance. So in this case, we would be justified in saying that the difference
between the two groups is not only statistically significant, it can also be regarded as
having some practical educational meaning.
However, in the final analysis, you, as the experienced educator and administrator,
must make the judgment about practical meaning. Many times you will be presented
with mean differences that are large by any practical standard, but because of small
Ns or large variances, theyre not statistically significant. In those cases, the judgment
is fairly easy: You would be on very soft ground making policy and budgetary
decisions based on differences that are not statistically significant.
But the other case is more difficult. If you have a mean difference that is both
statistically significant and practically significant as indicated by the effect size, you
still have to be the judge of whether that difference justifies changing programs,
spending more money, hiring or firing staff, and so on.
The new knowledge you now have about how to determine statistical and practical
significance adds greatly to your ability to make decisions about the effectiveness of
educational programs and the formulation of educational policies, but there are no
automatic answers. You, as the responsible administrator, must bring your
experience to bear in making the final decision.














Lesson 5
The Effect Test
Take another look at Table 10 in Lesson 4 which provides the significance levels for
the t-test. You probably noticed that the size of the t value needed to reject the null
hypothesis (and enable you to declare that there is a statistically significant
difference between two means) is dependent on the size of the samples on which the
means are based. With df = 20, you need a t value of 2.09 to reach the .05 level of
significance; but with df = 100, you need a t value of only 1.98.
In other words, if you have small Ns, you will need a large difference between the
means to achieve statistical significance; but if you have very large Ns, you will need
only a very small difference to be able to declare that the difference between the
means is statistically significant.
So why is that of more than technical interest? Because we dont want to mistake
statistical significance for educational significance. Suppose you are comparing the
mean reading scores of students in your traditional program with those in a new
reading program. There are 500 students in each program, and at the end of the year
there is a 3 point difference favoring the new program, and that 3 point difference is
statistically significant beyond the .001 level of confidence. The proponents of the
new program are likely to cite that finding as clear research evidence of the
superiority of the new program and call on you, as the superintendent, to junk the
traditional program, even though the new program is substantially more costly.
But you should be wary of that recommendation. Why? Because the mean difference
in reading scores, even though its statistically significant, is very small. Is a 3 point
difference likely to have any practical significance, or even be observable? Probably
not. Even if the difference were a few points greater, would such a difference justify
the expenditure of substantially more funds? Probably not.
It turns out that statisticians have developed a test that is intended to give some help
when confronting the question of whether a difference between two means is of
practical consequence. Its called the Effect Test.
The formula for the Effect Test is:

Where:
E is the effect size.
is the mean of Group 1.
is the mean of Group 2.
is the standard deviation of Group 1.
is the standard deviation of Group 2.
As you can see, the formula is simply dividing the difference between the two means
by the average of the score variability in the two groups.
There is a general consensus that an effect size of .33 or greater indicates that the
difference has practical meaning or significance.
Lets do an example.
We have two groups with mean reading test scores of 188 and 185 and standard
deviations of 30 and 32. N = 500 for both groups, and the difference between the
means is statistically significant. We plug the numbers into the Effect Test formula as
follows:
or
The effect size does not reach the .33 level, so the 3 point difference between the
means would not be regarded as practically consequential, even though its
statistically significant.
But suppose the two means are 193 and 182, and the Ns and standard deviations are
the same. Then we have:
or
The 11 point difference between the means (with the associated score variability as
reflected in the standard deviations) exceeds the .33 threshold for practical
significance. So in this case, we would be justified in saying that the difference
between the two groups is not only statistically significant, it can also be regarded as
having some practical educational meaning.
However, in the final analysis, you, as the experienced educator and administrator,
must make the judgment about practical meaning. Many times you will be presented
with mean differences that are large by any practical standard, but because of small
Ns or large variances, theyre not statistically significant. In those cases, the judgment
is fairly easy: You would be on very soft ground making policy and budgetary
decisions based on differences that are not statistically significant.
But the other case is more difficult. If you have a mean difference that is both
statistically significant and practically significant as indicated by the effect size, you
still have to be the judge of whether that difference justifies changing programs,
spending more money, hiring or firing staff, and so on.
The new knowledge you now have about how to determine statistical and practical
significance adds greatly to your ability to make decisions about the effectiveness of
educational programs and the formulation of educational policies, but there are no
automatic answers. You, as the responsible administrator, must bring your
experience to bear in making the final decision.














Lesson 6
Correlation
What is a Correlation?
Thus far weve covered the key descriptive statisticsthe mean, median, mode, and
standard deviationand weve learned how to test the difference between means.
But often we want to know how two things (usually called "variables" because they
vary from high to low) are related to each other.
For example, we might want to know whether reading scores are related to math
scores, i.e., whether students who have high reading scores also have high math
scores, and vice versa. The statistical technique for determining the degree to which
two variables are related (i.e., the degree to which they co-vary) is, not surprisingly,
called correlation.
There are several different types of correlation, and well talk about them later, but in
this lesson were going to spend most of the time on the most commonly used type of
correlation: the Pearson Product Moment Correlation. This correlation, signified by
the symbol r, ranges from 1.00 to +1.00. A correlation of 1.00, whether its positive or
negative, is a perfect correlation. It means that as scores on one of the two variables
increase or decrease, the scores on the other variable increase or decrease by the same
magnitudesomething youll probably never see in the real world. A correlation of 0
means theres no relationship between the two variables, i.e., when scores on one of
the variables go up, scores on the other variable may go up, down, or whatever.
Youll see a lot of those.
Thus, a correlation of .8 or .9 is regarded as a high correlation, i.e., there is a very close
relationship between scores on one of the variables with the scores on the other. And
correlations of .2 or .3 are regarded as low correlations, i.e., there is some relationship
between the two variables, but its a weak one. Knowing peoples score on one
variable wouldnt allow you to predict their score on the other variable very well.
Computing the Pearson Product Moment Correlation
Lets do a correlation to see how the formula works and what it produces. The
formula for the Pearson product moment correlation is:

Where:
rxy is the correlation coefficient between X and Y.
n is the size of the sample.
X is the individuals score on the X variable.
Y is the individuals score on the Y variable.
XY is the product of each X score times its corresponding Y score.
X
2
is the individual X score squared.
Y
2
is the individual Y score squared.
Lets see what the correlation is between 30 students reading scores and their math
scores. The data we need to compute the formula are given in Table 12.
Table 12
Reading and Math Scores and the Associated Data for Computing the Pearson
Product Moment Correlation (N=30)











X
(Reading
Scores)
Y
(Math Scores)

X
2


Y
2


XY
191 180 36481 32400 34380
103 101 10609 10201 10403
187 173 34969 29929 32351
108 103 11664 10609 11124
180 170 32400 28900 30600
118 113 13924 12769 13334
178 171 31684 29241 30438
127 122 16129 14884 15494
176 168 30976 28224 29568























134 130 17956 16900 17420
165 150 27225 22500 24750
147 145 21609 21025 21315
160 150 25600 22500 24000
157 154 24649 23716 24178
155 145 24025 21025 22475
168 164 28224 26896 27552
150 145 22500 21025 21750
172 170 29584 28900 29240
145 130 21025 16900 18850
185 179 34225 32041 33115
140 141 19600 19881 19740
195 193 38025 37249 37635
135 136 18225 18496 18360
100 101 10000 10201 10100
130 128 16900 16384 16640
125 121 15625 14641 15125
105 106 11025 11236 11130
120 118 14400 13924 14160
115 112 13225 12544 12880
110 108 12100 11664 11880
4381 4227 664583 616805 639987

Total ( )
So, we plug the numbers from this table into the formula, and do the math:
or

or

or

In this case, the correlation between reading and math scores is remarkably high
(because I concocted the numbers so it would turn out that way). With real scores, it
would be high, but not that high. If you glance over the numbers in Table 12, even
before weve computed the correlation you can easily see (in this small sample of 30)
that high scores in reading tend to go with high scores in math, low reading scores
tend to go with low math scores, and so on. But, of course, you wouldnt be able to
see that pattern if you had a sample of 500.
Positive and Negative Correlations
I pointed out above that a correlation can vary from +1.00 to 1.00. The correlation we
just computed is a positive correlation. That is, high reading scores go with high
math scores, low with low, and so on. However, we could have a negative correlation.
This is not something bad; it simply denotes an association in which high scores on
one variable go with low scores on the other. For example, if we were computing a
correlation between, say, amount of time students watch television and their
achievement score, we would find a negative correlation: high TV watching is
associated with lower achievement scores, and vice versa. Such a correlation might be
something like .71.
Determining Statistical Significance
OK, so we have a correlation coefficient. What precisely does it mean, and how do we
interpret it? Its not a percent, as many people mistakenly think.
First, we can determine its statistical significance in the same way we did with the t
test. We can look it up in a table in the appendices of any statistical text. In the case
of our .98 correlation between reading and math scores, if we look that up in the table
for correlations, we find that the value needed to reject the null hypothesis at the .01
level of confidence (and declare that the correlation is statistically significant, or
unlikely due to chance) for our sample of 30 is .45 (in this case using the one-tailed
test because the samples are dependent).
So if we were stating this finding in a research report, we could say that the
correlation of reading scores with math scores = .98 p <.01 with df = 28. (Now see how
smart you are because you know what all that means.)
Practical vs. Statistical Significance
But we have the same issue we had with the t-test: determining its practical vs. its
statistical significance. We dont have an effect test, as we did with the t-test, but we
have something similar. It has an imposing namethe coefficient of determination
but youll be ecstatically happy to learn that its very simple.
The coefficient of determination is nothing more than r
2
. You simply multiply r by
itself, and youve got it. OK, youve got it, what does it mean? The coefficient of
determination, r
2
, tells us how much of the variance in one of the variables is
accounted for by the variance in the other variable. Thus, if we have a correlation of
.60 between, say, students achievement scores and a measure of their socioeconomic
status, r
2
= 36. That means that 36% of the variance in the students achievement
scores (not 60, which is the correlation) can be accounted for by variance in their
socioeconomic status. But that also means that the remaining variance (64%) in
achievement scores cannot be accounted for by socioeconomic status, but is
attributable to many other factors, such as study time, intelligence, motivation,
quality of instruction, and so on.
Other Correlations
All the correlations weve talked about so far have been based on what we call
interval data, i.e., data where the distance between scores or values is the same. The
distance between a 65 and 66 is assumed to be the same as the distance between a 14
and a 15. But many times we want to determine the relationship between two
variables when that is not the case. Suppose, for example, we want to compute the
correlation between students class rank in their junior year with their class rank in
their senior year. Ranks are not the same as scores; there may be a much smaller (or
bigger) difference between ranks 1 and 2 than between ranks 8 and 10 (like the
difference between the first two teams and the last two teams in football or baseball).
If the data we have are ranks rather than scores, we cant use the product moment
formula. But there is another correlation formula for use with ranks (its called rho).
And suppose we want to determine the relationship between two variables when one
is based on what is called nominal or categorical data, and the other is interval data.
An example would be correlating gender with achievement scores. Again, the
product moment correlation cant be used, but there is also a special formula for
doing a correlation with these disparate types of data. In this case, its called the
point biserial correlation.
Table 13 displays the several different types of correlation for use with variables
based on different levels of measurement. In this course, were not going to compute
them. But with the knowledge and skills youve developed thus far, when you
encounter situations where the variables you want to correlate are based on different
levels of measurement (interval, ordinal, or nominal), youll be able to select the type
you need.
Table 13
Alternative Types of Correlation for Different levels of Measurement*
Type of Measurement and Examples

Variable X

Variable Y
Correlation Being
Computed
Type of
Correlation
Interval
(reading scores)
Interval (math
scores)
Correlation
between reading
and math
achievement
Pearson product
moment (r)
Ordinal (class
rank in the
junior year)
Ordinal (class rank
in the senior year)
Correlation
between class rank
in the last two years
of high school
Spearman rank
coefficient (rho
or p)
Nominal (social
class, high,
middle, or low
Ordinal (rank in
high school
graduating class)
Correlation
between social class
and rank in high
school
Rank biserial
coefficient (rbs)
Nominal (family Interval (grade Correlation Point biserial
configuration,
e.g., intact or
single parent)
point average) between family
configuration and
grade point average
(rpb)
Nominal (voting
preference
Republican or
Democrat)
Nominal (gender,
i.e., male or
female)
Correlation
between voting
preference and
gender
Phi coefficient
( )
*This table was adapted from a similar one found in Neil Salkinds Statistics for
People Who (Think They) Hate Statistics, Sage Publications, 2000, p. 101.
Correlation and Cause
Before we conclude this lesson, we need to understand one of the most important
facts about correlation, namely, that it does not necessarily indicate cause. It may be
that one of the variables does in fact cause the other, but we dont know that just
from the fact that the two are correlated.
Smoking and Lung Cancer
It is now an established fact that smoking causes lung cancer, but that conclusion
could not be reached simply because there is a correlation between the two. When
the association between smoking and lung cancer first appeared, and many argued
that indicated that smoking caused lung cancer, the tobacco companies argued that
there were other factors that could explain the relationship, e.g., smoking is higher
among blue collar workers who also have greater exposure to other toxic elements,
smokers drink more and lead more stressful lives, and so on. And logically they were
right. It took other kinds of direct physiological evidence and animal experiments to
prove that the association was indeed causal.
We often find strong correlations where clearly a causal relationship makes no sense.
For example, we may find a strong correlation between car sales and college
attendance. Neither one of these is causing the other; both increase during financially
prosperous times.
Wine Consumption and Heart Disease
But it is when two correlated variables seem likely to be causally related to one
another that we tend to jump to the unsupportable conclusion that one causes the
other. For example, when we hear about a correlation between an increase in stork
nests and the birth rate in Germany, we laugh it off as clearly due to some unknown
third factor. But when we hear that moderate wine consumption is associated with
lower rates of heart disease, were ready to immediately conclude (especially if were
wine lovers) that there is obviously some medically beneficial element in wine. But
when these reports first came out, skeptics (they were probably statisticians) pointed
out that other things could account for the association between moderate wine
consumption and lower rates of heart disease. Moderate wine drinkers are likely to
be more educated, non-smokers, get more exercise, and have lower rates of obesity.
Again, as it has turned out, other kinds of physiological evidence do support the
conclusion that moderate wine consumption is medically beneficial, but we cant
conclude that just on the basis of the correlation.
The Important Lesson About Correlation and Cause
The important lesson here is that the correlation coefficient is a highly useful statistic
for determining the relationship between variables, but a correlation does not
demonstrate a causal relationship between the variables.
The same holds for differences between means. If, for example, we give a pre-test
and a post-test to students who have participated in a new reading program, and we
find that the increase in the mean reading score is both statistically and practically
significant, that does not entitle us to conclude that the new program caused the
increase. Any number of other factors could account for the increase: the students
were older, and they had been exposed to many other influences and experiences that
could haveand probably didimprove their reading. To determine how much, if
any, of the improvement was caused by the new program, we would have to employ
a control group (or some other method for determining "the expectation of non-
treatment"). This would tell us how much improvement occurred in comparable
students who had the same experiences except for the new reading program.









Lesson 7
Chi Square
Parametric and Non-Parametric Statistics
Most of the statistics weve learned so farthe mean, the standard deviation, the t-
test, and the product moment correlationbelong to a category called parametric
statistics. Thats because it is assumed the data used to compute them have certain
parameters or meet certain conditions. One of these is that the variances are similar;
another is that the sample is large enough to be representative of the universe from
which it is drawn. We used examples of 30 or more cases when we worked on the
mean, the t-test, and the product moment correlation because there is a general
consensus among statisticians that this is the minimum-size sample to use with
parametric tests. You should keep this in mind when using these tests in your
practicum and in your own research.
But what do we do when we cant meet these conditions? Happily, theres another
category of statistics, and you shouldnt be surprised to learn that its called non-
parametric statistics. We can do many of the same things with non-parametric
statistics. Theyre regarded as somewhat less powerful than parametric statistics, but
theyre not to be looked down on. When conditions call for them, they are the things
to use.
Chi Square
One of the most useful of the non-parametric statistics is chi square. We use it when
our data consist of people distributed across categories, and we want to know
whether that distribution is different from what we would expect by chance (or
another set of expectations). We dont have scores, we dont have means. We just
have numbers, or frequencies. In other words, we have nominal data.
For example, suppose we have the data in Table 14 that display the number of
students who elect different majors, and we want to know whether those numbers
differ from chance. In other words, are some majors selected more often than others,
or is the selection pattern essentially random?
Table 14
Number of Students Selecting Different Majors

Pre-Med
Computer
Sciences
English
Literature

Education

Engineering

Total
50 85 25 60 80 300
The null hypothesis here, of course, is that there is no difference between this
distribution of major selections from what would be expected by chance. So what chi
square does is compare these numbers (the observed frequencies) with those that
would be expected by chance (the expected frequencies).
The formula for chi square is:

Where:
is the value for chi square.
is the sum.
O is the observed frequency
E is the expected frequency.
The first question in doing the calculation is, how do we get the expected
frequencies? Thats easy. If we are testing the observed frequencies (those in Table
14) against what we would expect by chance, since we have five categories of majors,
we would expect one-fifth of the individuals to fall in each of the categories. One-
fifth (20%) of 300 is 60. So if the selection of majors is largely a chance pattern, we
would expect to find 60 people in each category.
Table 15 displays the observed and expected frequencies for each major, computes
the difference between them (OE), squares OE ((OE)
2
), divides the squares by the
expected frequencies ((OE)
2
/E), and sums those quantities to give us our , which
is 39.17.
Table 15
Observed and Expected Frequencies for the Selection of Majors

Major
O (observed
frequency)
E (expected
frequency)

OE

(OE)
2


(OE)
2
/E
Pre-Med 50 60 -10 100 1.67
Computer
Sciences
85 60 25 625 10.42
English 25 60 -35 1225 20.42
Literature
Education 60 60 0 0 0.00
Engineering 80 60 20 400 6.67
Total 300 300 39.17
By now, you know the next step: determining if we can reject the null hypothesis. We
do it the same way we did for the t-test and the correlation. We enter the chi square
significance table (which I have handy, but you dont) with our chi square value
(39.17) and the appropriate degrees of freedom. For chi square, the degrees of
freedom are equal to the number of rows minus one (R1). In our case we have five
rows, so df = 4.
Entering the chi square table with our result of 39.17 and df = 4, we find that we need
a chi square value of 13.28 to reject the null hypothesis at the .01 level of confidence.
We clearly have that, so we can say that the distribution of major selections is a not
simply a chance pattern; or = 39.7 p <.01, df = 4.












Lesson 8
Summarizing the Steps and Moving On
In the statistical tests weve calculated (the t-test, correlation, and X
2
), weve gone
through a series of steps that youll go through when you compute any statistical test.
Recapping, here they are:
1. First, determine the level of measurement you have. Are the data you have
interval, ordinal, or nominal?
2. If you have interval data, determine whether they meet the requirements of a
parametric test (adequate sample size and variance similarity).
3. Based on the determinations you made from (1) and (2), select the statistical
test (t, r, X
2
, or whatever).
4. Calculate the values required, plug them into the formula, and compute the
test. (Now that you have gone through these calculations and understand
them, the labor can be done for you by any one of the available statistical
software packages.)
5. Select the level of risk you want to take in rejecting the null hypothesis and
making (or avoiding) the Type I and Type II errors. Usually that will be .05 or
.01.
6. Enter the appropriate significance table (e.g., for t, r, or X
2
) with the test result
and the proper degrees of freedom.
7. Determine whether your test result is large enough to reject the null
hypothesis and enable you to conclude that it is statistically significant.
8. If it is statistically significant, use whatever additional tests may be available
(e.g., the effect test, the coefficient of determination, etc.) and your own
reasoned judgment to determine if the result is also practically significant.
* * * *
Congratulate yourself. The fact that you understand these steps and can execute them
shows how far youve come. You now have a good grip on basic statistics. You can
understand them in research journals, and you can use them in your practicum and in
your own research. And you are now in a position to go on to more advanced
statistics (I know you cant wait).
References
I have not provided a set of references because there are literally dozens of
introductory statistics texts, and just about any of them will do. You definitely
should have one of these texts for reference purposes, especially for the significance
tables they all provide. My favorite, and the one I highly recommend, is Neil
Salkind's Statistics for People Who (Think They) Hate Statistics. Sage Publications,
2000.
Statistical Software
This short course has taken you through both the explanation of the major statistical
concepts and the actual computation of the most common statistical tests you will be
encountering in the research literature and using in your own research.
Now that you have this essential, basic understanding, you wont need to do any
computations by hand. There are software applications that will do that for you.
Once you enter the data, they will compute a correlation in less than a second, and
provide you with the significance levels.

Potrebbero piacerti anche