Sei sulla pagina 1di 73

Measures of Central

Tendency and Variability


Measures of Central
Tendency
• A measure of central tendency is a descriptive statistic
that describes the average, or typical value of a set of
scores

• There are three common measures of central


tendency:

• Mean

• Median

• Mode
The Mode
5

• The mode is the score

Frequency
that occurs most 3

frequently in a set of
data 1

0
75 80 85 90 95
Score on Exam 1
Bimodal Distribution
5

Frequency
• When a distribution has 3

two “modes,” it is called


bimodal 1

0
75 80 85 90 95
Score on Exam 1
Multimodal Distribution
5

Frequency
• If a distribution has more 3

than 2 “modes,” it is
called multimodal 1

0
75 80 85 90 95
Score on Exam 1
When to Use the Mode
6

• The mode is not a very


useful measure of central 5

tendency

• It is insensitive to large 2

changes in the data set

0
1 2 3 4 5 6 7 8 9 10

• That is, two data sets that 100

are very different from each


other can have the same 75

mode

50

• The mode is primarily used


with nominally scaled data
25

0
10 20 30 40 50 60 70 80 90 100
The Median

• The median is simply another name for the 50th


percentile;

• It is the score in the middle;

• Half of the scores are larger than the median


and half of the scores are smaller than the
median
How to Calculate the
Median

• Sort the data from highest to lowest

N+1
Find the score in the middle,

• 2
• If N (the number of scores) is even the median
is the average of the middle two scores
Median Example
●What is the median of the following scores:

10 8 14 15 7 3 3 8 12 10 9

●Sort the scores:



15 14 12 10 10 9 8 8 7 3 3

●Determine the middle score:



N + 1 11 + 1
middle = = = 6th

2 2
●Middle score = median = 9
!9
Median Example
●What is the median of the following scores: 24 18
19 42 16 12

●Sort the scores: 42 24 19 18 16 12

N+1 6+1
Determine the middle score: middle = =
● 2 2
= 3.5

19 + 18
Median = average of the 3rd and 4th scores:
● 2
= 18.5

!10
When to use the Median
●The median is often used when the distribution
of scores is either positively or negatively
skewed and it will not overly influence the
median.

!11
The Mean
●The mean is the arithmetic average of all the
ΣX
scores ( )

● The mean of a population is represented by the


Greek letter µ; the mean of a sample is
represented by X

!12
Calculating the Mean
●Calculate the mean of the following data:

1 5 4 3 2

●Sum the scores (ΣX): 1 + 5 + 4 + 3 + 2 = 15

●Divide the sum (ΣX = 15) by the number of


scores (N = 5): 15 / 5 = 3

● Mean: X = 3
!13
Rounding Off the Mean
When to use the Mean
●You should use the mean when

●The data are interval or ratio scaled (ordinal data


too)

●Data are not skewed

●The mean is preferred because it is sensitive to every


score

●If you change one score in the data set, the mean will
change
!15
Comparison of Measures of
Central Tendency
Measure Strengths Weaknesses
Mean •Unique – there’s exactly •Can be adversely affected
one mean for any data by one or two unusually
set high or low values
•Factors in all values in •Can be time-consuming to
the set calculate for large data sets
•Easy to understand
Median •Divides a data set neatly •Can ignore the effects of
into two groups large or small values even
•Not affected by one or if they are important to
two extreme values consider
Comparison of Measures of
Central Tendency
Measure Strengths Weaknesses

Mode •Very easy to find •May not exist for a data


•Describes the most set
typical case •May not be unique
•Can be used with •Can be very different
categorical data like from mean and median if
candidate preference, the most typical case
choice of major, etc. happens to be near the
low or high end of the
range
Distributions
Measures of
Variability
Variability

• The goal for variability is to obtain a measure


of how spread out the scores are in a
distribution.

• A measure of variability usually accompanies


a measure of central tendency as basic
descriptive statistics for a set of scores.
Variability
• When the population variability is small, all of the
scores are clustered close together and any
individual score or sample will necessarily
provide a good representation of the entire set.

• On the other hand, when variability is large and


scores are widely spread, it is easy for one or
two extreme scores to give a distorted picture of
the general population.
Variability
Central Tendency and Variability

• Central tendency describes the central point of


the distribution, and variability describes how
the scores are scattered around that central
point.

• Together, central tendency and variability are


the two primary values that are used to
describe a distribution of scores.
Measuring Variability
• Variability can be measured with

– the range

– the standard deviation/variance

– The coefficient of variance

• In each case, variability is determined


by measuring distance.
The Range
• The range is the total distance covered by the
distribution, from the highest score to the
lowest score (using the upper and lower real
limits of the range).

Range = Highest value – lowest value



Finding the Range of a Data
Set
The first list below is the weights
of the dogs in the first picture, and
the second is the weights of the
dogs in the second picture. Find
the mean, median, and range for
each list, then describe any
observations you can make based
on the results.
1st: 70, 73, 58, 60
2nd: 30, 85, 40, 125, 42, 75, 60, 55

Variance and Standard
Deviation
If most of the values are similar, but there’s just
one unusually high value, the range will make it
look like there’s a lot more variation than there
actually is.
For this reason, we will next define variance and
standard deviation, which are much more reliable
measures of variation.
The Standard Deviation
• Standard deviation measures the standard
distance between a score and the mean.
Procedure for finding the
Variance and Standard Deviation
Step 1 Find the mean.
Step 2 Subtract the mean from each data value in
the data set.
Step 3 Square the differences.
Step 4 Find the sum of the squares.
Step 5 Divide the sum by n – 1 to get the
variance, where n is the number of data
values.
Step 6 Take the square root of the variance to get
the standard deviation.
Finding the Variance and
Standard Deviation
Find the variance and standard deviation for the
weights of the eight dogs in the second picture at
the beginning of this section. The weights are
listed again for reference.
30, 85, 40, 125, 42, 75, 60, 55
Finding the Variance and
Standard Deviation
Solution:
Step 1 Find the mean weight.
We found the mean of 64 lb
Step 2 Subtract the mean from each data value.
30 - 64 = -34, 85 - 64 = 21, 40 - 64 = -24, 125 - 64 =
61,
42 - 64 = -22, 75 - 64 = 11, 60 - 64 = -4, 55 - 64 = -9
Step 3 Square each result.
(-34)2 = 1,156, (21)2 = 441, (-24)2 = 576, (61)2 = 3,721,
(-22)2 = 484, 112 = 121, (-4)2 = 16, (-9)2 = 81
Finding the Variance and
Standard Deviation
Solution:
Step 4 Find the sum of the squares.
1,156 + 441 + 576 + 3,721 + 484 + 121 + 16 + 81 = 
6,596
Step 5 Divide the sum by n - 1 to get the variance,
where n is the sample size. In this case, n is 8, so
n - 1 = 7.
Variance = 6,596/7 ≈ 942.3
Step 6 Take the square root of the variance to get
standard deviation.
Standard Deviation = sqrt(942.3) ≈ 30.7 lb
Finding the Variance and
Standard Deviation
2
Data (X) X−X (X − X)
30 -34 1,156
85 21 441
40 -24 576
125 61 3,721
42 -22 484
75 11 121
60 -4 16
55 -9 81
Standard Deviation

To understand the significance of standard


deviation, we’ll look at the process one step at a
time.

Step 1 Compute the mean. Variation is a measure


of how far the data vary from the mean, so it makes
sense to begin there.

Step 2 Subtract the mean from each data value. In


this step, we are literally calculating how far away
from the mean each data value is.
Standard Deviation

Step 3 Square the differences. This solves the


problem of those differences adding to zero—when
we square them, they’re all positive.

Step 4 Add the squares. In the next two steps,


we’re getting an approximate average of the
squares of the individual variations from the mean.
First we add them, then…
Standard Deviation

Step 5 Divide the sum by n − 1. It seems like


dividing by the number of values (n) here is a good
idea, but it turns out that when we’re using a
sample from a larger population to compute mean
and variance, dividing by n − 1 makes the sample
variance more likely to be a true reflection of the
population variance. In any case, at this point we
have an approximate average of the squares of the
individual variations from the mean.
Standard Deviation

Step 6 Take the square root of the sum. This


“undoes” the square we did in Step 3. It will return
the units of our answer to the units of the original
data, giving us a good measure of how far the
typical data value varies from the mean.
Properties of the 

Standard Deviation
• If a constant is added to every score in a
distribution, the standard deviation will not be
changed.

• If you visualize the scores in a frequency


distribution histogram, then adding a constant
will move each score so that the entire
distribution is shifted to a new location.

• The center of the distribution (the mean)


changes, but the standard deviation remains the
same.
Properties of the 

Standard Deviation
• If each score is multiplied by a constant, the
standard deviation will be multiplied by the same
constant.

• Multiplying by a constant will multiply the


distance between scores, and because the
standard deviation is a measure of distance, it
will also be multiplied.
Interpreting Standard deviation

A professor has two sections of Math 115 this


semester. The 8:30 A.M. class has a mean score
of 74% with a standard deviation of 3.6%. The 2
P.M. class also has a mean score of 74%, but a
standard deviation of 9.2%. What can we
conclude about the students’ averages in these
two sections?
Interpreting Standard deviation

• Solution
In relative terms, the morning class has a small
standard deviation and the afternoon class has a
large one. So even though they have the same
mean, the classes are quite different. In the
morning class, most of the students probably
have scores relatively close to the mean, with few
very high or very low scores. In the afternoon
class, the scores vary more widely, with a lot of
high scores and a lot of low scores that average
out to a mean of 74%.
Coefficient of Variance
• Use to compare standard deviations when the
units are different

• The result is expressed as percentage

• Denoted by CVar

SD
• CVar= x100 %
M
Coefficient of Variance
• Example:

• The mean for the number of pages of a sample


of women’s fitness magazines is 132, with a
variance of 23; the mean for the number of
advertisements of a sample of women’s fitness
magazines is 182, with a variance of 62.
Compare the variations.
The Mean and Standard Deviation
as Descriptive Statistics
• If you are given numerical values for the mean
and the standard deviation, you should be able
to construct a visual image (or a sketch) of the
distribution of scores.

• As a general rule, about 70% of the scores will


be within one standard deviation of the mean,
and about 95% of the scores will be within a
distance of two standard deviations of the
mean.
Chebyshev’s Theorem

• The theorem states that at least 75% of the


data values will fall within 2 standard deviations
of the mean of the data set.

• At least 88.89% of the data values will fall


within 3 standard deviations of the mean.

• Applies to any distribution regardless of its


shape.
Chebyshev’s Theorem

• For example, in variable 1 it has a mean of 70


and a standard deviation of 1.5, at least 75% of
the data values fall between 67 and 73.

• 70 + 2(1.5) = 70+3 = 73; and


• 70-2(1.5) = 70-3 = 67.
Chebyshev’s Theorem

• In what values do 75% and 88.89% of the data


fall if it has a mean of 70 and standard
deviation of 10.
Chebyshev’s Theorem
Chebyshev’s Theorem

• PROBLEM:

• The mean price of houses in a certain


neighbourhood is Php1,000,000, and the
standard deviation is Php200,000. Find the
price range for which at least 75% of the
houses will sell.
The Empirical(Normal) Rule

• When is distribution is bell-shaped (or what is


called normal), the following are true.

• Approximately 68% of the data values will fall


within 1 standard deviation of the mean.

• Approximately 95% of the data values will fall


within 2 standard deviations of the mean.

• Approximately 99.7% of the data values will fall


within 3 standard deviations of the mean.
The Empirical Rule

The figure below illustrates the Empirical Rule with


X = mean and s = standard deviation.
The Empirical(Normal) Rule

• Suppose the scores on a national achievement


exam have a mean of 480 and a standard
deviation of 90. If scores are normally
distributed, then approximately 68% will fall
between 390 and 570. Approximately 95% of
the scores will fall between 300 and 660. And
approximately 99.7% will fall between 210
and 750.
The Empirical(Normal) Rule
The Empirical(Normal) Rule

• Solution
Normal Distributions

A wide variety of quantities in the real world, like


sizes of individuals in a population, IQ scores,
and many others, tend to exhibit the same
phenomenon, in which we see that the largest
number have values somewhere in the middle of
the range, and the classes further away from the
center have smaller values. In fact, it’s so
common that frequency distributions of this type
came to be known as normal distributions.
Normal Distributions

A normal distribution is a continuous,


symmetric, bell-shaped distribution
Normal Distributions

A probability distribution that plots all of its


values in a symmetrical fashion and most of the
results are situated around the probability’s
mean is called a normal distribution. Values
are equally likely to plot either above or below
the mean. Grouping takes place at values that
are close to the mean and then tails off
symmetrically away from the mean.
Normal Distributions

Some Properties of a Normal Distribution


1. The value in the middle of the distribution, which
appears most often in the sample, is the mean.
2. The distribution is symmetric about the mean. This
means that the graph has two halves that are mirror
images on either side of the mean value.
3. This is the key fact: the area under any portion of the
curve is the percentage (in decimal form) of data
values that fall between the values that begin and
end that region.
4. The total area under the entire curve is 1.
Normal Distributions
The graph below shows a (a) What is the mean
normal distribution for height?
heights of women in the (b) What percentage
United States. The of women are
numbers on the horizontal between 57.4 and
axis are heights in inches, 59.1 inches tall?
and some areas are (c) If there are
labeled for reference 31,806 women at
a stadium concert,
how many of them
would you expect
to be between
63.7 and 66.0
inches tall?
Normal Distributions

SOLUTION
(a) The mean is the value in the very center of a normal
distribution. This would be the highest point on the graph,
which is labeled 63.7. So the mean height for American
women is 63.7 inches.
(b) The diagram indicates that the area under the normal
graph between 57.4 and 59.1 is 0.034. This is the decimal
form of the percentage of data values that fall in that range.
Converting 0.034 to percent form by moving the decimal
point two places right, we get 3.4%. So we’d expect that
about 3.4% of women would have heights in that range.
Normal Distributions

SOLUTION
(c) In this case, the area under that portion of the
graph is 0.303, so we’d expect 30.3% of women to
have heights between 63.7 and 66.0 inches. In
particular, we’d expect 30.3% of the women at the
concert to have a height in that range 30.3 % of
31,806 = 0.303 × 31,806 = 9,637.218

We’d expect about 9,637 women to be between 63.7


and 66.0 inches tall.
Normal Distributions

EXAMPLE
According to the website answerbag.com, the
mean height for male humans is 5 feet 9.3 inches,
with a standard deviation of 2.8 inches. If this is
accurate, out of 1,000 randomly selected men,
how many would you expect to be between 5 feet
6.5 inches and 6 feet 0.1 inch?
Normal Distributions

SOLUTION
The given range of heights corresponds to those
within 1 standard deviation of the mean, so we
would expect about 68% of men to fall in that
range. In this case, we expect about 680 men to
be between 5 feet 6.5 inches and 6 feet 0.1 inch.
The Standard Normal Distribution

The standard normal distribution is a normal


distribution with mean 0 and standard deviation 1.
The values under the curve shown indicate the
proportion of area in each section.
Z Score

For a data value from a sample with mean X


# and
X−X
standard deviation s, the z score is z# =
s
A data point is greater than the mean if z > 0 and
less than the mean if z < 0.
z scores are typically rounded to two decimal
places.
Z Score

Example: According to the website


answerbag.com, the mean height for male
humans is 5 feet 9.3 inches, with a standard
deviation of 2.8 inches. Find the z score for a man
who is 6 feet 4 inches tall and describe what it
tells us.
Z Score

SOLUTION
Use the formula for z scores with mean 5 feet 9.3
inches and standard deviation 2.8 inches. Note that
we converted the heights to inches to make it easier
to subtract.
76in − 69.3in
z# = approx.2.39
2.8in
This means that 6′4″ is 2.39 standard deviations
above the mean.
Z Score

There are two main companies that offer standardized


college entrance exams, ACT and SAT. Since each has a
completely different scoring scale, it’s really difficult to
compare the scores of students that took different exams.

One year the ACT had a mean score of 21.2 and a standard
deviation of 5.1. That same year, the SAT had a mean score
of 1498 and a standard deviation of 347. Suppose that a
scholarship committee is considering two students, one who
scored 26 on the ACT and another who scored 1800 on the
SAT. Both are pretty good scores, but which one is better?
Z Score
Solution

26 − 21.2
26ACT
# :z= = 0.94
5.1
1800 − 1498
# 1800SAT :z= = 0.87
347
The student with 26 on the ACT did better. He/She is
0.94 standard deviations above the mean, while the
student who scored 1800 on the SAT is 0.87 standard
deviations above the mean.
Z Score and Area

The value of z scores is that they will allow us to


find areas under a normal curve using only areas
under a standard normal curve, which can be read
from a table
Z Score and Area
Z Score and Area

Two important facts about the Standard


Normal Curve
1. The area under any normal curve us divided
into two equal halves at the mean. Each of the
halves has area 0.500.
2. The area between z# = 0 and a positive z score
is the same as the area between z# = 0 and the
negative of that z score.
Z Score and Area

Find the area under the standard normal


distribution
1. Between z# = 1.55 and #z = 2.25
2. Between z# = − 0.60 and z# = − 1.35
3. Between z# = 1.50 and #z = − 1.75
4. To the right of #z = 1.70
5. To the right of #z = − 0.95
6. To the left of #z = − 2.20
7. To the left of #z = 1.95

Potrebbero piacerti anche