Sei sulla pagina 1di 27

Chapter 5

Understanding and Comparing Distributions


Objectives:
Learn how to make side-byside graphs to compare two
or more groups
Homework:
Reviewreadingpg.80-94
pg.95#6,8(7asanexample)

Copyright 2010 Pearson Education, Inc.

deals with the real world, not MathWorld. The real world is more complicated and
more interesting than MathWorld.
Dont round or truncate intermediate results. Keep the full precision that your
technology can carry.
Report statistics to one decimal place more than the precision of the data.
Focus on the meaning in the Tell section and not on the minor differences in
numeric results. Dont sweat the small differences.
Your answers need not match those in the back of the book to the last digit and
your interpretation is more important than your calculation.

Copyright 2010 Pearson Education, Inc.

Slide 5 - 3

The Big Picture

We can answer much more interesting questions about


variables when we compare distributions for different
groups.
Below is a histogram of the Average Wind Speed for every
day in 1989.

Copyright 2010 Pearson Education, Inc.

Slide 5 - 4

The Big Picture (cont.)

The distribution is unimodal and skewed to the


right.
The high value may be an outlier
Median daily wind
speed is about
1.90 mph and the
IQR is reported to
be 1.78 mph.
Can we say more?

Copyright 2010 Pearson Education, Inc.

Slide 5 - 5

The Five-Number Summary

The five-number
summary of a
distribution reports its
median, quartiles, and
extremes (maximum
and minimum).
Example: The fivenumber summary
for for the daily wind
speed is:
Copyright 2010 Pearson Education, Inc.

Max

8.67

Q3

2.93

Median

1.90

Q1

1.15

Min

0.20

Slide 5 - 6

Daily Wind Speed: Making Boxplots

A boxplot is a graphical display of the five-number


summary.
Boxplots are useful when comparing groups.
Boxplots are particularly good at pointing out
outliers.

Copyright 2010 Pearson Education, Inc.

Slide 5 - 7

Constructing Boxplots
1.

Draw a single
vertical axis
spanning the range
of the data. Draw
short horizontal lines
at the lower and
upper quartiles and
at the median. Then
connect them with
vertical lines to form
a box.
Copyright 2010 Pearson Education, Inc.

Slide 5 - 8

Constructing Boxplots (cont.)


2.

Erect fences around


the main part of the data.

The upper fence is


1.5 IQRs above the
upper quartile.

The lower fence is 1.5


IQRs below the lower
quartile.

Note: the fences only


help with constructing
the boxplot and
should not appear in
the final display.
Copyright 2010 Pearson Education, Inc.

Slide 5 - 9

Constructing Boxplots (cont.)


3.

Use the fences to grow


whiskers.

Draw lines from the


ends of the box up
and down to the most
extreme data values
found within the
fences.

If a data value falls


outside one of the
fences, we do not
connect it with a
whisker.
Copyright 2010 Pearson Education, Inc.

Slide 5 - 10

Constructing Boxplots (cont.)


4.

Add the outliers by


displaying any data
values beyond the
fences with special
symbols.

We often use a
different symbol for
far outliers that are
farther than 3 IQRs
from the quartiles.

Copyright 2010 Pearson Education, Inc.

Slide 5 - 11

Wind Speed: Making Boxplots (cont.)

Compare the histogram and boxplot for daily wind speeds:

How does each display represent the distribution?


Copyright 2010 Pearson Education, Inc.

Slide 5 - 12

Comparing Groups

It is almost always more interesting to compare groups.


With histograms, note the shapes, centers, and spreads of
the two distributions.

What does this graphical display tell you?

Copyright 2010 Pearson Education, Inc.

Slide 5 - 13

Comparing Groups (cont.)

Boxplots offer an ideal balance of information and simplicity, hiding


the details while displaying the overall summary information.
We often plot them side by side for groups or categories we wish
to compare.

What do these boxplots tell you?

Copyright 2010 Pearson Education, Inc.

Slide 5 - 14

What About Outliers?

If there are any clear outliers and you are


reporting the mean and standard deviation, report
them with the outliers present and with the
outliers removed. The differences may be quite
revealing.
Note: The median and IQR are not likely to be
affected by the outliers.

Copyright 2010 Pearson Education, Inc.

Slide 5 - 15

States visited dotplot ->boxplot

Copyright 2010 Pearson Education, Inc.

Slide 5 - 16

Chapter 5
Understanding and Comparing Distributions
Objectives:
Learn how to make side-byside graphs to compare two
or more groups
Homework:
Review reading pg. 80-94
pg. 96-102
# 10, 11, 13, 14,
# 38 (# 37 as an example)
Copyright 2010 Pearson Education, Inc.

2. Create parallel boxplots. Label your graph clearly.


3. Write a few sentences comparing the distributions.
4. Which restaurant pays the higher average salary?
5. Why is the mean salary misleading?
6. At which restaurant would you rather work? Give a sound
statistical justification for your decision.
Copyright 2010 Pearson Education, Inc.

Slide 5 - 18

The distribution of salaries at Mooseburgers is symmetric, with a


typical salary of about $134.50. The distribution of salaries at
McTofu is also symmetric, with the exception of one very high
salary, Sallys $360.
A typical salary at McTofu is lower, around $120. The distributions
of salaries are fairly compact at both restaurants, with
interquartile ranges of $21 at Mooseburgers and $20.50 at
McTofu. In fact, with the exception of the outlier, the distribution
of salaries at McTofu is similar to the distribution of salaries at
Mooseburgers, but generally about $15 lower.

Copyright 2010 Pearson Education, Inc.

Slide 5 - 19

Timeplots: Order, Please!

For some data sets, we are interested in how the data


behave over time. In these cases, we construct timeplots
of the data.

Copyright 2010 Pearson Education, Inc.

Slide 5 - 20

*Re-expressing Skewed Data to


Improve Symmetry

When the data are skewed it can be hard to summarize


them simply with a center and spread, and hard to decide
whether the most extreme values are outliers or just part
of a stretched out tail.
How can we say anything useful about such data?

Copyright 2010 Pearson Education, Inc.

Slide 5 - 21

*Re-expressing Skewed Data to


Improve Symmetry (cont.)

One way to make a


skewed distribution more
symmetric is to re-express
or transform the data by
applying a simple function
(e.g., logarithmic function).

Note the change in


skewness from the raw
data (previous slide) to the
transformed data (right):

Copyright 2010 Pearson Education, Inc.

Slide 5 - 22

What Can Go Wrong? (cont.)

Avoid inconsistent scales,


either within the display or
when comparing two
displays.
Label clearly so a reader
knows what the plot
displays.
Good intentions, bad
plot:

Copyright 2010 Pearson Education, Inc.

Slide 5 - 23

What Can Go Wrong? (cont.)

Beware of outliers

Be careful when
comparing groups
that have very
different spreads.
Consider these
side-by-side
boxplots of
cotinine levels:
Re-express . . .
Copyright 2010 Pearson Education, Inc.

Slide 5 - 24

# 22 The graph is not appropriate.


Boxplots are for quantitative data,
and these are categorical data,
although coded as numbers. The
numbers used for hair color and eye
color are arbitrary, so the boxplot and
any accompanying statistics for eye
color make no sense.

Copyright 2010 Pearson Education, Inc.

Slide 5 - 25

What have we learned?

Weve learned the value of comparing data


groups and looking for patterns among groups
and over time.
Weve seen that boxplots are very effective for
comparing groups graphically.
Weve experienced the value of identifying and
investigating outliers.
Weve graphed data that has been measured over
time against a time axis and looked for long-term
trends both by eye and with a data smoother.

Copyright 2010 Pearson Education, Inc.

Slide 5 - 26

Copyright 2010 Pearson Education, Inc.

Slide 5 - 27

Potrebbero piacerti anche