Chapter 3

Chapter 3
Displaying and Summarizing Quantitative

Data
Wednesday
Announcements
Office hours for the TAs and myself have been
posted on Blackboard under Course Information.

Note that there are no office hours on Fridays.
There is a worksheet posted on Blackboard
under Chapter 3, Lecture. We will likely use this
in class on Friday. Please bring a copy to class or
have it available on your laptop in class. It will
not be handed in.
Make sure you are reading your book. Assigned
readings listed on syllabus.
Wednesday
Announcements
Assignments:
For homework (online and written), make sure to
read the Guidelines posted on Blackboard under

Course Information.
For written homework, if we cannot read your
handwriting, you may lose points.
You have a written homework and online homework
due next Wednesday, the 10th.
Survey: Complete before Monday at 8am. It is part of
your first lab grade.
Keep working on your Personal Glossary! Link on
Blackboard for each chapter.
Chapter Outline
Review Quantitative Variables
Describing Quantitative Variables Graphically
Describing Quantitative Variables Numerically
Quantitative Variables
Variables with numbers as values.
Age
Weight
Height
Number of siblings
Describing One Quantitative

Variable
Distribution of variable
Summary of different values observed for the
variable
Includes the 3 ss: shape, center, and spread
Spread is also known as variation or variability.
Always start with making a picture
Histogram
Stem-and-Leaf Display
Displaying Quantitative
Data
Histogram
Stem-and-leaf display
Example: Percent of Population of

Hispanic Origin
Who: 50 states
What: % of States Population of Hispanic
Origin
Where: United States
When: 2000
Why: Looking at demographic changes over
time
How: U.S. Census
Histogram of Percent of Population

of Hispanic Origin
Displaying Quantitative
Data
Histogram
Divides the values of the variable into equalwidth piles (called bins).
Count # of whos belonging to each bin.
Plot bin values on x-axis.
Plot # of whos belonging to each bin on yaxis.
Compare heights of bars = # of whos with
values in the range of the bin.
Picture of Distribution.
Generally used for smaller data sets.
Group data like histograms.
Still have original values (or close to it).
Two columns
Left: Stem
Right: Leaf
Leaf
Contains the last digit of the values.
Arranged in increasing order away from stem.
Stem
Contains the rest of the values.
Usually arranged in increasing order from top to bottom.
JMP does opposite, increasing order from bottom to top!
Key at bottom decodes value of stem and leaf
Stem and Leaf Plot of Percent of

Population of Hispanic Origin
Interpreting Histograms and Stem

and Leaf Displays
Shape
Number of Modes
Symmetry
Outliers
Looking at Distributions Shape

Shape
How many humps (called modes)?
None = uniform
One = unimodal
Two = bimodal
Three or more = multimodal

Shape
Is it symmetric?
Symmetric = roughly equal on both sides
Skewed = more values on one side

Skewed to the right More smaller values trails off
to the larger values
Skewed to the left More larger values trails off to
the smaller values

Shape
Are there any outliers?
Interesting observations in data
Can impact statistical methods
Percent of Population of
Hispanic Origin
How many modes?
Skewed which direction?
Are there Potential
outliers?
Looking at Distributions - Center

Where is the typical value located?
Center
Median
Mean
Looking at Distributions - Spread

How far apart are the values?
Variation (Spread)
Range
Interquartile Range = IQR
Standard deviation = s
Looking at Distributions Center

and Spread (or variation)
2 common options:
First
Median (Center)
Range (Variation)
IQR (Variation)
Second
Mean (Center)
Standard deviation (Variation)
Median
50th percentile
50% of the observations are below the median
50% of the observations are above the median
Median is the middle number

Measures the center of the observations
Different calculation when
n is odd
n is even
Median (n is odd)
Order the data from smallest to largest.
Median is the middle number on the list.
(n+1)/2 number from the bottom
Ex: If n=11, median is the (11 + 1)/2 = 6th
number from the bottom.
Ex: If n=37, median is the (37 + 1)/2 = 19th
number from the bottom.
Example
Year
HR
Year
HR
Year
HR
54
55
56
57
58
59
60
61
13
27
26
44
30
39
40
34
62
63
64
65
66
67
68
69
45
44
24
32
44
39
29
44
70
71
72
73
74
75
76
38
47
34
40
20
12
10
Example (n is odd)
10 12 13 20 24 26 27 29 30 32 34 34
38 39 39 40 40 44 44 44 44 45 47
Median is the (23+1)/2 = 12th number from
the bottom
Median = 34
Median (n is even)
Median is the average of the two middle
numbers.
(n+1)/2 will be halfway between these two
numbers.
Ex: If n=10, (10 + 1)/2 = 5.5,
median is average of 5th and 6th numbers

from bottom.
Ex: If n = 28, (28 + 1)/2 = 14.5
median is average of 14th and 15th numbers
from bottom.
Barry Bonds
Year
HR Year
HR
Year HR
86
87
88
89
90
91
92
16
25
24
19
33
25
34
46
37
33
42
40
37
34
00
01
02
03
04
05
06
07
93
94
95
96
97
98
99
49
73
46
45
45
5
26
28
Example (n is even)
5
16 19 24 25 25 26 28 33 33 34
34 37 37 40 42 45 45 46 46 49 73
(22+1)/2 = 11.5
Median is the average of the 11th and 12th
numbers from the bottom = 34 and 34.

Median = 34
Properties of the Median

Which observations affect the median?
For Barry Bonds, 73 is an outlier

Does this observation affect the median?
Example of finding the

median
The incarceration rate (per 100,000) for each
US State in 2008 was recorded.

Below is a histogram of the data.
Incarceration rate (per 100,000)
Incarceration Rates-Highlights
148
185
197
200
209
226
238
247
265
269
286
288
302
309
317
323
340
361
363
366
373
373
376
378
385
387
403
416
432
434
439
443
445
445
448
458
468
472
474
479
495
508
552
556
572
648
648
654
686
867
148
185
247
309
373
867
=
=
=
=
=
=
Maine (1st)
Minnesota (2nd)
Nebraska (8th)
Iowa (14th)
Illinois (21st)
Louisiana (50th)
Incarceration Rates
Compute the median
148
185
197
200
209
226
238
247
265
269
286
288
302
309
317
323
340
361
363
366
373
373
376
378
385
387
403
416
432
434
439
443
445
445
448
458
468
472
474
479
495
508
552
556
572
648
648
654
686
867
Range
Measures variation (spread)
Minimum 0th percentile
Maximum 100th percentile
Range = Maximum Minimum
Total variability of the observations
Example Barry Bonds

Minimum = 5
Maximum = 73
Range = 73 5 = 68
Properties of the Range

Which observations affect the range?
For Barry Bonds, 73 is an outlier

Does this observation affect the range?
IQR (Interquartile Range)

Measures variation (spread)
IQR = Q3 Q1
25th percentile = Q1
75th percentile = Q3
Variability of the middle 50% of the
observations
Finding Q1 and Q3
In general,
Q1 is the median of the lower half of the
ordered observations.
Q3 is the median of the upper half of the
ordered observations.
Actual calculations from textbook and JMP are
slightly different.
Example - Barry Bonds

Order the home runs from smallest to
largest
Q1 = Median of Lower Half = 25

Q3 = Median of Upper Half = 45
IQR = 45 25 = 20
5-Number Summary
Minimum
Q1
Median
Q3
Maximum
Example Barry Bonds

Minimum = 5
Q1 = 25
Median = 34
Q3 = 45
Maximum = 73
Incarceration Rates
Compute the quartiles
148
185
197
200
209
226
238
247
265
269
286
288
302
309
317
323
340
361
363
366
373
373
376
378
385
387
403
416
432
434
439
443
445
445
448
458
468
472
474
479
495
508
552
556
572
648
648
654
686
867
Median = 386
Q1 =
Q3 =
Incarceration Rates
JMP gives different quartiles
Incarceration Rates
Compute the IQR
148
185
197
200
209
226
238
247
265
269
286
288
302
309
317
323
340
361
363
366
373
373
376
378
385
387
403
416
432
434
439
443
445
445
448
458
468
472
474
479
495
508
552
556
572
648
648
654
686
867
Q1 = 302
Q3 = 472
IQR =
Mean
Ordinary average
Add up all observations.
Divide by the number of observations.
Mean
Formula
n observations
y1, y2, y3, , yn are the observations.
n
y1 y2 y3 yn
y
y
i 1
Example
Barry Bonds HRs per season
y1 y2 y3 yn
y
y
i 1
Properties of the Mean

What effect do the observations have on the
mean?
For Barry Bonds, 73 is an outlier. What effect
does this observation have on the mean?
Standard Deviation
Denoted by letter s.
Measures variability (spread) from mean.
Values closer to mean = smaller contribution to
s.
Values far away from mean = larger
contribution to s.
s depends on how far away values are on
average from the mean.
What is the same?
What is different?
What is one measure of center that we could
use to compare to all data points to describe

how variable the data points are?
Calculate the deviations from the

mean
A
xi
Deviation from
mean
xi -
B
yi
Deviation from
mean
yi -
C
zi
2
0
1
0
2
0
1
5
1
0
2
0
2
0
1
5
2
0
2
5
2
0
2
0
3
0
5
0
Deviation from
mean
zi -
How can we convert this to one number to
describe the variability?

How do we get rid of the negatives?
Calculate the squared deviations

A
xi -
xi
2
2
0
0
2
2
0
0
2
2
0
0
2
2
0
0
2
2
0
0
0
0
0
0
0
0
0
0
0
0
Squared B
Deviatio
n
yi
yi -
1
1
0
0
1
1
5
5
2
2
0
0
2
2
5
5
3
3
0
0
-10
-10
5
5
-15
-15
-5
-5
1
1
0
0
1
1
5
5
2
2
0
0
5
5
0
0
-10
-10
0
0
5
5
10
10
Squared
Deviatio
n
zi -
zi
-5
-5
0
0
30
30
Squared
Deviatio
n
Now, how can we convert this to one number
to describe the variability?
Sum of the squared deviations

A
xi -
Squared B
Deviatio
n
yi
yi -
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Sum
Sum
=
=
0
0
xi
2
2
0
0
2
2
0
0
2
2
0
0
2
2
0
0
2
2
0
0
1
1
0
0
1
1
5
5
2
2
0
0
2
2
5
5
3
3
0
0
Squared
Deviatio
n
zi -
Squared
Deviatio
n
-10
-10
100
100
5
5
-15
-15
225
225
-5
-5
25
25
1
1
0
0
1
1
5
5
2
2
0
0
5
5
0
0
-10
-10
100
100
0
0
0
0
-5
-5
25
25
5
5
25
25
0
0
0
0
10
10
100
100
30
30
900
900
Sum
Sum
=
=
250
250
Sum
Sum
=
=
1250
1250
zi
Divide by (n-1), not n

A
xi -
Squared B
Deviatio
n
yi
yi -
0
0
0
0
-10
-10
100
100
5
5
-15
-15
225
225
0
0
0
0
-5
-5
25
25
-10
-10
100
100
0
0
0
0
0
0
0
0
-5
-5
25
25
0
0
0
0
5
5
25
25
0
0
0
0
0
0
0
0
10
10
100
100
1
1
0
0
1
1
5
5
2
2
0
0
5
5
0
0
30
30
900
900
xi
2
2
0
0
2
2
0
0
2
2
0
0
2
2
0
0
2
2
0
0
Sum 0
Sum
= 0
2=
1
1
0
0
1
1
5
5
2
2
0
0
2
2
5
5
3
3
0
0
Squared
Deviation
C zi -
Squared
Deviation
zi
Sum 250
Sum
= 250
2=
Sum 1250
Sum
= 1250
2=
Compare s2 (variance) to the

graphs
Dataset s2
A
62.5
312.5
In the process we squared the

deviations
How do you un-square a value?
Standard Deviation
Datase
t
s2
62.5
7.91
312.5 17.68
What do the standard deviations represent?

How far, on average, each value is from the
mean.
Compare s (standard deviation) to

the graphs
Datase
t
s2
62.5
7.91
312.5 17.68
Properties of standard deviation

Can the standard deviation be negative?
Standard Deviation
n
( y1 y ) ( y 2 y ) ( yn y )
s
n 1
2
2
(
y
y
)
i
i 1
n 1
Standard Deviation
Usually calculate using computer or
calculator.
Choose n-1 option on calculator.
If calculating by hand, make table.
Standard Deviation of Number of

Home Runs per Season for Barry
Bonds
Properties of s
What effect do the observations have on the
value of s?
For Barry Bonds, 73 is an outlier. What effect
does this observation have on the value of s?
General Properties of s
Can the standard deviation be negative?
Can the standard deviation be 0?
s has the same units as the data.

Variance = s2
Comparing standard
deviations
Look at the pairs of graphs on the handout.
For each pair, determine which has the larger
standard deviation, or if they are the same.
Comparison of the Mean and

Median
Median = 50th percentile (middle number)
Mean = fair share value (balancing point)
Mean vs. Median

Mean and Median are generally similar when
Distribution is symmetric with no outliers
Mean and median are generally different
when either
Distribution is skewed
Outliers are present
Influence of Outliers on the Mean

and Median
Small Example: Income in a small town of 6
people
$25,000 $27,000 $29,000
$35,000 $37,000 $38,000
Mean income is $31,830

Median income is $32,000
Influence of Outliers on the Mean

and Median
Bill Gates moves to town.
$25,000 $27,000 $29,000
$35,000 $37,000 $38,000 $100,000,000
The mean income is $14,313,000
The median income is $35,000
Influence of Outliers
Summaries not affected by outliers are called
ROBUST (or resistant).

Center
Median = Robust
Mean = Not Robust
Variation (Spread)
Range = Not Robust
IQR = Robust
s = Not Robust
Influence of Skewness on the Mean

and Median
The observations in the tail influence the
mean.
These observations do not (usually) influence
the median.
Skewed to the right (large values)
Mean > median
Skewed to the left (small values)
Mean < median
Mean vs. Median

Always question when means are reported for
skewed data
Income
Housing prices
Course grades
Comparison of Range, IQR and

Standard Deviation
Report Range and IQR when you report
Median Value
Report Standard Deviation when you report
Mean Value
Which summaries are the

best?
Five Number Summary
Distribution is skewed
Outliers are present
Mean and Standard Deviation
Distribution is symmetric with no outliers
ALWAYS GET A PICTURE OF YOUR DATA.

Chapter 3 - Displaying and Summarizing Quantitative Data

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Chapter 3 - Displaying and Summarizing Quantitative Data

Caricato da

Copyright:

Formati disponibili

Displaying and Summarizing Quantitative

posted on Blackboard under Course Information.

read the Guidelines posted on Blackboard under

Describing One Quantitative

Example: Percent of Population of

Histogram of Percent of Population

Key at bottom decodes value of stem and leaf

Stem and Leaf Plot of Percent of

Interpreting Histograms and Stem

Looking at Distributions Shape

Looking at Distributions Shape

Symmetric = roughly equal on both sides

Skewed = more values on one side

Looking at Distributions Shape

Skewed which direction?

Are there Potential

Looking at Distributions - Center

Looking at Distributions - Spread

Looking at Distributions Center

Median is the middle number

Median is the (23+1)/2 = 12th number from

median is average of 5th and 6th numbers

numbers from the bottom = 34 and 34.

Properties of the Median

For Barry Bonds, 73 is an outlier

Example of finding the

US State in 2008 was recorded.

Incarceration rate (per 100,000)

Example Barry Bonds

Properties of the Range

For Barry Bonds, 73 is an outlier

IQR (Interquartile Range)

Example - Barry Bonds

Q1 = Median of Lower Half = 25

Example Barry Bonds

Properties of the Mean

For Barry Bonds, 73 is an outlier. What effect

does this observation have on the mean?

average from the mean.

What is the same?

What is one measure of center that we could

use to compare to all data points to describe

Calculate the deviations from the

How can we convert this to one number to

describe the variability?

Calculate the squared deviations

Now, how can we convert this to one number

to describe the variability?

Sum of the squared deviations

Divide by (n-1), not n

Compare s2 (variance) to the

In the process we squared the

What do the standard deviations represent?

Compare s (standard deviation) to

Properties of standard deviation

If calculating by hand, make table.

Standard Deviation of Number of

For Barry Bonds, 73 is an outlier. What effect

does this observation have on the value of s?

s has the same units as the data.

standard deviation, or if they are the same.

Comparison of the Mean and