Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Wednesday
Announcements
Office hours for the TAs and myself have been
Wednesday
Announcements
Assignments:
For homework (online and written), make sure to
Chapter Outline
Review Quantitative Variables
Describing Quantitative Variables Graphically
Describing Quantitative Variables Numerically
Quantitative Variables
Variables with numbers as values.
Age
Weight
Height
Number of siblings
Displaying Quantitative
Data
Histogram
Stem-and-leaf display
Origin
Where: United States
When: 2000
Why: Looking at demographic changes over
time
How: U.S. Census
Displaying Quantitative
Data
Histogram
Divides the values of the variable into equalwidth piles (called bins).
Count # of whos belonging to each bin.
Plot bin values on x-axis.
Plot # of whos belonging to each bin on yaxis.
Compare heights of bars = # of whos with
values in the range of the bin.
Stem-and-Leaf Display
Picture of Distribution.
Generally used for smaller data sets.
Group data like histograms.
Still have original values (or close to it).
Stem-and-Leaf Display
Two columns
Left: Stem
Right: Leaf
Leaf
Contains the last digit of the values.
Arranged in increasing order away from stem.
Stem
Contains the rest of the values.
Usually arranged in increasing order from top to bottom.
JMP does opposite, increasing order from bottom to top!
Percent of Population of
Hispanic Origin
How many modes?
outliers?
Median
50th percentile
50% of the observations are below the median
50% of the observations are above the median
Median (n is odd)
Order the data from smallest to largest.
Median is the middle number on the list.
(n+1)/2 number from the bottom
Ex: If n=11, median is the (11 + 1)/2 = 6th
number from the bottom.
Ex: If n=37, median is the (37 + 1)/2 = 19th
number from the bottom.
Example
Year
HR
Year
HR
Year
HR
54
55
56
57
58
59
60
61
13
27
26
44
30
39
40
34
62
63
64
65
66
67
68
69
45
44
24
32
44
39
29
44
70
71
72
73
74
75
76
38
47
34
40
20
12
10
Example (n is odd)
Order the data from smallest to largest.
10 12 13 20 24 26 27 29 30 32 34 34
38 39 39 40 40 44 44 44 44 45 47
the bottom
Median = 34
Median (n is even)
Order the data from smallest to largest.
Median is the average of the two middle
numbers.
(n+1)/2 will be halfway between these two
numbers.
Ex: If n=10, (10 + 1)/2 = 5.5,
Barry Bonds
Year
HR Year
HR
Year HR
86
87
88
89
90
91
92
16
25
24
19
33
25
34
46
37
33
42
40
37
34
00
01
02
03
04
05
06
07
93
94
95
96
97
98
99
49
73
46
45
45
5
26
28
Example (n is even)
Order the data from smallest to largest.
5
16 19 24 25 25 26 28 33 33 34
34 37 37 40 42 45 45 46 46 49 73
(22+1)/2 = 11.5
Median is the average of the 11th and 12th
Incarceration Rates-Highlights
148
185
197
200
209
226
238
247
265
269
286
288
302
309
317
323
340
361
363
366
373
373
376
378
385
387
403
416
432
434
439
443
445
445
448
458
468
472
474
479
495
508
552
556
572
648
648
654
686
867
148
185
247
309
373
867
=
=
=
=
=
=
Maine (1st)
Minnesota (2nd)
Nebraska (8th)
Iowa (14th)
Illinois (21st)
Louisiana (50th)
Incarceration Rates
Compute the median
148
185
197
200
209
226
238
247
265
269
286
288
302
309
317
323
340
361
363
366
373
373
376
378
385
387
403
416
432
434
439
443
445
445
448
458
468
472
474
479
495
508
552
556
572
648
648
654
686
867
Range
Measures variation (spread)
Minimum 0th percentile
Maximum 100th percentile
Range = Maximum Minimum
Total variability of the observations
observations
Finding Q1 and Q3
In general,
Q1 is the median of the lower half of the
ordered observations.
Q3 is the median of the upper half of the
ordered observations.
Actual calculations from textbook and JMP are
slightly different.
largest
5-Number Summary
Minimum
Q1
Median
Q3
Maximum
Incarceration Rates
Compute the quartiles
148
185
197
200
209
226
238
247
265
269
286
288
302
309
317
323
340
361
363
366
373
373
376
378
385
387
403
416
432
434
439
443
445
445
448
458
468
472
474
479
495
508
552
556
572
648
648
654
686
867
Median = 386
Q1 =
Q3 =
Incarceration Rates
JMP gives different quartiles
Incarceration Rates
Compute the IQR
148
185
197
200
209
226
238
247
265
269
286
288
302
309
317
323
340
361
363
366
373
373
376
378
385
387
403
416
432
434
439
443
445
445
448
458
468
472
474
479
495
508
552
556
572
648
648
654
686
867
Q1 = 302
Q3 = 472
IQR =
Mean
Ordinary average
Add up all observations.
Divide by the number of observations.
Mean
Formula
n observations
y1, y2, y3, , yn are the observations.
n
y1 y2 y3 yn
y
y
i 1
Example
Barry Bonds HRs per season
y1 y2 y3 yn
y
y
i 1
mean?
Standard Deviation
Denoted by letter s.
Measures variability (spread) from mean.
Values closer to mean = smaller contribution to
s.
Values far away from mean = larger
contribution to s.
s depends on how far away values are on
What is different?
Deviation from
mean
xi -
B
yi
Deviation from
mean
yi -
C
zi
2
0
1
0
2
0
1
5
1
0
2
0
2
0
1
5
2
0
2
5
2
0
2
0
3
0
5
0
Deviation from
mean
zi -
xi -
xi
2
2
0
0
2
2
0
0
2
2
0
0
2
2
0
0
2
2
0
0
0
0
0
0
0
0
0
0
0
0
Squared B
Deviatio
n
yi
yi -
1
1
0
0
1
1
5
5
2
2
0
0
2
2
5
5
3
3
0
0
-10
-10
5
5
-15
-15
-5
-5
1
1
0
0
1
1
5
5
2
2
0
0
5
5
0
0
-10
-10
0
0
5
5
10
10
Squared
Deviatio
n
zi -
zi
-5
-5
0
0
30
30
Squared
Deviatio
n
xi -
Squared B
Deviatio
n
yi
yi -
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Sum
Sum
=
=
0
0
xi
2
2
0
0
2
2
0
0
2
2
0
0
2
2
0
0
2
2
0
0
1
1
0
0
1
1
5
5
2
2
0
0
2
2
5
5
3
3
0
0
Squared
Deviatio
n
zi -
Squared
Deviatio
n
-10
-10
100
100
5
5
-15
-15
225
225
-5
-5
25
25
1
1
0
0
1
1
5
5
2
2
0
0
5
5
0
0
-10
-10
100
100
0
0
0
0
-5
-5
25
25
5
5
25
25
0
0
0
0
10
10
100
100
30
30
900
900
Sum
Sum
=
=
250
250
Sum
Sum
=
=
1250
1250
zi
xi -
Squared B
Deviatio
n
yi
yi -
0
0
0
0
-10
-10
100
100
5
5
-15
-15
225
225
0
0
0
0
-5
-5
25
25
-10
-10
100
100
0
0
0
0
0
0
0
0
-5
-5
25
25
0
0
0
0
5
5
25
25
0
0
0
0
0
0
0
0
10
10
100
100
1
1
0
0
1
1
5
5
2
2
0
0
5
5
0
0
30
30
900
900
xi
2
2
0
0
2
2
0
0
2
2
0
0
2
2
0
0
2
2
0
0
Sum 0
Sum
= 0
2=
1
1
0
0
1
1
5
5
2
2
0
0
2
2
5
5
3
3
0
0
Squared
Deviation
C zi -
Squared
Deviation
zi
Sum 250
Sum
= 250
2=
Sum 1250
Sum
= 1250
2=
Dataset s2
A
62.5
312.5
Standard Deviation
Datase
t
s2
62.5
7.91
312.5 17.68
Datase
t
s2
62.5
7.91
312.5 17.68
Standard Deviation
n
( y1 y ) ( y 2 y ) ( yn y )
s
n 1
2
2
(
y
y
)
i
i 1
n 1
Standard Deviation
Usually calculate using computer or
calculator.
Choose n-1 option on calculator.
Properties of s
What effect do the observations have on the
value of s?
General Properties of s
Can the standard deviation be negative?
Can the standard deviation be 0?
Comparing standard
deviations
Look at the pairs of graphs on the handout.
For each pair, determine which has the larger
when either
Distribution is skewed
Outliers are present
people
$25,000 $27,000 $29,000
$35,000 $37,000 $38,000
Influence of Outliers
Summaries not affected by outliers are called
Variation (Spread)
Range = Not Robust
IQR = Robust
s = Not Robust
mean.
These observations do not (usually) influence
the median.
Skewed to the right (large values)
Mean > median
Skewed to the left (small values)
skewed data
Income
Housing prices
Course grades
Median Value
Report Standard Deviation when you report
Mean Value