Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Statistics
Data Information
2
Example
An engineering school student is anxious about their statistics
course, since theyve heard the course is difficult. The professor
provides last terms final exam marks to the student. What can be
discerned from this list of numbers?
Statistics
Data Information
List of last terms marks. New information about the
statistics class.
95
89
70 E.g. Class average,
65 Proportion of class receiving As
78 Most frequent mark,
57
Marks distribution, etc.
:
3
Key Statistical Concepts
Population Sample
Subset
Statistics
Parameter
5
Descriptive Statistics
OilFire
8.9% CoalMine
15.6%
C ategory
C oalMine
DamFailure
GasExplosion
Lightning
Nuclear
30
25
Chart of Cause
OilFire
20
DamFailure
Count
8.9%
15
10
GasExplosion 0
62.2%
CoalMine DamFailure GasExplosion Lightning Nuclear OilFire
Cause
6
Descriptive Statistics
Descriptive statistics involves arranging, summarizing, and
presenting a set of data in such a way that useful information
is produced.
Statistics
Data Information
Population
Sample
Inference
Statistic
Parameter
8
Classification of Data
Data
Qualitative Quantitative
(Interval)
9
Numerical Methods for Describing Qualitative Data
10
Example
11
Graphical Methods for Describing Qualitative Data
Bar Chart
Pie Chart
Pareto Diagram
12
Pie Chart
DamFailure
8.9%
GasExplosion
62.2%
13
Bar Chart
Chart of Cause
30
25
20
Count
15
10
0
CoalMine DamFailure GasExplosion Lightning Nuclear OilFire
Cause
14
Pareto Diagram
Chart of Cause
100
60
40
20
0
GasExplosion CoalMine DamFailure OilFire Lightning Nuclear
Cause
Percent within all data.
15
Graphical Methods for Describing Quantitative Data
Dot plots
Steam-and-leaf display
Histograms
16
Example
17
Dotplots
Dotplot of MPG
18
Histograms
Histogram of MPG
35
30
25
Frequency
20
15
10
0
30 33 36 39 42 45
MPG
19
Frequency Distribution Example
24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
32, 13, 12, 38, 41, 43, 44, 27, 53, 27
20
Frequency Distribution Example
Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Find range: 58 - 12 = 46
Select number of classes: 5 (usually between 5 and 15)
Compute class interval (width): 10 (46/5 then round up)
Determine class boundaries (limits): 10, 20, 30, 40, 50, 60
Compute class midpoints: 15, 25, 35, 45, 55
21
Frequency Distribution Example
Relative
Class Frequency Percentage
Frequency
10 but less than 20 3 .15 15
20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10
Total 20 1.00 100
22
Frequency Distribution Example
Cumulative Cumulative
Class Frequency Percentage
Frequency Percentage
23
Histogram Example
Class
Class Midpoint Frequency
10 but less than 20 15 3 His togram : Daily High Te m pe rature
20 but less than 30 25 6
30 but less than 40 35 5 7
40 but less than 50 45 4
50 but less than 60 55 2
6
5
Frequency
4
3
2
(No gaps 1
between 0
bars)
5 15 25 35 45 55 65
Class Midpoints
24
Numerical Methods for Describing Quantitative Data
25
Measures of Central Tendency
Central Tendency
X
i1
i
X
n Midpoint of Most
ranked frequently
values observed
value
26
Mean
Population Sample
Size N n
Mean
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
n
X
i1
i
X1 X 2 Xn
X
n n
1 2 3 4 5 15 1 2 3 4 10 20
3 4
5 5 5 5
28
Median
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Median = 3 34
Median = 3.5
2
29
Mode
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6
No Mode
Mode = 9
30
Mean, Median, Mode
31
Measures of Variation
Measures of variation give information on the
spread or variability of the data values.
Range
Standard deviation
Variance
Same center,
different variation
32
Range
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
33
Disadvantages of the Range
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
34
Variance Standard deviation
Average (approximately) of squared deviations of values
from the mean
Sample variance: Sample standard deviation:
n n
2
2
(Xi X)
i1
i
(X X ) 2
S S i 1
n -1 n -1
35
Population vs Sample
Population
Sample
Subset
Statistics
Parameter
N n
2
(X )
i
2
2
(X X)
i1
i
2 i1 S
N n -1
36
Example: Sample Standard Deviation
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = X = 16
2 2 2 2
(10 16) (12 16) (14 16) (24 16)
8 1
37
Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 0.926
Data C
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 4.567
38
Measuring variation
39
Shape of a distribution
40