Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Tabular
Descriptive
Techniques
Cereal choice
Capital expenditure
The waiting time for medical services
Interval Data
Real numbers, i.e. heights, weights,
prices, etc.
Also referred to as quantitative or
numerical.
Arithmetic operations can be performed
on Interval Data, thus its meaningful to
talk about 2*Height, or Price + $1, and so
on.
Nominal Data
The values of nominal data are categories.
Nominal data are also called qualitative or
categorical.
For example, responses to questions about marital
status, coded as:
Single = 1, Married = 2, Divorced = 3, Widowed = 4
Because the numbers are arbitrary arithmetic
operations dont make any sense (e.g. does
Widowed 2 = Married?!)
Ethnicity
Smoking status
smoker, non-smoker
Ordinal Data
Ordinal Data appear to be categorical in nature, but
their values have an order; a ranking to them:
For example, college course rating system:
poor = 1, fair = 2, good = 3, very good = 4, excellent = 5
While its still not meaningful to do arithmetic on this data
(e.g. does 2*fair = very good?!), we can say things like:
excellent > poor or fair < very good
That is, order is maintained no matter what numeric
values are assigned to each category.
Examples:Ordinal Data
A type of categorical data in which order is
important.
Class of degree-1st class, 2:1, 2:2, 3rd class,
fail
Degree of illness- none, mild, moderate,
acute, chronic.
Opinion of students about stats classesVery unhappy, unhappy, neutral, happy,
ecstatic!
Categorical?
Interval
Data
Ordinal
Data
Y
Ordered?
Categoric
al Data
Nominal
Data
Nominal data
With nominal data,
all we can do is,
calculate the proportion
of data that falls into
each category.
Age -- income
income
Age
55
55
42
42
75000
75000
68000
68000
..
..
.. Weight
.. gain
Weight
gain
+10
+10
+5
+5
..
..
IBM
IBM
25
25
50%
50%
Dell Compaq
Compaq Other
Other
Dell
11
11
88
66
22% 16%
16%
12%
22%
12%
Total
Total
50
50
Other
11.1%
Accounting
28.9%
General
management
14.2%
Finance
20.6%
Marketing
25.3%
F re qu e ncy
80
73
70
60
64
52
50
40
36
28
30
20
10
0
1
4
Area
More
Itallthesameinformation,
(basedonthesamedata).
Justdifferentpresentation.
90
91
92
93
94
Collect data
Prepare a frequency distribution
Draw a histogram
With 200
observations, we
should have
between 7 & 10
classes
Alternatively,wecoulduseSturgesformula:
Numberofclassintervals=1+3.3log(n)=1+3.3(2.3)=8.6
Number of classes
5-7
7-9
9-10
10-11
11-13
13-17
17-20
Smallest
Smallest
Smallest
Smallest
observation
observation
observation
observation
15
60
40
Bills
120
105
90
75
60
45
30
20
15
Frequency
Classfrequency
frequency
Class
Totalnumber
numberofofobservations
observations
Total
Shapes of histograms
Shapes of histograms
Negatively skewed
Positively skewed
Modal classes
A modal class is the one with the largest
number of observations.
A unimodal histogram
Modal classes
A bimodal histogram
A modal class
A modal class
Interpreting histograms
Example 2.2: Selecting an investment
each investment)
The center
for A
0 15 30 45 60 75
181614121086420-15
Return on investment A
The
center for
B
0 15 30 45 60 75
Return on investment B
17
34
46
0 15 30 45 60 75
1614121086420-15
Return on investment A
16
26
43
0 15 30 45 60 75
Return on investment B
0 15 30 45 60 75
181614121086420-15
0 15 30 45 60 75
investment B
The risk from investing in A is smaller.
The possibility of having a high rate of return exists
for both investment.
theory.
Histogram
Histogram
40
40
20
20
00
50
50
60
60
70
80
70
80
Marks(Manual)
Marks(Manual)
90
90
100
100
70
80
70
80
Marks(Computer)
Marks(Computer)
90
90
100
100
Frequency
Frequency
60
60
Ste
m
48
Or split it at the tens position (still rounding)
4
Leaf
2
8
0000000000111112222223333345555556666666778888999999
000001111233333334455555667889999
The length of each line
0000111112344666778999
represents the frequency
001335589
124445589
33566
3458
022224556789
334457889999
00112222233344555999
001344446699
124557889
Cumulative Relative
Frequencies:
firstclass
nextclass:.355+.185=.540
:
:
lastclass:.930+.070=1.00
Ogives
Ogives are cumulative relative frequency
distributions.
Example 2.1 - continued
Cumulative relative
relative frequency
frequency
Cumulative
Cumulativerelative
relativefrequency
frequencyfor
fortelephone
telephonebills
bills
Cumulative
Cumulative
Cumulative
Class Frequency
Frequency frequency
frequency
Class
0-15
71
71
0-15
71
71
15-30
37
108
15-30
37
108
30-45
13
121
30-45
13
121
45-60
130
45-60
99
130
60-75
10
140
60-75
10
140
75-90
18
158
75-90
18
158
90-105
28
186
90-105
28
186
105-200
14
200
105-200
14
200
}}
Cum.Relative
Cum.Relative
frquency
frquency
71/200=.355
71/200=.355
108/200=.540
108/200=.540
121/200=.605
121/200=.605
130/200=.650
130/200=.650
140/200=.700
140/200=.700
158/200=.790
158/200=.790
186/200=.930
186/200=.930
200/200=1.000
200/200=1.000
.700
.650
.605
.540
.790
.930 1.000
.355
15
30
45
Bills
Bills
60
75
90
105 120
Constructing an ogive
1) Calculate the relative frequencies.
2) Calculate the cumulative relative frequencies by
adding the current class relative frequency to the
previous class cumulative relative frequency.
3) Graph the cumulative relative frequencies
around $35
Ogive App.
46
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
47
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
) (
48
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
49
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
50
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Another App.
Pareto Rule (20/80)
51
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Notehowthisreaderiscross
classifiedaccordingtoboth
variables
Contingency Table
Interpretation: The relative frequencies in the columns 2 & 3 are similar,
but there are large differences between columns 1 and 2 and between
columns 1 and 3.
This tells us that blue collar workers tend to read different newspapers
from both white collar workers and professionals and that white collar
and professionals are quite similar in their newspaper choice.
similar
dissimilar
Size
Price
23
315
24
229
26
335
27
261
..
..
Scatter Diagram
It appears that in fact there is a relationship: the greater
the house size the greater the selling price:
No relationship
Line Chart
Observations measured at the
same point in time are called
cross-sectional data.
Observations measured at
successive points in time are
called time-series data.
60
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
61
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Line Chart
From 87 to 92, the tax was fairly flat. Starting 93, there was a rapid
increase taxes until 2001. Finally, there was a downturn in 2002.
Summary
Interval
Data
Nominal
Data
Histogram, Ogive,
Single Set of or Stem-and-Leaf
Display
Data
Frequency and
Relative Frequency
Tables, Bar and
Pie Charts
Contingency Table,
Bar Charts