Stat

Graphical &
Tabular
Descriptive
Techniques
2.1 Types of data and information

A variable - a characteristic of population or
sample that is of interest for us.
Cereal choice
Capital expenditure
The waiting time for medical services
Data - the actual values of variables
Interval data are numerical observations

Nominal data are categorical observations
Ordinal data are ordered categorical observations
Interval Data
Real numbers, i.e. heights, weights,
prices, etc.
Also referred to as quantitative or
numerical.
Arithmetic operations can be performed
on Interval Data, thus its meaningful to
talk about 2*Height, or Price + $1, and so
on.
Nominal Data
The values of nominal data are categories.
Nominal data are also called qualitative or
categorical.
For example, responses to questions about marital
status, coded as:
Single = 1, Married = 2, Divorced = 3, Widowed = 4
Because the numbers are arbitrary arithmetic
operations dont make any sense (e.g. does
Widowed 2 = Married?!)
More Examples: Nominal Data

Type of Bicycle
Mountain bike, road bike, chopper, folding,BMX.
Ethnicity
White British, Afro-Caribbean, Asian, Chinese,

other, etc. (note problems with these categories).
Smoking status
smoker, non-smoker
Ordinal Data
Ordinal Data appear to be categorical in nature, but
their values have an order; a ranking to them:
For example, college course rating system:
poor = 1, fair = 2, good = 3, very good = 4, excellent = 5
While its still not meaningful to do arithmetic on this data
(e.g. does 2*fair = very good?!), we can say things like:
excellent > poor or fair < very good
That is, order is maintained no matter what numeric
values are assigned to each category.
Examples:Ordinal Data
A type of categorical data in which order is
important.
Class of degree-1st class, 2:1, 2:2, 3rd class,
fail
Degree of illness- none, mild, moderate,
acute, chronic.
Opinion of students about stats classesVery unhappy, unhappy, neutral, happy,
ecstatic!
Types of Data & Information

Data
Categorical?
Interval
Data
Ordinal
Data
Y
Ordered?
Categoric
al Data
Nominal
Data
Types of data - examples

Interval data
Nominal data
With nominal data,
all we can do is,
calculate the proportion
of data that falls into
each category.
Age -- income
income
Age
55
55
42
42
75000
75000
68000
68000
..
..
.. Weight
.. gain
Weight
gain
+10
+10
+5
+5
..
..
IBM
IBM
25
25
50%
50%
Dell Compaq
Compaq Other
Other
Dell
11
11
88
66
22% 16%
16%
12%
22%
12%
Total
Total
50
50
Exploratory Data Analysis

is the process of using simple math and
pictures to summarize data .
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Types of data analysis

Knowing the type of data is necessary to properly
select the technique to be used when analyzing data.
Type of analysis allowed for each type of data
Interval data arithmetic calculations

Nominal data counting the number of observation in each
category
Ordinal data - computations based on an ordering process
2.2 Graphical Techniques for

Nominal data
The only allowable calculation on nominal
data is to count the frequency of each value
of a variable.
When the raw data can be naturally
categorized in a meaningful manner, we can
display frequencies by
Bar charts emphasize frequency of occurrences

of the different categories.
Pie chart emphasize the proportion of
occurrences of each category.
Graphical & Tabular Techniques for

Nominal Data
First we need to summarize the data in
a table that presents the categories and
their counts called a frequency
distribution.
A relative frequency distribution lists
the categories and the proportion with
which each occurs.
Nominal Data (Tabular

Summary)
The Pie Chart

The pie chart is a circle, subdivided into
a number of slices that represent the
various categories.
The size of each slice is proportional to
the percentage corresponding to the
category it represents.
The Pie Chart
Other
11.1%
Accounting
28.9%
General
management
14.2%
Finance
20.6%
Marketing
25.3%
(28.9 /100)(3600) = 1040
The Bar Chart

Rectangles represent each category.
The height of the rectangle represents the frequency.
The base of the rectangle is arbitrary
Bar Chart
F re qu e ncy
80
73
70
60
64
52
50
40
36
28
30
20
10
0
1
4
Area
More
Itallthesameinformation,
(basedonthesamedata).
Justdifferentpresentation.
The Bar Chart

Use bar charts also when the order in which
nominal data are presented is meaningful.
Total number
number of
of new
new products
products introduced
introduced inin
Total
NorthAmerica
America inin the
the years
years 1989,,1994
1989,,1994
North
20,000
15,000
10,000
5,000
0
89
90
91
92
93
94
2.3 Graphical Techniques for

Interval Data
Example 2.1 Goal: Display & describe
information concerning the monthly bills
of new telephone subscribers
Collect data
Prepare a frequency distribution
Draw a histogram
How many classes to use?
With 200
observations, we
should have
between 7 & 10
classes
Alternatively,wecoulduseSturgesformula:
Numberofclassintervals=1+3.3log(n)=1+3.3(2.3)=8.6
Preparing the histogram

Collect data
Bills
42.19
38.45
29.23
89.35
118.04
110.46
0.00
72.88
83.05
.
.
(There are 200 data points
Prepare a frequency distribution

How many classes to use?
Number of observations
Less then 50
50 - 200
200 - 500
500 - 1,000
1,000 5,000
5,000- 50,000
More than 50,000
Number of classes
5-7
7-9
9-10
10-11
11-13
13-17
17-20
Class width = [Range] / [# of classes]

[119.63 - 0] / [8] = 14.95
Largest
Largest
Largest
Largest
observation
observation
observation
observation
Smallest
Smallest
Smallest
Smallest
observation
observation
observation
observation
15
Building the Histogram

1) Collect the Data
2) Create a frequency distribution for the data
a) Determine the number of classes to use. [8]
b) Determine how large to make each class. [15]
c) Place the data into each class
each item can only belong to one class;
classes contain observations greater than
their lower limits and less than or equal to
their upper limits.
Drawing the histogram

Draw a Histogram
Interpreting the Histogram

What information is visible from this histogram?
60
40
Bills
120
105
90
75
60
45
30
20
15
Frequency
About half of all A few bills are in Relatively,

the bills are small the middle range large number
of large bills
80 71+37=108 13+9+10=32
18+28+14=60
Additional notes: Relative frequency

It is often preferable to show the relative frequency
(proportion) of observations falling into each class,
rather than the frequency itself.
Classrelative
relativefrequency
frequency==
Class
Classfrequency
frequency
Class
Totalnumber
numberofofobservations
observations
Total
Relative frequencies are especially important when
comparing two or more histograms

the number of observations of the samples studied are
different
Additional notes: Class width

It is generally best to use equal class width, but
sometimes unequal class width are called for.
Unequal class width is used when the
frequency associated with some classes is too
low. Then,
several classes are combined together to form a

wider and more populated class.
It is possible to form an open ended class at the
higher end or lower end of the histogram.
Shapes of histograms
Shapes of histograms
Negatively skewed
Positively skewed
Modal classes
A modal class is the one with the largest
number of observations.
A unimodal histogram
The modal class
Modal classes
A bimodal histogram
A modal class
A modal class
Bell shaped histograms

Many statistical techniques require that the
population be bell shaped.
Drawing the histogram helps verify the shape of
the population in question
Interpreting histograms
Example 2.2: Selecting an investment
An investor is considering investing in one

out of two investments.
The returns on these investments were
recorded.
From the two histograms, how can the
investor interpret the
Expected returns
The spread of the return (the risk involved with
each investment)
Comparing two Histograms

181614121086420-15
The center
for A
0 15 30 45 60 75
181614121086420-15
Return on investment A
The
center for
B
0 15 30 45 60 75
Return on investment B
Interpretation: The center of the returns of Investment

is slightly lower than that for Investment B

181614121086420-15
Sample size =50 18-
17
34
46
0 15 30 45 60 75
Sample size =50
1614121086420-15
Return on investment A
16
26
43
0 15 30 45 60 75
Return on investment B
Interpretation: The spread of returns for Investment A

is less than that for investment B

181614121086420-15
0 15 30 45 60 75
181614121086420-15
0 15 30 45 60 75
Return on investment A Return on investment B
Interpretation: Both histograms are slightly positively

skewed. There is a possibility of large returns.
Conclusion: two Histograms

Example 2.2: Conclusion
It seems that investment A is better, because:

Its expected return is only slightly below that of
investment B
The risk from investing in A is smaller.
The possibility of having a high rate of return exists
for both investment.
Another example: comparing two

histograms
Example 2.3: Comparing students
performance
Students performance in two statistics classes were

compared.
The two classes differed in their teaching emphasis
Class A mathematical analysis and development of
theory.
Class B applications and computer based

analysis.
The final mark for each student in each course was
recorded.
Draw histograms and interpret the results.
Comparing two histograms

Frequency
Frequency
Histogram
Histogram
40
40
20
20
00
50
50
60
60
70
80
70
80
Marks(Manual)
Marks(Manual)
90
90
100
100
70
80
70
80
Marks(Computer)
Marks(Computer)
90
90
100
100
Frequency
Frequency
The mathematical emphasis

creates two groups, and a
larger spread.
Histogram
Histogram
40
40
20
20
00
50
50
60
60
Stem & Leaf Display

Retains information about individual observations that
would normally be lost in the creation of a histogram.
Split each observation into two parts, a stem and a leaf:
For example, observation value 48.19:
There are several ways to split it up
We could split it at the decimal point:
(and round)
Ste
m
48
Or split it at the tens position (still rounding)
4
Leaf
2
8
Stem & Leaf Display

Continue this process for all the observations. Then,
use the stems for the classes and each leaf
becomes part of the histogram (based on
Example 2.4 data) as follows
Stem Leaf
0
1
2
3
4
5
6
7
8
9
10
11
0000000000111112222223333345555556666666778888999999
000001111233333334455555667889999
The length of each line
0000111112344666778999
represents the frequency
001335589
124445589
33566
3458
022224556789
334457889999
00112222233344555999
001344446699
124557889
of the class defined by

the stem.
Westillhaveaccesstothe
originaldatapointsvalue!
Cumulative Relative
Frequencies:
firstclass
nextclass:.355+.185=.540
:
:
lastclass:.930+.070=1.00
Ogives
Ogives are cumulative relative frequency
distributions.
Example 2.1 - continued
Cumulative relative
relative frequency
frequency
Cumulative
Cumulativerelative
relativefrequency
frequencyfor
fortelephone
telephonebills
bills
Cumulative
Cumulative
Cumulative
Class Frequency
Frequency frequency
frequency
Class
0-15
71
71
0-15
71
71
15-30
37
108
15-30
37
108
30-45
13
121
30-45
13
121
45-60
130
45-60
99
130
60-75
10
140
60-75
10
140
75-90
18
158
75-90
18
158
90-105
28
186
90-105
28
186
105-200
14
200
105-200
14
200
}}
Cum.Relative
Cum.Relative
frquency
frquency
71/200=.355
71/200=.355
108/200=.540
108/200=.540
121/200=.605
121/200=.605
130/200=.650
130/200=.650
140/200=.700
140/200=.700
158/200=.790
158/200=.790
186/200=.930
186/200=.930
200/200=1.000
200/200=1.000
.700
.650
.605
.540
.790
.930 1.000
.355
15
30
45
Bills
Bills
60
75
90
105 120
Constructing an ogive
1) Calculate the relative frequencies.
2) Calculate the cumulative relative frequencies by
adding the current class relative frequency to the
previous class cumulative relative frequency.
3) Graph the cumulative relative frequencies
Why draw an ogive?

The ogive can be
used to answer
questions like:
What telephone
bill value is at the
50th percentile?
around $35
Ogive App.
46
47

) (

48
49
50
Another App.
Pareto Rule (20/80)
51
2.5. Describing relationships

between two variables(bivariate data)
First, determine the type of variables.
To compare two nominal variables,
use contingency tables with bar/pie
charts
To compare two interval variables, use
scatter diagrams.
A Contingency Table for two

Nominal variables
A sample of newspaper readers was asked to report which newspaper they read:
Globe and Mail (1), Post (2), Star (3), or Sun (4), and to indicate whether they
were blue-collar worker (1), white-collar worker (2), or professional (3).
Notehowthisreaderiscross
classifiedaccordingtoboth
variables
Contingency Table
Interpretation: The relative frequencies in the columns 2 & 3 are similar,
but there are large differences between columns 1 and 2 and between
columns 1 and 3.
This tells us that blue collar workers tend to read different newspapers
from both white collar workers and professionals and that white collar
and professionals are quite similar in their newspaper choice.
similar
dissimilar
Graphing a contingency table

Professionalstend
toreadtheGlobe&
Mailmorethan
twiceasoftenasthe
StarorSun
The Relationship Between Two

Interval Variables
A scatter diagram plots one variable against
the another.
The independent variable is labeled X while
the other, dependent variable, is labeled Y.
For example: A real estate agent
wants to study the relationship
between house price and house size
X variable: Size
Y variable: Price
Size
Price
23
315
24
229
26
335
27
261
..
..
Scatter Diagram
It appears that in fact there is a relationship: the greater
the house size the greater the selling price:
Typical Patterns of Scatter Diagrams

Positive linear relationship
No relationship
Negative nonlinear relationship

This is a weak linear relationship.
A non linear relationship seems to
fit the data better.
Negative linear relationship
Nonlinear (concave) relationship
Line Chart
Observations measured at the
same point in time are called
cross-sectional data.
Observations measured at
successive points in time are
called time-series data.
60
Time-series data graphed on a

line chart, which plots the value
of the variable on the vertical
axis against the time periods on
the horizontal axis.
61
Example, plot the total amounts of U.S. income

tax for the years 1987 to 2002:
Line Chart
From 87 to 92, the tax was fairly flat. Starting 93, there was a rapid
increase taxes until 2001. Finally, there was a downturn in 2002.
Summary
Interval
Data
Nominal
Data
Histogram, Ogive,
Single Set of or Stem-and-Leaf
Display
Data
Frequency and
Relative Frequency
Tables, Bar and
Pie Charts
Relationship Scatter Diagram

Between
Two
Variables
Contingency Table,
Bar Charts

Stat

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Stat

Caricato da

Copyright:

Formati disponibili

Graphical &

2.1 Types of data and information

Data - the actual values of variables

Interval data are numerical observations

More Examples: Nominal Data

Mountain bike, road bike, chopper, folding,BMX.

White British, Afro-Caribbean, Asian, Chinese,

Types of Data & Information

Types of data - examples

Exploratory Data Analysis

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Types of data analysis

Interval data arithmetic calculations

2.2 Graphical Techniques for

Bar charts emphasize frequency of occurrences

Graphical & Tabular Techniques for

Nominal Data (Tabular

The Pie Chart

The Pie Chart

(28.9 /100)(3600) = 1040

The Bar Chart

The Bar Chart

2.3 Graphical Techniques for

How many classes to use?

Preparing the histogram

Prepare a frequency distribution

Class width = [Range] / [# of classes]

Building the Histogram

Drawing the histogram

Interpreting the Histogram

About half of all A few bills are in Relatively,

Additional notes: Relative frequency

Relative frequencies are especially important when

comparing two or more histograms

Additional notes: Class width

several classes are combined together to form a

The modal class

Bell shaped histograms

An investor is considering investing in one

Comparing two Histograms

Interpretation: The center of the returns of Investment

Comparing two Histograms

Sample size =50 18-

Sample size =50

Interpretation: The spread of returns for Investment A

Comparing two Histograms

Return on investment A Return on investment B

Interpretation: Both histograms are slightly positively

Conclusion: two Histograms

It seems that investment A is better, because:

Another example: comparing two

Students performance in two statistics classes were

Class B applications and computer based

Comparing two histograms

The mathematical emphasis

Stem & Leaf Display

Stem & Leaf Display

of the class defined by

Why draw an ogive?

2.5. Describing relationships

A Contingency Table for two

Graphing a contingency table

The Relationship Between Two

Typical Patterns of Scatter Diagrams