Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Data Type
Qualitative
Summary
Table
Bar
Chart
Pie
Chart
Quantitative
Frequency
Distribution
Histogram
Displaying Qualitative Data
Summary Table or Frequency Table
List categories or classes in a column and the total count (or % or
both) of each category in another column
Example: Grades of 20 Students
A, B, A, B, C, A, D, C, C, A, D, B, A, A, D, C, A, B, D, C
Grades # of Students % of Students
A 7 35%
B 4 20%
C 5 25%
D 4 20%
Summary/Frequency Table of grades of 20 students
1. What % of students earned at least a C grade?
2. What is the most frequently observed grade?
Ans. 1. 80%; 2. A
Bar graph
Bar Graph for Qualitative Data
categories are represented by bars where the height of
each bar is the corresponding frequency or percentage
Bars have the same width, and leave equal spaces between
successive bars
Distribution of Grades
7
4
5
4
0 2 4 6 8
A
B
C
D
G
r
a
d
e
s
Number of Students
Pie Chart
Pie Chart for Qualitative Data
categories are represented by slices of a pie
where the size of each slice is proportional to
percentage (35% A, 20% B, 25% C, 20% D) of
class frequency
A
35%
B
20%
C
25%
D
20%
Distribution of Grades
Sample Question
All of the following are characteristics of bar graphs
except
(a) The bars of the graph should be of the same width
(b) Bar graphs are used to depict qualitative or
categorical data
(c) There should be no spaces between bars of the
bar graph
Ans (c)
Sample Question
The following pie chart shows the distribution of students
in a Math course (10% Freshmen, 46% Sophomores,
30% Juniors, 14% Seniors).
What percentage of the class took the course prior
to reaching their senior year?
(a) 44% (b) 86% (c ) 54% (d) 14%
Ans. (b)
Displaying Quantitative Data
Ungrouped Frequency Table: Organize data in a
table with two columns
Column 1 (Variable Values)
distinct values (in order of magnitude) of the variable
under consideration
Column 2 (frequency)
the number of times each value is repeated (a third
column that shows percent or proportion of each
class frequency is also recommended).
Example: Number of courses taken by 30 students
Distribution of the # of courses of 30 students
Number of Courses Number of Students
3 4
4 18
5 6
6 2
How many students enrolled in at least four courses?
Ans. 26
What percentage of students enrolled in four courses?
Ans. 60%
What percentage of students enrolled in at most five
courses?
Ans. (28/30)100%
Data: 3, 4, 5, 6, 3, 4, 5, 6, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4
Grouped Frequency Distribution
Organize data in tabular form with two columns
Column 1: class intervals with class boundaries
divide the range of data into several (about 5 to 15)
intervals preferably with equal width in such a way that
no data value belongs to two different intervals
Column 2: frequency
the number of data values that fall in the corresponding
class interval in column 1
Example- Grouped Frequency Distribution
Raw Data: 24.0, 26.5, 24.5, 35.6, 27.4, 27.9, 30.3, 37.4,
32.3, 38.7, 14.0, 16.8, 19.1, 20.3, 22.7
Boundaries
(Upper + Lower Boundaries) / 2
Width
Class interval Midpoint Frequency
23.99 - 28.99 26.49 5
28.99 - 33.99 31.49
2
33.99 38.99 36.49 3
18.99 23.99
13.99 18.99 16.49
21.49
2
3
Histogram for Quantitative Data
Histograms are graphs of the
frequency or relative frequency of
a variable.
Class intervals (with boundaries)
make up the horizontal axis;
The frequencies or relative
frequencies are displayed on the
vertical axis.
McClave, Statistics , 11th ed. Chapter 2:
Methods for Describing Sets of Data
11
Class Boundary
F
R
E
Q
U
E
N
C
Y
0
1
2
3
4
5
Histogram of Previous Example
0 13.99 18.99 23.99 28.99 33.99 38.99
Class Boundary
No Space
between bars
Class Freq.
13.99-18.99
18.99-23.99
23.99-28.99
5
28.99-33.99 2
33.99-38.99 3
Frequency
2
3
13
Histogram- Study shapes of the distribution of quantitative data
Symmetric
Skewed to Left
Skewed to Right
Income
Test scores of an easy test
Heights in inches
#
o
f
S
t
u
d
e
n
t
s
#
o
f
S
t
u
d
e
n
t
s
McClave, Statistics , 11th ed. Chapter 2:
Methods for Describing Sets of Data
14
What do you see in a Histogram?
Center or location of data
Spread of data
Shape of the distribution of data
skewed, symmetric
Presence of outliers, if any
Presence of multiple peaks, if any, in the data.
Sample Question
Parking times (in nearest minutes) are
recorded for a group of 2000 students. Which
of the following graphs would be most
appropriate to display parking times?
(a) Bar chart
(b) Histogram
(c ) Pie chart
Ans (b)
Summary of Tables/Graphs/Charts for one variable
Frequency Tables, Bar and Pie Chart for categorical data
These tables and charts are used to show relative differences in
categories; Pie chart for % data, and bar chart show frequency or
relative frequency of categories
Grouped and Ungrouped frequency tables and Histogram
Measurement data are often summarized in grouped frequency table,
count data in ungrouped frequency tables, and both are displayed in
histograms
Histogram shows the shape of the distribution of data
Choice of intervals greatly affects the shape and may lead to misleading
conclusion
Bivariate Data
Data on two categorical variables from each unit
e.g., Class Rank (FR SO JR SR) and Employment Status
(Full-time, Part-time, Unemployed) of students
Data on one categorical and one quantitative variable
from each unit
e.g., Class Rank and Age of students
Data on two quantitative variables from each unit
e.g., Height and Weight of students
Presenting Bivariate Data
Cross-Table for two categorical variables from each unit
collect class rank (FR, SO, JR, SR) and employment status (full-time,
part-time, unemployed) of 20 students and summarize in a cross-table
Employment Status
Full-
Time
Part-
time
Unem-
ployed
Total
Fr 1 2 1 4
So 1 2 1 4
Jr 3 1 2 6
Sr 1 3 2 6
Total 6 8 6 20
Class
Rank
What % of the survey is Sr?
What proportion of Sr is unemployed?
Ans. 30%
Ans. 1/3
Data Set
Class Employment
Rank Status
Fr Full-time
Sr Part-time
Jr Unemployed
. .
. .
. .
. .
. .
. .
Example
In a factory, three machines produced 1000 items.
machine A produces 70% of the items; 1 of every 20 is defective
machine B produces 10%; 1 of every 100 is defective
machine C produces 20%; 2% are defective
Display the information in a cross table.
Defective Non- Defective Total
A 35 665 700
B 1 99 100
C 4 196 200
Total 40 960 1000
What % of defective items is produced by A?
Ans. 40
Ans. 87.5%
Ans. 5%
How many items are defective?
What % of items produced by A is defective?
Machine
Quality of Items
Presenting Bivariate Data
One categorical and one quantitative variable
Distribution of Age by Class Rank
Class
Rank
Median
Age
Fr 19
So 20
Jr 22
Sr 25
Presenting Bivariate Data
Two quantitative variables from each unit
Use scatter plot to study the relationship
between two variables
Height
W
e
i
g
h
t
Positive correlation
Heart
Failure
Rate
Negative correlation
G
P
A
No Correlation
Exercise time
(in minutes)
Toe Size (mm)
Numerical Measures of Quantitative Data
Properties of
Quantitative Data
Location/
Central Tendency
Dispersion/
Variation/Spread
Relative Position
Mean
Median
Mode
Range
Standard deviation
Variance
Quartile
Percentile
Z-score
Summation Notation:
x
1,
x
2
, . x
n
n
n
i
i x x x x x + + + + =
=
... 3 2 1
1
2 2
3
2
2
2
1
1
2
...
n
n
i
i
x x x x x + + + + =
=
( )
2 3
2
2
1
1
...
n
i n
i
x x x x x
=
| |
= + + + +
|
\ .
Data
Square each
value, then add
Add all values
Add all values,
then square
Measures of Location/
Central Tendency
Look for value(s) around which the data tend to
cluster. Three measures of location or center are
1
n
i
i
x
x
n
=
=
middle number when data values are
arranged from low to high or high to low
most frequently observed value(s)
Mean
Median
Mode
Mean
Sample Mean:
Data: x
1
= 1, x
2
= 2, x
3
= 3, x
4
= 6
mean =12/4 = 3.0
Data: x
1
= -1, x
2
= 0, x
3
= 3, x
4
= 6
mean =8/4 = 2.0
Note: Mean is affected by extreme values
Population mean is denoted by (mu)
1
n
i
i
x
x
n
=
=
26
Median
Median is the middle number in the ordered data
(from low to high or high to low)
For even number of observations, median is the mean of
the two middle values.
Median is not affected by extreme values
Example
Data: 1, 5000, 3; Ordered data: 1, 3, 5000;
Median = 3
Example
Data: 1, 50, 3; Ordered data: 1, 3, 50;
Median = 3
Example
Data:1, 50, 3, -4; Ordered data: -4, 1, 3, 50
Median = (1 + 3)/2 = 2
Mode
The most frequently observed
value(s). A data set may have
unique mode (i.e. one value
in the data set)
more than one mode
no mode
Example
Data: 1, 5, 3, 3
mode= 3
Example
Data: 1, 5, 3, 3, 5
mode = 3, 5
Example
Data: 1, 5, 3, 3, 5, 1
mode = none
Mode can also be
used for categorical
data where you
look for category
with highest
frequency, e.g.,
among all types of
cancer, the one that
occurs most often is
modal cancer
28
Three Measures of Central Tendency
(Mean, Median, Mode)
Mode is the x-value under the peak in each graph
Mode
Mode
Mode
Perfectly symmetric data set;
Mean = Median
Few extremely high
values in the data set;
Mean > Median
(Rightward skewness)
Few extremely low
values in the data set
Mean < Median
(Leftward skewness)
Sample question
For the scores (x)
1, 4, 3, -3, 0
Find x
2
Ans: (c )
(a) 25 (b) 17 (c ) 35 (d) none of the above
Sample Questions
Find the mode of the data set 1, 1, 5, 5.
(a) 1
(b) 5
(c ) 1, 5
(d) 0
(e) no mode
Ans. (e) No mode
True or False
A data set has no mode means the mode is equal to zero
Ans. False
In the data set 1, 1, 1, 4, 4, 4, both 1 and 4 appear 3 times. So the mode is 3.
Ans. False
In a distribution that is skewed to the right, mean is greater than the median.
Ans. True
STA-2023 students are expected to study at least 45 minutes every day except Friday
Ans. True
Sample Question
Find the median of the dataset: 1, 4, 0, 5, 8.
(a) 0
(b) 5
(c ) 4
Ans. (c)
Find the median of the dataset: 1, 4, 2, 10, 8, 6.
(a) 2
(b) 10
(c ) 6
(d) 5
Ans. (d)
The two sets have same location but spread of data values
in set I is more than that in set II
32
Numerical Measures of Variability
Measures of variability give us an idea of how
spread out the data are around the center
Data Set I: 10 20 30
Center
Spread
Data Set II: 15 20 25
Center
Spread
33
Numerical Measures of Variability
Range= Highest value Lowest value= H L
Sample Variance (s2)
2
2
1
( )
1
i
n
i
x x
s
n
=
=