sqqs1013 chp02

QQS1013 Elementary Statistics
DESCRIPTIVE
STATISTICS
2.1 INTRODUCTION
Raw data - Data recorded in the sequence in which there are

collected and before they are processed or ranked
Array data - Raw data that is arranged in ascending or

descending order.
Exampl
e1
Here is a list of question asked in a large statistics class and the “raw data” given by one
of the students:
1. What is your sex (m=male, f=female)?

Answer : m
2. How many hours did you sleep last night?

Answer: 5 hours
3. Randomly pick a letter – S or Q.

Answer: S
4. What is your height in inches?

Answer: 67 inches
5. What’s the fastest you’ve ever driven a car (mph)?

Answer: 110 mph
Exampl
e2
Quantitative raw data Qualitative raw data
These data also called ungrouped data.
Chapter 2: Descriptive Statistics 1

2.2 ORGANIZING AND GRAPHING QUALITATIVE DATA

2.2.1 Frequency Distributions Table
A frequency distribution for qualitative data lists all categories and the
number of elements that belong to each of the categories.
It exhibits the frequencies are distributed over various categories
Also called as a frequency distribution table or simply a frequency table.
e.g. : The number of students who belong to a certain category is called
the frequency of that category.
2.2.2 Relative Frequency and Percentage Distribution
• A relative frequency distribution is a listing of all categories along with their

relative frequencies (given as proportions or percentages).
• It is commonplace to give the frequency and relative frequency distribution

together.
• Calculating relative frequency and percentage of a category
FORMUL
A Relative Frequency of a category
Σξϖ∆λβ = Frequency of that category
Sum of all frequencies
Percentage (%) = (Relative Frequency)* 100

Exampl
e3
A sample of UUM staff-owned vehicles produced by Proton was identified and the
make of each noted. The resulting sample follows (W = Wira, Is = Iswara, Wj =
Waja, St = Satria, P = Perdana, Sv = Savvy):
Construct a frequency distribution table for these data with their relative frequency
and percentage.
W W P Is Is P Is W St Wj
Is W W Wj Is W W Is W Wj
Wj Is Wj Sv W W W Wj St W
Wj Sv W Is P Sv Wj Wj W W
St W W W W St St P Wj Sv
Solution:
Relative
Category Frequency Percentage (%)
Frequency
Wira 19 0.38 38
Iswara 8 0.16 16
Perdana 4 0.08 8
Waja 10 0.20 20
Satria 5 0.10 10
Savvy 4 0.08 8
Total 50 1.00 100
2.2.3 Graphical Presentation of Qualitative Data

a) Bar Graphs
A graph made of bars whose heights represent the frequencies of respective
categories.

• Such a graph is most helpful when you have many categories to

represent.
• Notice that a gap is inserted between each of the bars.
It has
simple/ vertical bar chart
horizontal bar chart
component bar chart
multiple bar chart
Simple/ Vertical Bar Chart

To construct a vertical bar chart, mark the various categories on the horizontal
axis and mark the frequencies on the vertical axis
• Horizontal Bar Chart

To construct a horizontal bar chart, mark the various categories on the vertical
axis and mark the frequencies on the horizontal axis.
•
UUM Staff-owned Vehicles Produced
By Proton
Types of Vehicle
Satria
Perdana
Wira
0 5 10 15 20
Frequency
Component Bar Chart
 To construct a component bar chart, all categories is in one bar and every
bar is divided into components.
 The height of components should be tally with representative frequencies.
Exampl
e4
Suppose we want to illustrate the information below, representing the number of

people participating in the activities offered by an outdoor pursuits centre during
Jun of three consecutive years.
2004 2005 2006

Climbing 21 34 36
Caving 10 12 21
Walking 75 85 100
Sailing 36 36 40
Total 142 167 191
Solution:
Activities Breakdown (Jun)
200
Number of participants
150 Sailing
Walking
100
Caving
50 Climbing
0
2004 2005 2006
Year
• Multiple Bar Chart
 To construct a multiple bar chart, each bars that representative any

categories are gathered in groups.
 The height of the bar represented the frequencies of categories.
 Useful for making comparisons (two or more values).
Activities Breakdown (Jun)
120
Number of participants
100
Climbing
80
Caving
60
Walking
40
Sailing
20
0
2004 2005 2006
Year
 The bar graphs for relative frequency and percentage distributions can
be drawn simply by marking the relative frequencies or percentages,
instead of the class frequencies.
Pie Chart
A circle divided into portions that represent the relative frequencies or

percentages of a population or a sample belonging to different
categories.
• An alternative to the bar chart and useful for summarizing a single
categorical variable if there are not too many categories.
• The chart makes it easy to compare relative sizes of each
class/category.
The whole pie represents the total sample or population. The pie is divided
into different portions that represent the different categories.
To construct a pie chart, we multiply 360o by the relative frequency for each
category to obtain the degree measure or size of the angle for the
corresponding categories.
Exampl
e5
Movie
Frequency Relative Frequency Angle Size
Genres
Comedy 54 0.27 360*0.27=97.2o
Action 36 0.18 360*0.18=64.8o
Romance 28 0.14 360*0.14=50.4o
Drama 28 0.14 360*0.14=50.4o
Horror 22 0.11 360*0.11=39.6o
Foreign 16 0.08 360*0.08=28.8o
Science 16 0.08 360*0.08=28.8o
Fiction
Total 200 1.00 360o

c) Line Graph/Time Series Graph
A graph represents data that occur over a specific period time of time.
• Line graphs are more popular than all other graphs combined because
their visual characteristics reveal data trends clearly and these graphs
are easy to create.
When analyzing the graph, look for a trend or pattern that occurs over the
time period.
• Example is the line ascending (indicating an increase over time) or
descending (indicating a decrease over time).
• Another thing to look for is the slope, or steepness, of the line. A line
that is steep over a specific time period indicates a rapid increase or
decrease over that period.
Two data sets can be compared on the same graph (called a compound
time series graph) if two lines are used.
Data collected on the same element for the same variable at different points
in time or for different periods of time are called time series data.
• A line graph is a visual comparison of how two variables—shown on the
x- and y-axes—are related or vary with each other. It shows related
information by drawing a continuous line between all the points on a
grid.
• Line graphs compare two variables: one is plotted along the x-axis
(horizontal) and the other along the y-axis (vertical).
• The y-axis in a line graph usually indicates quantity (e.g., RM, numbers
of sales litres) or percentage, while the horizontal x-axis often measures
units of time. As a result, the line graph is often viewed as a time series
graph
Exampl
e6
A transit manager wishes to use the following data for a presentation showing
how Port Authority Transit ridership has changed over the years. Draw a time
series graph for the data and summarize the findings.
Ridership
Year
(in millions)
1990 88.0
1991 85.0
1992 75.7
1993 76.6
1994 75.4
Solution:
89
Ridership (in millions)
87
85
83
81
79
77
75
1990 1991 1992 1993 1994
Year
The graph shows a decline in ridership through 1992 and then leveling off for the years
1993 and 1994.
EXERCISE 1

1. The following data show the method of payment by 16 customers in a supermarket

checkout line. ( C = cash, CK = check, CC = credit card, D = debit and O =
other ).
C CK CK C CC D O C
CK CC D CC C CK CK CC
a. Construct a frequency distribution table.

b. Calculate the relative frequencies and percentages for all categories.
c. Draw a pie chart for the percentage distribution.
2. The frequency distribution table represents the sale of certain product in ZeeZee
Company. Each of the products was given the frequency of the sales in certain
period. Find the relative frequency and the percentage of each product. Then,
construct a pie chart using the obtained information.
Type of Frequency Relative Frequency Percentage Angle Size

Product
A 13
B 12
C 5
D 9
E 11
3. Draw a time series graph to represent the data for the number of worldwide airline
fatalities for the given years.
Year 1990 1991 1992 1993 1994 1995 1996

No. of
440 510 990 801 732 557 1132
fatalities
4. A questionnaire about how people get news resulted in the following information
from 25 respondents (N = newspaper, T = television, R = radio, M = magazine).
N N R T T
R N T M R
M M N R N
T R M N M
T R R N N
a. Construct a frequency distribution for the data.

b. Construct a bar graph for the data.
5. The given information shows the export and import trade in million RM for four
months of sales in certain year. Using the provided information, present this data
in component bar graph.
Month Export Import

September 28 20
October 30 28
November 32 17
December 24 14
6. The following information represents the maximum rain fall in

millimeter (mm) in each state in Malaysia. You are supposed to help a
meteorologist in your place to make an analysis. Based on your knowledge,

present this information using the most appropriate chart and give your
comment.
State Quantity (mm)

Perlis 435
Kedah 512
Pulau Pinang 163
Perak 721
Selangor 664
Wilayah Persekutuan
Kuala Lumpur 1003
Negeri Sembilan 390
Melaka 223
Johor 876
Pahang 1050
Terengganu 1255
Kelantan 986
Sarawak 878
Sabah 456
2.3 ORGANIZING AND GRAPHING QUANTITATIVE DATA
2.3.1 Stem-and-Leaf Display
In stem and leaf display of quantitative data, each value is divided into two
portions – a stem and a leaf. Then the leaves for each stem are shown
separately in a display.
Gives the information of data pattern.
Can detect which value frequently repeated.
Exampl
e7
25 12 9 10 5 12 23 7
13 11 12 31 28 37 6
41 38 44 13 22 18 19
Solution:
2.3.2 Frequency Distributions

A frequency distribution for quantitative data lists all the classes and the
number of values that belong to each class.
Data presented in form of frequency distribution are called grouped data.
The class boundary is given by the midpoint of the upper limit of one
class and the lower limit of the next class. Also called real class limit.
To find the midpoint of the upper limit of the first class and the lower limit
of the second class, we divide the sum of these two limits by 2.
e.g.: class
400 + 401 boundary
= 400.5
2
Class Width (class size)
FORMUL
A
Σξϖ∆λβ Class width = Upper boundary – Lower boundary
e.g. :
Width of the first class = 600.5 – 400.5 = 200
Class Midpoint or Mark
FORMUL
A
Σξϖ∆λβ
Lower limit + Upper limit

class midpoint or mark =
2
e.g:
401 + 600
Midpoint of the 1st class = =500.5
2
Constructing Frequency Distribution Tables
1. To decide the number of classes, we used Sturge’s formula, which is

FORMUL
A
Σξϖ∆λβ c = 1 + 3.3 log n
where c is the no. of classes

n is the no. of observations in the data set.
2. Class width,
FORMUL
A
Σ ∆λβ
ξϖ i>
Largest value - Smallest value
Number of classes
Range
i>
c
This class width is rounded up to a convenient number.
3. Lower Limit of the First Class or the Starting Point

Exampl the smallest value in the data set.
 Use
e8

The following data give the total home runs hit by all players of each of the 30 Major
League Baseball teams during 2004 season.
Number of classes, c = 1 + 3.3 log 30

= 1 + 3.3(1.48)
= 5.89 ≈ 6 class
Class width,
242 − 135
i>
6
> 17.8
≈ 18
i) Starting Point = 135
Table 2.10 : Frequency Distribution for Data of Table 2.9

Total Home Runs Tally f
135 – 153 |||| |||| 10
153 – 171 || 2
171 – 189 |||| 5
189 – 207 |||| | 6
207 – 225 ||| 3
225 – 242 |||| 4
∑ f = 30
2.3.3 Relative Frequency and Percentage Distributions
FORMUL
A
Σξϖ∆λβ
Frequency of that class

Relative frequency of a class =
Sum of all frequencies
f
=
∑f
Percentage = (Relative frequency) •100
Exampl
e9
(Refer example 8)
Table 2.11: Relative Frequency and Percentage Distributions
Relative
Total Home Runs Class Boundaries %
Frequency
135 – 153 134.5 less than 152.5 0.3333 33.33
153 – 171 152.5 less than 170.5 0.0667 6.67
171 – 189 170.5 less than 188.5 0.1667 16.67
189 – 207 188.5 less than 206.5 0.2000 20.00
207 – 225 206.5 less than 224.5 0.1000 10.00
225 – 242 224.5 less than 242.5 0.1333 13.33
Total 1.0 100%
2.3.4 Graphing Grouped Data

a) Histograms
A histogram is a graph in which the class boundaries are marked on the horizontal
axis and either the frequencies, relative frequencies, or percentages are marked
on the vertical axis. The frequencies, relative frequencies or percentages are
represented by the heights of the bars.
In histogram, the bars are drawn adjacent to each other and there is a space
between y axis and the first bar.
Exampl
e 10
(Refer example 8)
Frequency histogram for Table 2.9
12
10
0
134.5 152.5 170.5 188.5 206.5 224.5 242.5
1
b) Polygon Total home runs
A graph formed by joining the midpoints of the tops of successive bars in a

histogram with straight lines is called a polygon.

Exampl
e 11
Frequency polygon for Table 2.11
12
10
8
Frequency
0
134.5 152.5 170.5 188.5 206.5 224.5 242.5
1 Total home runs
For a very large data set, as the number of classes is increased (and the width of
classes is decreased), the frequency polygon eventually becomes a smooth
curve called a frequency distribution curve or simply a frequency curve.
Frequency distribution curve
Shape of Histogram
Same as polygon.
For a very large data set, as the number of classes is increased (and the width
of classes is decreased), the frequency polygon eventually becomes a smooth
curve called a frequency distribution curve or simply a frequency curve.
The most common of shapes are:

(i) Symmetric
(ii) Right skewed
(iii) Left skewed

Symmetric histograms
Right skewed and Left skewed
 Describing data using graphs helps us insight into the main characteristics of the
data.
 When interpreting a graph, we should be very cautious. We should observe
carefully whether the frequency axis has been truncated or whether any axis has
been unnecessarily shortened or stretched.
2.3.5 Cumulative Frequency Distributions

• A cumulative frequency distribution gives the total number of values
that fall below the upper boundary of each class.
Exampl
e 12

Using the frequency distribution of table 2.11,
Total Home Cumulative

Class Boundaries f
Runs Frequency
135 – 152 134.5 less than 152.5 10
153 – 170 152.5 less than 170.5 2
171 – 188 170.5 less than 188.5 5
189 – 206 188.5 less than 206.5 6
207 – 224 206.5 less than 224.5 3
225 – 242 224.5 less than 242.5 4
Ogive
An ogive is a curve drawn for the cumulative frequency distribution by joining with
straight lines the dots marked above the upper boundaries of classes at heights
equal to the cumulative frequencies of respective classes.
Two type of ogive:
(i) ogive less than
(ii) ogive greater than
First, build a table of cumulative frequency.
Exampl
e 13
(Ogive Less Than)

Earnings Number of Cumulative
(RM) students (f) Earnings (RM) Frequency (F)
30 – 39 5 Less than 29.5 0

40 – 49 6 Less than 39.5 5
50 – 59 6 Less than 49.5 11
60 - 69 3 Less than 59.5 17
70 – 79 3 Less than 69.5 20
80 - 89 7 Less than 79.5 23
Cumulative Frequency
Less than 89.5 30

Total 30
Graph
35 Ogive Less Than
30
25
20
15
10
5
0
Chapter 2: Descriptive
29.5Statistics
39.5 49.5 59.5 69.5 79.5 89.5 17
Earnings
Exampl
e 14
(Ogive More Than)
Earnings Number of Cumulative

(RM) students (f) Earnings (RM) Frequency (F)
30 – 39 5 More than 29.5 30

40 – 49 6 More than 39.5 25
50 – 59 6 More than 49.5 19
60 - 69 3 More than 59.5 13
70 – 79 3 More than 69.5 10
80 - 89 7 More than 79.5 7
More than 89.5 0
Total 30
Graph Ogive More Than
35
30
25
20
Cumulative Frequency
15
10
5
0
29.5 39.5 49.5 59.5 79.5 89.5
69.5
Earnings
2.3.6 Box-Plot
Describe the analyze data graphically using 5 measurement: smallest value,

first quartile (K1), second quartile (median or K2), third quartile (K3) and
largest value.

For symmetry data
Smallest K1 Median K3 Largest

value value
For left skewed data
Smallest
value K1 Median K3 Largest
value
For right skewed data
Smallest K1 Median K3 Largest

value value
2.4 MEASURES OF CENTRAL TENDENCY
2.4.1 Ungrouped Data Measurement

Mean
FORMUL
A
Σξϖ∆λβ Mean for population data: µ=
∑x
N
Mean for sample data: x=

∑x
n
where: ∑x = the sum of all values
N = the population size
n = the sample size,
µ = the population mean
x = the sample mean
Exampl
e 15
The following data give the prices (rounded to thousand RM) of five homes sold
recently in Sekayang.
158 189 265 127 191
Find the mean sale price for these homes.

Solution:
Thus, these five homes were sold for an average price of RM186 thousand @
RM186 000.
 The mean has the advantage that its calculation includes each
value of the data set.
Weighted Mean
Used when have different needs.

 Weight mean :
FORMUL
A
Σξϖ∆λβ xw =
∑ wx
∑w
where w is a weight.
Exampl
e 16
Consider the data of electricity components purchasing from a factory in the table
below:
Type Number of component (w) Cost/unit (x)

1 1200 RM3.00
2 500 RM3.40
3 2500 RM2.80
4 1000 RM2.90
5 800 RM3.25
Total 6000
Solution:
xw =
∑wx
∑w
+
1200(3) 500(3.4) +2500(2.8)
+ 1000(2.9)
+ 800(3.2 5)
=
1200 + 500 +2500+ 1000
+ 800
17800
=
6000
= 2.967
Mean cost of a unit of the component is RM2.97
Median
Median is the value of the middle term in a data set that has been
ranked in increasing order.
Procedure for finding the Median
Step 1: Rank the data set in increasing order.
Step 2: Determine the depth (position or location) of the median.

FORMUL
n +1
Σ ∆λβ
A
Depth of Median =
ξϖ 2
Step 3: Determine the value of the Median.
Exampl
e 17
Find the median for the following data:

10 5 19 8 3
Solution:
(1) Rank the data in increasing order

(2) Determine the depth of the Median

n +1
Depth of Median =
2
5 +1
=
2
=3
(3) Determine the value of the median
Therefore the median is located in third position of the data set.
Hence, the Median for above data =
Exampl
e 18
Find the median for the following data:
10 5 19 8 3 15
Solution:
(1) Rank the data in increasing order
(2) Determine the depth of the Median

n +1
Depth of Median =
2
6 +1
=
2
= 3.5
(3) Determine the value of the Median
Therefore the median is located in the middle of 3rd position and 4th
position of the data set.
8 +10
Median = = 9
2
Hence, the Median for the above data =

 The median gives the center of a histogram, with half of the data
values to the left of (or, less than) the median and half to the right of (or,
more than) the median.
 The advantage of using the median is that it is not influenced by
outliers.
Mode

Mode is the value that occurs with the highest frequency in a data set.
Exampl
e 19
1. What is the mode for given data?

77 69 74 81 71 68 74 73
2. What is the mode for given data?

77 69 68 74 81 71 68 74 73
Solution:
1. Mode =
2. Mode =
 A major shortcoming of the mode is that a data set may have

none or may have more than one mode.
 One advantage of the mode is that it can be calculated for both
kinds of data, quantitative and qualitative.
2.4.2 Grouped Data Measurement
Mean
FORMUL
A Mean for population data:
Σξϖ∆λβ ∑fx
μ=
N
Mean for sample data:
x=
∑fx
n
Where x the midpoint and f is the frequency of a class.
Exampl
e 20
The following table gives the frequency distribution of the number of orders received
each day during the past 50 days at the office of a mail-order company. Calculate
the mean.
Number of order f
10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14
Chapter 2: Descriptive Statistics n = 50 23
Solution:
Because the data set includes only 50 days, it represents a sample. The value of
∑ fx is calculated in the following table:
Number of order f x fx
10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14
n = 50
The value of mean sample is:
Thus, this mail-order company received an average of 16.64 orders per day during
these 50 days.
Median
Step 1: Construct the cumulative frequency distribution.
Step 2: Decide the class that contain the median.
Class Median is the first class with the value of cumulative frequency is
at least n/2.
Step 3: Find the median by using the following formula:
FORMUL Where:
A n = the total frequency
Σξϖ∆λβ n  F = the total frequency before class
 2-F
median
 i = the class width
Median= Lm + i Lm = the lower boundary of the class
 fm  median
Exampl   fm = the frequency of the class median
e 21
Based on the grouped data below, find the median:
Time to travel to work Frequency

1 – 10 8
11 – 20 14
21 – 30 12
31 – 40 9
41 – 50 7

Solution:
1st Step: Construct the cumulative frequency distribution
Time to travel to work Frequency Cumulative Frequency

1 – 10 8
11 – 20 14
21 – 30 12
31 – 40 9
41 – 50 7
Thus, 25 persons take less than 24 minutes to travel to work and another 25
persons take more than 24 minutes to travel to work.
Mode
Mode is the value that has the highest frequency in a data set.
For grouped data, class mode (or, modal class) is the class with the highest frequency.
Formula of mode for grouped data:
FORMUL
A
Σξϖ∆λβ Mode  Δ1
=L mo+  i
 Δ1+Δ
2
Where:

Lmo is the lower boundary of class mode
∆1 is the difference between the frequency of class mode and the

frequency of the class before the class mode
∆2 is the difference between the frequency of class mode and the

frequency of the class after the class mode
i is the class width

Exampl
e 22
Based on the grouped data below, find the mode
Time to travel to work Frequency

1 – 10 8
11 – 20 14
21 – 30 12
31 – 40 9
41 – 50 7
Solution:
Based on the table,
We can also obtain the mode by using the histogram;

2.4.3 Relationship among Mean, Median & Mode
As discussed in previous topic, histogram or a frequency distribution curve

can assume either skewed shape or symmetrical shape.
Knowing the value of mean, median and mode can give us some idea about
the shape of frequency curve.
For a symmetrical histogram and frequency curve with one peak, the value of the mean,
median and mode are identical and they lie at the center of the distribution.
Mean, median, and mode for a symmetric histogram and frequency distribution curve
For a histogram and a frequency curve skewed to the right, the value of the
mean is the largest that of the mode is the smallest and the value of
the median lies between these two.
Mean, median, and mode for a histogram and frequency distribution curve
skewed to the right

For a histogram and a frequency curve skewed to the left, the value of the
mean is the smallest and that of the mode is the largest and the value
of the median lies between these two.
Mean, median, and mode for a histogram and frequency distribution curve
skewed to the left
2.5 DISPERSION MEASUREMENT
The measures of central tendency such as mean, median and mode do not
reveal the whole picture of the distribution of a data set.
Two data sets with the same mean may have a completely different spreads.
• The variation among the values of observations for one data set may be
much larger or smaller than for the other data set.
2.5.1 Ungrouped Data Measurement
Range
FORMUL
A
Σξϖ∆λβ RANGE = Largest value – Smallest value

Exampl
e 23
Find the range of production for this data set,
Solution:
Range = Largest value – Smallest value

= 267 277 – 49 651
= 217 626
 Disadvantages:
o being influenced by outliers.
o based on two values only. All other values in a data set are ignored.
Variance and Standard Deviation
Standard deviation is the most used measure of dispersion.

A Standard Deviation value tells how closely the values of a data set
clustered around the mean.
Lower value of standard deviation indicates that the data set value are
spread over relatively smaller range around the mean.
Larger value of data set indicates that the data set value are spread over
relatively larger around the mean (far from mean).
Standard deviation is obtained the positive root of the variance:
FORMUL
A
Σξϖ∆λβ Variance for population:
(∑x ) 2
∑x 2
−
N
σ =
2
N
Variance for sample:

(∑ x) 2
∑x 2
−
n
s =
2
n −1
FORMUL
A Standard Deviation for population:
Σ ∆λβ
ξϖ σ = σ2
Standard Deviation for sample:

s= s2
Exampl
e 24
Let x denote the total production (in unit) of company
Company Production
A 62
B 93
C 126
D 75
E 34
Find the variance and standard deviation,
Solution:
Company Production (x) x2

A 62 3844
B 93 8649
C 126 15876
D 75 5625
E 34 1156
390 35150

The properties of variance and standard deviation:
The standard deviation is a measure of variation of all values from the mean.
The value of the variance and the standard deviation are never negative. Also, larger
values of variance or standard deviation indicate greater amounts of variation.
The value of s can increase dramatically with the inclusion of one or more outliers.
The measurement units of variance are always the square of the measurement units
of the original data while the units of standard deviation are the same as the units
of the original data values.
Range
FORMUL
A Range = Upper bound of last class – Lower bound of first class
Σ ∆λβ
ξϖ
Class Frequency
41 – 50 1
51 – 60 3
61 – 70 7
71 – 80 13
81 – 90 10
91 - 100 6
Total 40
Upper bound of last class = 100.5

Lower bound of first class = 40.5
Range = 100.5 – 40.5 = 60
Variance and Standard Deviation

FORMUL
A
Σξϖ∆λβ Variance for population:

( ∑ fx )
2
∑ fx 2
−
N
σ = 2
N
Variance for sample:
( ∑ fx )
2
∑ fx 2
−
n
s2 =
n −1
FORMUL
A
Σξϖ∆λβ Standard Deviation:
Population: σ = σ 2
Sample: s= s2
Exampl
e 25
Find the variance and standard deviation for the following data:
No. of order f
10 – 12 4
13 – 15 12
16 – 18 20
19 – 21 14
Total n = 50
Solution:
No. of order f X fx fx2

10 – 12 4 11 44 484
13 – 15 12 14 168 2352
16 – 18 20 17 340 5780
19 – 21 14 20 280 5600
Total n = 50 62 832 14216
Variance,

Standard Deviation,
Thus, the standard deviation of the number of orders received at the office of this mail-
order company during the past 50 days is 2.75.
2.5.3 Relative Dispersion Measurement

To compare two or more distribution that has different unit based on their
dispersion OR
To compare two or more distribution that has same unit but big different in
their value of mean.
Also called modified coefficient or coefficient of variation, CV.
FORMUL
A
Σξϖ∆λβ s
CV =   ×100 % − ( sample )
x
σ 
CV =   ×100 % − ( population )
x
**Sample and Population are standard deviation
Exampl
e 26
Given mean and standard deviation of monthly salary for two groups of worker who
are working in ABC company- Group 1: 700 & 20 and Group 2 :1070 & 20. Find the
CV for every group and determine which group is more dispersed.
Solution:

20
CV 1 = × 100
%= . 286
%
700
20
CV 2 = × 100
%= . 187
%
1070
The monthly salary for group 1 worker is more dispersed compared to group 2.
2.6 MEASURE OF POSITION
• Determines the position of a single value in relation to other values in a

sample or a population data set.
• Quartiles
Quartiles are three summary measures that divide ranked data set into four
equal parts.
The 1st quartiles – denoted as Q1

FORMUL
A n +1
Σξϖ∆λβ Depth of Q1 =
4
The 2nd quartiles – median of a data set or Q2
The 3rd quartiles – denoted as Q3

FORMUL
A 3( n + 1)
Σ ∆λβ
ξϖ Depth of Q 3 =
4
Exampl
e 27
Table below lists the total revenue for the 11 top tourism company in Malaysia
109.7 79.9 21.2 76.4 80.2 82.1 79.4 89.3 98.0 103.5
86.8
Solution:

Step 1: Arrange the data in increasing order
76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7
121.2
Step 2: Determine the depth for Q1 and Q3
n + 1 11 + 1
Depth of Q1 = = =3
4 4
3( n + 1) 3 ( 11 + 1)
Depth of Q 3 = = = 9
4 4
Step 3: Determine the Q1 and Q3
76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7
121.2
Exampl Q1 = 79.9 ; Q3 = 103.5
e 28
Table below list the total revenue for the 12 top tourism company in Malaysia
109.7 79.9 74.1 121.2 76.4 80.2 82.1 79.4 89.3

98.0 103.5 86.8
Solution:
Step 1: Arrange the data in increasing order
74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7
121.2
Step 2: Determine the depth for Q1 and Q3
n +1 + 1
12
Depth of Q =1 = = 3.25
4 4
3(n +1) 3 (12 +

1 )
Depth of Q =3 = = .
975
4 4
Step 3: Determine the Q1 and Q3
74.1 76.4 79.4 79.9 80.2 82.1 86.8 89.3 98.0 103.5 109.7
121.2
Q1 = 79.4 + 0.25 (79.9 – 79.4) = 79.525
Q3 = 98.0 + 0.75 (103.5 – 98.0) = 102.125

• Interquartile Range
 The difference between the third quartile and the first quartile for a data
set.
FORMUL
A IQR = Q3 – Q1
Σξϖ∆λβ
Exampl
e 29
By referring to example 28, calculate the IQR.
Solution:
IQR = Q3 – Q1 = 102.125 – 79.525 = 22.6

• Quartiles
From Median, we can get Q1 and Q3 equation as follows:
FORMUL
A
Σξϖ∆λβ n 
 4 - F
Q1 =L Q+  i
1
f
 Q1 
 
 3n 
 4 - F
Q3 =LQ 3+  i
 f Q3 
 
Exampl
e 30
Refer to example 22, find Q1 and Q3
Solution:
1st Step: Construct the cumulative frequency distribution
Time to travel to work Frequency Cumulative Frequency

1 – 10 8 8
11 – 20 14 22
21 – 30 12 34
31 – 40 9 43
41 – 50 7 50
2nd Step: Determine the Q1 and Q3
n 50
Class Q 1 = = =12 5 .
4 4
Class Q1 is the 2nd class
Therefore,
n 
 4 -F
Q1 = LQ1 +  i
f
 Q1 
 
 12.5 - 8 
= 10.5 +   10
 14 
= 13.7143
3n 3 ( 50 )
Class Q 3 = = = 37 5 .
4 4
Class Q3 is the 4th class
Therefore,
n 
 4 -F
Q3 = LQ3 +  i
f
 3 
Q
 
 37.5 - 34 
= 30.5 +  10
 9 
= 34.3889
• Interquartile Range
FORMUL
A
Σξϖ∆λβ IQR = Q3 – Q1

Exampl
e 31
Refer to example 30, calculate the IQR.
Solution:
IQR = Q3 – Q1 = 34.3889 – 13.7143 = 20.6746
2.7 MEASURE OF SKEWNESS

To determine the skewness of data (symmetry, left skewed, right skewed)
Also called Skewness Coefficient or Pearson Coefficient of Skewness
FORMUL
A Mean − Mode
Σ ∆λβ
ξϖ Sk =
s
or
3( Mean − Mode )
Sk =
s
If Sk +ve  right skewed

If Sk -ve  left skewed
If Sk = 0  symmetry
 If Sk takes a value in between (-0.9999, -0.0001) or (0.0001,
0.9999)  approximately symmetry.
Exampl
e 32
The duration of cancer patient warded in Hospital Seberang Jaya recorded in a
frequency distribution. From the record, the mean is 28 days, median is 25 days
and mode is 23 days. Given the standard deviation is 4.2 days.
What is the type of distribution?
Find the skewness coefficient

Solution:
This distribution is right skewed because the mean is the largest value
Mean - Mode 28 − 23
Sk = = = 11905
.
s 4.2
OR
3 ( Mean - Median ) 3 ( 28 − 25 )
Sk = = = 21429
.
s 4.2
So, from the Sk value this distribution is right skewed.
ADDITIONAL INFORMATION
Use of Standard Deviation
1. Chebyshev’s Theorem
According to Chebyshev’s Theorem, for any number k greater than 1, at least

(1 – 1/k2) of the data values lie within k standard deviations of the mean.
1
=1 −
k2
1
=1 −
( 2) 2
= 0.75 @ 75 %
Thus; for example if k = 2, then
Therefore, according to Chebyshev’s Theorem, at least 75% of the values of a

data set lie within two standard deviation of the mean
Empirical Rule
• For a bell-shaped distribution, approximately

 1.68%of the observations lie within one standard deviation of the mean.
 2.95% of the observations lie within two standard deviations of mean.
 3.99.7% of the observations lie within three standard deviations of the mean.
Measure of Position
1. Ungrouped Data - Quartile Deviation
QD is a mean for Interquartile Range

It used to compare the dissemination of two data set.
If the QD value is high, it means that the data is more disseminated.
Quartile Deviation = Interquartile Range / 2

= (Q3 - Q1) / 2
2. Ungrouped Data – Percentile
Pk = value of the (kn)th term in a ranked set 100
Where: k = the number of percentile

n = the sample size

Percentile rank of xi = Number of values than xi X 100

Total number of values in the data set

EXERCISE 2
1. A survey research company asks 100 people how many times they have been to
the dentist in the last five years. Their grouped responses appear below.
Number of Visits Number of Responses

0–4 16
5–9 25
10 – 14 48
15 – 19 11
What are the mean and variance of the data?
2. A researcher asked 25 consumers: “How much would you pay for a television
adapter that provides Internet access?” Their grouped responses are as follows:
Amount ($) Number of Responses
0 – 99 2
100 – 199 2
200 – 249 3
250 – 299 3
300 – 349 6
350 – 399 3
400 – 499 4
500 – 999 2
Calculate the mean, variance, and standard deviation.
3. The following data give the pairs of shoes sold per day by a particular
shoe store in the last 20 days.
85 90 89 70 79 80 83 83 75 76
89 86 71 76 77 89 70 65 90 86
Calculate the
a. mean and interpret the value.
b.median and interpret the value.
c. mode and interpret the value.
d.standard deviation.
4. The followings data shows the information of serving time (in minutes) for 40
customers in a post office:
2.0 4.5 2.5 2.9 4.2 2.9 3.5 2.8

3.2 2.9 4.0 3.0 3.8 2.5 2.3 3.5
2.1 3.1 3.6 4.3 4.7 2.6 4.1 3.1
4.6 2.8 5.1 2.7 2.6 4.4 3.5 3.0
2.7 3.9 2.9 2.9 2.5 3.7 3.3 2.4
a. Construct a frequency distribution table with 0.5 of class width.

b. Construct a histogram.
c. Calculate the mode and median of the data.
d. Find the mean of serving time.
e. Determine the skewness of the data.
. Find the first and third quartile value of the data.
g. Determine the value of interquartile range.
5. In a survey for a class of final semester student, a group of data was obtained for
the number of text books owned.
Number of students Number of text book owned

12 5
9 5
11 3
15 2
10 1
8 0
Find the average number of text book for the class. Use the weighted mean.
6.The following data represent the ages of 15 people buying lift tickets at a ski area.
15 25 26 17 38 16 60 21
30 53 28 40 20 35 31
Calculate the quartile and interquartile range.
7.A student scores 60 on a mathematics test that has a mean of 54 and a standard
deviation of 3, and she scores 80 on a history test with a mean of 75 and a
standard deviation of 2. On which test did she perform better?
8.The following table gives the distribution of the share’s price for ABC Company
which was listed in BSKL in 2005.
Price (RM) Frequency

12 – 14 5
15 – 17 14
18 – 20 25
21 – 23 7
24 – 26 6
27 - 29 3
Find the mean, median and mode for this data.

sqqs1013 chp02

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

sqqs1013 chp02

Caricato da

Copyright:

Formati disponibili

QQS1013 Elementary Statistics

Raw data - Data recorded in the sequence in which there are

Array data - Raw data that is arranged in ascending or

1. What is your sex (m=male, f=female)?

2. How many hours did you sleep last night?

3. Randomly pick a letter – S or Q.

4. What is your height in inches?

5. What’s the fastest you’ve ever driven a car (mph)?

These data also called ungrouped data.

Chapter 2: Descriptive Statistics 1

2.2 ORGANIZING AND GRAPHING QUALITATIVE DATA

2.2.2 Relative Frequency and Percentage Distribution

• A relative frequency distribution is a listing of all categories along with their

• It is commonplace to give the frequency and relative frequency distribution

• Calculating relative frequency and percentage of a category

Percentage (%) = (Relative Frequency)* 100

Chapter 2: Descriptive Statistics 2

2.2.3 Graphical Presentation of Qualitative Data

Chapter 2: Descriptive Statistics 3

• Such a graph is most helpful when you have many categories to

Simple/ Vertical Bar Chart

• Horizontal Bar Chart

Component Bar Chart

Suppose we want to illustrate the information below, representing the number of

2004 2005 2006

Activities Breakdown (Jun)

• Multiple Bar Chart

 To construct a multiple bar chart, each bars that representative any

Activities Breakdown (Jun)

A circle divided into portions that represent the relative frequencies or

Chapter 2: Descriptive Statistics 6

c) Line Graph/Time Series Graph

Chapter 2: Descriptive Statistics 8

1. The following data show the method of payment by 16 customers in a supermarket

a. Construct a frequency distribution table.

Type of Frequency Relative Frequency Percentage Angle Size

Year 1990 1991 1992 1993 1994 1995 1996

a. Construct a frequency distribution for the data.

Month Export Import

6. The following information represents the maximum rain fall in

Chapter 2: Descriptive Statistics 9

State Quantity (mm)

2.3 ORGANIZING AND GRAPHING QUANTITATIVE DATA

2.3.1 Stem-and-Leaf Display

2.3.2 Frequency Distributions

Chapter 2: Descriptive Statistics 10

Class Width (class size)

Class Midpoint or Mark

Lower limit + Upper limit

Constructing Frequency Distribution Tables

1. To decide the number of classes, we used Sturge’s formula, which is

where c is the no. of classes

This class width is rounded up to a convenient number.

3. Lower Limit of the First Class or the Starting Point

Chapter 2: Descriptive Statistics 12

Number of classes, c = 1 + 3.3 log 30

i) Starting Point = 135

Table 2.10 : Frequency Distribution for Data of Table 2.9

Frequency of that class

2.3.4 Graphing Grouped Data

A graph formed by joining the midpoints of the tops of successive bars in a

Chapter 2: Descriptive Statistics 14