Sei sulla pagina 1di 76

STATISTICAL

Chapter 3. DESCRIPTION
OF DATA
Frequency Distributions
Grouped Data
Percentiles, Deciles & Quartiles
Graphical Representations
Symmetry and Skewness
Statistical data collected should be
arranged in such a manner that will
allow a reader to distinguish their
essential features. Depending on the
type and the objectives of the person
presenting the information, data may
be presented using one or a
combination of three forms.
Three Forms of Presenting Data

Textual Form – data is


presented in paragraph
form especially when
they are purely
qualitative or when very
few numbers are
involved.
Tabular Form - data is
presented in rows and
columns

4000
3500
3000 1991

Graphical Form - data 2500


2000
1992
1993
1500 1994

is presented in visual 1000


500
1995

form

r
ve r
r
ch

ce r
Fe ry

ay
ril

ne

ly

t
ry

be
e
be

De be
Se gus
Ap

Ju
a

ob
ua

ar

Ju
nu

m
em

m
M
br

ct
Au
Ja

O
pt

No
When that data include a large
number of observations, it is
convenient to group the values into
mutually exclusive classes and show
the number of observations occurring
in each class in a tabular form.
Frequency Distribution

A frequency distribution is the


arrangement of data that shows the
frequency of occurrence of values
falling within arbitrarily defined ranges
of the variable known as class
intervals. The smallest and largest
values that fall in a given interval are
called class limits.
Class Frequency and Class Mark

Class frequency refers to the


number of observations falling in a
particular class while the midpoint
between the upper and lower class limits
is called class mark/midpoint.
Steps in Making a Frequency Distribution
 Find the range.
 Determine the interval size by dividing the
range by the desired number of classes
which is normally not less than 10 and not
more than 20.
 Determine the class limits of the class
intervals. Tabulation is facilitated if the
lower class limits of the class intervals are
multiples of the class size. The bottom
interval must include the lowest score.
 List the intervals, beginning at the
bottom.
 Tally the frequencies.
 Summarize these under a column
labeled f.
 Total this column and record the
number at the bottom.
Problem:

Construct a frequency distribution


of the given scores on a test.
56 28 42 56 47 39 62 60 54 47
78 82 55 56 41 44 54 42 62 48
62 38 57 55 50 47 42 56 68 53
37 72 65 66 52 52 48 48 42 68
Solution:

Computing for the range:


R = 82 – 28 = 54
Computing for the class interval:
54
i  5 .4
10

Therefore, class interval


may be 5 or 6.
We choose 5 because it is the odd number.

If i = 5, lowest limit should be 25.


We choose 25 because it is the smallest
multiple of the chosen interval which is
smaller than the smallest value in the set.

If lowest limit is 25, the bottom interval


should be 29 – 25.
The interval 29 - 25 contains the lowest
score (28).
Classes Tally f
84 - 80 / 1
79 - 75 / 1
74 - 70 / 1
69 - 65 //// 4
64 - 60 //// 4
59 - 55 /////// 7
54 - 50 ////// 6
49 - 45 ////// 6
44 - 40 ////// 6
39 - 35 /// 3
34 - 30 0
29 - 25 / 1
N   f  40
For Grouped Data ( > 30 values)

MEASURES OF CENTRAL TENDENCY

MEAN
Methods :
1. Midpoint Method
2. Short Method
Midpoint Method
After the f column, make another column and
enter the midpoint (Xm) of each class. Multiply the
frequency with the midpoint and enter it in the
next column. Label the column f Xm. Get the sum.
Use the formula:

x
 ( fX m )
N
Short Method

Choose a class at or near the middle of the


distribution to be designated as the origin. After
the f column, construct the deviation column (d).
Mark the chosen class zero. In succession, write
-1, -2 and so on for classes lower in value than
the origin. In like manner, write 1, 2, 3 and so on
for classes greater in value than the origin.
Construct f x d column and get the algebraic
sum.
Use the formula:

x z
 ( fxd) a lg
i
N
where z = midpt. of class chosen as origin
Problem:
Classes f

For the given 54-50 4


frequency distribution, 49-45 7
compute for the mean 44-40 12
using:
39-35 10

Midpoint Method 34-30 9

Short Method 29-25 6


24-20 2
Solution: Using Midpoint Method

Classes f Xm fXm
x
 ( fX m )
54-50 4 52 208 N
49-45 7 47 329
1905
44-40 12 42 504 x
39-35 10 37 370 50
34-30 9 32 288
29-25 6 27 162 x  38.1
24-20 2 22 44
N = 50  fX m 1905
Using Short Method

Classes f d fd
x z
 ( fxd) a lg
i
54-50 4 3 12
N
49-45 7 2 14
11
44-40 12 1 12 x  37  (5)
39-35 10 0 0 50
34-30
29-25
9
6
-1
-2
-9
-12
x  38.1
24-20 2 -3 -6
N = 50  fd  11
MEDIAN

Steps:
N
Find
2
Find the accumulated sum of the
frequencies up to the sum that
contains N
2
Use the formula:

( N  cf )
Md  L  2 i
f

where L = lower limit of class which contains N/2


f = frequency of class containing N/2
cf = cumulative sum that approaches or is
equal to N/2
MODE

Rough Mode( R. Mo) - obtained by


inspection and is equal to the
Xm of class having the highest
frequency.

Theoretical Mode( T. Mo)  3Md  2x


Problem:

For the given frequency


distribution in the previous
problem, compute for the:

Median
R. Mode
T. Mode
Solution: Computing for the Median

Classes f cf i=5
54-50 4 N 50
  25
49-45 7 2 2
44-40 12 ( N  cf )
39-35 10 27
Md  L  2 i
f
34-30 9 17 (25  17)
29-25 6 8 Md  35  (5)
10
24-20 2 2
Md  39
N = 50
Computing for the Mode

Classes f R. Mode = 42
54-50 4
49-45 7 T . Mode  3Md  2 x
44-40 12 since x  38.1
39-35 10 Md  39
34-30 9
29-25 6
24-20 2
T . Mode  3(39)  2(38.1)
N = 50 T . Mode  40.8
Other Measures of Position

Quartiles
Deciles
Percentiles
Quartiles - those which divide the
distribution into 4 parts

( kN  cf )
Qk  L  4 i
f
Deciles - those which divide the
distribution into 10 parts

( kN  cf )
Dk  L  10 i
f
Percentiles - those which divide the
distribution into 100
parts

( kN  cf )
Pk  L  100 i
f
Problem:

For the given frequency distribution in


the previous problem, compute for:

Q1
D3
P88
Solution: Computing for Q1

i=5
Classes f cf
kN (1)50
54-50 4   12.5
49-45 7
4 4
(kN  cf )
44-40 12
Qk  L  4 i
39-35 10 f
34-30 9 17 (12.5  8)
29-25 6 8 Q1  30  (5)
9
24-20 2 2 Q1  32.5
N = 50
Computing for D3

i=5
Classes f cf
kN (3)50
54-50 4   15
10 10
49-45 7
44-40 12 (kN  cf )
Dk  L  10 i
39-35 10 f
34-30 9 17
(15  8)
29-25 6 8 D3  30  (5)
24-20 2 2
9
D3  33.89
N = 50
Computing for P88

i=5
Classes f cf
kN (88)50
54-50 4   44
100 100
49-45 7 46
44-40 12 39 (kN  cf )
Pk  L  100 i
39-35 10 27 f
34-30 9 17
(44  39)
29-25 6 8 P88  45  (5)
24-20 2 2
7
P88  48.57
N = 50
For Grouped Data ( > 30 values)

MEASURES OF VARIATION

RANGE
The range is computed as the
difference between the upper limit of
the highest class interval and the
lower limit of the lowest class interval.
VARIANCE

 2

 f (x m  x) 2

N
STANDARD DEVIATION

  f (x m  x) 2

N
MEAN DEVIATION

D
 f x m x
N

QUARTILE DEVIATION

Q3  Q1
Q
2
Problem:

For the given frequency


distribution, determine:

variance
standard deviation
mean deviation
quartile deviation
Classes f Classes f
89-85 1 59-55 7
84-80 1 54-50 6
79-75 2 49-45 6
74-70 3 44-40 6
69-65 4 39-35 3
64-60 4 34-30 1
Solution: Computing for the Mean
Classes f Xm fXm
89-85 1 87 87
84-80 1 82 82 x
 ( fX m )
79-75 2 77 154
N
74-70 3 72 216
2443
69-65 4 67 268
x
64-60 4 62 248 44
59-55 7 57 399
54-50 6 52 312 x  55.5
49-45 6 47 282
44-40 6 42 252
39-35 3 37 111
34-30 9 32 32
N = 44  fX m  2443
Computing for the Variance
Classes f xm – X (xm - X )2 f(xm - X )2
89-85 1 31.5 992.25 992.25

2  
84-80 1 26.5 702.25 702.25
f ( xm  x ) 2
79-75 2 21.5 462.25 924.50
74-70 3 16.5 272.25 816.75 N
69-65 4 11.5 132.25 529.00
7329
64-60 4 6.5 42.25 169.00  
2

59-55 7 1.5 2.25 15.75 44


54-50 6 -3.5 12.25 73.50
49-45 6 -8.5 72.25 433.50  2  166.57
44-40 6 -13.5 182.25 1093.50
39-35 3 -18.5 342.25 1026.75
34-30 1 -23.5 552.25 552.25
N = 44  f (x m  x ) 2
 7329
Computing for the Standard Deviation

Since   166.57
2

  2

  166.57
  12.906
Computing for the Mean Deviation
Classes f /xm –X / f /xm - X /
89-85 1 31.5 31.5
84-80 1 26.5 26.5
79-75
74-70
2
3
21.5
16.5
43.0
49.5 D
 f x m x

69-65 4 11.5 46.0


N
64-60 4 6.5 26.0 465
1.5
D
59-55 7 10.5 44
54-50 6 3.5 21.0
49-45 6 8.5 51.0 D  10.6
44-40 6 13.5 81.0
39-35 3 18.5 55.5
34-30 1 23.5 23.5

N = 44 fx m  x  465
Computing for the Quartile Deviation
Classes f cf
Qk
kN  cf 
 L 4 i
89-85 1 44 f
84-80 1 43 kN 1( 44)
  11
79-75 2 42 4
Q1
4
 45 
11  10 5
 45.83
74-70 3 40
6
69-65 4 37
64-60 4 33 kN 3( 44)
  33
4 4
33  335
59-55 7 29
54-50 6 22 Q3  60   60
49-45 6 16
4
44-40 6 10
Q3  Q1 60  45.83
39-35 3 4 Q 
34-30 1 1 2 2
N = 44 Q  7.085
Types
of
Graphs
BAR GRAPH

The bar graph is particularly useful in


presenting data gathered from discrete
variables on a nominal scale. It uses rectangles
or bars to represent discrete classes of data.
The base of each bar corresponds to a class
interval of the frequency distribution and the
heights of the bars represent the frequencies
associated with each class.
HISTOGRAM

The histogram is similar to a bar


chart but the bases of each bar
are the class boundaries rather
than class limits.
FREQUENCY POLYGON

A frequency polygon is a line


graph of class frequencies plotted
against class marks.
Problem:

Classes f
For the following 54-50 4
frequency distribution, 49-45 7
construct: 44-40 12
bar graph 39-35 10

histogram 34-30 9

frequency polygon
29-25 6
24-20 2
BAR GRAPH
15
Fre que ncy

10

0
20-24 25-29 30-34 35-39 40-44 45-49 50-54
Class Marks
HISTOGRAM
15
12
10
Frequency

9
10
7
6
4
5
2

0
Class Boundaries
FREQUENCY POLYGON
15
Frequency

10

0
20-24 25-29 30-34 35-39 40-44 45-49 50-54
Classes
PIE CHART

A pie chart is used to represent


quantities that make up a whole.
Problem:

The following table classifies enrolment in a


certain university. Construct a pie chart to
show the enrolment distribution.

Engineering 5280
Engineering
Commerce 3000 Commerce
Education 1800 Education
Arts & Sciences
Arts & Sciences 1320 Law
Law 600
CUMULATIVE FREQUENCY
CURVE (Ogive Curve)

An ogive curve is a line graph obtained


by plotting values from the tabular
arrangement by class intervals whose
frequencies are cumulated. From this
curve, the centile rank of a certain score
can be determined. A centile rank
denotes the percentage of scores that
fall below a specified score in a
distribution.
Problem:

Construct the ogive curve for the


given frequency distribution. What
score correspond to C50? C88? What
is the centile rank of a score of 50?
Classes f cf CP (cf/N x 100)
64-60 2 376 100.0
59-55 12 374 99.5
54-50 20 362 96.3
49-45 32 342 91.0
44-40 46 310 82.4
39-35 58 264 70.2
34-30 64 206 54.8
29-25 58 142 37.8
24-20 42 84 22.3
19-15 23 42 11.2
14-10 15 19 5.0
9-5 4 4 1.1

N  376
120

100

80
Ogive
CP

60

Curve
40

20

0
0 9 14 19 24 29 34 39 44 49 54 59 64
UL

C50 = 33 C88 = 48 Score 50 = C91


Kurtosis and
Skewness

The measures of skewness and


kurtosis indicate the extent of
departure of a distribution from
normal and permit comparison
of two or more distributions.
KURTOSIS (ku)
Kurtosis refers to the flatness or
peakedness of a frequency distribution. It
shows the shape of the curve or the
arrangement of a set of distribution in
relation to the other set of distribution. The
coefficient of kurtosis is given by:

Q
ku 
P90  P10
Types of Kurtosis

leptokurtic (ku < 0.263)


mesokurtic (ku = 0.263)
platykurtic (ku > 0.263)
SKEWNESS (sk)
Skewness refers to the symmetry or
asymmetry of a frequency distribution. The
coefficient of skewness is given by:

3( x  md )
sk 
s
If sk = 0, the distribution is normal.

X  Md  Mo
If sk < 0, the distribution is
negatively skewed.

X Md Mo
( Mo  Md  X )
If sk > 0, the distribution is
positively skewed.

Mo Md X

( X  Md  Mo )
Problem:
For a certain frequency distribution,
the ff. data are given:
s  13.7 Q3  155.8 md  147
x  147 P90  167.5 Q1  138
D1  128.8
Determine the kurtosis and skewness of the
distribution. Is it a normal distribution?
Solution:

Q3  Q1
Q 2
ku  
P90  P10 P90  D1
155.8  138
ku  2  0.23
167.5  128.8
Distribution is leptokurtic.
3( x  md )
sk 
s
3(147  147.25)
sk   0.05
13.7

Distribution is negatively
skewed.
Student
Activity
Part I. Answer the following:

1. Define each of the following:


a. class mark c. histogram
b. ogive d. frequency polygon

2. What advantages does each of the


following forms of presenting data offer?
a. textual b. tabular c. graphical
3. Distinguish between:
a. class limits and class boundaries
b. skewness and kurtosis
4. Give the class mark, the class boundaries
and the interval size for each of the
following:
a. 10 – 19
b. 1.5 – 5.0
c. 12.85 – 13.43
Part II. Solve the following using
Microsoft Excel Applications.

The list below gives the weekly food budget


and weekly incomes for 39 households.

1. Construct frequency distribution table for


food budget using i = 25 and determine:
a. mean
b. median
c. rough and theoretical mode
d skewness
Food Budget Weekly Income Food Budget Weekly Income
1598 1553 1639 1636
1680 1740 1655 1677
1660 1652 1736 1761
1583 1581 1587 1603
1476 1481 1622 1605
1633 1634 1689 1631
1717 1692 1700 1765
1596 1561 1613 1688
1613 1566 1615 1667
1607 1626 1458 1479
1728 1699 1750 1747
1672 1685 1700 1673
1572 1589 1654 1641
1634 1571 1625 1613
1461 1443 1565 1521
1726 1712 1563 1583
1732 1724 1566 1542
1620 1628 1587 1567
1616 1564 1584 1610
1579 1526
2. Construct frequency distribution table for
weekly income using i = 25 and determine:
a) standard deviation
b) mean deviation
c) quartile deviation
d) kurtosis

3. Plot a bar chart for food budget and


superimpose on it the frequency polygon
for weekly income.
4. Take the difference between weekly
income and food budget for each
household and construct a frequency
distribution and cumulative frequency
distribution.
5. Plot the ogive curve for the data in (4).
What score corresponds to a centile
rank of 71?
Proceed to Topic 4

Potrebbero piacerti anche