Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Descriptive Statistics
Inferential Statistics
Descriptive Statistics
deals with the collection, organization,
presentation, and computation of data to
describe the samples under investigation.
A numerical property of a
population is called a parameter.
Discrete
Continuous
Discrete
For example,
You can count the change in your pocket.
You can count the money in your bank
account.
You could also count the amount of money
in everyone’s bank account.
It might take you a long time to count that
last item, but the point is — it’s still
countable.
Continuous
1. Nominal Level
1. Nominal Level
Examples:
• black • male
• grey • female
• red
• brown
Levels of Measurement
2. Ordinal Level
2. Ordinal Level
Examples:
• C • Extra Large
• DE • Large
• Medium
• Small
Levels of Measurement
3. Interval Level
3. Interval Level
4. Ratio Level
Samples:
= {4, 12, 20, 28, 36, 44, 52, 60, 68, 76, 84, 92}
Stratified Sampling
If the population is divided into
subpopulations, called strata, and we
take a random sample from each
stratum, the resulting sample is a
stratified sample.
Stratified Sampling
Proportional Equal
Allocation Allocation
Example:
Stratified Sampling with Proportional
Allocation
Suppose 40% of the student body are
freshmen, 25% sophomores, 20% juniors,
15% seniors.
Stratified sample size (n = 100) with proportional
allocation
Freshmen - 40
Sophomores - 25
Juniors - 20
Seniors - 15
Example
Stratified Sampling with Equal Allocation
Suppose 50 samples are to be chosen from
5 sections
Section N n
A 48 10
B 50 10
C 52 10
D 49 10
E 50 10
N = 249
Cluster Sampling
A sample of convenience is a
sample that already exists and is
available for study; the elements in the
sample are not chosen by a chance
process. By contrast, a probability
sample is obtained by a chance process;
each element in the population has a
certain probability of being selected
Methods of Collecting Data
Textual
Tabular
Graphical
Methods of Presenting Data
Textual Method
Collected data are presented in narrative and
paragraph forms.
Tabular Method
Data are orderly arranged and presented in rows
and columns for an easier and more
comprehensible comparison of figures.
Graphical Method
Data gathered are presented in visual or pictorial
form. This would enable the researcher to get a
clear view of the relationships of data through
pictures and colored maps.
Organization of Data
Frequency Distribution
Grouped Frequency Distribution
Frequency Distribution
6 11 5 1 6 6 7 5 7 6 1 7 5 3 6
Solution
The distinct values are 1, 3, 5, 6,7, and 11, which
occur 2, 1, 3, 5, 3, and 1 times, respectively. This
is an adequate description of the frequency
distribution. However, it is also common to
describe the frequency distribution by means of a
table. First column of the table contains the
distinct values. In the second column next to each
distinct value, we record its frequency. So the
frequency distribution is represented as
Solution
x f
1 2
3 1
5 3
6 5
7 3
11 1
Grouped Frequency Distribution
Refers to the tabulation of data by category or
class interval with corresponding frequency for
each class.
Age No. of employees
20 – 29 30
30 – 39 35 Class
classes 40 – 49 20 frequencies
50 – 59 10
60 – 69 5
Lower limits: 20, 30, 40, 50, 60
Upper limits: 29, 39, 49, 59, 69
Procedure for Constructing a
Grouped Frequency Distribution
Range = 78 – 40 = 38
C.I. = 8
C.S. = 38 / 8 = 4.75 or 5
Example
30
25
Frequency
20
15
10
0
42 47 52 57 62 67 72 77
Cl ass M ark
Graphical Presentation of a
Frequency Distribution
Frequency Polygon - A line graph where the
class frequencies is plotted against the
classmarks.
35
30
25
Frequency
20
15
10
0
0 42 47 52 57 62 67 72 77
Class M ark
Graphical Presentation of a
Frequency Distribution
Cummulative Frequency Polygon - is the graph
of a cummulative frequency distribution
140
120
100
Frequency
80
60
40
20
0
0 44.5 49.5 54.5 59.5 64.5 69.5 74.5 79.5
Class M ark
Graphical Presentation of a
Frequency Distribution
Pie Graph or Pie Chart - The pie is subdivided into
segments each of which is proportional in size
to the quantities or percentage it represents.
The entire circle represents the total population.
Education
36%
Health
4%
DILG
4%
Agricultural
Administration National Defense
8% 30%
Public Affairs
18%
Measures of Centrality and
Variability
Mean
Median
Mode
Uses of Measures of Central
Location
Mean
Varies less from sample to sample; all data in the
distribution are used. It affected by extreme
values in the distribution; most stable measure of
central location associated with interval or ratio
data.
3 Measures of Central Location
Median
Middle value, not affected by the extreme values
Mode
Appropriate for data that is nominal; it measures
popularity; the value with the highest frequency.
Data
Ungrouped Grouped
Array or Matrix Frequency Distribution
Computation of Mean for Ungrouped Data
n
∑ xi x 1 + x 2 + x 3 + . . . + xn
i=1
X = =
n n
Where:
∑x - sum of x – values
n - number of items or cases
Example
Nine students got the following scores in a
surprise quiz: 83, 68, 62, 80, 66, 94, 67, 72,
56. Find the mean.
83 + 68 + 62 + 80 + 66 + 94 + 67 + 72 + 56
X =
9
648
= = 72
9
Computation of Mean for Grouped Data
n
∑ fixi f 1 x 1 + f 2 x 2 + f 3 x 3 + . . . + fn x n
i=1
X = =
n n
Where:
f - frequency
x - class mark
n - number of samples
Example
Compute the mean score of 50 students using the
data below
Product of Frequency
Class Interval Frequency Class Mark
and Classmark
(C.I.) (f ) (x) (fx)
94 - 100 2 97 194
87 - 93 3 90 270
80 - 86 9 83 747 3, 688
73 - 79 18 76 1,368
X =
66 - 72 7 69 483
50
59 - 65 6 62 372
52 - 58 2 55 110 = 73.76
45 - 51 3 48 144
n = 50 ∑fx = 3,688
Computation of Median ( x~ )
Ungrouped Data
~ (n + 1)th item
X =
2
Example
There are seven college students in a
classroom with ages 18, 19, 20, 21, 18, 20
and 22. determine the median.
Solution:
Arranges data in descending or ascending order of
magnitude.
22 ~ (n + 1)th item
21 X =
2
20
(7 + 1)th 8th
20 Median = 20 = =
19 2 2
18 = 4th item
18
Example
There are eight men riding in an elevator
ages 18, 19, 48, 28, 46, 20, 22 and 26. Find
the median.
Solution: ~ (n + 1)th item
X =
48 2
46 (8 + 1)th 9th
28 = =
2 2
26
Median = 26 + 22 = 4.5th item
22 2
20
= 24 4th and 5th items are the
19
middle values
18
Remarks:
n
-F
~ 2
X = LBmd + i
fmd
Where:
LBmd - Lower class boundary of the median class
n - number of samples
F - less than cummulative frequency before
the median class
Fmd - frequency of the median class
i - class size or class width
Example
Compute the median of the given set of data
less than cummulative
Class Interval Frequency Class Boundaries
frequency
40 - 44 9 39.5 - 44.5 80
35 - 39 12 34.5 - 39.5 71
30 - 34 15 29.5 - 34.5 59
25 - 29 19 24.5 - 29.5 44 median class
20 - 24 14 19.5 - 24.5 25
15 - 19 6 14.5 - 19.5 11
10 - 14 5 9.5 - 14.5 5
n = 80
Example
Middle item in grouped data is:
n
th observation
2
80 75
~ - 25 = 24.5 +
X = 24.5 + 2 5 19
19
= 24.5 + 3.95
40 - 25 ~
= 24.5 + 5
19 X = 28.45
(50% or 40 of the samples fall
15
= 24.5 + (5) below 28.45 and 50% fall above
19 the computed median)
Example
Below is the frequency distribution of examination
scores in English for Grade 6. Compute for the
median.
less than cummulative
Class Interval Frequency Class Boundaries
frequency
50 - 54 1 49.5 - 54.5 1
55 - 59 3 54.5 - 59.5 4
60 - 64 9 59.5 - 64.5 13
65 - 69 19 64.5 - 69.5 32 median class
70 - 74 13 69.5 - 74.5 45
75 - 79 3 74.5 - 79.5 48
80 - 84 2 79.5 - 84.5 50
n = 50
Example
50
~ - 13
X = 64.5 + 2 5
19
25 - 13
= 64.5 + 5
19
12
= 64.5 + (5)
19
= 64.5 + 3.16
~
X = 67.66
^
Mode ( x )
mode is 95 (unimodal)
^ fmo – fb
X = LBmo + i
2fmo – fb - fa
Where:
LBmo - Lower class boundary of the modal class
fmo - Frequency of the modal class
fb - Frequency before the modal class
fa - Frequency after the modal class
Example
Solve for the mode in the following distribution of test
scores in Mathematics taken by 52 students.
90 - 99 3 89.5 - 99.5
80 - 89 9 79.5 - 89.5
70 - 79 18 69.5 - 79.5 Modal class
60 - 69 12 59.5 - 69.5
50 - 59 8 49.5 - 59.5
40 - 49 2 39.5 - 49.5
n = 52
Example
^ 18 - 12
X = 69.5 + 10
2(18) – 12 - 9
6
= 69.5 + 10
36 - 21
6
= 69.5 + (10)
15
60
= 69.5 +
15
= 69.5 + 4
^
X = 73.5
Measures of Dispersion
Range is the difference between the highest (H) and
the lowest (L) data values
R = H–L
N(∑x2) – (∑x)2
δ2 =
N2
n(∑x2) – (∑x)2
s2 =
n(n – 1)
Standard Deviation
Is the square root of the variance
√
∑ (x - µ)2
δ2 = population standard variance
N
√
∑ (x - x)2 sample standard variance
s2 =
n
Example
∑x
µ =
N
6 + 11 + 5 + 1 + 6 + 6 + 7 + 5 + 7 + 6
=
10
60
=
10
= 6
Example
2
x x-µ (x - µ)
6 0 0
11 5 25
5 -1 1 54
1 -5 25 δ2 = = 5.4
10
6 0 0
6 0 0
7 1 1
δ = √ 5.4 = 2.3
5 -1 1
7 1 1
6 0 0
∑(x - µ) 2 =
54
Another way of computing population
variance
N(∑x2) – (∑x)2
δ2 =
N2
2
x x
6 36
11 121 10(414) – (60)2
δ2 =
5 25 (10)2
1 1
6 36 4140 – 3600
6 36
=
100
7 49
540
5 25 =
7 49 100
6 36
∑x = 60 ∑x 2 414 = 5.4
=
Sample Variance and Standard Deviation
from Grouped Data
√
∑f (x - x)2 ∑f (x - x)2
s2 = sd =
n-1 n-1
or
n(∑x2f) – (∑xf)2
√
n(∑x2f) – (∑xf)2
s2 = sd =
n(n – 1) n(n – 1)
Example
√
n(∑x2f) – (∑xf)2
sd =
n (n – 1)
√
40(181,525) – (2,665)2
=
40(39)
√
158,775
=
1,560
= √101.779
sd = 10.09
Measures of Position
Are used to describe the standing or place occupied by
a data value relative to the rest of the data.
Q4 D10 P100
D9 P90
D8 P80
Q3 D7 P70
D6 P60
Median Median Median
Median D5 P50
Q2 D4 P40
D3 P30
D2 P20
Q1 D1 P10
i(n)
Qi = For Quartile
4
i(n)
Di = For Decile
10
i(n)
Pi = For Percentile
100
Example
Find the first and third quartile (Q1, Q3), fifth and seventh
decile (D5, D7) and 30th percentile and 80th percentile
(P30, P80) given the following set of data.
40 1(12)
35 Qi = = 3rd item
32
4
9th item 30 Q3 D7 3(12)
28
Q3 = = 9th item
4
25
6th item 22 D5 5(12)
D5 = = 6th item
20 10
18 7(12)
3rd item 15 Q1 D7 = = 8.4 or 9th item
10 10
8
Example
Find the first and third quartile (Q1, Q3), fifth and seventh
decile (D5, D7) and 30th percentile and 80th percentile
(P30, P80) given the following set of data.
40
30(12)
35 P30 = = 3.6 ≈ 4th item
10th item 32 P80 100
9th item 30 Q3 D7
28 80(12)
25 P80 = = 9.6 ≈ 10th item
6th item 22 D5 100
20
4th item 18 P30
3rd item 15 Q1
10
8
Computation of Quantiles for Grouped Data
Quartile:
in
-F
4
Qi = LQi + i
fQi
Where:
LQi - lower limit where (in/4)th item is found
F - less than cummulative frequency before
(iN/4)th item is found
fQi - Frequency of (in/4)th item is found
i - class size
Computation of Quantiles for Grouped Data
Decile:
j*n
-F
10
Dj = LDj + i
fDj
Where:
LDj - lower limit where (j*n/10)th item is found
F - less than cummulative frequency before
(j*n/10)th item is found
fDj - Frequency of (j*n/10)th item is found
i - class size
Computation of Quantiles for Grouped Data
Percentile:
J*n
-F
100
Pj = LPj + i
fpj
Where:
LPj - lower limit where (j*n/100)th item is found
F - less than cummulative frequency before
(j*n/100)th item is found
fPj - Frequency of (j*n/100)th item is found
i - class size
Example
Using the grouped data below, find Q2, D6 and P75
Location of Q2:
2(40)
Frequency Q2 = = 20th
Class Interval < C.F. 4
(f)
45 - 49 3 40 2(40)
40 - 44 4 37 4 - 16
Q2 = 30 + 5
35 - 39 5 33 12
30 - 34 12 28 Q2 class
25 - 29 9 16 20 - 16
= 30 + 5
20 - 24 5 7 12
15 - 19 2 2
n = 40 4
= 30 + 5 = 30+(1.6)
12
Q2 = 31.6
Sample
Using the grouped data below, find Q2, D6 and P75
Location of D6:
6(40)
Frequency D6 = = 24th
Class Interval < C.F. 10
(f)
45 - 49 3 40 6(40)
40 - 44 4 37 10 - 16
D6 = 30 + 5
35 - 39 5 33 12
30 - 34 12 28 Q
D26 class
25 - 29 9 16 24 - 16
= 30 + 5
20 - 24 5 7 12
15 - 19 2 2
n = 40 8
= 30 + 5 = 30+(3.3)
12
D6 = 33.3
Sample
Using the grouped data below, find Q2, D6 and P75
Location of P75:
75(40)
Frequency P75 = = 30th
Class Interval < C.F. 100
(f)
45 - 49 3 40 75(40)
- 28
40 - 44 4 37 P75 = 35 + 100 5
35 - 39 5 33 P75 class 5
30 - 34 12 28 QQ22, Dclass
6 class
25 - 29 9 16 30 - 28
= 35 + 5
20 - 24 5 7 5
15 - 19 2 2
n = 40
= 35 + 2
P75 = 37
Standard Score
Formula for the Z-score:
x – mean
Z =
Standard deviation