Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
STATISTICS
1. Population refers to the totality of the observations with which
Is the science that deals with the collection, organization or we are concerned.
presentation, analysis and interpretation of quantitative data for 2. Sample is small part of population. It could also be referred to
decision- making process. as a subgroup, subset, or representative of a population.
3. Parameter is any numerical value describing a characteristic of
a population.
PHASES OF STATISTICS 4. Statistic is any numerical value describing a characteristic of a
sample. It is an estimate of a parameter. It is a value or
I. Descriptive Statistics
measurement obtained from a sample.
II. Inferential Statistics
5. Data are facts or a set of information or observations under
consideration, gathered by a researcher from a population or
from a sample.
Descriptive statistics
- Is composed of those methods concerning collection and Data may be classified into two:
description of a set of data to yield meaningful information.
5.1 Qualitative Data are data which assume values that
- Is a mathematical method used to summarize a set of data. manifest the concept of attributes. These are also called
categorical data. (𝑄𝑢𝑎𝑙𝑖𝑡𝑎𝑡𝑖𝑣𝑒 = 𝑄𝑢𝑎𝑙𝑖𝑡𝑦)
1
SCALES OF MEASUREMENT
6. Variable is a characteristic of population or sample which
differentiates members from each other. Measurement is defined as the assignment of symbols or
numerals to objects or events according to some rules.
Variables may be classified into two:
In Statistics, there are four different scales of measurement,
6.1 Discrete Variable is one that can assume specific namely:
values only (Whole numbers) the values of a discrete
variable are obtained through the process of counting. 1. Nominal scale
This is the most primitive level of measurement.
Example:
This scale is used to distinguish one object from
another for identification purposes.
number of students present
number of red marbles in a jar Example:
students’ grade level
all integers from 1 to 100 Gender (Male, Female)
Political Party (Democratic, Republican)
6.2 Continuous Variable is one that can assume infinite Race
values within a specific interval. The values of a Religion
continuous variable are obtained through measuring.
2. Ordinal Scale
Example:
In this scale, data are arranged in some specified
height of students in class order or rank. This measurement allows us compare
weight of students in class objects but we cannot know the degree of the difference.
time it takes to get to school
distance traveled between classes Example:
Amount of sales of a sari-sari store
Area of Land Size (Small, Medium, Large)
The first, third and fifth person in a race.
NOTE: Discrete data is counted
Continuous data is measured
2
3. Interval Scale SUMMATION NOTATION
If data are measured in the interval scale, we can The summation sign: Σ
determine the amount of difference between two objects or
data. Interval scale cannot be multiplied or divided.
(has unit distance and zero-point equality of intervals – and
+, -)
Example:
Temperature, in degrees Fahrenheit
Dates (data that has an arbitrary zero)
PARAMETRIC VS NON-PARAMETRIC
I. The summation of the sum of two or more
variables is the sum of their summations.
I. Interval and ratio data are parametric, and are used Thus,
with parametric tools in which distributions are
predictable (and often Normal).
𝑛 𝑛 𝑛 𝑛
3
Sometimes, a mass of data is too large to handle so that grouping’
II. If c is a constant, then, is necessary to see notable features of the data and obtain appropriate
measurements.
𝑛 𝑛
∑ 𝑐𝑥𝑖 = 𝑐 ∑ 𝑥𝑖 Grouped data are data are organized and arranged into different classes
𝑖=1 𝑖=1 or categories
PRESENTATION OF DATA
I. Ungrouped Data
II. Grouped Data
4
NOTE: The highest score obtained in the test is 50 and the lowest
score is 3. Furthermore, ten students got scores of 25 and below.
Ordered array Generally, the students’ performance is satisfactory with 21 of
them or 70% getting scores of 25 and above.
- Data arranged from smallest to largest (usually).
Data may be presented in textual from or paragraph form. This Sometimes, it is quite hard to grasp the data when it is presented
involves enumerating significant characteristics and identifying in textual form. Hence, we may also present data by using tables.
notable features of the data. A table has the following parts:
1. Below are test scores of 30 students in a math quiz: 1. Table number - this is for easy reference to the table
2. Table title - briefly explains the content of the table
25 18 17 12 43 40 3. Column header - describes the data in each column
33 41 20 35 10 36 4. Row Classifier - shows the classes or categories
28 19 28 42 28 31 5. Body - the main part of the table
40 40 32 26 3 50 6. Source note - placed below the table when the data
26 15 10 35 29 30 is obtained from another source.
Arranging the scores from lowest to highest using stem and leaf plot.
S L
0 3
1 0025789
2 05668889
3 0123556
4 000123
5 0
5
II. Grouped Data
THE FREQUENCY DISTRIBUTION TABLE
The smallest and the largest values that can fall within
the class interval are referred to as the class limits.
I. Ungrouped data
A. Upper class limit
The frequency distribution table for ungrouped data is the - The highest value that can go in a
arrangement of values from according to magnitude showing the class.
frequency of occurrence of each value in the data. This is used B. Lower class limit
when the number of items is too large and the range of values is - The smallest value that can go in a
not too wide. class.
Class boundary
- the numbers used to separate
classes. The size of the gap between
classes is the difference between the
upper-class limit of one class and
the lower-class limit of the next
class (usually by 0.5).
6
The numerical difference between the upper and lower-class CLASS MID POINT OR CLASS MARK
boundaries of a class interval (any class) is defined to be the
class width or class size. The mid value or central value of the class interval is
called mid-point.
The midpoint between the upper and lower-class boundaries
is called the class mark.
𝐶𝑙 +𝐶ℎ
The number of observations may be accumulated either from 𝑥𝑖 =
2
the highest-class interval to the lowest-class interval. The
accumulated values, in this case, is referred to as the greater
than cumulative frequency. If the number of observation from Where:
the lowest class interval, it is called less than cumulative
frequency. 𝐶𝑙 = 𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 𝑜𝑓 𝐶𝑙𝑎𝑠𝑠
𝐶ℎ = 𝑈𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 𝑜𝑓 𝐶𝑙𝑎𝑠𝑠
STURGES FORMULA
A rule for determining number of classes to use in a Histogram or CONSTRUCTION OF FREQUENCY DISTRIBUTION
frequency distribution table (Approximation).
Following steps are involved in the construction of a
frequency distribution.
𝐾 = 1 + 3.322 log10 𝑛
Where:
K = No. of class
n = Is the size of the data
7
(1) Find the range of the data: The range is the difference between Leaf plot:
the largest and the smallest values.
S L
(2) Decide the approximate number of classes: Which the data are to
be grouped. There are no hard and first rules for number of classes. 1 25
Most of the cases we have 5 5 to 20 20 classes. H.A. Sturges has given 2 1367
a formula for determining the approximation number of classes. 3 0345689
4 1233467788
(3) Determine the approximate class interval size: The size of class
5 0011223445667899
interval is obtained by dividing the range of data by number of classes
and denoted by (h) class interval size 6 02345578
7 257
NOTE: In case of fractional results, the next higher whole number is
taken as the size of the class interval.
𝑅 = 𝐻 − 𝐿 = 77 − 12
(4) Decide the starting point: The lower-class limits or class boundary
should cover the smallest value in the raw data. It is a multiple of class 𝑅 = 65
interval.
EXAMPLE:
𝐾 = 1 + 3.322 log10 𝑛
𝐾 = 1 + 3.322 log10 50
Construct a frequency distribution with suitable class interval size of 𝐾 = 6.64 ≈ 7
marks obtained by 50 students of a class are given below:
𝐾=7
23, 50, 38, 42, 63, 75, 12, 33, 26, 39, 35, 47, 43, 52, 56, 59, 64, 77, 15,
21, 51, 54, 72, 68, 36, 65, 52, 60, 27, 34, 47, 48, 55, 58, 59, 62, 51, 48, 𝑅 65
ℎ= = = 9.3 𝑜𝑟 10
50, 41, 57, 65, 54, 43, 56, 44, 30, 46, 67, 53 𝐾 7
8
Cumulative Frequency Distribution Note: To find the class boundaries, we take half of the difference
between the lower-class limit of the 2nd class and the upper-class limit
20−19
of the 1st class 2 = 0.5. This value is subtracted from the lower-
The total frequency of all classes less than the upper-class boundary of class limit and is added to the upper-class limit to get the required class
a given class is called the cumulative frequency of that class. A table boundaries.
9
B. Horizontal Bar Chart
A line graph is used to show continuing data; how one thing is affected
A circle or pie graph is used to show how a part of something relates
by another. It’s clear to see how things are going by the rises and falls a
to the whole. This kind of graph is needed to show percentages
line graph shows. This kind of graph is needed to show the effect of an
effectively.
independent variable on a dependent variable.
10
A frequency polygon is a line graph whose bases are the class marks
and whose heights are the frequencies.
11
An ogive is a line graph where the bases are the class boundaries and THE MEAN (AVERAGE VALUE)
the heights are the < 𝑐𝑓 for the less than ogive and > 𝑐𝑓 for the
greater than ogive. Among the measures of central tendency, the mean is the most
popular and widely used. It is also called the arithmetic mean.
12
Σ𝑥
𝑥̅ = Where:
𝑛
Solution:
18+20+18+20+18+22+24+27+25
𝑥̅ = 9
𝑥̅ = 21
The formula for finding the mean using the long method is as
follows:
Σ𝑓𝑥𝑚
𝑥̅ = 𝑛
13
Σ𝑓𝑥𝑚 1950 Σ𝑓𝑥𝑚 1675
𝑥̅ = 𝑛
= 60
𝑥̅ = =
𝑛 40
𝑥̅ = 32.50
𝑥̅ = 41.875
Example 2:
This indicates that the mean score in statistics quiz of the 40
Below is the frequency distribution of the scores of 40 students in a students is 41.875
quiz in statistics:
B. The Deviation Method
Σ𝑓𝑑
𝑥̅ = ̅̅̅
𝑥𝑜 +
𝑛
Where:
̅̅̅̅
𝑥𝑜 = 𝑎𝑠𝑠𝑢𝑚𝑒𝑑 𝑚𝑒𝑎𝑛
Calculate an estimate of the mean. 𝑓 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
𝑑 = 𝑑𝑒𝑣𝑖𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 (𝑑 = 𝑥𝑚 − 𝑥𝑜 )
Solution:
𝑛 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
14
Example 2:
Solution:
𝑖Σ𝑓𝑢
𝑥̅ = ̅̅̅
𝑥𝑜 +
𝑛
15
Where:
̅̅̅̅
𝑥𝑜 = 𝑎𝑠𝑠𝑢𝑚𝑒𝑑 𝑚𝑒𝑎𝑛
𝑓 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
𝑥𝑚 − ̅̅̅̅
𝑥𝑜 𝑑
𝑢 = 𝑐𝑜𝑑𝑒 (𝑢 = = )
𝑖 𝑖
i = class size
𝑛 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
Example 2:
𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛:
Σ𝑥
𝑥̅ =
𝑛
2+1+3+0+1+3+6+3+3+5+1
𝑥̅ =
12
𝑥̅ = 2.4166
16
2. A family recorded their electrical consumption for the following 4. Given the following frequency distribution. Compute the mean
period. using a.) the long method b.) the deviation method and c.) the
coded method.
𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛:
Σ𝑥
𝑥̅ =
𝑛
140 + 148 + 165 + 174 + 159 + 143
𝑥̅ =
6
𝑥̅ = 154.833
𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛:
𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛:
Σ𝑥
𝑥̅ =
𝑛
101 + 125 + 118 + 128 + 106 + 115 + 99 + 118 + 109
𝑥̅ =
9
Σ𝑓𝑥𝑚
1019 a. 𝑥̅ = 𝑛
𝑥̅ = 6423
9 𝑥̅ =
44
𝑥̅ = 113.22 𝑥̅ = 145.977
17
Σ𝑓𝑑 PROPERTIES OF THE MEDIAN
b. 𝑥̅ = ̅̅̅
𝑥𝑜 + 𝑛
−45
𝑥̅ = 147 +
44 Below are the properties or characteristics of the median of any
𝑥̅ = 145.977 distribution:
THE MEDIAN
COMPUTATIONAL PROCEDURE
Median is the middle-most value of a given set of values when 1. Ungrouped Data
these values are arranged in an array. An array is a n arrangement of
A. Odd number of observations
values in increasing or decreasing order. In other words, the median is
the value that divides the set of values into 2 equal parts. 𝑛 + 1 𝑡ℎ
𝑥̃ = 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 ( ) 𝑖𝑡𝑒𝑚
2
Example:
18
𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛: Hence,
𝐴𝑟𝑟𝑎𝑛𝑔𝑒 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑎𝑐𝑐𝑜𝑟𝑑𝑖𝑛𝑔 𝑡𝑜 𝑖𝑡𝑠 𝑀𝑎𝑔𝑛𝑖𝑡𝑢𝑑𝑒 4𝑡ℎ 𝑖𝑡𝑒𝑚+5𝑡ℎ 𝑖𝑡𝑒𝑚 121+122
𝑥̃ = 2
= 2
25 27 28 30 30 32 33.
𝑛
𝑖( −<𝑐𝑓)
B. Even number of observations 𝑥̃ = 𝐿𝑀𝐷 + 2
𝑓𝑀𝐷
𝑛 𝑛
(2 ) 𝑡ℎ 𝑖𝑡𝑒𝑚 + (2 + 1) 𝑡ℎ 𝑖𝑡𝑒𝑚
𝑥̃ = Where:
2
𝐿𝑀𝐷 = 𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛
𝑖 = 𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒
Example: 𝑛 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
< 𝑐𝑓 = 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓
𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠 𝑏𝑒𝑙𝑜𝑤 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠
Find the median: 124, 120, 118, 122, 125, 121, 122, 120.
𝐴𝑟𝑟𝑎𝑛𝑔𝑒 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑎𝑐𝑐𝑜𝑟𝑑𝑖𝑛𝑔 𝑡𝑜 𝑖𝑡𝑠 𝑀𝑎𝑔𝑛𝑖𝑡𝑢𝑑𝑒: STEPS
118, 120, 120, 121, 122, 122, 124, 125 1. Construct the less than cumulative frequency.
𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛: 2. Determine the median class. This is the class containing
𝑛
one-half of the total frequency. To do this, compute 2 and
𝑛 8 𝑛 8 look under the less than cumulative frequency column to
2
=2=4 2
+1=2+1=5 𝑛
find the value immediately greater than 2 . The class
opposite is the median class.
3. Use the formula to find the median.
4. The median is used when the middle value is desired.
19
Note: 𝑛 50
= = 25 , < 𝑐𝑓 = 30 , 𝐿𝑀𝐷 = 170, 𝑖 = 5, 𝑓𝑀𝐷 = 9
2 2
𝑛
When the cumulative frequency is exactly equal to , the median is the 𝑛
2 𝑖( −<𝑐𝑓)
upper boundary of the class interval which takes half of the cases. 2
𝑥̃ = 𝐿𝑀𝐷 + 𝑓𝑀𝐷
𝑛
Example: 𝑖( −<𝑐𝑓)
2
𝑥̃ = 𝐿𝑀𝐷 + 𝑓𝑀𝐷
Compute the median of the length of 50 carrots planted in different
soil. 5(25−21)
𝑥̃ = 170 + 9
𝑥̃ = 170 + 2.2222
𝑥̃ = 172.2222
Example:
𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛:
20
𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛:
THE MODE
The Mode of any distribution is the value that value that appears the
most number of times.
𝑥̃ = 30 + 5.67 If all the scores in each set of data have equal frequency, then the
data has no mode.
𝑥̃ = 35.67
21
PROPERTIES OF THE MODE
1. The mode is the most appropriate measure of the central 2. Grouped Data
tendency when data are nominal in scale.
If the data is arranged in a frequency distribution table,
2. The mode is the least reliable among the measures of central the mode may be computed using:
tendency because its value is undefined in some distributions.
3. The mode is used when we want to find the value which occurs A. The Moment of Force Method
most often. B. The Empirical Method
22
http://www.dronstudy.com/book/meanmedianmode-of-grouped-
datacumulative-frequency-graph-and-ogive-ex-9d-r-s-aggarwal/
23
24
Read more: https://www.emathzone.com/tutorials/basic-
statistics/example-construction-of-frequency-
distribution.html#ixzz57qcafbTR
25
Grouped Frequency Distribution
A frequency distribution where several numbers are
grouped into one class.
Class Limits
Separate one class in a grouped frequency distribution
from another. The limits could actually appear in the data
and have gaps between the upper limit of one class and
the lower limit of the next.
Class Boundaries
http://www.emathzone.com/tutorials/basic-statistics/example- Separate one class in a grouped frequency distribution
construction-of-frequency-distribution.html from another. The boundaries have one more decimal
place than the raw data and therefore do not appear in the
data. There is no gap between the upper boundary of one
class and the lower boundary of the next class. The lower
class boundary is found by subtracting 0.5 units from the
lower class limit and the upper class boundary is found by
adding 0.5 units to the upper class limit.
Class Width
Definitions The difference between the upper and lower boundaries of
any class. The class width is also the difference between
Raw Data the lower limits of two consecutive classes or the upper
Data collected in original form. limits of two consecutive classes. It is not the difference
Frequency between the upper and lower limits of the same class.
The number of times a certain value or class of values Class Mark (Midpoint)
occurs. The number in the middle of the class. It is found by
Frequency Distribution adding the upper and lower limits and dividing by two. It
The organization of raw data in table form with classes can also be found by adding the upper and lower
and frequencies. boundaries and dividing by two.
Categorical Frequency Distribution Cumulative Frequency
A frequency distribution in which the data is only nominal The number of values less than the upper class boundary
or ordinal. for the current class. This is a running total of the
Ungrouped Frequency Distribution frequencies.
A frequency distribution of numerical data. The raw data Relative Frequency
is not grouped.
26
The frequency divided by the total frequency. This gives Stem and Leaf Plot
the percent of values falling in that class. A data plot which uses part of the data value as the stem
Cumulative Relative Frequency (Relative Cumulative Frequency) and the rest of the data value (the leaf) to form groups or
The running total of the relative frequencies or the classes. This is very useful for sorting data quickly.
cumulative frequency divided by the total frequency.
Gives the percent of the values which are less than the
upper-class boundary.
Histogram
A graph which displays the data by using vertical bars of
various heights to represent frequencies. The horizontal
axis can be either the class boundaries, the class marks, or
the class limits.
Frequency Polygon
A line graph. The frequency is placed along the vertical
axis and the class midpoints are placed along the
horizontal axis. These points are connected with lines.
Ogive
A frequency polygon of the cumulative frequency or the
relative cumulative frequency. The vertical axis the
cumulative frequency or relative cumulative frequency.
The horizontal axis is the class boundaries. The graph
always starts at zero at the lowest class boundary and will
end up at the total frequency (for a cumulative frequency)
or 1.00 (for a relative cumulative frequency).
Pareto Chart
A bar graph for qualitative data with the bars arranged
according to frequency.
Pie Chart
Graphical depiction of data as slices of a pie. The
frequency determines the size of the slice. The number of
degrees in any slice is the relative frequency times 360
degrees.
Pictograph
A graph that uses pictures to represent data.
27