Statistics

TERMINOLOGIES IN STATISTICS
STATISTICS
1. Population refers to the totality of the observations with which
Is the science that deals with the collection, organization or we are concerned.
presentation, analysis and interpretation of quantitative data for 2. Sample is small part of population. It could also be referred to
decision- making process. as a subgroup, subset, or representative of a population.
3. Parameter is any numerical value describing a characteristic of
a population.
PHASES OF STATISTICS 4. Statistic is any numerical value describing a characteristic of a
sample. It is an estimate of a parameter. It is a value or
I. Descriptive Statistics
measurement obtained from a sample.
II. Inferential Statistics
5. Data are facts or a set of information or observations under
consideration, gathered by a researcher from a population or
from a sample.
Descriptive statistics
- Is composed of those methods concerning collection and Data may be classified into two:
description of a set of data to yield meaningful information.
5.1 Qualitative Data are data which assume values that
- Is a mathematical method used to summarize a set of data. manifest the concept of attributes. These are also called
categorical data. (𝑄𝑢𝑎𝑙𝑖𝑡𝑎𝑡𝑖𝑣𝑒 = 𝑄𝑢𝑎𝑙𝑖𝑡𝑦)
Example: Color of the skin Place of birth

Civil Status Color of the sky
Inferential Statistics
Nationality
- Is composed of those methods concerned with the analysis of a
smaller group of data (sample) leading to predictions or inferences 5.2 Quantitative Data are data which are numerical in
about a larger set of data (population) from which the sample is drawn. nature. These are obtained from counting or measuring.
(𝑄𝑢𝑎𝑛𝑡𝑖𝑡𝑎𝑡𝑖𝑣𝑒 = 𝑄𝑢𝑎𝑛𝑡𝑖𝑡𝑦)
Example: Age of Teachers

Number of students in a room
Grade point average
Speed of a car
NOTE: Quantitative observations are numerical values

(can be discrete or continuous variables)
1
SCALES OF MEASUREMENT
6. Variable is a characteristic of population or sample which
differentiates members from each other. Measurement is defined as the assignment of symbols or
numerals to objects or events according to some rules.
Variables may be classified into two:
In Statistics, there are four different scales of measurement,
6.1 Discrete Variable is one that can assume specific namely:
values only (Whole numbers) the values of a discrete
variable are obtained through the process of counting. 1. Nominal scale
This is the most primitive level of measurement.
Example:
This scale is used to distinguish one object from
another for identification purposes.
 number of students present
 number of red marbles in a jar Example:
 students’ grade level
 all integers from 1 to 100  Gender (Male, Female)
 Political Party (Democratic, Republican)
6.2 Continuous Variable is one that can assume infinite  Race
values within a specific interval. The values of a  Religion
continuous variable are obtained through measuring.
2. Ordinal Scale
Example:
In this scale, data are arranged in some specified
 height of students in class order or rank. This measurement allows us compare
 weight of students in class objects but we cannot know the degree of the difference.
 time it takes to get to school
 distance traveled between classes Example:
 Amount of sales of a sari-sari store
 Area of Land  Size (Small, Medium, Large)
 The first, third and fifth person in a race.
NOTE: Discrete data is counted
Continuous data is measured
2
3. Interval Scale SUMMATION NOTATION
If data are measured in the interval scale, we can The summation sign: Σ
determine the amount of difference between two objects or
data. Interval scale cannot be multiplied or divided.
(has unit distance and zero-point equality of intervals – and
+, -)
Example:
 Temperature, in degrees Fahrenheit
 Dates (data that has an arbitrary zero)
4. Ratio Scale This appears as the symbol, Σ, which is the

Greek upper-case letter, Σ. The summation
The ratio level has an absolute or true zero point. sign, Σ, instructs us to sum the elements of a
Numbers can be compared as multiples of one another, sequence. A typical element of the sequence
quantitative. which is being summed appears to the right of
the summation sign.
Example:
 Weight, Height, Length
 Area THEOREMS ON SUMMATION
PARAMETRIC VS NON-PARAMETRIC
I. The summation of the sum of two or more
variables is the sum of their summations.
I. Interval and ratio data are parametric, and are used Thus,
with parametric tools in which distributions are
predictable (and often Normal).
𝑛 𝑛 𝑛 𝑛
II. Nominal and Ordinal data are non-parametric, and ∑(𝑥𝑖 + 𝑦𝑖 + 𝑧𝑖 ) = ∑ 𝑥𝑖 + ∑ 𝑦𝑖 + ∑ 𝑧𝑖

𝑖=1 𝑖=1 𝑖=1 𝑖=1
don not assume any distribution. They are used with
non- parametric tools such as the histogram.
3
Sometimes, a mass of data is too large to handle so that grouping’
II. If c is a constant, then, is necessary to see notable features of the data and obtain appropriate
measurements.
𝑛 𝑛
∑ 𝑐𝑥𝑖 = 𝑐 ∑ 𝑥𝑖 Grouped data are data are organized and arranged into different classes
𝑖=1 𝑖=1 or categories
III. If c is a constant, then,

STEM AND LEAF PLOT
𝑛 𝑛
A Stem and Leaf plot is a method used to organize statistical data.
∑ 𝑐 = ∑ 𝑐𝑛 The greatest common place value of the data is used to form the
𝑖=1 𝑖=1
stem. The next greatest common place value is used to form the
leaves.
PRESENTATION OF DATA
Data must be presented in a systematic and organized manner so

that important characteristics can easily be seen. There are two ways of
classifying data:
I. Ungrouped Data
II. Grouped Data
Ungrouped data are data that are not systematically organized. If it is

arranged, however, the arrangement is only according to magnitude.
4
NOTE: The highest score obtained in the test is 50 and the lowest
score is 3. Furthermore, ten students got scores of 25 and below.
Ordered array Generally, the students’ performance is satisfactory with 21 of
them or 70% getting scores of 25 and above.
- Data arranged from smallest to largest (usually).
TEXTUAL PRESENTATION TABULAR PRESENTATION
Data may be presented in textual from or paragraph form. This Sometimes, it is quite hard to grasp the data when it is presented
involves enumerating significant characteristics and identifying in textual form. Hence, we may also present data by using tables.
notable features of the data. A table has the following parts:
1. Below are test scores of 30 students in a math quiz: 1. Table number - this is for easy reference to the table
2. Table title - briefly explains the content of the table
25 18 17 12 43 40 3. Column header - describes the data in each column
33 41 20 35 10 36 4. Row Classifier - shows the classes or categories
28 19 28 42 28 31 5. Body - the main part of the table
40 40 32 26 3 50 6. Source note - placed below the table when the data
26 15 10 35 29 30 is obtained from another source.
Arranging the scores from lowest to highest using stem and leaf plot.
S L
0 3
1 0025789
2 05668889
3 0123556
4 000123
5 0
5
II. Grouped Data
THE FREQUENCY DISTRIBUTION TABLE
Large mass of data can be readily analyzed by grouping

A frequency distribution table is a table which shows the data the data into different classes and determining the number
arranged into different classes and the number of cases which fall into of observations that fall within each class. Such grouping,
each class. in tabular form, is called a frequency distribution.
The smallest and the largest values that can fall within
the class interval are referred to as the class limits.
I. Ungrouped data
A. Upper class limit
The frequency distribution table for ungrouped data is the - The highest value that can go in a
arrangement of values from according to magnitude showing the class.
frequency of occurrence of each value in the data. This is used B. Lower class limit
when the number of items is too large and the range of values is - The smallest value that can go in a
not too wide. class.
Example: A more precise expression of the class interval is called

the class boundaries.
Class boundary
- the numbers used to separate
classes. The size of the gap between
classes is the difference between the
upper-class limit of one class and
the lower-class limit of the next
class (usually by 0.5).
The number of observations falling within a specific

class is called the frequency.
6
The numerical difference between the upper and lower-class CLASS MID POINT OR CLASS MARK
boundaries of a class interval (any class) is defined to be the
class width or class size. The mid value or central value of the class interval is
called mid-point.
The midpoint between the upper and lower-class boundaries
is called the class mark.
𝐶𝑙 +𝐶ℎ
The number of observations may be accumulated either from 𝑥𝑖 =
2
the highest-class interval to the lowest-class interval. The
accumulated values, in this case, is referred to as the greater
than cumulative frequency. If the number of observation from Where:
the lowest class interval, it is called less than cumulative
frequency. 𝐶𝑙 = 𝐿𝑜𝑤𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 𝑜𝑓 𝐶𝑙𝑎𝑠𝑠
𝐶ℎ = 𝑈𝑝𝑝𝑒𝑟 𝑙𝑖𝑚𝑖𝑡 𝑜𝑓 𝐶𝑙𝑎𝑠𝑠
MAGNITUDE OF CLASS INTERVAL

STURGES FORMULA TO FIND SIZE OF CLASS
The magnitude of class interval depends on range and number INTERVAL (h)
of classes. The range is the difference between the highest and
lowest values in the data series. A class interval is generally in the
𝑅
multiples of 5, 10, 15 and 20. ℎ=
𝐾
STURGES FORMULA
A rule for determining number of classes to use in a Histogram or CONSTRUCTION OF FREQUENCY DISTRIBUTION
frequency distribution table (Approximation).
Following steps are involved in the construction of a
frequency distribution.
𝐾 = 1 + 3.322 log10 𝑛
Where:
K = No. of class
n = Is the size of the data
7
(1) Find the range of the data: The range is the difference between Leaf plot:
the largest and the smallest values.
S L
(2) Decide the approximate number of classes: Which the data are to
be grouped. There are no hard and first rules for number of classes. 1 25
Most of the cases we have 5 5 to 20 20 classes. H.A. Sturges has given 2 1367
a formula for determining the approximation number of classes. 3 0345689
4 1233467788
(3) Determine the approximate class interval size: The size of class
5 0011223445667899
interval is obtained by dividing the range of data by number of classes
and denoted by (h) class interval size 6 02345578
7 257
NOTE: In case of fractional results, the next higher whole number is
taken as the size of the class interval.
𝑅 = 𝐻 − 𝐿 = 77 − 12
(4) Decide the starting point: The lower-class limits or class boundary
should cover the smallest value in the raw data. It is a multiple of class 𝑅 = 65
interval.
EXAMPLE:
𝐾 = 1 + 3.322 log10 𝑛
𝐾 = 1 + 3.322 log10 50
Construct a frequency distribution with suitable class interval size of 𝐾 = 6.64 ≈ 7
marks obtained by 50 students of a class are given below:
𝐾=7
23, 50, 38, 42, 63, 75, 12, 33, 26, 39, 35, 47, 43, 52, 56, 59, 64, 77, 15,
21, 51, 54, 72, 68, 36, 65, 52, 60, 27, 34, 47, 48, 55, 58, 59, 62, 51, 48, 𝑅 65
ℎ= = = 9.3 𝑜𝑟 10
50, 41, 57, 65, 54, 43, 56, 44, 30, 46, 67, 53 𝐾 7
8
Cumulative Frequency Distribution Note: To find the class boundaries, we take half of the difference
between the lower-class limit of the 2nd class and the upper-class limit
20−19
of the 1st class 2 = 0.5. This value is subtracted from the lower-
The total frequency of all classes less than the upper-class boundary of class limit and is added to the upper-class limit to get the required class
a given class is called the cumulative frequency of that class. A table boundaries.
showing the cumulative frequencies is called a cumulative frequency

distribution. There are two types of cumulative frequency distributions. Graphical Presentation
Some readers find the graphical presentation easier to comprehend

Less than cumulative frequency distribution: than the tabular or textual presentation aside from it adds life and
beauty to one’s work.
It is obtained by adding successively the frequencies of all the previous
A bar chart is a graph represented by either vertical or horizontal
classes including the class against which it is written. The cumulate is rectangles.
started from the lowest to the highest size.
More than cumulative frequency distribution:

A. Vertical Bar Chart
It is obtained by finding the cumulate total of frequencies starting from

the highest to the lowest class.
9
B. Horizontal Bar Chart
A line graph is used to show continuing data; how one thing is affected
A circle or pie graph is used to show how a part of something relates
by another. It’s clear to see how things are going by the rises and falls a
to the whole. This kind of graph is needed to show percentages
line graph shows. This kind of graph is needed to show the effect of an
effectively.
independent variable on a dependent variable.
10
A frequency polygon is a line graph whose bases are the class marks
and whose heights are the frequencies.
A histogram is a graph represented by vertical rectangles whose bases

are the class marks and whose heights are the frequencies.
11
An ogive is a line graph where the bases are the class boundaries and THE MEAN (AVERAGE VALUE)
the heights are the < 𝑐𝑓 for the less than ogive and > 𝑐𝑓 for the
greater than ogive. Among the measures of central tendency, the mean is the most
popular and widely used. It is also called the arithmetic mean.
The mean of as set of values or measurements is the sum of all the

measurements divided by the number of measurements in the set.
PROPERTIES OF THE MEAN
1. The mean is the most appropriate measure of central tendency

when the data are in the interval or ratio scale.
2. The mean lies between the largest and smallest values or

measurements
3. The value of the mean is unique for a given set of data.
4. The mean is easily affected by extreme values since all values

contribute to the average.
MEASURES OF CENTRAL TENDENCY
COMPUTATIONAL PROCEDURE
A score that indicates where the center of the distribution tends to be
1. Ungrouped Data
located.
A. The simple Arithmetic Mean
For an ungrouped or raw data, the mean, denoted

by 𝑥̅ , has the following formula:
12
Σ𝑥
𝑥̅ = Where:
𝑛
Where: 𝑓 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠

𝑥𝑚 = 𝑚𝑖𝑑𝑝𝑜𝑖𝑛𝑡 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑐𝑙𝑎𝑠𝑠 (𝑐𝑙𝑎𝑠𝑠𝑚𝑎𝑟𝑘)
𝑥 = 𝑡ℎ𝑒 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑚𝑒𝑛𝑡𝑠 𝑛 = 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
𝑛 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠𝑜𝑟 𝑚𝑒𝑎𝑠𝑢𝑟𝑒𝑚𝑒𝑛𝑡𝑠 Σ𝑓𝑥𝑚 = 𝑠𝑢𝑚 𝑜𝑓 𝑡ℎ𝑒 𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑠 𝑜𝑓 𝑚𝑖𝑑 −
𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝑎𝑛𝑑 𝑡ℎ𝑒𝑖𝑟 𝑐𝑜𝑟𝑟𝑒𝑠𝑝𝑜𝑛𝑑𝑖𝑛𝑔 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
Example:
A researcher wants to determine the average age of working students Example 1:

A random sample of 10 students working in one branch of Jollibee were
asked about their ages. The following information were obtained: The masses of 60 potatoes are measured. The table shows results,
18, 20,18,20,18,22,24,27,25. Compute the mean age.
Solution:
18+20+18+20+18+22+24+27+25
𝑥̅ = 9
𝑥̅ = 21
2. Grouped Data Calculate an estimate of the mean.
A. The Long Method Solution:
The formula for finding the mean using the long method is as
follows:
Σ𝑓𝑥𝑚
𝑥̅ = 𝑛
13
Σ𝑓𝑥𝑚 1950 Σ𝑓𝑥𝑚 1675
𝑥̅ = 𝑛
= 60
𝑥̅ = =
𝑛 40
𝑥̅ = 32.50
𝑥̅ = 41.875
Example 2:
This indicates that the mean score in statistics quiz of the 40
Below is the frequency distribution of the scores of 40 students in a students is 41.875
quiz in statistics:
B. The Deviation Method
An easy method of finding the mean by using the deviation

method has the following formula:
Σ𝑓𝑑
𝑥̅ = ̅̅̅
𝑥𝑜 +
𝑛
Where:
̅̅̅̅
𝑥𝑜 = 𝑎𝑠𝑠𝑢𝑚𝑒𝑑 𝑚𝑒𝑎𝑛
Calculate an estimate of the mean. 𝑓 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
𝑑 = 𝑑𝑒𝑣𝑖𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑟𝑜𝑚 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛 (𝑑 = 𝑥𝑚 − 𝑥𝑜 )
Solution:
𝑛 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
14
Example 2:
Below is the frequency distribution of the scores of 40 students in a

quiz in statistics: (Solve using deviation method)
Solution:
C. The Coded Method
Another procedure of finding the mean of a given set of

grouped data is the using the coded method which ha the
following formula:
𝑖Σ𝑓𝑢
𝑥̅ = ̅̅̅
𝑥𝑜 +
𝑛
15
Where:
̅̅̅̅
𝑥𝑜 = 𝑎𝑠𝑠𝑢𝑚𝑒𝑑 𝑚𝑒𝑎𝑛
𝑓 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠
𝑥𝑚 − ̅̅̅̅
𝑥𝑜 𝑑
𝑢 = 𝑐𝑜𝑑𝑒 (𝑢 = = )
𝑖 𝑖
i = class size
𝑛 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
Example 2:
Below is the frequency distribution of the scores of 40 students in a

quiz in statistics: (Solve using coded method) 1. The number of incorrect answers on a multiple-choice
competency exam for a random sample of 12 students are as
follows: 2, 1, 3, 0, 1, 3, 6, 3, 3, 5, 2, 1. Find the mean number of
mistakes.
𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛:
Σ𝑥
𝑥̅ =
𝑛
2+1+3+0+1+3+6+3+3+5+1
𝑥̅ =
12
𝑥̅ = 2.4166
16
2. A family recorded their electrical consumption for the following 4. Given the following frequency distribution. Compute the mean
period. using a.) the long method b.) the deviation method and c.) the
coded method.
Σ𝑥
𝑥̅ =
𝑛
140 + 148 + 165 + 174 + 159 + 143
𝑥̅ =
6
𝑥̅ = 154.833
3. The average IQ of 10 students in a mathematics course is 114.

If 9 of the students have IQ’s of 101, 125, 118, 128, 106, 115, 99,
118, and 109. What must be the other IQ.
Σ𝑥
𝑥̅ =
𝑛
101 + 125 + 118 + 128 + 106 + 115 + 99 + 118 + 109
𝑥̅ =
9
Σ𝑓𝑥𝑚
1019 a. 𝑥̅ = 𝑛
𝑥̅ = 6423
9 𝑥̅ =
44
𝑥̅ = 113.22 𝑥̅ = 145.977
17
Σ𝑓𝑑 PROPERTIES OF THE MEDIAN
b. 𝑥̅ = ̅̅̅
𝑥𝑜 + 𝑛
−45
𝑥̅ = 147 +
44 Below are the properties or characteristics of the median of any
𝑥̅ = 145.977 distribution:
1. The median is the most appropriate measure of the central

iΣ𝑓𝑢 tendency for interval data.
c. 𝑥̅ = ̅̅̅
𝑥𝑜 + 𝑛
5(−9)
𝑥̅ = 147 + 2. The median lies between the highest and lowest values
44
3. There is only one value for the median for a given set of values.
𝑥̅ = 145.977
4. The median is not affected by extreme values.
`
THE MEDIAN
Another measure of central tendency is the median, denoted by 𝑥

̃. NOTE: Before doing the computation, the values must first be
arranged in an array.
Median is the middle-most value of a given set of values when 1. Ungrouped Data
these values are arranged in an array. An array is a n arrangement of
A. Odd number of observations
values in increasing or decreasing order. In other words, the median is
the value that divides the set of values into 2 equal parts. 𝑛 + 1 𝑡ℎ
𝑥̃ = 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 ( ) 𝑖𝑡𝑒𝑚
2
Example:
Find the median of the following set of measurements:

25, 32, 33, 27, 30, 30, 28.
18
𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛: Hence,
𝐴𝑟𝑟𝑎𝑛𝑔𝑒 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑎𝑐𝑐𝑜𝑟𝑑𝑖𝑛𝑔 𝑡𝑜 𝑖𝑡𝑠 𝑀𝑎𝑔𝑛𝑖𝑡𝑢𝑑𝑒 4𝑡ℎ 𝑖𝑡𝑒𝑚+5𝑡ℎ 𝑖𝑡𝑒𝑚 121+122
𝑥̃ = 2
= 2
25 27 28 30 30 32 33.
𝑛+1 7+1 𝑥̃ = 121.5

( )= =4
2 2
Therefore, the median is the 4th item. Thus, 2. Grouped Data
When data are grouped in a frequency distribution, the

𝑥̃ = 30 median is computed using the following formula:
𝑛
𝑖( −<𝑐𝑓)
B. Even number of observations 𝑥̃ = 𝐿𝑀𝐷 + 2
𝑓𝑀𝐷
𝑛 𝑛
(2 ) 𝑡ℎ 𝑖𝑡𝑒𝑚 + (2 + 1) 𝑡ℎ 𝑖𝑡𝑒𝑚
𝑥̃ = Where:
2
𝐿𝑀𝐷 = 𝑙𝑜𝑤𝑒𝑟 𝑐𝑙𝑎𝑠𝑠 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛
𝑖 = 𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒
Example: 𝑛 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
< 𝑐𝑓 = 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 𝑐𝑢𝑚𝑢𝑙𝑎𝑡𝑖𝑣𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓
𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠 𝑏𝑒𝑙𝑜𝑤 𝑡ℎ𝑒 𝑚𝑒𝑑𝑖𝑎𝑛 𝑐𝑙𝑎𝑠𝑠
Find the median: 124, 120, 118, 122, 125, 121, 122, 120.
𝐴𝑟𝑟𝑎𝑛𝑔𝑒 𝑡ℎ𝑒 𝑑𝑎𝑡𝑎 𝑎𝑐𝑐𝑜𝑟𝑑𝑖𝑛𝑔 𝑡𝑜 𝑖𝑡𝑠 𝑀𝑎𝑔𝑛𝑖𝑡𝑢𝑑𝑒: STEPS
118, 120, 120, 121, 122, 122, 124, 125 1. Construct the less than cumulative frequency.
𝑆𝑜𝑙𝑢𝑡𝑖𝑜𝑛: 2. Determine the median class. This is the class containing
𝑛
one-half of the total frequency. To do this, compute 2 and
𝑛 8 𝑛 8 look under the less than cumulative frequency column to
2
=2=4 2
+1=2+1=5 𝑛
find the value immediately greater than 2 . The class
opposite is the median class.
3. Use the formula to find the median.
4. The median is used when the middle value is desired.
19
Note: 𝑛 50
= = 25 , < 𝑐𝑓 = 30 , 𝐿𝑀𝐷 = 170, 𝑖 = 5, 𝑓𝑀𝐷 = 9
2 2
𝑛
When the cumulative frequency is exactly equal to , the median is the 𝑛
2 𝑖( −<𝑐𝑓)
upper boundary of the class interval which takes half of the cases. 2
𝑥̃ = 𝐿𝑀𝐷 + 𝑓𝑀𝐷
𝑛
Example: 𝑖( −<𝑐𝑓)
2
𝑥̃ = 𝐿𝑀𝐷 + 𝑓𝑀𝐷
Compute the median of the length of 50 carrots planted in different
soil. 5(25−21)
𝑥̃ = 170 + 9
𝑥̃ = 170 + 2.2222
𝑥̃ = 172.2222
Example:
Given below is the age of distribution of patience in

hospital that has dengue fever in the town of Santa
Rosa. Compute the median of the given data.
20
THE MODE
Another measure of central tendency is the mode, denoted by 𝑥̂
The Mode of any distribution is the value that value that appears the
most number of times.
The mode is the least common among the three measures of

central tendency. However, it is very useful to measure
popularity. For example, we might be interested to know the most
preferred brand of shampoo, the most saleable brand of bag, or
the most popular noontime show.
𝑛 100
= = 50 , < 𝑐𝑓 = 33 , 𝐿𝑀𝐷 = 30, 𝑖 = 10, 𝑓𝑀𝐷 = 30
2 2
A given distribution may be:
𝑛
𝑖( −<𝑐𝑓)
2
𝑥̃ = 𝐿𝑀𝐷 + 1. Unimodal - if there is only one mode
𝑓𝑀𝐷
2. Bimodal - if there are two modes; or
10(50−33) 3. Multimodal - if there are more than two modes.
𝑥̃ = 30 + 30
𝑥̃ = 30 + 5.67 If all the scores in each set of data have equal frequency, then the
data has no mode.
𝑥̃ = 35.67
21
PROPERTIES OF THE MODE
1. The mode is the most appropriate measure of the central 2. Grouped Data
tendency when data are nominal in scale.
If the data is arranged in a frequency distribution table,
2. The mode is the least reliable among the measures of central the mode may be computed using:
tendency because its value is undefined in some distributions.
3. The mode is used when we want to find the value which occurs A. The Moment of Force Method
most often. B. The Empirical Method
4. The mode is a quick approximation of the average. It is Note:

sometimes referred to as an inspection average.
The results using the above methods may not be the same.
5.
A. The Moment of Force Method
The formula for finding the mode using the
1. Ungrouped Data moment of force method is:
The mode for ungroup data is done by inspection.
𝑑1
Example: 𝑥̂ = 𝐿𝑀𝑜 + 𝑖 ( )
𝑑1 + 𝑑2
Given an array of 10 students who participated in the Where:

University Math contest:
𝐿𝑚𝑜 = 𝑙𝑜𝑤𝑒𝑟 𝑏𝑜𝑢𝑛𝑑𝑎𝑟𝑦 𝑜𝑓 𝑡ℎ𝑒 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠
15, 16, 17, 17, 17, 17, 18, 18, 18, 19. 𝑖 = 𝑐𝑙𝑎𝑠𝑠 𝑠𝑖𝑧𝑒
𝑑1 = 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑡ℎ𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒
By inspection, the modal age is 17, since it is the most 𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠 𝑏𝑒𝑙𝑜𝑤 𝑖𝑡
𝑑2 = 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑡ℎ𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑡ℎ𝑒
frequent of the most common value.
𝑚𝑜𝑑𝑎𝑙 𝑐𝑙𝑎𝑠𝑠 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠 𝑎𝑏𝑜𝑣𝑒 𝑖𝑡.
22
http://www.dronstudy.com/book/meanmedianmode-of-grouped-
datacumulative-frequency-graph-and-ogive-ex-9d-r-s-aggarwal/
23
24
Read more: https://www.emathzone.com/tutorials/basic-
statistics/example-construction-of-frequency-
distribution.html#ixzz57qcafbTR
Read more: https://www.emathzone.com/tutorials/basic-

statistics/cumulative-frequency-
distribution.html#ixzz57lY74bMV
25
Grouped Frequency Distribution
A frequency distribution where several numbers are
grouped into one class.
Class Limits
Separate one class in a grouped frequency distribution
from another. The limits could actually appear in the data
and have gaps between the upper limit of one class and
the lower limit of the next.
Class Boundaries
http://www.emathzone.com/tutorials/basic-statistics/example- Separate one class in a grouped frequency distribution
construction-of-frequency-distribution.html from another. The boundaries have one more decimal
place than the raw data and therefore do not appear in the
data. There is no gap between the upper boundary of one
class and the lower boundary of the next class. The lower
class boundary is found by subtracting 0.5 units from the
lower class limit and the upper class boundary is found by
adding 0.5 units to the upper class limit.
Class Width
Definitions The difference between the upper and lower boundaries of
any class. The class width is also the difference between
Raw Data the lower limits of two consecutive classes or the upper
Data collected in original form. limits of two consecutive classes. It is not the difference
Frequency between the upper and lower limits of the same class.
The number of times a certain value or class of values Class Mark (Midpoint)
occurs. The number in the middle of the class. It is found by
Frequency Distribution adding the upper and lower limits and dividing by two. It
The organization of raw data in table form with classes can also be found by adding the upper and lower
and frequencies. boundaries and dividing by two.
Categorical Frequency Distribution Cumulative Frequency
A frequency distribution in which the data is only nominal The number of values less than the upper class boundary
or ordinal. for the current class. This is a running total of the
Ungrouped Frequency Distribution frequencies.
A frequency distribution of numerical data. The raw data Relative Frequency
is not grouped.
26
The frequency divided by the total frequency. This gives Stem and Leaf Plot
the percent of values falling in that class. A data plot which uses part of the data value as the stem
Cumulative Relative Frequency (Relative Cumulative Frequency) and the rest of the data value (the leaf) to form groups or
The running total of the relative frequencies or the classes. This is very useful for sorting data quickly.
cumulative frequency divided by the total frequency.
Gives the percent of the values which are less than the
upper-class boundary.
Histogram
A graph which displays the data by using vertical bars of
various heights to represent frequencies. The horizontal
axis can be either the class boundaries, the class marks, or
the class limits.
Frequency Polygon
A line graph. The frequency is placed along the vertical
axis and the class midpoints are placed along the
horizontal axis. These points are connected with lines.
Ogive
A frequency polygon of the cumulative frequency or the
relative cumulative frequency. The vertical axis the
cumulative frequency or relative cumulative frequency.
The horizontal axis is the class boundaries. The graph
always starts at zero at the lowest class boundary and will
end up at the total frequency (for a cumulative frequency)
or 1.00 (for a relative cumulative frequency).
Pareto Chart
A bar graph for qualitative data with the bars arranged
according to frequency.
Pie Chart
Graphical depiction of data as slices of a pie. The
frequency determines the size of the slice. The number of
degrees in any slice is the relative frequency times 360
degrees.
Pictograph
A graph that uses pictures to represent data.
27

Statistics

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Statistics

Caricato da

Copyright:

Formati disponibili

TERMINOLOGIES IN STATISTICS

Example: Color of the skin Place of birth

Example: Age of Teachers

NOTE: Quantitative observations are numerical values

4. Ratio Scale This appears as the symbol, Σ, which is the

II. Nominal and Ordinal data are non-parametric, and ∑(𝑥𝑖 + 𝑦𝑖 + 𝑧𝑖 ) = ∑ 𝑥𝑖 + ∑ 𝑦𝑖 + ∑ 𝑧𝑖

III. If c is a constant, then,

Data must be presented in a systematic and organized manner so

Ungrouped data are data that are not systematically organized. If it is

TEXTUAL PRESENTATION TABULAR PRESENTATION

Large mass of data can be readily analyzed by grouping

Example: A more precise expression of the class interval is called

The number of observations falling within a specific

MAGNITUDE OF CLASS INTERVAL

showing the cumulative frequencies is called a cumulative frequency

Some readers find the graphical presentation easier to comprehend

More than cumulative frequency distribution:

It is obtained by finding the cumulate total of frequencies starting from

A histogram is a graph represented by vertical rectangles whose bases

The mean of as set of values or measurements is the sum of all the

PROPERTIES OF THE MEAN

1. The mean is the most appropriate measure of central tendency

2. The mean lies between the largest and smallest values or

3. The value of the mean is unique for a given set of data.

4. The mean is easily affected by extreme values since all values

For an ungrouped or raw data, the mean, denoted

Where: 𝑓 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠

A researcher wants to determine the average age of working students Example 1:

2. Grouped Data Calculate an estimate of the mean.

A. The Long Method Solution:

An easy method of finding the mean by using the deviation

Below is the frequency distribution of the scores of 40 students in a

C. The Coded Method

Another procedure of finding the mean of a given set of

Below is the frequency distribution of the scores of 40 students in a

3. The average IQ of 10 students in a mathematics course is 114.

1. The median is the most appropriate measure of the central

Another measure of central tendency is the median, denoted by 𝑥

Find the median of the following set of measurements:

𝑛+1 7+1 𝑥̃ = 121.5

Therefore, the median is the 4th item. Thus, 2. Grouped Data

When data are grouped in a frequency distribution, the

Given below is the age of distribution of patience in

Another measure of central tendency is the mode, denoted by 𝑥̂

The mode is the least common among the three measures of

4. The mode is a quick approximation of the average. It is Note:

Given an array of 10 students who participated in the Where:

Read more: https://www.emathzone.com/tutorials/basic-

Potrebbero piacerti anche