Sei sulla pagina 1di 64

MODULE 1

DATA COLLECTION &


PRESENTATION
Participation in this module constitutes as waiver of rights
conferred by the Data Privacy Law and participants’ hereof
consent to the use, posting, publication and/or circulation of this
module in social media and other public spheres, and/or use of
the module by the university in publications, research,
government reporting, commercial and advertising purposes,
and all other purposes in relation to university administrative and
curricular activities. Participants of this module who do not agree
to this waiver and consent are required not to download this
module.

DATA PRIVACY NOTIFICATION


QUICK RECAP
QUICK RECAP: STATISTICS

ENGINEERING METHOD
QUICK RECAP: STATISTICS
Statistics is the science of collecting, organizing, presenting, analyzing, and
interpreting numerical data to assist in making more effective decisions.

SAMPLE
POPULATION is a portion, or part, of the population
of interest
is a collection of all possible individuals,
objects, or measurements of interest. SAMPLE SIZE
The total number of things in the
sample
QUICK RECAP: STATISTICS

DESCRIPTIVE
STATISTICS
uses the data to provide descriptions of the
population, either through numerical
calculations or graphs or tables.

INFERENTIAL STATISTICS
makes inferences and predictions about a
population based on a sample of data taken from
the population in question.
QUICK RECAP: STATISTICS
Types of Variables

Qualitative/Categorical Quantitative/Numerical

Brand Discrete Continuous


Marital Status
Color
Children in a family Amount of income
Countable Basketball shots tax paid
Data TV sets owned Weight of a student
Yearly rainfall in
Measurable Data Cebu
Intervals between
whole numbers
QUICK RECAP: STATISTICS

OTHER TERMINOLOGIES

Univariate data – result when a single variable is measured on


an experimental unit.

Bivariate data – result when two variables are measured on a


single experimental unit.

Multivariate data – result when more than two variables are


measured.
QUICK RECAP: STATISTICS

OBTAINING DATA
QUICK RECAP: STATISTICS

OBTAINING DATA

RETROSPECTIVE STUDIES

This type of study strictly uses historical data, data taken over a specific period of
time. In most cases, this type of study will be the least expensive. However, there are clear
disadvantages:

(i) Validity and reliability of historical data are often in doubt.


(ii) If time is an important aspect of the structure of the data, there may be data
missing.
(iii) There may be errors in collection of the data that are not known.
(iv) There is no control on the ranges of the measured variables (the factors in a study).
Indeed, the ranges found in historical data may not be relevant current studies
QUICK RECAP: STATISTICS

OBTAINING DATA

OBSERVATIONAL STUDIES
Observing the process or the population, disturbing it as little as possible,
and records the quantities of interest.

DESIGNED EXPERIMENTS
Deliberate or purposeful changes are made in the controllable variables in
the system or process, observes the resulting system output data, and then makes
inferences about which variables are responsible for the observed changes in
output performance.
DATA
ORGANIZATION
TOOLS FOR DESCRIBING DATA

Qualitative data are popularly summarized using:

BAR GRAPH PARETO CHART


35
31
30
24
25

20 18
14 13
15

10

0
first year second year third year fourth year fifth year
TOOLS FOR DESCRIBING DATA

Pareto Analysis is a simple Pareto Chart: Gross Profit $


900,000.00 1.20

technique for prioritizing 800,000.00


1.00 1.00
possible changes by 700,000.00 0.94

identifying the problems


600,000.00 0.80 0.80

500,000.00 0.65

that will be resolved by 400,000.00


0.47
0.60

making these changes. By 300,000.00 0.40

200,000.00

using this approach, you 100,000.00


0.20

can prioritize the individual - 0.00

10-10345 10-43445 10-10355 30-30325 40-55055


changes that will most Gross Profit $ Cummulative %

improve a situation. Part 10-10345 constitutes the top 20% of the company's gross profit.
This helps the management where to allocate most of their resources
to making that part is performing well in terms of quality, etc.
TOOLS FOR DESCRIBING DATA

Quantitative data are commonly summarized using:

HISTOGRAM DOTPLOTS
TOOLS FOR DESCRIBING DATA

Quantitative data are commonly summarized using:

BOXPLOTS
The advantage of Boxplots are that
they give the reader multiple
information without to having to take
much space in reports such as where
the median is located, the
Interquartile Range, outliers, and
skewness of the distribution
Read more about Boxplots here:
https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51
https://stattrek.com/statistics/charts/boxplot.aspx
TOOLS FOR DESCRIBING DATA

SHAPES OF DISTRIBUTION

BELL-SHAPED UNIFORM
TOOLS FOR DESCRIBING DATA

SHAPES OF DISTRIBUTION

RIGHT-SKEWED LEFT-SKEWED
Majority of the data are located at the left Majority of the data are located at the right
side and has a tail at the right side of the side and has a tail at the left side of the
distribution distribution
TOOLS FOR DESCRIBING DATA

SHAPES OF DISTRIBUTION

BIMODAL U-SHAPED
Bimodal distribution is a continuous probability distribution with two different modes or two peaks.
U-shaped distribution can still be categorized as bimodal dist.
TOOLS FOR DESCRIBING DATA
Bivariate quantitative data are summarized using

SCAT TERPLOTS
Scatterplots graphically
displays the relationship
of two variables. It shows
the trends and patterns in
the distribution or lack
thereof
TOOLS FOR DESCRIBING DATA
Multivariate quantitative data are summarized using:

BUBBLE GRAPH

Aside from GDP and ave.


years in the educational
system, you can observe
that the size of the bubble
corresponds to the Feel
Safe Score and the Color
corresponds to the
Satisfaction Rate
TOOLS FOR DESCRIBING DATA
Data collected over time are generally summarized using:

TIME-SERIES PLOTS
900
800
BOD (mg O2/L) 700
600 influent
500 effluent
400
300
200
100
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
weeks
DESCRIBING
DATA DISTRIBUTION
DATA DISTRIBUTION

You can describe your data distribution by:


I. Measures of Central Tendency
II. Measures of Variation
III.Measures of Relative Standing
DESCRIBING
DATA DISTRIBUTION:
MEASURES OF CENTRAL TENDENCY
DATA DISTRIBUTION

I. MEASURES OF CENTRAL TENDENCY OF DATA DISTRIBUTION

1. ARITHMETIC MEAN OR AVERAGE


a. Population: μ
 xi
b. Sample of n measurements : x=
n
the data value located exactly at the centermost position when
2. MEDIAN
the data set is arranged in order.
The median may be preferred to the mean if the data are highly
skewed.
DATA DISTRIBUTION

I. MEASURES OF CENTER OF DATA DISTRIBUTION

3. MODE the most frequently occurring data value

a) If all the elements in the data set have the same frequency of occurrence, then
the data set is said to have no mode.

b) If the data set has one value that occurs more frequently than the rest of the values,
then the data set is said to be unimodal.

c) If two elements of the data set are tied for the highest frequency of occurrence,
then the data set is said to be bimodal.
DATA DISTRIBUTION

I. MEASURES OF CENTER OF DATA DISTRIBUTION


DATA DISTRIBUTION

I. MEASURES OF CENTER OF DATA DISTRIBUTION


DATA DISTRIBUTION

I. MEASURES OF CENTER OF DATA DISTRIBUTION


DESCRIBING
DATA DISTRIBUTION:
MEASURES OF VARIABILITY
DATA DISTRIBUTION

II. MEASURES OF VARIABILITY

1 . R A N G E & I N T E R Q UA R T I L E R A N G E

RANGE = largest value – smallest value


IQR : measures the spread of the middle 50% of an
ordered data set.
If you observe the figure, the shaded part of
the distribution represents the middle 50% of
the distribution.
DATA DISTRIBUTION

II. MEASURES OF VARIABILITY

2. VARIANCE a way to measure how far a set of numbers is spread out.

 ( xi −  ) 2
a. Population variance :
 2
=
N
b. Sample variance (from a sample of n measurements) :

_ ( xi ) 2
 ( x i − x)  xi −
2 2
s =
2
= n
n−1 n−1
DATA DISTRIBUTION

II. MEASURES OF VARIABILITY

3. STANDARD DEVIATION
A low standard deviation means that most of the numbers are close to the
average. A high standard deviation means that the numbers are more spread out.

a. Population:
 =  2

b. Sample:
s= s 2
DATA DISTRIBUTION

II. MEASURES OF VARIABILITY

VARIANCE VS. STANDARD DEVIATION


In terms of function, the variance and the standard deviation
serves the same goal. But for interpretation reasons, It is
harder to interpret the variance since its measure is identified
in square units unlike the standard deviation which makes
more sense when we interpret the variability of the
distribution.
DATA DISTRIBUTION
SIGNIFICANCE OF STANDARD DEVIATION
The Empirical (Normal) Rule
The Empirical Rule describes
that Approximately 68%,
95%, and “almost all” of
the measurements are
within one , two, and three
standard deviations of the
mean, respectively.

But take note that is rule


only applies to bell-shaped
distributions and even
stated as approximately
bell-shaped
DATA DISTRIBUTION
SIGNIFICANCE OF STANDARD DEVIATION
Chebyshev’s Theorem

Chebyshev’s
theorem describes
that ¾ or 75% of the
data lie within 2
standard deviations
from the mean while
88.89% lie within 3
standard deviations
from the mean

Note: Go to notes of this slide for more on this topic


DESCRIBING
DATA DISTRIBUTION:
MEASURES OF RELATIVE STANDING
DATA DISTRIBUTION

III. MEASURE OF RELATIVE STANDING

1 . Z - S CO R E
➢ A z-score measures the distance between an observation and the mean,
measured in units of standard deviation.

➢ z-scores between -2 and +2 are highly likely.

➢ z-scores exceeding 3 in absolute value are very unlikely


and can be considered outliers (unusually large or small
observations).
DATA DISTRIBUTION

III. MEASURE OF RELATIVE STANDING

1 . Z - S CO R E
The sample z score of a value of x is a measure of relative standing
defined by

x − x
z =
s
DATA DISTRIBUTION

III. MEASURE OF RELATIVE STANDING

2 . P E R C E N T I L E & Q UA R T I L ES
Percentiles divide a data set into 100 equal parts. It is simply a measure that
tells us what percent of the total frequency of a data set was at or below that
measure.

# 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑏𝑒𝑙𝑜𝑤 𝑥
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑅𝑎𝑛𝑘 𝑜𝑓 𝑥 = 𝑥 100%
𝑛

𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒
𝑛𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑥 = (𝑛 + 1)
100
DATA DISTRIBUTION

III. MEASURE OF RELATIVE STANDING

2 . P E R C E N T I L E & Q UA R T I L ES
Example: 2,2,3,4,5,5,5,6,7,8,8,8,8,8,9,9,10,11,11,12

What is the percentile ranking of 10?


# 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑏𝑒𝑙𝑜𝑤 𝑥
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑅𝑎𝑛𝑘 𝑜𝑓 𝑥 = 𝑥 100%
𝑛
What value exists at the percentile ranking of 25%?
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒
𝑛𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑥 = (𝑛 + 1) Round to nearest
100 integer
DATA DISTRIBUTION

III. MEASURE OF RELATIVE STANDING


2 . P E R C E N T I L E & Q UA R T I L ES

5 7 9 23 25 29 30 33 34 35 40 41 48 50 53 54 55 58 58 59 61 61 65 65 66 68 70 72 73 74 78 79

What is the percentile ranking of 68?


25
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑅𝑎𝑛𝑘 𝑜𝑓 𝑥 = 𝑥 100% = 𝟕𝟖. 𝟏𝟑%
32
What value exists at the percentile ranking of 25%?
25 33 has a percentile
𝑛𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑥 = 32 + 1 = 𝟖. 𝟐𝟓 ≈ 𝟖 rank of 25
100
DATA DISTRIBUTION

III. MEASURE OF RELATIVE STANDING

2 . P E R C E N T I L E & Q UA R T I L ES
As the name suggests, Quartiles break the data set into 4 equal parts. The first
quartile, Q1, is the 25th percentile. The second quartile, Q2, is the 50th percentile.
The third quartile, Q3, is the 75th percentile. It's important to note that the
median is both the 50th percentile and the second quartile, Q2.
DATA DISTRIBUTION

III. MEASURE OF RELATIVE STANDING

2 . P E R C E N T I L E & Q UA R T I L ES
Example: 10 – 20 – 30 – 40 – 50 – 55 - 60 – 70 – 80 – 90 - 100

What is the first quartile? Answer. 30


The first quartile (Q1) is just the "median" of all the values to the left of the true median.
FREQUENCY DISTRIBUTION TABLE
FREQUENCY DISTRIBUTION TABLE

FREQUENCY DISTRIBUTION TABLE (FDC)


Sometimes, it can become overwhelming to the reader when we have
a large data set such as one with more than 500 observations

Frequency Distribution tables gives the reader a better overview of


the values in the distribution as well to as where most the frequencies
lie in which distinct values.
FREQUENCY DISTRIBUTION TABLE

TYPES OF FREQUENCY DISTRIBUTUTION TABLES

1. Ungrouped
2. Relative Frequency
3. Grouped
4. Cumulative Frequency
5. Relative Cumulative Frequency
FREQUENCY DISTRIBUTION TABLE
COMPONENTS OF FDC
Class Interval – these are numbers defining the class consisting of the end numbers called the
class limits (upper limit and lower limit)

Class Frequency (f) – shows the number of observations falling in the class

Class Boundaries – these are the so-called “true class limits” classified as:
- Lower Class Boundary (LCB) – middle value of the lower class limit of the
class and the upper class limit of the preceding class
- Upper Class Boundary (UCB) – middle value between the upper class limit
and the lower limit of the next class

Class Size – the difference between two consecutive upper limits or two consecutive lower limits

Class Marks (CM) – midpoint or the middle value of a class interval


DATA DISTRIBUTION

CONSTRUCTING A GROUPED FREQUENCY DISTRIBUTION TABLE

1. Find the range.


𝑅𝑎𝑛𝑔𝑒 = 𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 − 𝐿𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒

2. Determine the number of classes using Sturges’ Rule


𝑘 = 1 + 3.322 log 𝑛 → Round up to nearest integer
𝑤ℎ𝑒𝑟𝑒:
𝑘 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠𝑒𝑠
𝑛 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
DATA DISTRIBUTION
CONSTRUCTING A GROUPED FREQUENCY DISTRIBUTION TABLE
3. Determine the interval size by dividing the range by the desired number of
classes.
𝑖 = 𝑅𝑎𝑛𝑔𝑒Τ𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝐶𝑙𝑎𝑠𝑠𝑒𝑠 → Round up to nearest integer if not a
whole number
4. Determine the class limits of the class intervals (the 1st lower class limit should
be the lowest data value from the data set).
5. Tally the frequencies for each interval then get the sum.
6. Graph the frequencies
DATA DISTRIBUTION
CONSTRUCTING A FREQUENCY DISTRIBUTION TABLE

Example
The accompanying specific gravity values for various wood types used in
construction appeared in the article “Bolted Connection Design Values
Based on European Yield Model” (J. of Structural Engr., 1993: 2169-2189)

0.31 0.35 0.36 0.36 0.37 0.38 0.4 0.4 0.4


0.41 0.41 0.42 0.42 0.42 0.42 0.42 0.43 0.44
0.45 0.46 0.46 0.47 0.48 0.48 0.48 0.51 0.54
0.54 0.55 0.58 0.62 0.66 0.66 0.66 0.68 0.75
DATA DISTRIBUTION
I. CONSTRUCTING AN UNGROUPED FREQUENCY DISTRIBUTION
TABLE
SG Values Frequency SG Values Frequency
0.31 1 0.46 2
0.35 1 0.47 1 1. List unique values in a ascending/descending
0.36 2 0.48 3 order
0.37 1 0.51 1 2. Count the frequency of each value
0.38 1 0.54 2 3. Compute the total frequency
0.40 3 0.55 1
0.41 2 0.58 1
0.42 5 0.62 1
0.43 1 0.66 3
0.44 1 0.68 1
0.45 1 0.75 1
Total 19 17
Grand Total 36
DATA DISTRIBUTION

I. CONSTRUCTING AN UNGROUPED FREQUENCY DISTRIBUTION


TABLE
Ungrouped Frequency Distribution of SG Values
6

0
0.31 0.35 0.36 0.37 0.38 0.4 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.51 0.54 0.55 0.58 0.62 0.66 0.68 0.75
DATA DISTRIBUTION

II. CONSTRUCTING RELATIVE FREQUENCY DISTRIBUTION TABLE


SG Relative SG Relative 1. List unique values in
Frequency
Values Frequency Frequency Values Frequency
ascending order
0.31 1 0.03 0.47 1 0.03
0.35 1 0.03 0.48 3 0.08 2. Count the frequency of
0.36 2 0.06 0.51 1 0.03 each value
0.37 1 0.03 0.54 2 0.06 3. Compute the relative
0.38 1 0.03 0.55 1 0.03 frequency by dividing the
0.4 3 0.08 0.58 1 0.03 individual frequency by the
0.41 2 0.06 0.62 1 0.03
total frequency
0.42 5 0.14 0.66 3 0.08
0.43 1 0.03 0.68 1 0.03 4. Compute the total
0.44 1 0.03 0.75 1 0.03 frequency
0.45 1 0.03 Total 36 1
0.46 2 0.06
DATA DISTRIBUTION

II. CONSTRUCTING RELATIVE FREQUENCY DISTRIBUTION TABLE

Relative Frequency Distribution of SG Values


0.16

0.14

0.12

0.10

0.08

0.06

0.04

0.02

0.00
0.31 0.35 0.36 0.37 0.38 0.4 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.51 0.54 0.55 0.58 0.62 0.66 0.68 0.75
DATA DISTRIBUTION

III. CONSTRUCTING A GROUPED FREQUENCY DISTRIBUTION


TABLE
1. Find the range.
𝑅𝑎𝑛𝑔𝑒 = 0.75 − 0.31 = 0.44

2. Determine the number of classes using Sturges’ Rule


𝑘 = 1 + 3.322 log 36 → Round up to nearest integer
𝑘 = 6.17 ≈ 7 𝑐𝑙𝑎𝑠𝑠𝑒𝑠

3. Class Size
𝑖 = 0.44Τ7
𝑖 = 0.06 ≈ 0.07 0.06 normally would be rounded up to the nearest integer but since the data
are all less than 1, having an interval of 1 per class would be useless thus, we
will retain 0.06 rounded up to 0.07 as the class size
DATA DISTRIBUTION
III. CONSTRUCTING A GROUPED FREQUENCY DISTRIBUTION
TABLE
4. Identify Lower & Upper Class Limits
Class Lower Class Upper Class
1 0.31 0.37
2 0.38 0.44
3 0.45 0.51
4 0.52 0.58
5 0.59 0.65
6 0.66 0.72
7 0.73 0.79
DATA DISTRIBUTION
III. CONSTRUCTING A GROUPED FREQUENCY DISTRIBUTION
TABLE
5. Tally the frequencies in each interval and get the sum

Class Class Interval Frequency


1 0.31-0.37 5
2 0.38-0.44 13
3 0.45-0.51 8
4 0.52-0.58 4
5 0.59-0.65 1
6 0.66-0.72 4
7 0.73-0.79 1
Total 36
DATA DISTRIBUTION

III. CONSTRUCTING A GROUPED FREQUENCY DISTRIBUTION


TABLE
5. Graph Distribution

Frequency Distribution of SG Values


14

12 The distribution is
10 skewed to the right
8

0.31-0.37 0.38-0.44 0.45-0.51 0.52-0.58 0.59-0.65 0.66-0.72 0.73-0.79


DATA DISTRIBUTION
IV. CONSTRUCTING A CUMULATIVE FREQUENCY DISTRIBUTION
TABLE
Class Cumulative
Class Frequency Compute the cumulative
Interval Frequency
1 0.31-0.37 5 5 frequencies by adding the
2 0.38-0.44 13 18 class interval’s frequency to
the cumulative frequency of
3 0.45-0.51 8 26 the previous class. The
4 0.52-0.58 4 30 cumulative frequency of the
5 0.59-0.65 1 31 1st class should be equal to
6 0.66-0.72 4 35 its frequency
7 0.73-0.79 1 36
Total 36 181
DATA DISTRIBUTION
V. CONSTRUCTING A CUMULATIVE RELATIVE FREQUENCY
DISTRIBUTION TABLE
Cumulative
Class Cumulative Relative Relative
Class Interval Frequency Frequency Frequency Frequency Compute the Cumulative Relative
1 0.31-0.37 5 5 0.03 0.03 Frequency through the formula:

2 0.38-0.44 13 18 0.10 0.13 𝐶𝑅𝐹 = 𝑃𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝐶𝑙𝑎𝑠𝑠′ 𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦


3 0.45-0.51 8 26 0.14 0.27 + 𝐶𝑢𝑟𝑟𝑒𝑛𝑡 𝐶𝑙𝑎𝑠𝑠 ′ 𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
4 0.52-0.58 4 30 0.17 0.44
5 0.59-0.65 1 31 0.17 0.61
6 0.66-0.72 4 35 0.19 0.80
7 0.73-0.79 1 36 0.20 1.00
Total 36 181 1
DATA DISTRIBUTION
V. CONSTRUCTING A CUMULATIVE RELATIVE FREQUENCY
DISTRIBUTION TABLE

Ogive Chart of SG Values


1.20
1
1.00
0.8
0.80
0.61
0.60
0.44
0.40
0.27
1st marker 0.20 0.13
should start 0.03
at the origin 0.00
0.31-0.37 0.38-0.44 0.45-0.51 0.52-0.58 0.59-0.65 0.66-0.72 0.73-0.79
END OF
PRESENTATION

Potrebbero piacerti anche