Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ENGINEERING METHOD
QUICK RECAP: STATISTICS
Statistics is the science of collecting, organizing, presenting, analyzing, and
interpreting numerical data to assist in making more effective decisions.
SAMPLE
POPULATION is a portion, or part, of the population
of interest
is a collection of all possible individuals,
objects, or measurements of interest. SAMPLE SIZE
The total number of things in the
sample
QUICK RECAP: STATISTICS
DESCRIPTIVE
STATISTICS
uses the data to provide descriptions of the
population, either through numerical
calculations or graphs or tables.
INFERENTIAL STATISTICS
makes inferences and predictions about a
population based on a sample of data taken from
the population in question.
QUICK RECAP: STATISTICS
Types of Variables
Qualitative/Categorical Quantitative/Numerical
OTHER TERMINOLOGIES
OBTAINING DATA
QUICK RECAP: STATISTICS
OBTAINING DATA
RETROSPECTIVE STUDIES
This type of study strictly uses historical data, data taken over a specific period of
time. In most cases, this type of study will be the least expensive. However, there are clear
disadvantages:
OBTAINING DATA
OBSERVATIONAL STUDIES
Observing the process or the population, disturbing it as little as possible,
and records the quantities of interest.
DESIGNED EXPERIMENTS
Deliberate or purposeful changes are made in the controllable variables in
the system or process, observes the resulting system output data, and then makes
inferences about which variables are responsible for the observed changes in
output performance.
DATA
ORGANIZATION
TOOLS FOR DESCRIBING DATA
20 18
14 13
15
10
0
first year second year third year fourth year fifth year
TOOLS FOR DESCRIBING DATA
500,000.00 0.65
200,000.00
improve a situation. Part 10-10345 constitutes the top 20% of the company's gross profit.
This helps the management where to allocate most of their resources
to making that part is performing well in terms of quality, etc.
TOOLS FOR DESCRIBING DATA
HISTOGRAM DOTPLOTS
TOOLS FOR DESCRIBING DATA
BOXPLOTS
The advantage of Boxplots are that
they give the reader multiple
information without to having to take
much space in reports such as where
the median is located, the
Interquartile Range, outliers, and
skewness of the distribution
Read more about Boxplots here:
https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51
https://stattrek.com/statistics/charts/boxplot.aspx
TOOLS FOR DESCRIBING DATA
SHAPES OF DISTRIBUTION
BELL-SHAPED UNIFORM
TOOLS FOR DESCRIBING DATA
SHAPES OF DISTRIBUTION
RIGHT-SKEWED LEFT-SKEWED
Majority of the data are located at the left Majority of the data are located at the right
side and has a tail at the right side of the side and has a tail at the left side of the
distribution distribution
TOOLS FOR DESCRIBING DATA
SHAPES OF DISTRIBUTION
BIMODAL U-SHAPED
Bimodal distribution is a continuous probability distribution with two different modes or two peaks.
U-shaped distribution can still be categorized as bimodal dist.
TOOLS FOR DESCRIBING DATA
Bivariate quantitative data are summarized using
SCAT TERPLOTS
Scatterplots graphically
displays the relationship
of two variables. It shows
the trends and patterns in
the distribution or lack
thereof
TOOLS FOR DESCRIBING DATA
Multivariate quantitative data are summarized using:
BUBBLE GRAPH
TIME-SERIES PLOTS
900
800
BOD (mg O2/L) 700
600 influent
500 effluent
400
300
200
100
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
weeks
DESCRIBING
DATA DISTRIBUTION
DATA DISTRIBUTION
a) If all the elements in the data set have the same frequency of occurrence, then
the data set is said to have no mode.
b) If the data set has one value that occurs more frequently than the rest of the values,
then the data set is said to be unimodal.
c) If two elements of the data set are tied for the highest frequency of occurrence,
then the data set is said to be bimodal.
DATA DISTRIBUTION
1 . R A N G E & I N T E R Q UA R T I L E R A N G E
( xi − ) 2
a. Population variance :
2
=
N
b. Sample variance (from a sample of n measurements) :
_ ( xi ) 2
( x i − x) xi −
2 2
s =
2
= n
n−1 n−1
DATA DISTRIBUTION
3. STANDARD DEVIATION
A low standard deviation means that most of the numbers are close to the
average. A high standard deviation means that the numbers are more spread out.
a. Population:
= 2
b. Sample:
s= s 2
DATA DISTRIBUTION
Chebyshev’s
theorem describes
that ¾ or 75% of the
data lie within 2
standard deviations
from the mean while
88.89% lie within 3
standard deviations
from the mean
1 . Z - S CO R E
➢ A z-score measures the distance between an observation and the mean,
measured in units of standard deviation.
1 . Z - S CO R E
The sample z score of a value of x is a measure of relative standing
defined by
x − x
z =
s
DATA DISTRIBUTION
2 . P E R C E N T I L E & Q UA R T I L ES
Percentiles divide a data set into 100 equal parts. It is simply a measure that
tells us what percent of the total frequency of a data set was at or below that
measure.
# 𝑜𝑓 𝑣𝑎𝑙𝑢𝑒𝑠 𝑏𝑒𝑙𝑜𝑤 𝑥
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑅𝑎𝑛𝑘 𝑜𝑓 𝑥 = 𝑥 100%
𝑛
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒
𝑛𝑡ℎ 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑥 = (𝑛 + 1)
100
DATA DISTRIBUTION
2 . P E R C E N T I L E & Q UA R T I L ES
Example: 2,2,3,4,5,5,5,6,7,8,8,8,8,8,9,9,10,11,11,12
5 7 9 23 25 29 30 33 34 35 40 41 48 50 53 54 55 58 58 59 61 61 65 65 66 68 70 72 73 74 78 79
2 . P E R C E N T I L E & Q UA R T I L ES
As the name suggests, Quartiles break the data set into 4 equal parts. The first
quartile, Q1, is the 25th percentile. The second quartile, Q2, is the 50th percentile.
The third quartile, Q3, is the 75th percentile. It's important to note that the
median is both the 50th percentile and the second quartile, Q2.
DATA DISTRIBUTION
2 . P E R C E N T I L E & Q UA R T I L ES
Example: 10 – 20 – 30 – 40 – 50 – 55 - 60 – 70 – 80 – 90 - 100
1. Ungrouped
2. Relative Frequency
3. Grouped
4. Cumulative Frequency
5. Relative Cumulative Frequency
FREQUENCY DISTRIBUTION TABLE
COMPONENTS OF FDC
Class Interval – these are numbers defining the class consisting of the end numbers called the
class limits (upper limit and lower limit)
Class Frequency (f) – shows the number of observations falling in the class
Class Boundaries – these are the so-called “true class limits” classified as:
- Lower Class Boundary (LCB) – middle value of the lower class limit of the
class and the upper class limit of the preceding class
- Upper Class Boundary (UCB) – middle value between the upper class limit
and the lower limit of the next class
Class Size – the difference between two consecutive upper limits or two consecutive lower limits
Example
The accompanying specific gravity values for various wood types used in
construction appeared in the article “Bolted Connection Design Values
Based on European Yield Model” (J. of Structural Engr., 1993: 2169-2189)
0
0.31 0.35 0.36 0.37 0.38 0.4 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.51 0.54 0.55 0.58 0.62 0.66 0.68 0.75
DATA DISTRIBUTION
0.14
0.12
0.10
0.08
0.06
0.04
0.02
0.00
0.31 0.35 0.36 0.37 0.38 0.4 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.51 0.54 0.55 0.58 0.62 0.66 0.68 0.75
DATA DISTRIBUTION
3. Class Size
𝑖 = 0.44Τ7
𝑖 = 0.06 ≈ 0.07 0.06 normally would be rounded up to the nearest integer but since the data
are all less than 1, having an interval of 1 per class would be useless thus, we
will retain 0.06 rounded up to 0.07 as the class size
DATA DISTRIBUTION
III. CONSTRUCTING A GROUPED FREQUENCY DISTRIBUTION
TABLE
4. Identify Lower & Upper Class Limits
Class Lower Class Upper Class
1 0.31 0.37
2 0.38 0.44
3 0.45 0.51
4 0.52 0.58
5 0.59 0.65
6 0.66 0.72
7 0.73 0.79
DATA DISTRIBUTION
III. CONSTRUCTING A GROUPED FREQUENCY DISTRIBUTION
TABLE
5. Tally the frequencies in each interval and get the sum
12 The distribution is
10 skewed to the right
8