Sei sulla pagina 1di 61

What is a Model?

11
Model Definition

A model can be defined as:


“ the real or virtual abstraction of the structure,
condition or operation of a system …
… on the basis of past trends of these system
attributes in time and space”.
(note: “real” means solid 3-D, that can be touched.
“Virtual” means using mathematical and statistical
equations).

12
Model Definition

a simpler definition …
- A model is the representation
of part of reality!

13
SYSTEMS DESCRIPTION

Model Definition

For any highway asset, models are used to describe the attributes
(physical structure or/and condition, and operational
characteristics) of the asset.

The asset attributes can be described …


… for the past situation
… at the current situation, or
… for the future situation (i.e., prediction).

Describe Physical structure


Condition
Predict Operation or procedure

(model functions) (attributes of the highway asset)


14
Model Classification

Models used in asset management may be classified by:

- the attributes of the asset


which asset feature is the model trying to
describe/predict?

- the form of the model


is the model real or virtual, numerical or
analytical

15
Model Classification on the basis of the Asset
Attribute We Seek to Describe
System Physical Models
A model that describes the changes in asset shape over time
Eg:
Eg: a model that estimates cantilever deflection in response to load

System Condition Models


A model that estimates the surface roughness of a highway a
pavement
A model that predicts future corrosion levels in structural reinforcement

System Operation Models


A model that predicts the annual number of crashes on State Road 26
A model that predicts travel delay as a function of traffic, nr. of lanes,
time of day
16
Model Classification on the basis of the Model
Form

Real models Virtual models


(3-D, touchable) (mathematical)

Static Dynamic Static Dynamic


models models models models

Numerical Analytical Numerical Analytical


models models models models
Simulation
models
17
System Model Forms (continued)
• Real models are those that are physical, 3-
3-D, objects that can be
touched. Typically are 3-
3-D miniatures (or same size) of an asset.

Real Static Model- Miniature bridge

18
Examples of Virtual (Mathematical) Models

Static mathematical models


- Boussinesq’s soil stress model
- The Moment equation

Dynamic mathematical models


- V = U + a*t (Newton’s 2nd Law of motion)
- Hydraulic model for percolation of fluids through
porous media

19
Numerical models

• The most common of these are simulation models, that


replicate the behavior or operation of the highway asset.
• Popular in recent times due to computer advances.
• Theory of probability is important for simulation
modeling.
• Examples of simulation models are:
 Videogames
 The Highway Corridor
Corridor Simulation
Simulation model (CORSIM).

20
Analytical models

• Statistical/Econometric Models
• Most models in use today are of this type.
• Can be used an inputs for simulation modeling of
asset condition
• Include:
• Linear regression
• Non-
Non-linear regression (several different functional forms)
• Survivor curves and hazard functions
• For non-
non-continuous variables, include: logit,
logit, probit models, etc.
• Variations for continuous and non-
non-continuous models include zero-
zero-
inflated models, etc.
21
Model Classification: Summary

Model Forms “Virtual”


(Mathematical/Statistical)
“Real”
Asset
ModelFeature
function Static/Dynamic? Numeric Analytic

Static
Physical
Features of
System Dynamic

Static
System
Condition
Dynamic

Static
System
Operation Dynamic

22
Wise words on modeling

A model that describes (and predicts) system attributes


must be general enough to cover all possible and
practical circumstances of the system.

“A model can never be truly confirmed unless it is made


so broad as to include every possibility. But we may
subject it to ever more rigorous scrutiny and, in the
face of contradictory evidence, refute it.”

[Greene, 2001]

23
Steps for Building Statistical Models
1. Define your Objective 8. Separate data into two datasets
- 1st for model calibration
2. Sampling/Collect Data - 2nd for model validation

3. Select dependent variable 9. Calibrate your model (using 1st


Dataset), with either …
4. Selection of independent - manual means, or
variables - software (Excel, MINITAB, SAS
or SPSS)
5. Preliminary Analysis (e.g.,
Descriptive Statistics). 10. Evaluate your model
(does it make sense)?
6. Specify Model Form Examine the R2, t-
statistics)
7. Final selection of
independent variables 11. Validate your model (using 2nd
Dataset)

24
Step 1. Definition of Objective
What is the purpose of the model development?

Step 2. Sampling/Data Collection


Select a sample of elements from the population.

The sample should closely resemble the population.

Sample should be representative but random

Very often, Steps 1 and 2 are carried out by someone else, And the results
are given to you to carry out the rest of the model development process.
25
Step 3. Specify Dependent Variable
(Also termed “response” variable)

What are we trying to predict/describe?

26
Types of dependent variables

Continuous Discrete

Categorical Quantitative

Ordinal Non-ordinal Count All Integers

Binary Multinary
Examples:
… International Roughness Index – continuous
… Number of bridge deck patches per linear ft. – count (Poisson)
… Choice of best asset repair option (reconstruct/rehabilitate/do-nothing (logit)
…etc.
27
Note: The type of response variable you have
established for your model helps to determine the
appropriate model specification (at Step 6)
Variable type Recommended Model Specification

Continuous Regression
Count Poisson, Negative Binomial
Non-ordered categorical, binary Binary Logit
Non-ordered categorical, multinary Multinomial Logit
Ordered categorical Probit
Quantitative Integers Probit

28
Response Variables for Pavement Performance Models

Examples:
Serviceability (PSR, IRI, etc.)
Distress condition (PCR, Cracking Index, % Pothole area, etc)
Skid resistance (friction number);
Structural condition (deflection number)

These mostly are continuous variables


Manual or automated inspections (or surveys ) are
carried out to measure values of the indicators

29
Response Variables for Bridge Performance Models

1. NBI Condition Ratings


reflects the range of physical condition of each key bridge
element
deck, superstructure, substructure, culvert, and sub- sub-
elements (piers, abutments, piles etc.
etc.)
ranges from 0 (poorest condition) to 9 (best possible
condition)

2. Bridge Health Index


A single numerical rating ranging from 0 (worst) to 100
(best), reflects element inspection data in relation to the
asset value of a bridge.
30
Response Variables for Bridge Performance Models

2. Sufficiency Rating
Ranges from 0% (entirely deficient) to 100%
(sufficient in all respects)
assesses the efficacy of a bridge to remain in
service on the basis
• Structural adequacy and safety
• Serviceability and functional obsolescence
• Lanes, average daily traffic, structure type
• deck condition, structural evaluation
• Essentiality for public use; and (detour length, average daily
traffic, highway designation).

31
Response Variables for Bridge Performance Models

3. Geometric Rating /Functional Obsolescence


Bridge geometrics are key determinants of traffic
safety and serviceability

NBI database items that document these parameters


are termed geometric inventory data or appraisal
ratings

Using a 0–
0–9 scale, these ratings represent the extent
to which existing geometric features achieve
minimum/desirable criteria

32
Response Variables for Bridge Performance Models

4. Capacity Rating (or operating rating)

is the absolute maximum permissible load level to which


the structure may be subjected, for the vehicle type used
in the rating

33
Response Variables for Bridge Performance Models

5. Bridge Vulnerability
- Measures the vulnerability to bridges to natural /man-
/man-made disaster
- Vulnerability types include:
- scour
-fatigue/fracture
fatigue/fracture
-earthquake
earthquake
-flooding
flooding
-collision
collision
-overload
overload
- For each vulnerability type, a Vulnerability index is calculated using:
- likelihood of the disaster
- consequence of the disaster

34
Step 4. Selection of the Independent Variables
- Also called: “explanatory variables”
“explanatory factors”

What are the factors that likely influence the dependent


variable?

Did your data collection (Step 2) exclude some important


variables? If yes, go back and collect additional data

35
Types of independent (explanatory) variables

Continuous Discrete

Categorical Quantitative

Ordinal Non-ordinal Count All Integers

Binary Multinary

Examples:
… climatic zone (0 for freeze, 1 for non-freeze) - binary
.. Climatic severity (temperature (degs) or precipitation (continuous)
… Class of contractor - ordinal
… Truck loading (average annual weights) - continuous
36
Some Independent Variables for Pavement
Performance Models

Stress Factors:
Age
Traffic loading (Nr. of trucks, ESALS, Gross vehicle weight, etc.)
Climatic effects (climatic zone, precipitation, freeze index,
freeze-thaw cycles, Nr. of hot/cold days (>70 or <32 deg), etc.)
Strength Factors:
Pavement thickness
Structural nr., subgrade strength (resilient modulus, CBR, ,etc.
Maintenance history
Other Factors
Pavement surface type, Work quality, Contractor class,
Administrative jurisdiction, etc.
37
Some Independent Variables for Bridge Performance
Models

Stress Factors:
Age
Traffic loading (Nr. of trucks, ESALS, Gross vehicle weight, etc.)
Climatic effects (climatic zone, freeze index, freeze-thaw cycles,
Nr. of hot/cold days (>70 or <32 deg), etc.)
Strength Factors:
Material Type
Design Type
Other Factors
Work quality, Contractor class, Administrative jurisdiction, etc.

38
Step 5. Carry out Preliminary Analysis of the Data

Purpose:
- to identify interesting trends or relationships between
dependent and independent variables
- Scatter diagrams (a simple plot of X vs. Y)
- Box plots
- Stem and leaf plots
- Pie charts
- Analysis of Variance (ANOVA)
- Pair-wise t-tests
- etc.

39
Step 6. Model Specification: Specify your Model Form
Also called “functional form” or “mathematical Form”

For Discrete Y random variables –


use logit,
logit, probit functional forms

For Continuous Y random variables –


use Linear or Non-
Non-linear models
- Polynomial (quadratic, cubic, etc.)
- Exponential
- Logarithmic
- Power
- Modified Exponential
- etc.

40
Examples of Continuous models - Polynomial

Y = a0 + a1 X 1 + a2 X 2 + ... + an X n

n = 0 means that the function is constant


n = 1 Linear
n = 2 Quadratic
n = 3 Cubic
n = 4 4th power polynomial,
Etc.
41
Examples of Continuous models - Exponential

x
Y = ab

Applications:
Population growth
Water demand for a growing city
Sales of a new product (Jason Mraz’s new CD)
Rate of deterioration of common highway asset material (concrete, steel)
Spread of an infectious disease, etc.
42
Examples of Continuous models - Modified Exponential

x
Y = c + ab

Applications:
Similar to those for the Exponential model (see previous slide), except that:
- the starting value is not zero
- there is a maximum threshold value that cannot be exceeded

43
Examples of Continuous models - Logistic

1
Y=
c + ab x

Applications:
Similar to those for the Modified Exponential model (see previous slide),
except where
- the rate of growth or decline is relatively rapid at first, but slows
down in the latter stages
44
Examples of Continuous models - Gompertz

bx
Y = ca

Applications:
Similar to those for the Modified Exponential model (see
previous 2 slides).
Note that taking logs of both slides yields the Modified
Exponential model.
45
Step 6 (continued)

How do we know which mathematical form to use for


a given set of data?

- Plot the raw data


then determine which model form closely
resembles the plot

- Trial and Error

46
Step 6 (model form specification)

Let’s say we specify the linear functional Form:


Y = b*X + a
a and b are parameters of the model

47
Step 7: Final selection of independent variables

In cases where there are several X variables, it may


be necessary to drop some of the variables to
simplify the analysis.

But which ones to drop?

From the plots in Step 6, it may be found that some


independent variables have absolutely no impact on
the dependent variable. Those may be dropped.

48
Step 8: Separate your dataset into two:

Dataset I for model calibration, 80-90%)

and

Dataset II (for model validation, (10-20%)

49
Step 9: Model Calibration

For purposes of this course, regression will be used for


calibration.
“Calibration” simply means determining the best line passing
through the points (i.e., determining the values of the parameters
“a” and “b” of the functional form).

3 ways of calibration:
Rough sketch by hand

Using Manual Means (with or without help from Excel)

Using Standard Statistical Software

50
Step 9: Model Calibration (continued)

By simple sketching …

250

200
Weight in lb

150

y 100

50

0
0 20 40 60 80 100
Heightxin inches

51
Step 9: Model Calibration (continued)

Mathematically, a “better line” could mean a line that


passes through the points such that …
the sum of vertical deviations of various points
from the regression line are minimized.

Note: Naturally, such a line would also be the best


unbiased and efficient line that passes through the
points

52
Step 10. Model Evaluation

10.1 The Coefficient of Determination (R-square)

How do we determine significance of a variable?

Least possible R-square is 0


Highest possible R-square is 1
Most people hold that R-square values between 0.5 and
1 indicate that the model is OK.

53
Step 10. Model Evaluation (continued)

10.2 Significance of Variables in a Regression Line

Confidence Level Critical value of t


95% confidence 1.96

90% confidence 1.64

80% confidence 1.28

54
Step 10. Model Evaluation (continued)

10.2 Significance of Variables in a Regression Line

In the following multiple linear regression lines, determine


which variables are significant at 90% confidence

Y = 12.23*X1 – 13.28*X2 + 0.85*X3 – 4.65


(2.723) (-1.343) (6.534) (-2.065)
significant Not significant significant significant

Cost = 293.6*Time + 932.43*Delay + 5.7654*Age


(0.123) (4.243) (1.067)
Not significant significant Not significant

Flow = 3.2 - 0.9*Viscosity + 0.4*temp + 1.4*press


(2.123) (-1.313) (6.934) (1.324)
significant Not significant significant Not significant
55
Step 10. Model Evaluation (continued)

10.2 Significance of Variables in a Regression Line

E.G. In the following multiple linear regression model for the condition of
a certain type of highway asset, determine which variables are
significant at 90% confidence.
Asset Condition (Y) = - 4.65T - 0.007*R + 13.24*M – 2.45*G

Variable Parameter t-stat Interpretation


(coefficient)
Annual Traffic (T) - 4.65 - 2.97 Significant inverse
relationship between T and Y
Annual Rainfall (R) - 0.007 - 1.04 Marginally significant inverse
relationship between R and Y
Material Quality (M) 13.24 1.93 Significant direct relationship
between M and Y
Age (G) -2.45 -4.17 Significant Inverse
relationship between G and Y

56
Step 10. Model Evaluation (continued)

How good is your model?

Examine the coefficient of determination (R2)


If R2 is reasonably high, then there is a close fit of the model
with the observed data.

- Typically, an R2 exceeding 0.5 is considered good


- Choice of a acceptable R2 limit actually depends on several
factors such as type of study, the statistical analyst, data type,
and objective of the investigation.

Note: Like Covariance and Correlation, R2 is a Measure of Association


In this context, it is telling us the strength of association (relationship between the
dependent and independent variables
57
Step 11. Model Validation

This is the final and the most reliable test of the goodness of the
model.
Procedure:
1. Plug the values of X from the validation dataset into the calibrated
model and determine the corresponding values of Y. This gives us
YE.

2. Compare the values of:


YE (the estimated values of Y using the model) to
YO (the Y values in the validation dataset).

3. Calculate the deviation for each validation observation (each row),


square them, sum them up, divide by n (the number of
observations) and find the square root.
58
Example of Model Validation
Predicted condition using the developed model
Actual (observed) condition from the field
Weight (W E)
YM
x
Height (using model) Weight WYEE-Y
-WOO (W(Y
E E--Y
WOO
2
) )2
4.75*Height - 155 YO )
(W O

60 130 124 6 118

63 144.25 152 -7.75 159.75

62 139.5 152 -12.5 164.5

49 77.75 125 -47.25 172.25

49 77.75 133 -55.25 188.25

78 215.5 200 15.5 184.5

75 201.25 195 6.25 188.75

69 172.75 200 -27.25 227.25

SSE is the Sum of Square Errors 1403.25

MSE - Mean Square Error (Avg. of the Deviations) 175.41

RMSE is the Square Root of the MSE 13.24


59
Another way of categorizing Asset
Performance Models

Models can also be classified by the certainty


(or otherwise) associated with their outputs
Deterministic models :
predict a single value for the response variable (in
our case, the asset condition)
Probabilistic models
predict a range of values for the response variable
(in our case, the asset condition)

60
The introduction of stochastic elements into the model …
… changes it from an exact statement to a
probabilistic description about expected outcomes
Implication: only a preponderance of
contradictory evidence can convincingly invalidate
the probabilistic model.
Thus, the probabilistic model is both less precise but …
more robust.
(Greene, 2001)

61
Another way of classifying Asset
Performance Models …

Purely empirical models


Estimate asset condition at any time on the basis of
extrapolations in space or time.
Purely mechanistic models
Determine asset condition at any time directly using a
instrument. Often for some primary response (of asset
material behavior) parameter such as stress, strain, or
deflection.
Mechanistic-
Mechanistic-empirical models
Combination of the two. For example, predicting future
deflection on the basis of past deflection
measurements
62
And yet another way of classifying Asset
Performance Models!

Models for predicting asset condition can be classified by


the spatio-
spatio-temporal nature of the data they use:
Cross-
Cross-sectional models
Estimate asset condition at any time on the basis of data
on asset features and environment at that same time.
Time-
Time-series models
Estimate asset condition at any time on the basis of
asset condition data in previous years.
Panel or longitudinal models
Combination of the above two.
63
Estimating Transportation Demand

Y t1 Y tN
Y t2 Y t3 Y t4

X1, t1 X1, tN
X1, t2 X1, t3 X1, t4
X2, t1
X2, t2 X2, t3 X2, t4 X2, tN

X3, t1
X3, t2 X3, 3 X3, t4
… X3, tN

t1 t2 t3 t4 tN
Cross-sectional models

Estimate asset condition at any time on the basis


of data on asset features and environment at
that same time.
Y2009 = f(X1,2009, X2,2009, …, XN,2009)

Example:
CONDITION2009 = f(TRAFFIC2009, RAINFALL2009, MATERIAL2009)

65
Time-series models

Estimate asset condition at any time on the


basis of asset condition data in previous
years.
Yt = f(Yt-k,Yt-k+1, …, Yt-2,Yt-1)

Example:

COND2009 = f(COND2004,COND2005,COND2006,COND2007,COND2008)

66
Panel or longitudinal models

Involve the use of pooled data, i.e., cross-


cross-
sectional data across several time periods.
periods

e.g., COND2009 = f (TRAFFIC2002, … TRAFFIC2009,


CLIMATE2002, …CLIMATE2009,
MATERIAL2002,…MATERIAL2009, etc.)

67
Other variations of Spatio-temporal models

Time-
Time-lag models

Estimate a response (dependent variable) for a given


point in time, given independent variables that are
associated with a previous point in time.

~ ,X , X )
Generally, Yt = f(X1,t-1 2,t-1 … N,t-1

Yt = f(Xj,t-k), j = 1, 2, …, N independent variables


k = 1, 2, …, K years
Example:
CONDITION2009 = f(TRAFFIC2008, RAINFALL2008,
MATERIAL2008)
68
Other variations of Spatio-temporal models

Adaptive models
Successively estimates a response (dependent
variable) for a given point in time, on the basis of
previous values of that response variables.

Generally, Yt = f(Yt-1)
Yt-1 = f(Yt-2)

Yt-K-1 = f(Yt-K), k = 1, 2, …, K years


Example:
CONDITION2009 = f(CONDITION2008)
CONDITION2008 = f(CONDITION2007), …
… CONDITION2019 = f(CONDITION2020)
69
Other variations of Spatio-temporal models

Simultaneous-
Simultaneous-equation models
Successively estimates a response (dependent
variable) for a given point in time, on the basis of
previous values of that response variables.

Generally, Yt = f(Xt-1)
Xt-1 = f(Yt-2)

Example:
CONDITION2009 = f(MAINT. HISTORY2008)
MAINT. HISTORY 2008 = f(CONDITION2007)

70
Typical Sources of Modeling Error:

Errors are due to uncertainties in the asset


management environment, such as …
- Variability in workmanship (Contractors)
- Material imperfections
- Climate/Weather variations
- Economic uncertainties (Inflation,
depression, stock fluctuations, etc.)
- Equipment/Human error or incompetence, etc.

Potrebbero piacerti anche