Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Predictive Objective:
• Predict the sale price of a house that
is on the market
Data Sample
• XLMiner Defaults
– Observations
randomly assigned
to training and
validation sets
– 60% of data go to
training set, 40% to
validation
• See sheet
STDPartition
XLMiner > Predict > Linear Regression
Estimate Model (pay attention to variable
selection; why all variables are not chosen?)
Evaluating prediction accuracy
• Is R2 useful for evaluating prediction
quality?
ei = y i − y iˆ
• Mean Absolute Deviation/Error (MAD or MAE)
• Sum Square Error (SSE)
• Mean Square Error (MSE)
• Root Mean Squared Error (RMSE)
• Mean Absolute % Error (MAPE)
If performance on validation data is way worse than that on training data, it
is an indication of overfitting.
XLMiner > Predict > Linear Regression
(same model as before)
Predictor Selection using
Stepwise
Metrics for comparing predictive models:
RSS = residual sum of squares (smaller = better)
Mallow’s Cp: (Cp should be the number of predictors
including the intercept if present and/or Cp is at a
minimum)
Probability: higher = better (rule out a subset if < 0.05)
Parsimony vs. Predictive accuracy
• Predictive modeling
– Predictors must be available at the time of
prediction
– Evaluate prediction accuracy by
partitioning data into training/validation
and calculate accuracy measures
– Automated variable selection
Which of the measures is/are
measures of prediction accuracy?
A. Mean Absolute Error (MAE)
B. Root Mean Squared Error (RMSE)
C. R-Square
D. Adjusted R-Square
E. Mean Absolute Percentage Error
(MAPE)
• Which of the following measures of
prediction accuracy you think could be
very sensitive to low value of y, the
dependent variable (in fact, undefined
when y equals zero)?