Regression Correlation PDF

Scatter Diagrams Correlation Classifications
• Scatter diagrams are used

to demonstrate
correlation between two • Correlation can be
quantitative variables.
19
classified into three basic Regression Plot
18
categories 19
• Often, this correlation is 17 18
linear. • Linear 17
Weight
16
Weight
16
• This means that a straight 15 • Nonlinear 15
14
line model can be • No correlation 14
13
13
developed. 12
12
20 21 22 23 24
Length
20 21 22 23 24
Length
Chapter 5 # 1 Chapter 5 # 2
Correlation Classifications Correlation Classifications
• Two variables may be

correlated but not through Curvilinear Correlation
a linear model. 500 50
400 • Two quantitative 40

• This type of model is 30
300 variables may not be
result
20
called non-linear 10
200
correlated at all
C4
0
• The model might be one 100 -10

-20
0
of a curve. 0 5 10 15
-30
-40
sample
0 5 10 15 20 25
Animalno
Linear Correlation Linear Correlation
• Variables that are

Regression Plot
correlated through a
linear relationship can
19
• Negatively correlated Regression Plot
display either positive

18
variables vary as
17
or negative correlation opposites 4
Weight
16
• Positively correlated
15 • As the value of one
Student GPA
variables vary directly.
14
variable increases the
13 3
12
other decreases
20 21 22 23 24
Length
2
0 10 20 30 40
Hours Worked
Strength of Correlation Strength of Correlation
• Correlation may be strong, • When the data is

moderate, or weak. Regression Plot distributed quite close Regression Plot
• You can estimate the to the line the 95
90
strength be observing the

4
correlation is said to 85
Final Exam Score

variation of the points be strong 80
75
Student GPA
around the line 3

• The correlation type is 70
65
• Large variation is weak independent of the 60
55
correlation strength. 50
2 55 65 75 85 95
0 10 20 30 40
Midterm Stats Grade
Hours Worked
The Correlation Coefficient Interpreting r
• The sign of the correlation coefficient tells us the

• The strength of a linear relationship is measured
direction of the linear relationship
by the correlation coefficient
 If r is negative (<0) the correlation is negative. The
• The sample correlation coefficient is given the line slopes down
symbol “r”  If r is positive (> 0) the correlation is positive. The
• The population correlation coefficient has the line slopes up
symbol “ρρ”.
Interpreting r Cautions
• The size (magnitude) of the correlation • The correlation coefficient only gives us an
coefficient tells us the strength of a linear indication about the strength of a linear
relationship relationship.
 If | r | > 0.90 implies a strong linear association • Two variables may have a strong curvilinear
 For 0.65 < | r | < 0.90 implies a moderate linear relationship, but they could have a “weak” value
association
for r
 For | r | < 0.65 this is a weak linear association
Fundamental Rule of Correlation Setting
• Correlation DOES NOT imply causation • A chemical engineer would like to determine if a
– Just because two variables are highly correlated does relationship exists between the extrusion
not mean that the explanatory variable “causes” the
response
temperature and the strength of a certain
formulation of plastic. She oversees the
production of 15 batches of plastic at various
• Recall the discussion about the correlation temperatures and records the strength results.
between sexual assaults and ice cream cone sales
The Study Variables

The Experimental Data
• The two variables of interest in this study are the strength

of the plastic and the extrusion temperature.
• The independent variable is extrusion temp. This is the Temp 120 125 130 135 140
variable over which the experimenter has control. She
Str 18 22 28 31 36
can set this at whatever level she sees as appropriate.
• The response variable is strength. The value of “strength” Temp 145 150 155 160 165
is thought to be “dependent on” temperature.
Str 40 47 50 52 58
The Scatter Plot
Conclusions by Inspection
• The scatter diagram for
• Does there appear to be a relationship between the
the temperature versus Scatter diagram of Strength vs Temperature
60
study variables?
strength data allows us to
• Classify the relationship as: Linear, curvilinear, no
Strength (psi)
50
deduce the nature of the
relationship between these 40 relationship
two variables 30 • Classify the correlation as positive, negative, or no
20 correlation
120 130 140 150
Temperature (F)
160 170
• Classify the strength of the correlation as strong,
moderate, weak, or none
What can we conclude simply from the scatter diagram?

Computing r Computing r
1  x − x  y − y 

Σ[(Z x )(Z y )]
 1
r = Σ   r =
n − 1  s x  s y 
 n −1
z-scores
df z-scores for y data
for x data
Computing r - Example Classifying the strength of linear
correlation
•The strength of a linear correlation between the response

See example handout for the plastic strength versus and the explanatory variable can be assigned based on r
extrusion temperature setting
These classifications are discipline dependent
Scatter Diagrams and Statistical

Classifying the strength of linear
Modeling and Regression
correlation
For this class the following criteria are adopted: • We’ve already seen that the best graphic for
illustrating the relation between two quantitative
If |r| > 0.90 then the correlation is strong variables is a scatter diagram. We’d like to take
If |r| < 0.65 then the correlation is weak this concept a step farther and, actually develop a
mathematical model for the relationship between
two quantitative variables
If 0.65 < |r| < 0.90 then the correlation is
moderate
Using the Line of Best Fit to Make
The Line of Best Fit Plot Predictions
• Since the data appears to

be linearly related we can
find a straight line model 60 • Based on this graphical
60
that fits the data better 50

model, what is the
than all other possible predicted strength for 50
Strength
Strength
40
straight line models. plastic that has been 40
30
extruded at 142 degrees?
• This is the Line of Best 30
20
Fit (LOBF) 120 130 140 150 160 170
20
Temp 120 130 140 150 160 170

Temp
Using the Line of Best Fit to Make Using the Line of Best Fit to Make
Predictions Predictions
• Given a value for the

predictor variable, determine • Based on this graphical
the corresponding value of model, at what 60
the dependent variable

temperature would I need 50
graphically.
to extrude the plastic in
Strength
• Based on this model we 40
order to achieve a strength
would predict a strength of
appx. 39 psi for plastic of 45 psi? 30
extruded at 142 F 20
120 130 140 150 160 170

Temp
Using the Line of Best Fit to Make
Predictions Computing the LSR model
• Locate 45 on the response

axis (y-axis) • Given a LSR line for bivariate data, we can use that
• Draw a horizontal line to line to make predictions.
the LOBF
• Drop a vertical line down • How do we come up with the best linear model
to the independent axis from all possible models?
• The intercepted value is
the temp. required to
achieve a strength of 45
psi
Bivariate data and the sample linear The straight line model
regression model
• For example, look at • Any straight line is completely defined by two

the fitted line plot of parameters:
powerboat
 The slope – steepness either positive or negative
registrations and the
number of manatees  The y-intercept – this is where the graph crosses the
killed. vertical axis
• It appears that a linear

model would be a
good one.
yˆ = b o + b1 x
The Parameter Estimators Calculating the Parameter Estimators
• The equation for the LOBF is:

• In our model “b0” is the estimator for the
intercept. The true value for this parameter is β0
• “b1” estimates the slope. The true value for this
yˆ = bo + b1 x
parameter is β1
Calculating the Parameter Estimators Computing the Intercept Estimator
• To get the slope estimator we use: • The intercept estimator is computed from the
variable means and the slope:
n Σ (x y ) − Σ x ⋅ Σ y
b1 =
( )
n Σ x 2 − (Σ x )
2 b0 = y − b1 x
or
• Realize that both the slope and intercept
 sy  estimated in these last two slides are really point
b 1 = r   estimates for the true slope and y-intercept
 sx  Chapter 5 # 35 Chapter 5 # 36
Revisit the manatee example Computing the estimators
Look at the summary statistics and correlation

coefficient data from the manatee example So the slope is:
Variable N Mean SEMean StDev s 
Boats 10 74.10 2.06 6.51 b1 = r  y 
Deaths 10 55.80 5.08 16.05  sx 
 16.05 
Minitab correlation coefficient output b1 = 0.921  = 2.27
Correlations: Boats, Deaths  6.51 
Pearson correlation of Boats and Deaths = 0.921
P-Value = 0.000
Computing the estimators Put it together
• In general terms any old linear regression

And the intercept is calculated using the slope equation is:
information along with the variable means:
response = intercept + slope(predictor)
b0 = y − b1 x • Specifically for the manatee example the sample

= 55.8 − 2.27 (74.1) regression equation is:
= −112.4 Deaths = -112.7 + 2.27(boats)
The slope estimate The slope estimate
• b1 is the estimated slope of the line • In our example the estimated slope is 2.27
• The interpretation of the slope is, “The amount of change
in the response for every one unit change in the • This is interpreted as, “For each additional 10,000 boats
independent variable.” registered, an additional 2.27 more manatees are killed
The intercept estimate The intercept Estimate
• Recall the sample regression model: • Sometimes this value is meaningful. For example
resting metabolic rate versus ambient temperature in
“b0” is the estimated y- intercept
Centigrade (oC)
yˆ = b0 + b1 x • Sometimes it’s not meaningful at all.

• This is an example where the y-intercept just serves to
make the model fit better. There can be no such thing as
The interpretation of the y-intercept is, “The
a –112.7 manatees killed
value of the response when the control (or
independent) variable has a value of 0.”
Regression Output Regression Output
• The sample regression equation is:

Use the minitab regression output for the ManateesKilled = -112.7 + 2.27(boats)
manatee example to predict the expected
number of manatees killed when the number of • So:
power boat registrations is 750,000 (x = 75) ManateesKilled = -112.7 + 2.27(75) = 57.6
• This means that we expect between 57 and 58

manatees killed in a year where 750,000 power
boats are registered.
Regression Output Regression Output
• The sample regression equation is:

Use the minitab regression output for the ManateesKilled = -112.7 + 2.27(boats)
manatee example to predict the expected
number of manatees killed when the number of • So:
power boat registrations is 850,000 (x = 85) ManateesKilled = -112.7 + 2.27(85) = 80.25
• This means that we expect between 80 and 81

manatees killed in a year where 750,000 power
boats are registered.
Regression Output Cardinal Rule of Regression
• NEVER NEVER NEVER NEVER NEVER NEVER
predict a response value from a predictor value that is
STOP!! YOU HAVE VIOLATED outside of the experimental range.
THE CARDINAL RULE OF REGRESSION • The only predictions we can make (statistically) are
predictions for responses where powerboat registrations
are between 670,000 and 840,000.
• This means that our prediction for the year when 850,000
powerboats were registered is garbage
Regression Estimates
The coefficient of determination The r2 Value
• r2 is called the coefficient of determination.
• r2 is a proportion, so it is a number between 0 and 1 • If r2 is, say, 0.857 we can conclude that 85.7% of the
inclusive. variability in the response is explained by the
• r2 quantifies the amount of variation in the response that is variability in the independent variable.
due to the variability in the predictor variable.
• This leaves 100 - 85.7 = 14.3% left unexplained. It’s
• r2 values close to 0 mean that our estimated model is a only the unexplained variation that is incorporated into
poor one while values close to 1 imply that our model the “uncertainty”
does a great job explaining the variation
Scatter of Points and r2
r2 and the correlation coefficient
• r2 is related to the correlation coefficient

• It’s just the square of r
• The interpretation as the proportion of variation in the
response that is explained by the variation in the
predictor variable makes it an important statistic
r2 = 0.848 r2 = 0.992

Regression Correlation PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Regression Correlation PDF

Caricato da

Copyright:

Formati disponibili

Scatter Diagrams Correlation Classifications

• Scatter diagrams are used

• Often, this correlation is 17 18

• This means that a straight 15 • Nonlinear 15

Correlation Classifications Correlation Classifications

• Two variables may be

a linear model. 500 50

400 • Two quantitative 40

• The model might be one 100 -10

• Variables that are

display either positive

or negative correlation opposites 4

Strength of Correlation Strength of Correlation

• Correlation may be strong, • When the data is

• You can estimate the to the line the 95

strength be observing the

Final Exam Score

around the line 3

• Large variation is weak independent of the 60

• The sign of the correlation coefficient tells us the

The Study Variables

• The two variables of interest in this study are the strength

What can we conclude simply from the scatter diagram?

1  x − x  y − y 

•The strength of a linear correlation between the response

These classifications are discipline dependent

Scatter Diagrams and Statistical

• Since the data appears to

that fits the data better 50

straight line models. plastic that has been 40

Temp 120 130 140 150 160 170

• Given a value for the

the dependent variable

120 130 140 150 160 170

• Locate 45 on the response

• For example, look at • Any straight line is completely defined by two

• It appears that a linear

• The equation for the LOBF is:

Calculating the Parameter Estimators Computing the Intercept Estimator

Look at the summary statistics and correlation

Computing the estimators Put it together

• In general terms any old linear regression

b0 = y − b1 x • Specifically for the manatee example the sample

The intercept estimate The intercept Estimate

yˆ = b0 + b1 x • Sometimes it’s not meaningful at all.

• The sample regression equation is:

• This means that we expect between 57 and 58

Regression Output Regression Output

• The sample regression equation is:

• This means that we expect between 80 and 81

• r2 is related to the correlation coefficient

Potrebbero piacerti anche