Stat 608 Chapter 2

+
Stat 608 Chapter 2 (and 5)
Simple Linear regression

+Introduction: Simple
Linear Regression Models
+
Simple Linear Regression (SLR)
Models
We always draw a scatterplot first to obtain an
idea of the strength (measured by correlation),
form (e.g., linear, quadratic, exponential), and
direction (positive or negative, in the linear
case) of any relationship that exists between two
variables.
On the next slide we see a strong (correlation

close to 1) positive (as x increases, y increases)
linear (straight line) relationship between the
percentage of voters of each state who voted for
President Obama in 2008 and in 2012.
+
Models
70 60
% Obama 2012
40 50
30
40 50 60 70
% Obama 2008
+
Models
When data are collected in pairs the
standard notation used to designate this
is:
(x1, y1), (x2, y2), . . . , (xn, yn),
where x1 denotes the first value of the X-
variable and y1 denotes the first value of
the Y-variable. The X-variable is called the
explanatory variable, while the Y-
variable is called the response variable.
+
Models
Forexample, the first few entries in
the election data set above were
(37.7, 40.8), (38.8, 38.4), (38.8,
36.9)
meaning 37.7% of Alabama voters
voted for President Obama in 2008,
while 40.8% did so in 2012, and so on.
+
Models
We could use a matched pairs t-test to test
whether the same percentage of voters
voted for President Obama in 2008 and
2012 in the 50 states, on average. If the
percentages were indeed equal, what would
that mean for our regression model?
Aside: I wouldnt be able to conclude just from the

average percentages being equal that the percentage of
the country voting for Obama had stayed the same why
not?
+
S.L.R. Models
The regression of Y on X is linear if
where the unknown parameters 0 and 1

determine the intercept and the slope,
respectively, of a specific straight line.
Note: the parabolic model
is also a linear model, even though it graphs a parabola,

because the mean of Y given X is related to the
parameters via a linear function.
+
S.L.R. Models
+
S.L.R. Models
There will almost certainly be some variation in
Y due strictly to random phenomenon that
cannot be measured, predicted, or explained.
This variation is called random error, and
denoted ei.
In other words, the difference between what

actually occurs and our model is the error. Our
model doesnt explain everything; the amount
of variability in our response variable that
remains unexplained is measured by the error.
+
S.L.R. Models
Suppose that Y1, Y2 , , Yn are independent
realizations of the random variable Y that are
observed at the values x1 , x2 , , xn of a
random variable X . If the regression of Y on X
is linear, then for i = 1, 2, , n;
where ei is the random error in Yi and is such

that E(ei | X) = 0.
+
Continued
Notice that the random error term does not
depend on x , nor does it contain any
information about Y (otherwise it would be a
systematic error).
For now, we assume the errors have constant

variance:
+
Introduction to Least Squares
Dont get these ideas confused:
+
Introduction to Least Squares
+Deriving And Interpreting
Parameter Estimates
+
Least Squares: Derive
parameter estimates
Goal: Minimize
with respect to 0
and 1.
We have two equations

with two unknowns. Set
both equal to 0 and solve.
+
parameter estimates
First: derivative
with respect to the
intercept. Solve
for the y-intercept.
The least squares regression

line goes through the point
+
parameter estimates
Second: Derivative
with respect to the
slope. Solve for
the slope.
+
Example: R Code
election<read.csv("/Users/Elizabeth//stat608/sp13/pct_obama.csv)
my.lm<lm(election$Obama_12~election$Obama_08)
my.lm
For more information about this linear model, type

summary(my.lm)
anova(my.lm)
To plot your x and y variables on a scatterplot, use:

plot(x,y)
For more information, syntax, and sometimes examples on a function

(e.g. the lm function), type:
?lm
+
Example: SAS Code
dataelection;
infile/Users/Elizabeth//stat608/sp13/pct_obama.csv;
inputstate$Pollclose$ElectoralObama12Romney12Pred$Obama08
Mccain08;
run;
procregdata=election;
modelObama12=Obama08;
run;
procgplotdata=election;
ploty*x;
run;
+
Example: Election Data
Explanatory variable: % voting for Obama in 08
Response variable: % voting for Obama in 12
Coefficients:
(Intercept) election$Obama_08
-4.461 1.042
Interpret the slope in context.

+
Least Squares: Matrix Setup
+
Derivatives of Matrices
Derivatives of matrices are defined as partials
with respect to the variable vector. For example:
+
Derivatives of Matrices
The previous example was a derivative of a
scalar-by-vector on this wikipedia page:
http://en.wikipedia.org/wiki/Matrix_calculus#
Derivatives_with_matrices
It can also be found in the section on First Order

derivatives in the Matrix Cookbook. (61) p. 9
(see eCampus)
+
Derivatives of Quadratic Forms
The gradient and Hessian:

+ Least Squares: Derive
parameter estimates in matrix
notation
+
Linear Models: Inference
+ 28
George Box
All models are wrong,

but some are useful.
+ 29
Inference
Objective: Develop hypothesis tests and
confidence intervals for the slope and
intercept of a least squares regression model
1. Assumptions: Next chapter
2. Bias of Estimates
3. Variability of Estimates
In English: Is the percent of the states that
voted for Obama in 2008 linearly related to
the percentage in 2012?
+
Review: Expectation and
Variance of Random Variables
Recall the definition of variance:
Covariance is similar. It is the numerator of

correlation:
+
Expectation and Variance of
Random Variables
Assume X and Y are random variables, and a and
b are constants. Then:
+
Correlation
Correlation is unit-free (with no measurement error,
changing from meters to feet yields the same
correlation).
Correlation does not change if the explanatory and

response variables are exchanged.
Correlation is always between -1 and 1.
Correlation closer to -1 or +1 indicates a stronger

relationship between the explanatory and response
variables; correlation closer to 0 indicates a weaker
association.
+
Correlation
When both x and y are above average, or both x and y are

below average, a positive value is contributed to the sum.
When exactly one is above average, a negative value is
contributed to the sum.
When most values are in the bottom left and top right of the
plot, the correlation is positive.
Sample covariance is the numerator of sample correlation.

+
Correlation
70
The correlation between the

60
percents of states voting for

% Obama 2012
Obama in 2008 and 2012 is

50
r = 0.98.
40
This indicates a strong

30
positive relationship
40 50 60 70
between 2008 and 2012.
% Obama 2008
+
Matrices
Assume x is a vector of random variables. Then:
+
Variance / Covariance Matrix
+
Correlation Matrix
Tocalculate the correlation matrix, divide by

the appropriate standard deviations
elementwise.
+
Matrices
Assume X and Y are matrices of random
variables; x and y are vectors of random
variables; and A, B, and C are constant matrices.
Then:
+
Quadratic Form
+
Expectation of Quadratic Form
It can be shown that
where and are the expected value and

variance-covariance matrix of x, respectively,
and tr denotes the trace of a matrix.
This result only depends on the existence of

and ; in particular, normality of x is not
required.
+ 41
Usual Assumptions
In matrix notation:
+
Bias
The bias of a statistic is the difference between its

expected value and the parameter it should estimate.
+ 43
Inferences About the Slope and

Intercept
Unbiased Estimates
+ Inferences About the Slope and 44
Intercept
Variance of the estimates
+
Inferences About the Slope and
Intercept
What does it mean to take the expectation of the
sample slope?
+ Inferences About the Slope 46
Distribution of slope:
Since the errors are normally distributed, yi = 0 +

1xi + ei is also normally distributed, given x. Since
the sample slope is a linear combination of the yis, it
is also normally distributed.
Of course, is unknown, so we estimate it using the

sample standard deviation. Then our test statistic
has the t-distribution (with n k degrees of freedom
in general, k being the number of columns of X).
+
Inferences About the Slope
The
standard error is the standard deviation of a statistic
(often, the sample standard deviation).
The margin of error is half the width of a confidence interval.
Dont mix up the critical value with the test statistic calculated
from the data! Look up the critical value on a table, or use R.
, the variance of the errors, is estimated by MSE. So the

square root of MSE, RMSE, is s in the equation above.
For our election example, R output gives:
Coefficients: Estimate Std. Error t value

Pr(>|t|)
(Intercept) -5.13608 1.72705 -2.974

0.00459 **
states50$Obama_08 1.05610 0.03363 31.401

< 2e-16 ***
Confidence Interval for the slope:
+
Inferences About the Slope
Hypothesis Test for whether the slope = 0:

+
More Inference
+ 52
Objectives: Regression Line
Develop hypothesis tests and confidence

intervals for the regression line:
1. Confidence intervals for the mean
2. Prediction intervals for individuals
In English: If 48% of a state voted for
Obama in 2008, what percentage voted
for him in 2012?
+ 53
Slope vs. regression line

+ 54

+ 55

Confidence interval for slope:
If x increases by 1 unit, by how much does y increase or decrease?
Deals with sampling distribution of sample slope from one sample to
another.
Can use CLT to say the sample slope is approximately normal for large
n.
Confidence interval for the regression line (mean):

For a given value of x, what is the mean value of y?
Deals with sampling distribution of sample mean from one sample to
another.
Can use CLT to say the sample mean is approximately normal for large
n.
Prediction interval for the regression line:

For a given value of x, what are the individual values of y?
Deals with distribution of individuals about the regression line.
The CLT is NOT useful in this case. We need to know the distribution of
the errors.
+ 56
Confidence Intervals for the

Population Regression Line
We consider the problem of finding a confidence

interval for the unknown population mean at a
given value of X, which we shall denote by x*.
First, recall that the population mean at X = x*

is given by
+ 57

Anestimator of this unknown quantity is the
value of the estimated regression equation at
X = x*, namely:
+ 58
How many Populations are

Sampled?
One for each unique value of X = x*

How many populations were sampled in
the election data?
+ 59
How Many Means and Variances

do we Have?
___ means
___ variances
However, we assume that
+ 60
Variance of mean
+ 61
Estimated Variance of mean

+ 62

A 100(1 a)% confidence interval for
the population mean of Y at X = x*, is given by
where t/2, n-2 is the 100( 1 /2)th quantile of the t-

distribution with n - 2 degrees of freedom.
+ 63

Does the confidence interval for the mean y-value get
narrower as n gets bigger?
What happens to the width of the confidence interval as

your desired x-value is farther from the mean for x?
+ 64
Prediction Intervals for the Actual

Value of Y
+ 65
Prediction Intervals for the

actual value of Y
+Dummy (Indicator)
Variable Regression
+ 67
Dummy Variable Regression

Experiment: Does increasing calcium intake
decrease blood pressure?
For placebo group:
For calcium group:

+ 68

For placebo group:
For calcium group:

+ 69
21 total men; 10
took the calcium
supplement, and 11
took the placebo.
+ 70
What if we had the intercept plus two variables, one an

indicator of taking the placebo and one an indicator of
taking the calcium supplement?
The columns of X must be linearly independent; that is,

the matrix X must be of full rank.
+ 71

+ 72

+ 73

+ 74
my.lm<-lm(y~x)
summary(my.lm)
Output:
Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.2727 2.2266 -0.122 0.904
x 5.2727 3.2267 1.634 0.119
Calcium group mean = 5, Placebo group mean = -0.27.

Difference between means = 5.27, as expected.
+ 75

+
Geometry
+ 77
Geometric Interpretation of
LetV be the vector space spanned by the
columns of our design matrix X.
Then is the projection of the vector y down
into the vector space V.
Is a in V?
+ 78
Projection (Hat) Matrix:
Projection is like a shadow of the real thing

+ 79
+ 80
+ 81
Any vector in V multiplied by the error vector is 0:

+ 82
Is it true that
What is
+
ANOVA Table
+
ANalysis Of VAriance (ANOVA)
The original sample variance of y is the total variance,

measuring the total amount of variability in the data
set.
The ANOVA table breaks that variability into two parts:

one explained by x, and the leftovers.
+ 85
ANalysis Of VAriance (ANOVA)

ANOVA Table
Were considering two sources of variation using this
regression model:
Variation that can be explained by the model
Variation due to randomness or other variables not

considered.
Data = Fit + Residual
yi = ( 0 + 1xi) + ( i)
The regression model, whether using dummy variables or not,

is not really about variances; the point in a regression model is
to study how y changes with x.
86
+ 87
ANOVA Table
SST = RSS + SSReg

+
ANOVA Table
All else held constant:
If the strength of the correlation increases,

_____________ decreases.
If the slope of the line gets steeper, _______________

increases.
+
ANOVA Table
Degre
es of
Freedo Sums of Mean F- P-
m Squares Squares Statistic value
Regressi
on 1
(Model)
Residual
n2
(Error)
Total
n1

Stat 608 Chapter 2

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Stat 608 Chapter 2

Caricato da

Copyright:

Formati disponibili

+

Stat 608 Chapter 2 (and 5)

Simple Linear regression

On the next slide we see a strong (correlation

Aside: I wouldnt be able to conclude just from the

where the unknown parameters 0 and 1

is also a linear model, even though it graphs a parabola,

In other words, the difference between what

where ei is the random error in Yi and is such

For now, we assume the errors have constant

We have two equations

The least squares regression

For more information about this linear model, type

To plot your x and y variables on a scatterplot, use:

For more information, syntax, and sometimes examples on a function

Interpret the slope in context.

It can also be found in the section on First Order

The gradient and Hessian:

All models are wrong,

Covariance is similar. It is the numerator of

Correlation does not change if the explanatory and

Correlation is always between -1 and 1.

Correlation closer to -1 or +1 indicates a stronger

When both x and y are above average, or both x and y are

Sample covariance is the numerator of sample correlation.

The correlation between the

percents of states voting for

Obama in 2008 and 2012 is

This indicates a strong

Tocalculate the correlation matrix, divide by

where and are the expected value and

This result only depends on the existence of

The bias of a statistic is the difference between its

Inferences About the Slope and

Since the errors are normally distributed, yi = 0 +

Of course, is unknown, so we estimate it using the

The margin of error is half the width of a confidence interval.

, the variance of the errors, is estimated by MSE. So the

For our election example, R output gives:

Coefficients: Estimate Std. Error t value

(Intercept) -5.13608 1.72705 -2.974

states50$Obama_08 1.05610 0.03363 31.401

Hypothesis Test for whether the slope = 0:

Objectives: Regression Line

Develop hypothesis tests and confidence

Slope vs. regression line

Slope vs. regression line

Slope vs. regression line

Confidence interval for the regression line (mean):

Prediction interval for the regression line:

Confidence Intervals for the

We consider the problem of finding a confidence

First, recall that the population mean at X = x*

Confidence Intervals for the

How many Populations are

One for each unique value of X = x*

How Many Means and Variances

Estimated Variance of mean

Confidence Intervals for the

the population mean of Y at X = x*, is given by

where t/2, n-2 is the 100( 1 /2)th quantile of the t-

Confidence Intervals for the

What happens to the width of the confidence interval as

Prediction Intervals for the Actual