Linear Regression

ENVR 210: Linear Regression and Correlation
Analysis in R
As part of your narrative to each answer, please be sure to copy all
supporting graphs, statistics, and R code into this document.
Question 1: Linear regression analysis without data transformation
Load data
y=c(5.39,5.73,6.18,6.42,6.77,7.11,7.46,7.7
1,8.15,8.5);
x=c(4,5,6,7,8,9,10,11,12,13);
dat=cbind(x,y);
dat=as.data.frame(dat);
Calculate the linear regression (x is predictor variable, y is response

variable)
o What is the equation of the regression line?
- Y = 2.939*X 11.093
o What is the null hypothesis? What is the p-value of the
regression? Do you reject or fail to reject the null hypothesis?
- For this dataset, we reject the null hypothesis (2.607e-12 <
2.93915)
o What is the r^2? Is it good or bad?
- R^2 is 0.998. This is good because larger values indicate
residual variation.
Plot the best fit line on a scatterplot of the data
o How does this look in relation to the data?
The 99.8% of the R^2 represents the strong best fit line
because of its close proximity to the residuals.
Plot the residual charts
o
o
Linearity: Residuals vs Fitted

Do the data appear to be linear?
- The data appears to be linear thus a relationship.
Independently sampled values, normally distributed errors:
Normal Q-Q
Do the points line up on the 45-degree line?
- Yes the points line up on the 45-degree line. Our data are
normally distributed, thus normal distribution from the
residuals.
o Constant variance: Scale-Location
Is the spread of the points the same?
- Since the spread of the points are the same there is
constant variance.
o Provide 1 statistic and 1 chart that show that transforming
data can not improve regression performance.
1. We still reject the null hypothesis and have not improved our
regression performance from (r2 = 0.9886, p = 2.067e-12) to (r2 =
0.9882, p = 3.347e-09).
2. The linear regression line still fits the data well and has not
improved significantly.
Question 2: Linear regression analysis with transformation
Load data
Number=c(8398,239,728,758,1453,75,27,915,67,4,28,1,168
,7,16,7,3);
Distance=c(364,357,343,251,216,133,115,90,88,58,54,54,5
3,47,25,16,8);
dat=cbind(Number,Distance);
dat=as.data.frame(dat);
Calculate the linear regression (Distance is predictor variable,

Number is response variable)
o What is the equation of the regression line?
- Y = 0.03522*X + 106.93396
o What is the null hypothesis? What is the p-value of the
regression? Do you reject or fail to reject the null hypothesis?
- For this dataset, we reject the null hypothesis (0.01629 <
0.03522)
o What is the r^2? Is it good or bad?
- 0.2831 is the r^2. It is bad because of the lower value.
Plot the best fit line on a scatterplot of the data
o How does this look in relation to the data?
Plot the residual charts

Do the data appear to be linear?
- No, the data is not linear
o Independently sampled values, normally distributed errors:
Normal Q-Q
- The points do not line up on the 45-degree line.
o Constant variance: Scale-Location
- No, the spread of the points are not the same.
What are your conclusions about the regression?
- Equation of the regression line is Y = 0.03522*X +
106.93396
- We reject the null hypothesis that the slope of the line
is 0.01629, and we conclude that the linear regression
line is insignificant (r2=0.2831, p=0.01629)
- The correlation coefficient is low, but our residual plots
offers insight for improvement
o
Could you improve the model fit by transforming the data? (Hint:
Yes, you can). If so, answer the following questions based on logtransformed data:
o
o
o
o
What is the equation of your regression line?

- Y = 0.3380*X + 1.3022
Did the significance of the regression change? Do you reject
or fail to reject the null hypothesis?
- We reject the null hypothesis at 3.3% because (0.0002429
< 0.3380)
Did the r^2 change? Is it good or bad?
- The r^2 changed from 0.2831 to 0.5773 which is good
because the larger the value the better.
Plot best fit line on a scatterplot of the transformed data
how does it look?
The best fit line is not perfect but it has improved from the
prior data.
Step through examining each of the assumptions of linear

regression

Does the data appear to be linear?
- The red line is not straight thus the data does not
have a perfect linear relationship. We can see that
observations 2, 12 and 17 have the largest residuals.
Independently sampled values, normally distributed

errors: Normal Q-Q
The points do not line up on the 45-degree line
because the data is not normal distributed from the
residuals.
Constant variance: Scale-Location

There are points far away from each other like the
highest point on
the other side of Cooks distance.
Did transforming the data improve the linear fit?
- Yes transforming the data improved the linear fit but not
significant to create a perfect linear relationship.

Linear Regression

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Linear Regression

Caricato da

Copyright:

Formati disponibili

ENVR 210: Linear Regression and Correlation

Calculate the linear regression (x is predictor variable, y is response

Plot the residual charts

Linearity: Residuals vs Fitted

Question 2: Linear regression analysis with transformation

Calculate the linear regression (Distance is predictor variable,

Plot the residual charts

Linearity: Residuals vs Fitted

What is the equation of your regression line?

Step through examining each of the assumptions of linear

Linearity: Residuals vs Fitted

Independently sampled values, normally distributed

Constant variance: Scale-Location

Potrebbero piacerti anche