Sei sulla pagina 1di 5

REGRESSION ANALYSIS

A full regression analysis involves several processes which include:


 constructing a scatterplot to investigate nature of an association
 calculating the correlation coefficient to indicate strength of the relationship
 determining the equation of the regression line
 interpreting the coefficients, the y-intercept ( a ) and the slope ( b ) of the least squares
line y=a+bx
 using the coefficient of determination to indicate the predictive power of the
association
 using the regression line to make predictions
 calculating the residuals and using a residual plot to test the assumption of linearity
 writing a report on your findings

Example
5 4 4 6 6
Life expectancy (years) 66 43 49 64 61
4 2 5 1 6
Birth rate (per 3 4 4 3 3
30 38 34 31 26
thousand) 8 3 2 2 4

Steps:
1. Construct a SCATTERPLOT on your calculator
after identifying the EV & RV.

2. Calculate the CORRELATION COEFFICIENT (4dp).


Use calculator,
What type of relationship does 1. & 2. indicate?

Is it appropriate to now fit a least squares line to the data?

3. Determine the EQUATION of the REGRESSION LINE (2dp).


Use calculator: a= b=
(y intercept) (gradient)

∴ y=¿

4. INTERPRET the COEFFICIENTS of the regression line i.e. the slope and intercept
For the regression equation y=a+bx :
 the slope (b)estimates the average change (increase/decrease) in the response
variable ( y ) for each one-unit increase in the explanatory variable ( x).
 the intercept (a) estimates the average value of the response variable ( y ) when
the explanatory variable ( x) equals 0.
Slope –

Intercept –

5. Use the regression line to make PREDICTIONS.


What is the life expectancy of a country with a birth rate of 35 (per 1000 people)?

When using a regression line to make predictions, we must be aware that strictly
speaking, the equation we have found applies only to the range of data values used
to derive the equation.

Predicting within the range of data is called interpolation.


(Generally, we can expect a reasonably reliable result).
Predicting outside the range of data is called extrapolation.
(We have no way of knowing whether prediction is reliable or not).

6. Use COEFFICIENT OF DETERMINATION to measure predictive power of the


linear relationship.
r2 ≈ (4dp)
Thus we can conclude that:

NB: This is a significant/worthwhile predictive power as r 2 is greater than 30%

7. RESIDUALS
Residuals (error of prediction) are the vertical distance between the individual data
points and the regression line. To calculate, use:
Residual = actual y-value – predicted y-value
 Data points above the regression line have a positive residual
 Data points below the regression line have a negative residual
 Data points on the regression line have a zero residual
NB: The sum of all the residuals always adds to 0 (or very close after rounding).

Example 1
The equation of a regression line that enables hand span to be predicted from
height is: Hand span=2.9+0.33 × Height
A person is 160 cm tall and has an actual hand span of 58.5 cm.
Using this regression equation, their predicted hand span is?
The residual value for this person is?
Testing the assumption of linearity.
A better way to test linearity is to create a residual plot. We plot the residual value
of each data value against the explanatory variable (x-axis). As the mean of
residuals is always zero, the horizontal zero line helps us orient ourselves. This line
corresponds to the regression line.
A residual plot can be done by hand or on your calculator.
Using your calculator:

Interpreting residuals
 No pattern indicates the current model is most likely the best.
 Pattern indicates another model may be more appropriate.

The points of the residual are randomly


scattered above and below the x-axis. The
original data probably had a linear relationship.

Conclusion
The lack of a clear pattern in the residual plot
confirms the assumption of a linear association.
y=a+bx is an appropriate model.

The points of the residuals show a curved


pattern.

Conclusion
The residual plot indicates a distinct pattern
suggesting that a non-linear model could be
more appropriate.

The points of the residuals show a curved


pattern.

Conclusion
The residual plot indicates a distinct pattern
suggesting that a non-linear model could be
more appropriate.
8. Write a REPORT on findings (combine all the above information together).

From the scatterplot we see that there is a strong negative, linear relationship
between life expectancy & birth rate, r = -0.8069. There are no obvious outliers.
Equation of least squares regression line is: Life expectancy = 105.37 – 1.44 ×
Birth rate. The slope predicts that, on average, life expectancy decreases by 1.44
years for an increase in birth rate of one birth per 1000 people. The coefficient of
determination indicates that 65.11% of the variation in life expectancy is explained
by the variation in birth rate. A residual plot shows the lack of a clear pattern and
confirms that the use of a linear equation to describe the relationship between life
expectancy and birth rate is appropriate.

Example 2
A student fits a least squares line to a set of bivariate data as
shown in the scatterplot opposite.

The residual plot for this least squares line would look like:
4C & 4D

Potrebbero piacerti anche