Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Regression:
fitting lines to data
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
4A Least-squares regression line 133
The aim of linear regression is to model the association between two numerical variables
by using a simple mathematical relation, the straight line. Knowing the equation of this line
gives us a better understanding of the nature of the association. It also enables us to make
predictions from one variable to another – for example, a young boy’s adult height from his
father’s height.
The easiest way to fit a line to bivariate data is to construct a scatterplot and draw the line
‘by eye’. We do this by placing a ruler on the scatterplot so that it seems to follow the
general trend of the data. You can then use the ruler to draw a straight line. Unfortunately,
unless the points are very tightly clustered around a straight line, the results you get by using
this method will differ a lot from person to person.
The most common approach to fitting a straight line to data is to use the least squares
method. This method assumes that the variables are linearly related, and works best when
there are no clear outliers in the data.
Some terminology
To explain the least squares method, we need to define several terms.
The scatterplot shows five data points, (x1 , y1 ),
y
(x2 , y2 ), ( x3 , y3 ), (x4 , y4 ) and (x5 , y5 ).
A regression line (not necessarily the least (x5, y5)
d5
squares line) has also been drawn on the regression line
(x3, y3)
scatterplot.
(x2, y2) d3 d4
The vertical distances d1 , d2 , d3 , d4 and d5 of (x4, y4)
d2
each of the data points from the regression line
are also shown. d1
(x1, y1)
These vertical distances, d, are known as
x
residuals.
The assumptions for fitting a least squares line to data are the same as for using the
correlation coefficient, r. These are that:
the data is numerical the association is linear there are no clear outliers.
∗ In mathematics you are used to writing the equation of a straight line as y = mx + c. However, statisticians
write the equation of a straight line as y = a + bx. This is because statisticians are in the business of building
linear models. Putting the variable term second in the equation allows for additional variable terms to be
added; for example, y = a + bx + cz. While this sort of model is beyond Further Mathematics, we will
continue to use y = a + bx to represent the equation of the regression line because it is common statistical
practice.
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
4A 4B Determining the equation of the least squares line 135
Exercise 4A
Basic ideas
1 What is a residual?
2 The least-squares regression line is obtained by:
A minimising the residuals
B minimising the sum of the residuals
C minimising the sum of the squares of the residuals
D minimising the square of the sum of the residuals
E maximising the sum of the squares of the residuals.
3 Write down the three assumptions we make about the association we are modelling
when we fit a least squares line to bivariate data.
Warning!
If you do not correctly decide which is the explanatory variable (the x-variable)
and which is the response variable (the y-variable) before you start calculating the
equation of the least squares regression line, you may get the wrong answer.
How to determine the equation of a least squares line using the formula
The heights (x) and weights (y) of 11 people have been recorded, and the values of the
following statistics determined:
x̄ = 173.3 cm s x = 7.444 cm ȳ = 65.45 cm sy = 7.594 cm r = 0.8502
Use the formula to determine the equation of the least squares regression line that enable
weight to be predicted from height. Calculate the slope and intercept correct to two
significant figures.
Steps
1 Identify and write down the EV: height (x)
explanatory variable (EV) and response RV: weight (y)
variable (RV). Label as x and y, respectively.
Note: In saying that we want to predict weight
from height, we are implying that height is the EV.
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
136 Core Chapter 4 Regression: fitting lines to data
How to determine and graph the equation of a least squares regression line using
the TI-Nspire CAS
The following data give the height (in cm) and weight (in kg) of 11 people.
Height (x) 177 182 167 178 173 184 162 169 164 170 180
Weight (y) 74 75 62 63 64 74 57 55 56 68 72
Determine and graph the equation of the least squares regression line that will enable
weight to be predicted from height. Write the intercept and slope correct to three
significant figures.
Steps
1 Start a new document by pressing / + N.
2 Select Add Lists & Spreadsheet. Enter the data
into lists named height and weight, as shown.
3 Identify the explanatory variable (EV) and
the response variable (RV).
EV: height
RV: weight
Note: In saying that we want to predict weight from
height, we are implying that height is the EV.
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
4B Determining the equation of the least squares line 137
How to determine and graph the equation of a least squares regression line using
the ClassPad
The following data give the height (in cm) and weight (in kg) of 11 people.
Height (x) 177 182 167 178 173 184 162 169 164 170 180
Weight (y) 74 75 62 63 64 74 57 55 56 68 72
Determine and graph the equation of the least squares regression line that will enable
weight to be predicted from height. Write the intercept and slope correct to three
significant figures.
Steps
1 Open the Statistics application
and enter the
data into columns labelled height
and weight.
2 Tap to open the Set
StatGraphs dialog box and
complete as shown.
Tap Set to confirm your selections.
3 Tap in the toolbar at the top of
the screen to plot the scatterplot in
the bottom half of the screen.
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
138 Core Chapter 4 Regression: fitting lines to data 4B
Exercise 4B
Note: In fitting lines to data, unless otherwise stated, the slope and intercept should be calculated accurate to
a given number of significant figures as required in the Further and General Mathematics study designs.
If you feel that you need some help with significant figures see the video on the topic,
accessed through the Interactive Textbook.
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
4B 4B Determining the equation of the least squares line 139
2 We wish to find the equation of the least squares regression line that enables life
expectancy in a country to be predicted from birth rate.
a Which is the response variable (RV) and which is the explanatory variable (EV)?
b Use the formula to determine the equation of the least squares regression line that
enables life expectancy (y) to be predicted from birth rate (x), where:
r = −0.810 x̄ = 34.8 s x = 5.41 ȳ = 55.1 sy = 9.99
Write the equation in terms of life expectancy and birth rate with the y-intercept and
slope written correct to two significant figures.
3 We wish to find the equation of the least squares regression line that enables distance
travelled by a car (in 1000s of km) to be predicted from its age (in years).
a Which is the response variable (RV) and which is the explanatory variable (EV)?
b Use the formula to determine the equation of the least squares regression line that
enables distance travelled (y) by a car to be predicted from its age (x), where:
r = 0.947 x̄ = 5.63 s x = 3.64 ȳ = 78.0 sy = 42.6
Write the equation in terms of distance travelled and age with the y-intercept and
slope written correct to two significant figures.
Sit-ups (x) 52 15 22 42 34 37
Push-ups (y) 37 26 23 51 31 45
Let the number of sit-ups be the explanatory (x) variable. Use your calculator to show
that the equation of the least squares regression line is:
push-ups = 16.5 + 0.566 × sit-ups (correct to three significant figures)
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
140 Core Chapter 4 Regression: fitting lines to data 4B
6 The table shows average hours worked and university participation rates (%) in six
countries.
Hours 35.0 43.0 38.2 39.8 35.6 34.8
Rate 26 20 36 25 37 55
Use your calculator to show that the equation of the least squares regression line that
enables participation rates to be predicted from hours worked is:
rate = 130 − 2.6 × hours (correct to two significant figures)
7 The table shows the number of runs scored and balls faced by batsmen in a cricket
match.
Runs (y) 27 8 21 47 3 15 13 2 15 10 2
Balls faced (x) 29 16 19 62 13 40 16 9 28 26 6
a Use your calculator to show that the equation of the least squares regression line
enabling runs scored to be predicted from balls faced is:
y = −2.6 + 0.73x
b Rewrite the regression equation in terms of the variables involved.
8 The table below shows the number of TVs and cars owned (per 1000 people) in six
countries.
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
4C Performing a regression analysis 141
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
142 Core Chapter 4 Regression: fitting lines to data
Price (dollars)
correlation coefficient is r = −0.964. 20000
15000
We would communicate this information
in a report as follows. 10000
5000
0
0 1 2 3 4 5 6 7 8 9 10
Age (years)
Report
There is a strong negative linear association between the price of these second-hand
cars and their age (r = −0.964).
1 We say ‘on average’ because the line does not pass through every point. Rather, it is a line that is drawn
through the average price of this type of car at each age.
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
4C Performing a regression analysis 143
Price (dollars)
30000
price = 35 100 − 3940 × age
25000
a Interpret the slope in terms of the 20000
variables price and age. 15000
b Interpret the intercept in terms of the 10000
variables price and age. 5000
0
0 1 2 3 4 5 6 7 8 9 10
Age (years)
Solution
a The slope predicts the average change On average, the price of these
(increase/decrease) in the price for each 1-year cars decreases by $3940 each
increase in the age. Because the slope is year.2
negative, it will be a decrease.
b The intercept predicts the value of the price of On average, the price of these
the car when age equals 0; that is, when the car cars when new was $35 100.
is new.
2 A general model for writing about the slope is that, on average, the RV increases/decreases by the value of
the ‘slope’ for each unit increase in the EV.
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
144 Core Chapter 4 Regression: fitting lines to data
Price (dollars)
horizontally across to the price axis as 25000
shown, to get an answer of around $14 000. 20000
A more accurate answer is obtained by 15000 (5.5, 13 430)
substituting age = 5.5 into the equation to 10000
obtain $13 430, as shown below. 5000
Price = 35 100 − 3940 × 5.5 0
0 1 2 3 4 5 6 7 8 9 10
= $13 430 Age (years)
With extrapolation, we have no way of knowing whether our prediction is reliable or not.
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
4C Performing a regression analysis 145
Price (dollars)
we are making a prediction within the data. 25000
20000
However, using the regression line to predict 15000
the price of a 10-year-old car would be an 10000
example of extrapolation. This is because we 5000
are working outside the data. 0 extrapolation
−5000
The fact that the regression line would predict 0 2 4 6 8 10 12
a negative value for the car highlights the Age (years)
potential problems of making predictions
beyond the data (extrapolating).
Residuals revisited
Calculating residuals
As we saw earlier (Section 4A), the residuals are the vertical distances between the
individual data points and the regression line.
3 A general model for writing about the coefficient of determination is: ‘r2 % of the variation in the RV is
explained by the variation in the EV’.
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
146 Core Chapter 4 Regression: fitting lines to data
Residuals
residual value = actual data value − predicted value
Residuals can be positive, negative or zero.
Data points above the regression line have a positive residual.
Data points below the regression line have a negative residual.
Data points on the line have a zero residual.
Price (dollars)
30000
that the residual value for the 6-year-old car is
25000
around –$5000. That is, the actual price of the 20000
car is approximately $5000 less than we would 15000
predict. 10000
To determine the value of a residual more 5000
precisely, we need to do a calculation. 0
0 1 2 3 4 5 6 7 8 9 10
Age (years)
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
4C Performing a regression analysis 147
One way of testing this assumption is to plot the regression line on the scatterplot and see
how well a straight line fits the data. However, a better way is to use a residual plot, as this
plot will show even very small departures from linearity.
A residual plot is a plot of the residual value 4000
for each data value against the independent 2000
Residual
variable (in this case, age). Because the mean 0
of the residuals is always zero, the horizontal −2000
zero line (red) helps us to orient ourselves. This −4000
From the residual plot, we see that there is no clear pattern4 in the residuals. Essentially
they are randomly scattered around the zero regression line.
Thus, from this residual plot we can report as below.
Report
The lack of a clear pattern in the residual plot confirms the assumption of a linear
association between the price of a second-hand car.
Report
From the scatterplot we see that there is a strong negative, linear association between
the price of a second hand car and its age, r = −0.964. There are no obvious outliers.
The equation of the least squares regression line is: price = 35 100 − 3940 × age.
The slope of the regression line predicts that, on average, the price of these
second-hand cars decreased by $3940 each year.
The intercept predicts that, on average, the price of these cars when new was $35 100.
The coefficient of determination indicates that 93% of the variation in the price of
these second-hand cars is explained by the variation in their age.
The lack of a clear pattern in the residual plot confirms the assumption of a linear
association between the price and the age of these second-hand cars.
4 From a visual inspection, it is difficult to say with certainty that a residual plot is random. It is easier to see
when it is not random, as you will see in the next chapter. For present purposes, it is sufficient to say that a
clear lack of pattern in a residual plot indicates randomness.
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
148 Core Chapter 4 Regression: fitting lines to data 4C
Exercise 4C
Mark
number and the slope correct to one decimal
40
place.
20
0
0 1 2 3 4 5 6 7 8
Days absent
3 For a 100 km trip, the equation of a regression line that enables fuel consumption of a
car (in litres) to be predicted from its weight (kg) is:
fuel consumption = −0.1 + 0.01 × weight
Complete the following sentences:
a The response variable is .
b The slope is and the intercept is .
c A car weighs 980 kg. The regression line predicts a fuel consumption of
litres.
d This car has an actual fuel consumption of 8.9 litres.
The error of prediction (residual value) is litres.
5 In an investigation of the relationship between the success rate (%) of sinking a putt
and the distance from the hole (in cm) of amateur golfers, the least squares regression
line was found to be:
success rate = 98.5 − 0.278 × distance r2 = 0.497
a Write down the slope of this regression equation and interpret.
b Use the equation to predict the success rate when a golfer is 90 cm from the hole.
c At what distance (in metres) from the hole does the regression equation predict an
amateur golfer to have a 0% success rate of sinking the putt?
d Calculate the value of r, correct to three decimal places.
e Write down the value of the coefficient of determination in percentage terms and
interpret.
B C
3.0 3
1.5
residual
residual
0
0.0
−1.5 −3
−3.0
0 2 4 6 8 10 12 0 2 4 6 8 10 12 14
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
150 Core Chapter 4 Regression: fitting lines to data 4C
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
4C 4C Performing a regression analysis 151
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
152 Core Chapter 4 Regression: fitting lines to data 4C
residual
Regression equation: y = a + bx 0
a = 55.8947
b = −9.30612 −6
r2 = 0.901028
r = −0.949225 0.0 1.0 2.0 3.0 4.0 5.0 6.0
drug dose
Use this information to complete the following report. Call the two variables drug dose
and response time. In this analysis drug dose is the explanatory variable.
Report
From the scatterplot we see that there is a strong relationship between
response time and :r= . There are no obvious outliers.
The equation of the least squares regression line is:
response time = + × drug dose
The slope of the regression line predicts that, on average, response time
increases/decreases by minutes for a 1-milligram increase in drug
dose.
The y-intercept of the regression line predicts that, on average, the response
time when no drug is administered is minutes.
The coefficient of determination indicates that, on average, % of the
variation in is explained by the variation in .
The residual plot shows a , calling into question the use of a linear
equation to describe the relationship between response time and drug dose.
0.15
residual
Use the format of the report given in the previous question to summarise findings of
this investigation. Call the two variables femur length and radius length.
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
4D Conducting a regression analysis using raw data 153
This analysis is concerned with investigating the association between life expectancy (in
years) and birth rate (in births per 1000 people) in 10 countries.
Birth rate 30 38 38 43 34 42 31 32 26 34
Life expectancy (years) 66 54 43 42 49 45 64 61 61 66
Steps
1 Write down the explanatory variable EV: birth
(EV) and response variable (RV). Use RV: life
the variable names birth and life.
2 Start a new document by pressing
/ + N.
Select Add Lists & Spreadsheet.
Enter the data into the lists named birth
and life, as shown.
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
154 Core Chapter 4 Regression: fitting lines to data
Steps
1 Write down the explanatory variable EV: birth
(EV) and response variable (RV). Use RV: life
the variable names birth and life.
2 Enter the data into lists as shown.
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
4D Conducting a regression analysis using raw data 155
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
156 Core Chapter 4 Regression: fitting lines to data 4D
Inspect the plot and write your The random residual plot suggests
conclusion. linearity.
Note: When you performed a regression analysis earlier, the residuals were calculated automatically
and stored in list3. The residual plot is a scatterplot with list3 on the vertical axis and birth on the
horizontal axis.
Exercise 4D
1 The table below shows the scores obtained by nine students on two tests. We want to
be able to predict test B scores from test A scores.
Use your calculator to perform each of the following steps of a regression analysis.
a Construct a scatterplot. Name the variables test a and test b.
b Determine the equation of the least squares line along with the values of r and r2 .
c Display the regression line on the scatterplot. d Obtain a residual plot.
2 The table below shows the number of careless errors made on a test by nine students.
Also given are their test scores. We want to be able to predict test score from the
number of careless errors made.
Test score 18 15 9 12 11 19 11 14 16
Careless errors 0 2 5 6 4 1 8 3 1
Use your calculator to perform each of the following steps of a regression analysis.
a Construct a scatterplot. Name the variables score and errors.
b Determine the equation of the least squares line along with the values of r and r2 .
Write answers correct to three significant figures.
c Display the regression line on the scatterplot. d Obtain a residual plot.
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
4D 4D Conducting a regression analysis using raw data 157
3 How well can we predict an adult’s weight from their birth weight? The weights of 12
adults were recorded, along with their birth weights. The results are shown.
Birth weight (kg) 1.9 2.4 2.6 2.7 2.9 3.2 3.4 3.4 3.6 3.7 3.8 4.1
Adult weight (kg) 47.6 53.1 52.2 56.2 57.6 59.9 55.3 58.5 56.7 59.9 63.5 61.2
a In this investigation, which would be the RV and which would be the EV?
b Construct a scatterplot.
c Use the scatterplot to:
i comment on the relationship between adult weight and birth weight in terms of
direction, outliers, form and strength
ii estimate the value of Pearson’s correlation coefficient, r.
d Determine the equation of the least squares regression line, the coefficient of
determination and the value of Pearson’s correlation coefficient, r. Write answers
correct to three significant figures.
e Interpret the coefficient of determination in terms of adult weight and birth weight.
f Interpret the slope in terms of adult weight and birth weight.
g Use the regression equation to predict the weight of an adult with a birth weight of:
i 3.0 kg ii 2.5 kg iii 3.9 kg.
Give answers correct to one decimal place.
h It is generally considered that birth weight is a ‘good’ predictor of adult weight. Do
you think the data support this contention? Explain.
i Construct a residual plot and use it to comment on the appropriateness of assuming
that adult weight and birth weight are linearly associated.
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
158 Core Chapter 4 Regression: fitting lines to data
Review
Bivariate data Bivariate data are data in which each observation involves recording
information about two variables for the same person or thing. An
example would be the heights and weights of the children in a
preschool.
Linear regression The process of fitting a line to data is known as linear regression.
Residuals The vertical distance from a data point to the straight line is called a
residual: residual value = data value − predicted value.
Least squares The least squares method is one way of finding the equation of a
method regression line. It minimises the sum of the squares of the residuals. It
works best when there are no outliers.
The equation of the least squares regression line is given by y = a+ bx,
where a represents the y-intercept of the line and b the slope.
Using the The regression line y = a+ bx enables the value of y to be determined
regression line for a given value of x.
For example, the regression line
cost = 1.20 + 0.06 × number o f pages
predicts that the cost of a 100-page book is:
cost = 1.20 + 0.06 × 100 = $7.20
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
Chapter 4 review 159
Review
Residual plots Residual plots can be used to test the linearity assumption by plotting
the residuals against the EV.
A residual plot that appears to be a random collection of points
clustered around zero supports the linearity assumption.
A residual plot that shows a clear pattern indicates that the association is
not linear.
Skills check
Multiple-choice questions
1 When using a least squares line to model a relationship displayed in a scatterplot, one
key assumption is that:
A there are two variables B the variables are related
C the variables are linearly related D r2 > 0.5
E the correlation coefficient is positive
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
160 Core Chapter 4 Regression: fitting lines to data
Review
4 The least squares regression line y = 8 − 9x predicts that, when x = 5, the value of y is:
A −45 B −37 C 37 D 45 E 53
7 Given that r = 0.733, s x = 1.871 and sy = 3.391, the slope of the least squares
regression line is closest to:
A 0.41 B 0.45 C 1.33 D 1.87 E 2.49
8 Using a least squares regression line, the predicted value of a data value is 78.6. The
residual value is –5.4. The actual data value is:
A 73.2 B 84.0 C 88.6 D 94.6 E 424.4
9 The equation of the least squares line plotted on y
the scatterplot opposite is closest to: 10
9
A y = 8.7 − 0.9x 8
B y = 8.7 + 0.9x 7
6
C y = 0.9 − 8.7x 5
4
D y = 0.9 + 8.7x 3
E y = 8.7 − 0.1x 2
1
0 x
0 1 2 3 4 5 6 7 8 9 10
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
Chapter 4 review 161
Review
The following information relates to Questions 11 to 14.
Weight (in kg) can be predicted from height (in cm) from the regression line:
weight = −96 + 0.95 × height, with r = 0.79
13 Noting that the value of the correlation coefficient is r = 0.79, we can say that:
A 62% of the variation in weight can be explained by the variation in height
B 79% of the variation in weight can be explained by the variation in height
C 88% of the variation in weight can be explained by the variation in height
D 79% of the variation in height can be explained by the variation in weight
E 95% of the variation in height can be explained by the variation in weight
14 A person of height 179 cm weighs 82 kg. If the regression equation is used to predict
their weight, then the residual will be closest to:
A –8 kg B 3 kg C 8 kg D 9 kg E 74 kg
15 The coefficient of determination for the data 100
displayed in the scatterplot opposite is close to 90
80
r2 = 0.5. 70
The correlation coefficient is closest to: 60
Mark
50
A −0.7 B −0.25 40
C 0.25 D 0.5 30
20
E 0.7
10
0
0 1 2 3 4 5 6 7 8
Days absent
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
162 Core Chapter 4 Regression: fitting lines to data
Review
Extended-response questions
1 In an investigation of the relationship between the hours of sunshine (per year) and
days of rain (per year) for 25 cities, the least squares regression line was found to be:
hours o f sunshine = 2850 − 6.88 × days o f rain, with r2 = 0.484
Use this information to complete the following sentences.
a In this regression equation, the explanatory variable is .
b The slope is and the intercept is .
c The regression equation predicts that a city that has 120 days of rain per year will
have hours of sunshine per year.
d The slope of the regression line predicts that the hours of sunshine per year will
by hours for each additional day of rain.
e r= , correct to three significant figures.
f % of the variation in sunshine hours can be explained by the variation in
.
g One of the cities used to determine the regression equation had 142 days of rain and
1390 hours of sunshine.
i The regression equation predicts that it has hours of sunshine.
ii The residual value for this city is hours.
h Using a regression line to make predictions within the range of data used to
determine the regression equation is called .
2 The cost of preparing meals in a school canteen is linearly related to the number of
meals prepared. To help the caterers predict the costs, data were collected on the cost
of preparing meals for different levels of demands. The data are shown below.
Number of meals 30 35 40 45 50 55 60 65 70 75 80
Cost (dollars) 138 154 159 182 198 198 214 208 238 234 244
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
Chapter 4 review 163
Review
3 In the scatterplot opposite, average 40 000
annual female income, in dollars, is
plotted against average annual male
25 000
20 000
30 000 40 000 50 000 60 000 70 000
Male income (dollars)
The equation of the least squares regression line for predicting female income from
male income is female income = 13 000 + 0.35 × male income.
a What is the explanatory variable?
b Complete the following statement by filling in the missing information.
From the least squares regression line equation it can be concluded that, for these
countries, on average, female income increases by $ __________ for each $1000
increase in male income.
c i Use the least squares regression line equation to predict the average annual
female income (in dollars) in a country where the average annual male income
is $15 000.
ii The prediction made in part c i is not likely to be reliable. Explain why.
©VCAA (2000)
4 We wish to find the equation of the least squares regression line that will enable height
(in cm) to be predicted from femur (thigh bone) length (in cm).
a Which is the RV and which is the EV?
b Use the following summary statistics to determine the equation of the least squares
regression line that will enable height (y) to be predicted from femur length (x).
r = 0.9939 x̄ = 24.246 s x = 1.873 ȳ = 166.092 sy = 10.086
Write the equation in terms of height and femur length. Give the slope and intercept
accurate to three significant figures.
c Interpret the slope of the regression equation in terms of height and femur length.
d Determine the value of the coefficient of determination and interpret in terms of
height and femur length.
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018
164 Core Chapter 4 Regression: fitting lines to data
Review
5 The data below show the height (in cm) of a group of 10 children aged 2 to 11 years.
Height (cm) 86.5 95.5 103.0 109.8 116.4 122.4 128.2 133.8 139.6 145.0
Age (years) 2 3 4 5 6 7 8 9 10 11
The task is to determine the equation of a least squares regression line that can be used
to predict height from age.
a In this analysis, which would be the RV and which would be the EV?
b Use your calculator to confirm that the equation of the least squares regression
line is: height = 76.64 + 6.366 × age and r = 0.9973.
c Use the regression line to predict the height of a 1-year-old child. Give the answer
correct to the nearest cm. In making this prediction are you extrapolating or
interpolating?
d What is the slope of the regression line and what does it tell you in terms of the
variables involved?
e Calculate the value of the coefficient of determination and interpret in terms of the
relationship between age and height.
f Use the least squares regression equation to:
i predict the height of the 10-year-old child in this sample
ii determine the residual value for this child.
g i Confirm that the residual plot for this
0.15
analysis is shown opposite.
residual
0.00
ii Explain why this residual plot suggests that
0.15
a linear equation is not the most appropriate
−0.30
model for this relationship.
2 3 4 5 6 7 8 9 1011
age
Cambridge Senior Maths AC/VCE ISBN 978-1-316-61622-2 © Jones et al. 2016 Cambridge University Press
Further Mathematics 3&4 Photocopying is restricted under law and this material must not be transferred to another party. Updated January 2018