Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
2.1 Correlation
Clive is still trying to work out what happened to him. Why was he refused
in his first application? Why him? What were the reasons for this
temporary reverse? Remember that his cousin Tommy was accepted first
time.
In this section, you will learn how to determine if a link may exist
between two characteristics of a situation (two variables). This idea is
now studied at the high school level, whereas before, it was only
encountered at CEGEP.
In this section, you will have to construct a distribution table for two
variables, construct a scattergram, find and interpret the coefficient
of correlation, and draw a line of regression and find its equation.
Distribution Tables
Section 2 2.1
Mathematics 536 Section 2
In order to see the overall picture, Clive made a Distribution table (or a
table of values).
2.2 Section 2
Mathematics 536 Section 2
You can now construct the graph associated with this situation.
Figure 2.3
Section 2 2.3
Mathematics 536 Section 2
Figure 2.4
90
75
60
45
1 2 3 4 5 6 7 8 9 10
Time (h)
2.4 Section 2
Mathematics 536 Section 2
ON ENTER ENTER
L1 ENTER L2
ENTER ENTER
(You have just determined the type of graph, the parameters determining the axes and
the form of the points (choices are square, cross or dot).
Determine the size of the window. For this, it is necessary to remember the
examination results. To define the window range for each parameter, you must
consider the highest and lowest values for each variable. Furthermore, the scale
chosen depends on the number of “steps” you want on each axis - preferably between
5 and 10 - in passing from lowest to highest.
Section 2 2.5
Mathematics 536 Section 2
Figure 2.5
Looking at the scattergram, are you capable of stating yet that the study time has
influenced a little, somewhat or a lot the results in the math test?____________
_____________________________________________________________
In fact, you still don’t have sufficient tools to be able to determine if such a relation
exists. There is, however, a measure which would allow you to evaluate precisely such
a relation. It is called the coefficient of correlation. Its usefulness in a wide range
of areas should convince you of its value and that its place in this course is justified.
Coefficient of correlation
Looking at the scattergram, you probably notice that the points follow a certain pattern.
They are grouped, to a certain extent, around an imaginary line.
From left to right, are the points in the scattergram ascending or descending?
__________________________________________
Yes, they ascend from left to right. Consequently, we say that the correlation is
positive.
Correlation represents the relation which exists between two variables. In fact, the
correlation measures the strength of the linear relation. It can be strongly or
weakly positive, strongly or weakly negative, or simply non-existent.
2.6 Section 2
Mathematics 536 Section 2
In fact, the greater the time spent studying, the higher the mark achieved. You can see
that the two variables go in the same direction. In such a case, we say that the
correlation is positive. As well as determining the sign (+ or -) of the correlation, it is
possible to evaluate its strength. To do this, we draw an ellipse (a type of curve
somewhat in the shape of an egg, but symmetrical in both directions - an oval).
What is the length of the major axis (L) of the ellipse in figure
2.4?_________________
The coefficient of correlation (r) can be quickly calculated using the formula
l
r=" 1 − where L and l are the lengths of the major and minor axes of the
L
ellipse which surrounds, as closely as possible, the set of points in the scattergram.
The coefficient of correlation is a measure of the degree of dependence between two
variables of a statistical or probabilistic nature.
Figure 2.6
Section 2 2.7
Mathematics 536 Section 2
Look carefully at the preceding formula for the coefficient of correlation and answer the
following questions.
l
What does represent?___________________________________
L
______________________________________________________
l
In fact, , represents the ratio between the lengths of the minor and major axes. It tells
L
you the number of times the minor axes is included in the major axis. For example, if
l 1
= , then L is four times l, or l is one quarter of L.
L 4
l
What is the largest possible value of
? _____________________
L
Justify your answer.______________________________________________
_____________________________________________________________
l
Did you find that the largest possible value of is 1? Well done! This happens when
L
the major axis has the same length as the minor axis. The ellipse, in this case, would
take the form of a circle. The coefficient of correlation would be r = ±(1 - 1) = 0.
Independent of the sign, 0 is the smallest possible value of r. In this case, we would
say that there is no correlation between the two variables being considered.
l
What is the smallest possible value of
? ___________________
L
Justify your answer._______________________________________________
_____________________________________________________
2.8 Section 2
Mathematics 536 Section 2
l
The smallest possible value for is 0. This happens when l = 0.
L
In such a case, the ellipse is so “flattened out” that it becomes a straight line. There is
then said to be a perfect linear correlation.
No correlation
Can you now say, taking into account what you have learned, that the length of
time spent studying has a small, medium or large effect, or has no effect at all,
on the results in the mathematics examination?
_____________________________________________________________
__________________________________________________________________
__________________________________________________________________
We can say that, since the correlation coefficient r 0.6, the results are strongly
influenced by the amount of time spent studying. In such a case, we can talk about a
cause and effect relationship. However, it can happen that two variables having a
strong linear correlation are not linked by a causal relationship. This is a question of
judgement. For example, if, in a survey, we find that there is a strong linear correlation
(r 0.85) between the wearing long pants and the ability to read well, could we
conclude that it is enough to wear long pants to be able to read well? You must rely on
the important law of common sense to avoid falling into such a trap. There is always a
danger in statistical studies of interpreting erroneously the results of an analysis. Even
professionals are not protected from the risk of making a false interpretation, and this
can result in the making of very unfortunate decisions.
Section 2 2.9
Mathematics 536 Section 2
Further, the opposite effect can happen: we can decide that there is no relation of
correlation when, in fact, there is a correlation, but one that you have not considered
(or, in the case of this class, that we have not studied). Read the following box to learn
of some examples of these phenomena.
In certain cases, there may be no linear correlation, but there may be another sort of
relation. These situations, which are just as realistic, are not studied in this course.
Here are a couple of examples.
2.10 Section 2
Mathematics 536 Section 2
y • No correlation
r 0 • The points in the scattergaram do not follow a line
• The variables are completely independent of one
another
=0
Section 2 2.11
Mathematics 536 Section 2
Check if you have completely mastered the idea of the coefficient of correlation.
Figure 2.11
Heart rate at rest and training time
Heart
Rate
85
80
75
70
65
60
55
50
45
40
0 5 10 15 20 25 30 35
Weekly training time (h)
Describe in your own words the relation between the two variables.
To do this, find the value of the coefficient and interpret the result.
______________________________________________________
______________________________________________________________
______________________________________________________________
2.12 Section 2
Mathematics 536 Section 2
3 .4 cm
You should have found r -0.71 because r = 1 − . − 0 .7 1 .
11. 8 cm
Note the use of the symbol for an approximation.
Since the two characteristics are related in an inverse variation (negative correlation),
you can conclude that an increase in training time is associated with a strong
reduction in heartbeat rate at rest.
You have seen how to measure the strength of the interdependence between two
variables linked by a linear relation (a straight line). Can you now use this line to
predict other values? For example, if a person who wants to remain in good health is
advised to maintain his heartbeat at 60 per minute, how many hours should hew train
each week? To reply to this kind of question, you need one last idea: the line of
regression.
LINE OF REGRESSION
Reconsider Clive’s scattergram (test results vs. Study time).
Mark
(%)
90
75
60
45
1 2 3 4 5 6 7 8 9 10 Time (hr)
Section 2 2.13
Mathematics 536 Section 2
These points represent the “ideal” points for this situation, or those which best
represent the relation between the two variables. So, with the help of this line of
regression, you can predict the mark from the number of hours studied, and vice
versa.
So you have found that “if the trend continues”, the first student would have scored
about 89% by studying 8.5 hours and the second must have studied about 5 hours to
obtain a mark of 75%. Don’t forget that these answers are approximate since the
drawing and reading the line is somewhat imprecise. There is a way to increase the
precision of the results by finding the equation of the line.
How can you find the equation of a straight line given its graph?____________
_____________________________________________________________
_____________________________________________________________
Since you studied this last year and reviewed it this year, you should have
remembered that the equation of a line is y = ax + b where a and b are the parameters
(they are not the variables) which determine the line. The equation can be determined
from two points, from the rate of change and one point or from the rate of change and
the initial value.
2.14 Section 2
Mathematics 536 Section 2
y Figure 2.13
dependent
variable
P2
y2 - y1
P1
x2 - x1
(0, b)
Section 2 2.15
Mathematics 536 Section 2
Mark
(%)
90
75
60
45
1 2 3 4 5 6 7 8 9 10 Time (hr)
On the graph above, draw the line of regression, choose two points on the line
and use them to find the values of a and b and to give the equation of the line.
Show your work, then compare to the answer given below.
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
The line passes through the points (1, 58) and (11, 98) which are the vertices of the
ellipse surrounding the scatter of points.
y2 − y1
Slope: (1, 58) and (11, 98) a =
x2 − x1
98 - 58
= =4
11 - 1
Initial value: a = 4 and (1, 58). so y = ax + b ⇒ 58 = 4(1) + b ⇒ b = 54
2.16 Section 2
Mathematics 536 Section 2
In the relation between the number of hours of study and the mark obtained,
what is the interpretation of the numbers 4 and 54 in the equation?
_____________________________________________________________
_____________________________________________________________
So far, you have learned how to calculate the value of the coefficient of correlation and
to find the equation of the line of regression. Note, however, that these values are
approximate and depend on the accuracy with which they are calculated. The
graphics calculator allows you to calculate these same values, but in less time and
with more accuracy.
- clear the lists you intend to use (eg. L1 and L2) and enter the values
- press and
STAT ENTER
CALC L1 ,
LINREG (aX + b)
L2 ENTER
.
N.B. The values shown are: a for the rate of change, b for the initial value
and r for the coefficient of correlation
Virginia is in the same class as Clive, but was absent for the test. She wants to obtain
a mark of exactly 82%.
Calculate the exact number of hours she must spend studying in order to be
fairly certain to obtain a mark of exactly 82%.
_____________________________________________________________
_____________________________________________________________
__________________________________________________________________
__________________________________________________________________
__________________________________________________________________
__________________________________________________________________
__________________________________________________________________
Section 2 2.17
Mathematics 536 Section 2
If you cannot work out how to use the graphics calculator, consult the User’s Guide,
ask a fellow-student or see your teacher. Now that you have these tools to measure
the strength of the relation between variables, you must be careful in interpreting these
values. The two most common errors are in ignoring the effect of a third variable when
this has an effect on the first two, and in drawing conclusions from a sample which is
too small.
By the way, did you find that Virginia needs to study 7 hours in order to obtain a mark
of 82%?
_____________________________________________________________________________
The calculator also allows you to find other correlations which are not linear. By
comparing the coefficients of correlation, you can determine which correlation is the
most appropriate.
2.18 Section 2
Mathematics 536 Section 2
2.2 Practise
1. For each of the scattergrams in figure 2.16 (on the next page), state whether the
linear correlation is positive, negative or non-existent. If there is a linear
correlation, indicate if it is strong. If there is a non-linear correlation, state what
kind.
Figure 2.15
Section 2 2.19
Mathematics 536 Section 2
Figure 2.16
y y y y
a) b) c) d)
y x x x x
y y y
e) f) g) h)
x y x y x x
y y
i) j) k) l)
x x x x
iv) find the value of the linear coefficient of correlation in two different ways
vi) find the equation of the line of regression in two different ways and
compare the answers
vii) calculate an ordered pair which fits to the equation and interpret the
meaning of the pair
2.20 Section 2
Mathematics 536 Section 2
a) Number of hours per week spent playing a sport and watching television.
Figure 2.17
Number of hours
spent playing a 0 10 15 20 5 8 35 12
sport (per week)
Number of hours
spent watching 25 20 10 2 15 14 3 8
television (week)
i) independent variable___________________
dependent variable____________________
x
ii) ________________________________________________________________
iv) a)______________________________________________________________
b)______________________________________________________________
v) _________________________________________________________________
vi) a)______________________________________________________________
b)______________________________________________________________
vii) ________________________________________________________________
__________________________________________________________________
Section 2 2.21
Mathematics 536 Section 2
Figure 2.18
Engine capacity 1900 1700 2500 1500 1800 2200 3300
(cm3)
Gas consumption 8.2 7.4 10.1 7.2 8.3 9.5 14.5
(L/100 km)
i) independent variable___________________
dependent variable____________________
x
ii) ________________________________________________________________
iv) a)______________________________________________________________
b)______________________________________________________________
v) _________________________________________________________________
vi) a)______________________________________________________________
b)______________________________________________________________
vii) ________________________________________________________________
__________________________________________________________________
2.22 Section 2
Mathematics 536 Section 2
c) The height from which a basketball is dropped and the number of bounces.
Figure 2.19
Number of 10 13 15 16 17 17 18 18 19
bounces
Height (number of 2 4 6 8 10 12 14 16 18
concrete blocks)
i) independent variable___________________
dependent variable____________________
x
ii) ________________________________________________________________
iv) a)______________________________________________________________
b)______________________________________________________________
v) _________________________________________________________________
vi) a)______________________________________________________________
b)______________________________________________________________
vii) ________________________________________________________________
__________________________________________________________________
Section 2 2.23
Mathematics 536 Section 2
d) Number of days without electricity and the number of candles burnt during the
ice storm.
Figure 2.20
Number of 2 5 10 1 17 8 5 4 10 4 3 8
candles burnt
Number of days 1 3 4 2 7 3 2 1 5 6 11 7
without electricity
i) independent variable___________________
dependent variable____________________
x
ii) ________________________________________________________________
iv) a)______________________________________________________________
b)______________________________________________________________
v) _________________________________________________________________
vi) a)______________________________________________________________
b)______________________________________________________________
vii) ________________________________________________________________
________________________________________________________________
2.24 Section 2
Mathematics 536 Section 2
Figure 2.21
Age of Toyota 10 8 2 15 1 3 4
Tercel (years)
Price ($) 4500 7000 14000 2500 15000 10500 8000
i) independent variable___________________
dependent variable____________________
x
ii) ________________________________________________________________
iv) a)______________________________________________________________
b)______________________________________________________________
v) _________________________________________________________________
vi) a)______________________________________________________________
b)______________________________________________________________
vii) ________________________________________________________________
__________________________________________________________________
Section 2 2.25
Mathematics 536 Section 2
Figure 2.22
) ) )
) )
b) The level of education achieved by an adult and the marks he/she obtained
when in Grade 7. ________________________________________________
_____________________________________________________________
c) The number of apples on an apple tree during a summer and the number of
days of sunshine. _______________________________________________
_____________________________________________________________
2.26 Section 2
Mathematics 536 Section 2
_____________________________________________________________
Figure 2.23
) ) )
) ) )
Section 2 2.27
Mathematics 536 Section 2
Figure 2.24
Conclusion Figures
1. No apparent correlation or relation between the
variables. Variables are independent
6. If the coefficient of correlation is zero, does this necessarily mean that the two
variables are independent of each other? _____________________________
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
7. What does the line of regression mean in a situation with two variables? _____
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
2.28 Section 2
Mathematics 536 Section 2
There is a formula for calculating the linear dependence between the variables x and
y. Effectively, this is an algebraic method for calculating r, the coefficient of correlation,
without recourse to a scattergram. Further, by this formula, the sign (+/-) of r is
automatically determined.
where
n: number of data points
xy: the sum of the products of x and y for each pair
x: the sum of the values of the independent variable x
y: the sum of the values of the dependent variable y
x 2: the sum of the squares of the variable x
y:2 the sum of the squares of the variable y
(x)2: the square of the sum of the values of the variable x
(y)2: the square of the sum of the values of the variable y
Section 2 2.29
Mathematics 536 Section 2
x y x2 y2 xy
1 3
11 18
3 6
9 15
5 9
7 12
x= y= x 2= y 2= xy=
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
2.30 Section 2
Mathematics 536 Section 2
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
2. The table below shows the number of years of education (x) and the salaries
(y), in thousands of dollars, of 8 randomly chosen women from the Eastern
Townships.
Figure 2.26
x y x2 y2 xy
12 16
16 20
19 25
10 12
13 9
16 30
12 9
12 8
x= y=
Section 2 2.31
Mathematics 536 Section 2
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
_____________________________________________________________
2.32 Section 2