00 mi piace00 non mi piace

47 visualizzazioni32 pagineMar 05, 2008

© Attribution Non-Commercial (BY-NC)

PDF, TXT o leggi online da Scribd

Attribution Non-Commercial (BY-NC)

47 visualizzazioni

00 mi piace00 non mi piace

Attribution Non-Commercial (BY-NC)

Sei sulla pagina 1di 32

2.1 Correlation

Clive is still trying to work out what happened to him. Why was he refused

in his first application? Why him? What were the reasons for this

temporary reverse? Remember that his cousin Tommy was accepted first

time.

previous years”, he thought, “but the teachers seem to be going nuts this

year - tests, assignments, presentations, homework - it never stops.”

Still, two questions keep coming back to him: Could I have done better? Is

there a direct link between the number of hours one studies and academic

results? He had always believed that the quality of one’s studies

(concentration while studying, for example) was a more important factor

than the quantity. He always seemed to be on top of things, even if he

recognised that he could sometimes push himself a bit harder.

In this section, you will learn how to determine if a link may exist

between two characteristics of a situation (two variables). This idea is

now studied at the high school level, whereas before, it was only

encountered at CEGEP.

In this section, you will have to construct a distribution table for two

variables, construct a scattergram, find and interpret the coefficient

of correlation, and draw a line of regression and find its equation.

Distribution Tables

all the students in his mathematics class about the number of hours they

each spent studying for the mid-semester exam and the results they

obtained.

Section 2 2.1

Mathematics 536 Section 2

Michel (7, 81) Roxanne (5, 75) Talia (6, 81) Jen (2, 53)

Dave (5, 73) Mary (3, 76) Martin (5, 78) Jocelyn (4, 78)

Andrée (3, 61) John (4, 65) Clive (4, 72) Sylvia (10, 88)

Sophie (6, 76) Mario (5, 78) Andrew (8, 75) Peter (4, 75)

Danny (10, 96) Pascal (3, 63) Yan (5, 68) Alan (5, 83)

Karen (4, 74) Philip (2, 70) Felix (5, 63) Audrey (9, 92)

Melissa (8, 84) Isabelle (4, 89) Marie (3, 53) Charlie (2, 55)

Renée (10, 96) Marc (8, 99) Paul (3, 55) Alex (10, 90)

The second represents the exam result.

each element of the sample, two pieces of data which we record in the

form of an ordered pair (x, y). x represents the independent variable and

y the dependent variable. Obviously, the order of x and y must not be

reversed. It is sometimes difficult to determine the dependence between

two variables. In such a case, we try to decide which quantity affects

the value of the other.

In order to see the overall picture, Clive made a Distribution table (or a

table of values).

2.2 Section 2

Mathematics 536 Section 2

Time (h) 7

Mark (%) 81

Time (h)

Mark (%)

You can now construct the graph associated with this situation.

Figure 2.3

Section 2 2.3

Mathematics 536 Section 2

- give your graph an explanatory title

- labelled both axes

- put a scale on both axes

- entered a point for each ordered pair

Here is what your graph should look like.

Figure 2.4

(%)

90

75

60

45

1 2 3 4 5 6 7 8 9 10

Time (h)

dispersion diagram. It represents the set of values from a population or a

sample. Such a representation is possible if the characteristics

(variables) under consideration are quantitative (discrete or continuous)

or, at least, “orderable”. (For example: school marks which are given as

letter grades - A, B, C, D etc)

2.4 Section 2

Mathematics 536 Section 2

if its possible values are single values, most often whole numbers.

Example: The number of windows in a house.

continuous, if the possible values can be any of the values in a given

interval. Example: The time taken to get home can not just be 14

minutes or 15 minutes, but could be 14 minutes and 37 seconds.

_________________________________________________________

ON ENTER ENTER

L1 ENTER L2

ENTER ENTER

(You have just determined the type of graph, the parameters determining the axes and

the form of the points (choices are square, cross or dot).

Determine the size of the window. For this, it is necessary to remember the

examination results. To define the window range for each parameter, you must

consider the highest and lowest values for each variable. Furthermore, the scale

chosen depends on the number of “steps” you want on each axis - preferably between

5 and 10 - in passing from lowest to highest.

For y min = 53; max = 99 so range = 99 - 53 = 46 so use scale = 5?

Section 2 2.5

Mathematics 536 Section 2

Figure 2.5

Looking at the scattergram, are you capable of stating yet that the study time has

influenced a little, somewhat or a lot the results in the math test?____________

_____________________________________________________________

In fact, you still don’t have sufficient tools to be able to determine if such a relation

exists. There is, however, a measure which would allow you to evaluate precisely such

a relation. It is called the coefficient of correlation. Its usefulness in a wide range

of areas should convince you of its value and that its place in this course is justified.

Coefficient of correlation

Looking at the scattergram, you probably notice that the points follow a certain pattern.

They are grouped, to a certain extent, around an imaginary line.

From left to right, are the points in the scattergram ascending or descending?

__________________________________________

Yes, they ascend from left to right. Consequently, we say that the correlation is

positive.

Correlation represents the relation which exists between two variables. In fact, the

correlation measures the strength of the linear relation. It can be strongly or

weakly positive, strongly or weakly negative, or simply non-existent.

_____________________________________________________________

_____________________________________________________________

2.6 Section 2

Mathematics 536 Section 2

In fact, the greater the time spent studying, the higher the mark achieved. You can see

that the two variables go in the same direction. In such a case, we say that the

correlation is positive. As well as determining the sign (+ or -) of the correlation, it is

possible to evaluate its strength. To do this, we draw an ellipse (a type of curve

somewhat in the shape of an egg, but symmetrical in both directions - an oval).

What is the length of the major axis (L) of the ellipse in figure

2.4?_________________

l

r=1 − = 1 - ____ = ______

L

For the moment, this number doesn’t mean very much to you, but continue and you will

see its application!

The coefficient of correlation (r) can be quickly calculated using the formula

l

r=" 1 − where L and l are the lengths of the major and minor axes of the

L

ellipse which surrounds, as closely as possible, the set of points in the scattergram.

The coefficient of correlation is a measure of the degree of dependence between two

variables of a statistical or probabilistic nature.

Figure 2.6

Section 2 2.7

Mathematics 536 Section 2

Look carefully at the preceding formula for the coefficient of correlation and answer the

following questions.

l

What does represent?___________________________________

L

______________________________________________________

l

In fact, , represents the ratio between the lengths of the minor and major axes. It tells

L

you the number of times the minor axes is included in the major axis. For example, if

l 1

= , then L is four times l, or l is one quarter of L.

L 4

l

What is the largest possible value of

? _____________________

L

Justify your answer.______________________________________________

_____________________________________________________________

(r)?________________________

l

Did you find that the largest possible value of is 1? Well done! This happens when

L

the major axis has the same length as the minor axis. The ellipse, in this case, would

take the form of a circle. The coefficient of correlation would be r = ±(1 - 1) = 0.

Independent of the sign, 0 is the smallest possible value of r. In this case, we would

say that there is no correlation between the two variables being considered.

l

What is the smallest possible value of

? ___________________

L

Justify your answer._______________________________________________

_____________________________________________________

2.8 Section 2

Mathematics 536 Section 2

(r)?________________________

l

The smallest possible value for is 0. This happens when l = 0.

L

In such a case, the ellipse is so “flattened out” that it becomes a straight line. There is

then said to be a perfect linear correlation.

Figure 2.7

No correlation

strong - 0.4 0 0.4 strong 1

-1

Maximum None Maximum

Can you now say, taking into account what you have learned, that the length of

time spent studying has a small, medium or large effect, or has no effect at all,

on the results in the mathematics examination?

_____________________________________________________________

__________________________________________________________________

__________________________________________________________________

We can say that, since the correlation coefficient r 0.6, the results are strongly

influenced by the amount of time spent studying. In such a case, we can talk about a

cause and effect relationship. However, it can happen that two variables having a

strong linear correlation are not linked by a causal relationship. This is a question of

judgement. For example, if, in a survey, we find that there is a strong linear correlation

(r 0.85) between the wearing long pants and the ability to read well, could we

conclude that it is enough to wear long pants to be able to read well? You must rely on

the important law of common sense to avoid falling into such a trap. There is always a

danger in statistical studies of interpreting erroneously the results of an analysis. Even

professionals are not protected from the risk of making a false interpretation, and this

can result in the making of very unfortunate decisions.

Section 2 2.9

Mathematics 536 Section 2

Further, the opposite effect can happen: we can decide that there is no relation of

correlation when, in fact, there is a correlation, but one that you have not considered

(or, in the case of this class, that we have not studied). Read the following box to learn

of some examples of these phenomena.

In certain cases, there may be no linear correlation, but there may be another sort of

relation. These situations, which are just as realistic, are not studied in this course.

Here are a couple of examples.

2.10 Section 2

Mathematics 536 Section 2

r -1 • Strongly negative correlation

• The points in the scattergram are perfectly aligned

y and descending from left to right

• No “spread” of the points

• Inverse dependence

y y

• The points in the scattergram, which descend from

left to right, may be considerably spread around an

imaginary line (r = -0.4), but tend to be less

dispersed as the coefficient approaches r = -0.9.

• To a greater or lesser degree, there is an inverse

r = - 0.4 x r = - 0.7 dependence

y • No correlation

r 0 • The points in the scattergaram do not follow a line

• The variables are completely independent of one

another

=0

• The points in the scattergram, which climb from

y y

left to right, may be considerably spread around an

imaginary line (r = 0.4), but tend to be less

dispersed as the coefficient approaches r = 0.9.

• Dependence in the same direction between the

variables

r = 0.7 x r = 0.5

• The points in the scattergram are perfectly

aligned and ascending from left to right

• No “spread” of the points

• Dependence in the same direction

Section 2 2.11

Mathematics 536 Section 2

Check if you have completely mastered the idea of the coefficient of correlation.

Figure 2.11

Heart rate at rest and training time

Heart

Rate

85

80

75

70

65

60

55

50

45

40

0 5 10 15 20 25 30 35

Weekly training time (h)

Describe in your own words the relation between the two variables.

To do this, find the value of the coefficient and interpret the result.

______________________________________________________

______________________________________________________________

______________________________________________________________

2.12 Section 2

Mathematics 536 Section 2

3 .4 cm

You should have found r -0.71 because r = 1 − . − 0 .7 1 .

11. 8 cm

Note the use of the symbol for an approximation.

Since the two characteristics are related in an inverse variation (negative correlation),

you can conclude that an increase in training time is associated with a strong

reduction in heartbeat rate at rest.

You have seen how to measure the strength of the interdependence between two

variables linked by a linear relation (a straight line). Can you now use this line to

predict other values? For example, if a person who wants to remain in good health is

advised to maintain his heartbeat at 60 per minute, how many hours should hew train

each week? To reply to this kind of question, you need one last idea: the line of

regression.

LINE OF REGRESSION

Reconsider Clive’s scattergram (test results vs. Study time).

Mark

(%)

90

75

60

45

1 2 3 4 5 6 7 8 9 10 Time (hr)

Section 2 2.13

Mathematics 536 Section 2

_____________________________________________________________

_____________________________________________________________

These points represent the “ideal” points for this situation, or those which best

represent the relation between the two variables. So, with the help of this line of

regression, you can predict the mark from the number of hours studied, and vice

versa.

a) the mark for a student who has studied for 8.5 hours ______

So you have found that “if the trend continues”, the first student would have scored

about 89% by studying 8.5 hours and the second must have studied about 5 hours to

obtain a mark of 75%. Don’t forget that these answers are approximate since the

drawing and reading the line is somewhat imprecise. There is a way to increase the

precision of the results by finding the equation of the line.

How can you find the equation of a straight line given its graph?____________

_____________________________________________________________

_____________________________________________________________

Since you studied this last year and reviewed it this year, you should have

remembered that the equation of a line is y = ax + b where a and b are the parameters

(they are not the variables) which determine the line. The equation can be determined

from two points, from the rate of change and one point or from the rate of change and

the initial value.

2.14 Section 2

Mathematics 536 Section 2

describe the relation between the two variables. From

these, you can determine the rate of change. The equation of

the line of regression is that of a direct or partial linear

variation: y = ax + b where a is the rate of change and b is

the initial value (the y-value at the point of intersection

with the vertical axis).

y Figure 2.13

dependent

variable

P2

y2 - y1

P1

x2 - x1

(0, b)

must:

y2 − y1

1. Find the rate of change (slope) using the formula a =

x2 − x1

found in 1) and x and y (from a point on the line) in the equation

y = ax + b.

Section 2 2.15

Mathematics 536 Section 2

Mark

(%)

90

75

60

45

1 2 3 4 5 6 7 8 9 10 Time (hr)

On the graph above, draw the line of regression, choose two points on the line

and use them to find the values of a and b and to give the equation of the line.

Show your work, then compare to the answer given below.

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

The line passes through the points (1, 58) and (11, 98) which are the vertices of the

ellipse surrounding the scatter of points.

y2 − y1

Slope: (1, 58) and (11, 98) a =

x2 − x1

98 - 58

= =4

11 - 1

Initial value: a = 4 and (1, 58). so y = ax + b ⇒ 58 = 4(1) + b ⇒ b = 54

2.16 Section 2

Mathematics 536 Section 2

In the relation between the number of hours of study and the mark obtained,

what is the interpretation of the numbers 4 and 54 in the equation?

_____________________________________________________________

_____________________________________________________________

So far, you have learned how to calculate the value of the coefficient of correlation and

to find the equation of the line of regression. Note, however, that these values are

approximate and depend on the accuracy with which they are calculated. The

graphics calculator allows you to calculate these same values, but in less time and

with more accuracy.

regression using the graphics calculator, you must:

- clear the lists you intend to use (eg. L1 and L2) and enter the values

- press and

STAT ENTER

CALC L1 ,

LINREG (aX + b)

L2 ENTER

.

N.B. The values shown are: a for the rate of change, b for the initial value

and r for the coefficient of correlation

Virginia is in the same class as Clive, but was absent for the test. She wants to obtain

a mark of exactly 82%.

Calculate the exact number of hours she must spend studying in order to be

fairly certain to obtain a mark of exactly 82%.

_____________________________________________________________

_____________________________________________________________

__________________________________________________________________

__________________________________________________________________

__________________________________________________________________

__________________________________________________________________

__________________________________________________________________

Section 2 2.17

Mathematics 536 Section 2

If you cannot work out how to use the graphics calculator, consult the User’s Guide,

ask a fellow-student or see your teacher. Now that you have these tools to measure

the strength of the relation between variables, you must be careful in interpreting these

values. The two most common errors are in ignoring the effect of a third variable when

this has an effect on the first two, and in drawing conclusions from a sample which is

too small.

By the way, did you find that Virginia needs to study 7 hours in order to obtain a mark

of 82%?

_____________________________________________________________________________

The calculator also allows you to find other correlations which are not linear. By

comparing the coefficients of correlation, you can determine which correlation is the

most appropriate.

_________________________________________________________________

2.18 Section 2

Mathematics 536 Section 2

2.2 Practise

1. For each of the scattergrams in figure 2.16 (on the next page), state whether the

linear correlation is positive, negative or non-existent. If there is a linear

correlation, indicate if it is strong. If there is a non-linear correlation, state what

kind.

Figure 2.15

Situation positive negative non-existent strong correlation

a

b

c

d

e

f

g

h

i

j

k

l

Section 2 2.19

Mathematics 536 Section 2

Figure 2.16

y y y y

a) b) c) d)

y x x x x

y y y

e) f) g) h)

x y x y x x

y y

i) j) k) l)

x x x x

iv) find the value of the linear coefficient of correlation in two different ways

vi) find the equation of the line of regression in two different ways and

compare the answers

vii) calculate an ordered pair which fits to the equation and interpret the

meaning of the pair

2.20 Section 2

Mathematics 536 Section 2

a) Number of hours per week spent playing a sport and watching television.

Figure 2.17

Number of hours

spent playing a 0 10 15 20 5 8 35 12

sport (per week)

Number of hours

spent watching 25 20 10 2 15 14 3 8

television (week)

i) independent variable___________________

dependent variable____________________

x

ii) ________________________________________________________________

iv) a)______________________________________________________________

b)______________________________________________________________

v) _________________________________________________________________

vi) a)______________________________________________________________

b)______________________________________________________________

vii) ________________________________________________________________

__________________________________________________________________

Section 2 2.21

Mathematics 536 Section 2

Figure 2.18

Engine capacity 1900 1700 2500 1500 1800 2200 3300

(cm3)

Gas consumption 8.2 7.4 10.1 7.2 8.3 9.5 14.5

(L/100 km)

i) independent variable___________________

dependent variable____________________

x

ii) ________________________________________________________________

iv) a)______________________________________________________________

b)______________________________________________________________

v) _________________________________________________________________

vi) a)______________________________________________________________

b)______________________________________________________________

vii) ________________________________________________________________

__________________________________________________________________

2.22 Section 2

Mathematics 536 Section 2

c) The height from which a basketball is dropped and the number of bounces.

Figure 2.19

Number of 10 13 15 16 17 17 18 18 19

bounces

Height (number of 2 4 6 8 10 12 14 16 18

concrete blocks)

i) independent variable___________________

dependent variable____________________

x

ii) ________________________________________________________________

iv) a)______________________________________________________________

b)______________________________________________________________

v) _________________________________________________________________

vi) a)______________________________________________________________

b)______________________________________________________________

vii) ________________________________________________________________

__________________________________________________________________

Section 2 2.23

Mathematics 536 Section 2

d) Number of days without electricity and the number of candles burnt during the

ice storm.

Figure 2.20

Number of 2 5 10 1 17 8 5 4 10 4 3 8

candles burnt

Number of days 1 3 4 2 7 3 2 1 5 6 11 7

without electricity

i) independent variable___________________

dependent variable____________________

x

ii) ________________________________________________________________

iv) a)______________________________________________________________

b)______________________________________________________________

v) _________________________________________________________________

vi) a)______________________________________________________________

b)______________________________________________________________

vii) ________________________________________________________________

________________________________________________________________

2.24 Section 2

Mathematics 536 Section 2

Figure 2.21

Age of Toyota 10 8 2 15 1 3 4

Tercel (years)

Price ($) 4500 7000 14000 2500 15000 10500 8000

i) independent variable___________________

dependent variable____________________

x

ii) ________________________________________________________________

iv) a)______________________________________________________________

b)______________________________________________________________

v) _________________________________________________________________

vi) a)______________________________________________________________

b)______________________________________________________________

vii) ________________________________________________________________

__________________________________________________________________

Section 2 2.25

Mathematics 536 Section 2

The following exercises cover the basic material of section 2. They should give you an

indication of your grasp of the idea of linear dependence (correlation).

correlation (r) is positive, negative or zero.

Figure 2.22

) ) )

) )

positive, strongly or weakly negative or zero.

b) The level of education achieved by an adult and the marks he/she obtained

when in Grade 7. ________________________________________________

_____________________________________________________________

c) The number of apples on an apple tree during a summer and the number of

days of sunshine. _______________________________________________

_____________________________________________________________

2.26 Section 2

Mathematics 536 Section 2

_____________________________________________________________

month. ________________________________________________________

_____________________________________________________________

the next page to indicate the possible conclusions to be drawn from each figure.

Figure 2.23

) ) )

) ) )

Section 2 2.27

Mathematics 536 Section 2

Figure 2.24

Conclusion Figures

1. No apparent correlation or relation between the

variables. Variables are independent

circular form

4. Marked positive correlation

parabolic form

6. Weak linear correlation. The relation is more

probably exponential

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

6. If the coefficient of correlation is zero, does this necessarily mean that the two

variables are independent of each other? _____________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

7. What does the line of regression mean in a situation with two variables? _____

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

correlation between two variables? __________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

2.28 Section 2

Mathematics 536 Section 2

In the first part of this section, you learnt how to calculate approximately the

coefficient of correlation. The method used required that a scattergram be drawn.

For this reason, it is called a graphical method. Consequently, the value obtained for r

is not necessarily exact.

There is a formula for calculating the linear dependence between the variables x and

y. Effectively, this is an algebraic method for calculating r, the coefficient of correlation,

without recourse to a scattergram. Further, by this formula, the sign (+/-) of r is

automatically determined.

r =

− (x) ∗ − (y)

2 2 2 2

nx ny

where

n: number of data points

xy: the sum of the products of x and y for each pair

x: the sum of the values of the independent variable x

y: the sum of the values of the dependent variable y

x 2: the sum of the squares of the variable x

y:2 the sum of the squares of the variable y

(x)2: the square of the sum of the values of the variable x

(y)2: the square of the sum of the values of the variable y

Section 2 2.29

Mathematics 536 Section 2

x y x2 y2 xy

1 3

11 18

3 6

9 15

5 9

7 12

x= y= x 2= y 2= xy=

r =

___________________________________________________

r =

___________________________________________________

r =

___________________________________________________

r =

___________________________________________________

r =

___________________________________________________

r =

___________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

2.30 Section 2

Mathematics 536 Section 2

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

2. The table below shows the number of years of education (x) and the salaries

(y), in thousands of dollars, of 8 randomly chosen women from the Eastern

Townships.

Figure 2.26

x y x2 y2 xy

12 16

16 20

19 25

10 12

13 9

16 30

12 9

12 8

x= y=

Section 2 2.31

Mathematics 536 Section 2

b) Use the formula to calculate the coefficient of correlation r:

r =

___________________________________________________

r =

___________________________________________________

r =

___________________________________________________

r =

___________________________________________________

r =

___________________________________________________

r =

___________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

_____________________________________________________________

2.32 Section 2

## Molto più che documenti.

Scopri tutto ciò che Scribd ha da offrire, inclusi libri e audiolibri dei maggiori editori.

Annulla in qualsiasi momento.