Sei sulla pagina 1di 5

ETC1000/ETX9000 Business and Economic Statistics

Demonstration Lecture Week 5: The Multiple Linear Regression Model



This lecture provides examples of the material taught in this weeks lectures, to help
you see its potential for real world application, and to reinforce the ideas being
communicated.


Case Study: The Effect of TV on Children


Background

Children in high income countries like Australia are said to spend the second largest
chunk of their waking time watching television. That is, the most time-consuming
activity after attending school is watching TV. Even despite the widespread use of
computers and the Internet, TV remains the dominant form of media in childrens
lives.

Not surprisingly, the public and parents are concerned about potentially detrimental
effects of TV on child cognitive development. Some argue, however, that TV can be
beneficial to children, in that it provides exposure to language.

Whichever way the direction runs, the effect of TV watching on cognitive
development in early childhood is likely to have long-term lingering effects, which
may be crucial to human capital formation and inevitably labour market outcomes
later in life.

So just how detrimental is television to a young childs cognitive development? We
can answer this question using the linear regression model if we have suitable data.


Data

The National Longitudinal Survey of Youth offers rich information about the
demographic, cognitive, socio-emotional and physiological characteristics of children
and their parents. More specifically, mothers of school-aged children were
interviewed about how many hours of television their child watched in a typical week.
Both mother and child were also tested on their reading ability.

We will use this data for 8-year-old children.





A Simple Linear Regression Model for Child Reading Score

We can use this data to estimate a simple linear regression model for Childs Reading
Score as a function of hours of TV watched:

Y = childs reading score (for age) out of 100
X = hours of TV watched in an average week

Heres the Excel output we obtain:





What does this output tell us?

First, note:
R
2
is very poor
A standard error of 22.9 points out of 100 is quite high

Intercept: b
0
= 39.963
The average reading score amongst children who watch no TV during a typical week
is estimated to be 39.963 out of 100.

Hours TV: b
1
= 0.260
b
1
tells us the estimated effect on y when x is one unit higher. That is, take 2 children,
the first of whom watches 1 more hour of TV per week than the second. The model
predicts the first child to have a reading score 0.26 (out of 100) higher than the second
child who watches less TV. This is an interesting result, as it suggests that watching
TV may help children in their reading. Note, however, that practically speaking this
seems quite small.

But is the effect statistically small? Lets perform a hypothesis test.
Hypothesis test

1. Formulate Null and Alternative Hypotheses
0 :
1 0
H Watching TV has no impact on Childs Reading Score
0 :
1 1
H Watching TV does have an impact on Childs Reading Score

2. Decide a Significance Level
Test at 5% level of significance, i.e. = 0.05

3. Calculate the p-value
p-value = 2.32 x 10
-9

4. Make a Decision

The decision rule is to reject H
0
:
1
= 0 if the p-value < .

Since 2.32 x 10
-9
< 0.05, we reject H
0
and conclude that watching TV has no
impact on Childs Reading Score.



So, even though the coefficient on Hours TV is small, it is still statistically significant.
How can it be that the effect of TV can be practically not important, but statistically
very important?


A Multiple Linear Regression Model for Child Reading Score

Now suppose we add another explanatory variable into the model:

Y = childs reading score (for age) out of 100
X
1
= hours of TV watched in an average week
X
2
= mothers reading score out of 100

And we want to estimate the following model:

0 1 1 2 2 i i i i
Y X X e


Heres the output:





Some things to note from this output:
Both explanatory variables are highly significant (p-values very small),
meaning they each help to explain childs reading score
R
2
is better
A standard error of 21.4 points out of 100 is still quite high

How about our interpretations?


Intercept: b
0
= 17.932
The average reading score amongst children who watch no TV during a typical week,
and whose mother scored 0 in the reading test is estimated to be 17.932 out of 100.


Hours TV: b
1
= -0.364
b
1
tells us the estimated effect on y when x is one unit higher, holding all other x
variables constant. That is, take 2 children whose mothers have the same reading
score, the first of which watches 1 more hour of TV per week than the second. The
model predicts the first child to have a reading score 0.364 less than the second child.
Practically speaking this is still quite small but of the opposite sign!


Mothers Reading Score: b
2
= 0.795
b
2
tells us the estimated effect on y when x is one unit higher, holding all other x
variables constant. That is, take 2 children who watch the same amount of TV per
week, but the mother of the first scored 1 point higher on the reading test than the
mother of the second. The model predicts the first child to have a reading score 0.87
higher than the second child, on average. The effect in this case is quite large for
every extra point the mother scores on the reading test, the child scores almost as
much genetics appears to be a highly influential factor!

Note that the simple regression results suggested a positive effect of TV on reading
scores, but the multiple regression suggests a negative result. Why does the data tell a
different story when we include an additional explanatory variable?

Multiple regression allows us to look at the effect of TV on reading scores, once
genetics are taken into account. The coefficient on Hours TV changes direction if
some of what Hours TV is capturing in the simple regression case is really due to
mothers reading score.

Potrebbero piacerti anche