Sei sulla pagina 1di 6

GV207 Political Analysis, Week 10

Department of Government, University of Essex

Multiple Regression

Introduction:
Last week, we used bivariate regression to assess the effect of one independent variable (X) on a
dependent variable (Y). However, in the real world Y is typically influenced by many different
independent variables, which are likely to be correlated. Multiple regression allows us to assess the
effects of many different independent variables (X1, X2, X3 etc.) on the same dependent variable (Y)
at the same time.
Important note: Remember from last week that the dependent variable (Y) of a regression must be
interval level. The independent variables (Xs) can be interval or dummy.
Why do we need to control for other variables?
Why do we need to include all independent variables into the same multiple regression model? Why
dont we just run a series of bivariate regressions instead?
The reason is that including additional independent variables into our regression allows us to control
for the effects of those variables. This is important, since our independent variables are typically
correlated with each other. If we are interested in the effect of X1 on Y, then we also need to include
X2 and X3 if they also influence the dependent variable and are correlated with X1. Not including
them would cause X1 to capture also parts of the effects of X2 and X3. In this case, the coefficient
estimate of X1 would be biased. This bias that results from the omission of a relevant variable in our
model is called omitted variable bias.
Lets see what happens if we dont include all relevant independent variables in our regression model.
In the following, we run three regressions with the share of women in parliament as the dependent
variable. The first one uses GDP per capita to explain the share of women in parliament (you may
recognise this model from last week). The second uses the democracy dummy variable by to explain
the share of women in parliament. And the third one is a multiple regression, which uses GDP per
capita, the democracy dummy variable, and the share of students enrolled in education to explain the
share of women in parliament.
. regress women2000 gdppc2000
Source

SS

df

MS

Model
Residual

2627.15635
8565.94689

1
139

2627.15635
61.6255172

Total

11193.1032

140

79.9507374

women2000

Coef.

gdppc2000
_cons

.0004655
8.393485

Std. Err.

.0000713
.9063686

6.53
9.26

Number of obs
F( 1,
139)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

141
42.63
0.0000
0.2347
0.2292
7.8502

P>|t|

[95% Conf. Interval]

0.000
0.000

.0003246
6.601433

.0006065
10.18554

GV207 Political Analysis, Week 10

Department of Government, University of Essex

. regress women2000 aclp_democ2000


Source

SS

df

MS

Model
Residual

693.492558
14594.2217

1
187

693.492558
78.0439662

Total

15287.7142

188

81.3176289

women2000

Coef.

aclp_democ2000
_cons

3.915333
8.768

Std. Err.

Number of obs
F( 1,
187)
Prob > F
R-squared
Adj R-squared
Root MSE

1.313462
1.020091

2.98
8.60

=
=
=
=
=
=

189
8.89
0.0033
0.0454
0.0403
8.8342

P>|t|

[95% Conf. Interval]

0.003
0.000

1.324226
6.755634

6.506441
10.78037

. regress women2000 gdppc2000 aclp_democ2000 educ2001


Source

SS

df

MS

Model
Residual

3095.09093
8098.01231

3
137

1031.69698
59.1095789

Total

11193.1032

140

79.9507374

women2000

Coef.

gdppc2000
aclp_democ2000
educ2001
_cons

.0002838
.5907561
.1158674
1.612113

Std. Err.

Number of obs
F( 3,
137)
Prob > F
R-squared
Adj R-squared
Root MSE

.0000953
1.483168
.0433338
2.568476

2.98
0.40
2.67
0.63

P>|t|
0.003
0.691
0.008
0.531

=
=
=
=
=
=

141
17.45
0.0000
0.2765
0.2607
7.6883

[95% Conf. Interval]


.0000954
-2.342106
.0301779
-3.466871

.0004722
3.523618
.201557
6.691097

Question: Compare the effects of GDP per capita and the democracy dummy variable in the bivariate
regression models and in the multiple regression model. What has happened?

We see that by just looking at the bivariate regressions, we would get biased coefficient estimates for
GDP per capita and the democracy variable, which would strongly overestimate the effects of these
two variables. Thus, it is crucial that we control for other relevant independent variables that influence
Y and are correlated with GDP per capita or the level of democracy.
Note that by including additional variables in our regression model the interpretation of the
coefficients of the independent variables also slightly changes. They now indicate the change in the
dependent variable for a one unit increase in the independent variable, while holding all other
independent variables constant. Thus the coefficients now report the effects of the independent
variables while controlling for all the other variables we have included in our model. The p-values tell
us whether the respective variables have a significant effect on the dependent variable, while
controlling for all other independent variables.
Also note that the value of R2 now reports the share of the variance in the dependent variable that can
be explained by all the independent variables together.
The multiple regression model:
Although the last example made use of just three independent variables, we can easily include any
number of additional independent variables in our multiple regression model. As in the bivariate
regression case, we can express the dependent variable of our multiple regression as a linear function
of the independent variables:
2

GV207 Political Analysis, Week 10

Department of Government, University of Essex

Using this equation we can again use the coefficient estimates combined with a set of hypothetical
values of our independent variables to make predictions about the expected value of Y for a given
scenario of X values.
Question: Write down the regression equation for the multiple regression model that used GDP per
capita, democracy, and the share of students enrolled in education to explain the share of women in
parliament.

Question: Given this equation

by how much will the % of women in parliament change if GDPPC increases by $10000?

by how much will the % of women in parliament change if a country is democratic instead of
autocratic?

what % of women in parliament would we expect in a country that is autocratic and has a
GDPPC of $0 and a share of students enrolled in education of 0%?

what % of women in parliament would we expect in a country that is democratic with a GDPPC
of $10000 and a share of students enrolled in education of 70%?

Multiple regression with dummy variables:


As we have seen above for the example of the democracy variable, we can easily incorporate dummy
variables into our regression models. The coefficient of the dummy variable tells us by how much the
dependent variable changes if the dummy variable takes on a value of 1 instead of a value of 0.
In addition, dummy variables can be used to include nominal and ordinal variables with more than
two categories into our regression models (think about why we cant include these variables in their
original form). We can transform any nominal or ordinal variable into a set of dummy variables.
Lets use the Freedom House 3-category variable (fhcat2000) to see how this works. We can use the
tab command together with its generate option to transform the original fhcat2000 variable into three
dummy variables with the names fhdum1, fhdum2 and fhdum3. (Note that we get three dummy
variables because the original fhcat2000 has three categories.)
. tab fhcat2000, generate(fhdum)
freedom
house
category of
democracy
2000

Freq.

Percent

Cum.

free
partly free
not free

86
52
53

45.03
27.23
27.75

45.03
72.25
100.00

Total

191

100.00

GV207 Political Analysis, Week 10

Department of Government, University of Essex

Notice that Stata has created three new dummy variables: fhdum1, fhdum2 and fhdum3. fhdum1
takes on a value of 1 if fhcat2000 is equal to free and a value of 0 otherwise; fhdum2 takes on a
value of 1 if fhcat2000 is equal to partly free and a value of 0 otherwise; and fhdum3 takes on a
value of 1 if fhcat2000 is equal to not free and a value of 0 otherwise.
. list country fhcat2000 fhdum1 fhdum2 fhdum3

1.
2.
3.
4.
5.

country

fhc~2000

fhdum1

fhdum2

fhdum3

Afghanistan
Albania
Algeria
Andorra
Angola

not free
partly f
not free
free
not free

0
0
0
1
0

0
1
0
0
0

1
0
1
0
1

We can now add these dummy variables to our multiple regression in order to see whether the level of
freedom has an effect on the share of women in parliament if controlling for GDP per capita and the
share of students enrolled in education. (We exclude the aclp_democ2000 variable from the model,
since it measures roughly the same thing as the fhcat2000 variable.)
To assess the effect of fhcat2000 on the share of women in parliament, we can either include just one
of the newly created dummy variables or we can include all of them except for one. Note that we
cannot include all dummy variables, since one of them has to serve as the reference category against
which we interpret the coefficients of the other dummy variables.
In the following regression, we only include the fhdum1 dummy variable.
Question: How do you interpret the coefficient of the fhdum1 variable?

. regress women2000 gdppc2000 educ2001 fhdum1


Source

SS

df

MS

Model
Residual

3325.33302
7867.77022

3
137

1108.44434
57.4289797

Total

11193.1032

140

79.9507374

women2000

Coef.

gdppc2000
educ2001
fhdum1
_cons

.0002246
.0933386
3.325281
2.49

Std. Err.

.0000981
.0438816
1.627918
2.534783

2.29
2.13
2.04
0.98

Number of obs
F( 3,
137)
Prob > F
R-squared
Adj R-squared
Root MSE

141
19.30
0.0000
0.2971
0.2817
7.5782

P>|t|

[95% Conf. Interval]

0.024
0.035
0.043
0.328

.0000306
.0065657
.1061845
-2.52236

In the next regression, we include both the fhdum1 and the fhdum3 variable.
Question: How do you interpret the coefficients of the two dummy variables?

=
=
=
=
=
=

.0004186
.1801115
6.544378
7.50236

GV207 Political Analysis, Week 10

Department of Government, University of Essex

. regress women2000 gdppc2000 educ2001 fhdum1 fhdum3


Source

SS

df

MS

Model
Residual

3372.69196
7820.41129

4
136

843.172989
57.5030242

Total

11193.1032

140

79.9507374

women2000

Coef.

gdppc2000
educ2001
fhdum1
fhdum3
_cons

.0002249
.0941024
3.983734
1.621946
1.765097

Std. Err.

.0000982
.043918
1.783245
1.787232
2.65922

2.29
2.14
2.23
0.91
0.66

Number of obs
F( 4,
136)
Prob > F
R-squared
Adj R-squared
Root MSE

P>|t|
0.023
0.034
0.027
0.366
0.508

=
=
=
=
=
=

141
14.66
0.0000
0.3013
0.2808
7.5831

[95% Conf. Interval]


.0000308
.0072519
.457259
-1.912413
-3.493672

.0004191
.1809528
7.510209
5.156306
7.023866

Comparing the strength of the effects of different variables:


The coefficient estimates provide us with information about the form of the relationships between our
independent variables and the dependent variable, while the p-values tell us whether the effects of our
independent variables are significant. However, we may also be interested in the strength of the
effects of our independent variables. Unfortunately, we normally cannot look at the size of the
coefficient estimates to see which of the independent variables has the strongest effect because our
independent variables are typically measured in different units.
However, we can use so-called standardised coefficients (betas) to compare the effects of our
variables. The standardised coefficients are measured in the same units, i.e. standard deviations. Thus,
they tell us by how many standard deviations the dependent variable will change if we increase the
independent variable by 1 standard deviation. To get standardised coefficients, we simply have to add
the beta option to our regress command:
. regress women2000 gdppc2000 educ2001 fhdum1, beta
Source

SS

df

MS

Model
Residual

3325.33302
7867.77022

3
137

1108.44434
57.4289797

Total

11193.1032

140

79.9507374

women2000

Coef.

gdppc2000
educ2001
fhdum1
_cons

.0002246
.0933386
3.325281
2.49

Std. Err.
.0000981
.0438816
1.627918
2.534783

t
2.29
2.13
2.04
0.98

Number of obs
F( 3,
137)
Prob > F
R-squared
Adj R-squared
Root MSE

=
=
=
=
=
=

141
19.30
0.0000
0.2971
0.2817
7.5782

P>|t|

Beta

0.024
0.035
0.043
0.328

.2337447
.2143155
.1863789
.

Question: Interpret the standardised coefficients of the three variables. Which of them has the
strongest effect on the share of women in parliament?

GV207 Political Analysis, Week 10

Department of Government, University of Essex

Stata exercise:
As in the last few weeks we will be using the data set Democracy small.dta.
1. Run a bivariate regression with the share of students enrolled in education (educ2001) as the
dependent variable and government spending (cengov2000) as the independent variable. Note
that this is the same regression you ran last week.
2. Interpret the coefficient of the cengov2000 variable. Is it significant?
3. Now run a multiple regression with the share of students enrolled in education (educ2001) as the
dependent variable and government spending (cengov2000), GDP per capita (gdppc2000) and
democracy (aclp_democ2000) as the independent variables.
4. Again interpret the coefficient of the cengov2000 variable. Is it still significant? What has
happened and why?
5. By how much does the share of students enrolled in education change if GDP per capita increases
by $1000? By how much does the share of students enrolled in education change if a country is
democratic instead of autocratic?
6. Compare the R2 between the two regression model. Which model explains a larger share in the
variance of the share of students enrolled in education?
7. Lets say we are interested in whether the share of students enrolled in education is significantly
different in Africa, after controlling for government spending, GDP per capita and democracy.
Create a set of region dummy variables using the region variable. Include the dummy variable for
Africa in your model. By how much is the share of students enrolled in education lower or higher
in Africa compared to other regions?

Potrebbero piacerti anche