Multiple Regression: Department of Government, University of Essex GV207 - Political Analysis, Week 10

GV207 Political Analysis, Week 10
Department of Government, University of Essex
Multiple Regression
Introduction:
Last week, we used bivariate regression to assess the effect of one independent variable (X) on a
dependent variable (Y). However, in the real world Y is typically influenced by many different
independent variables, which are likely to be correlated. Multiple regression allows us to assess the
effects of many different independent variables (X1, X2, X3 etc.) on the same dependent variable (Y)
at the same time.
Important note: Remember from last week that the dependent variable (Y) of a regression must be
interval level. The independent variables (Xs) can be interval or dummy.
Why do we need to control for other variables?
Why do we need to include all independent variables into the same multiple regression model? Why
dont we just run a series of bivariate regressions instead?
The reason is that including additional independent variables into our regression allows us to control
for the effects of those variables. This is important, since our independent variables are typically
correlated with each other. If we are interested in the effect of X1 on Y, then we also need to include
X2 and X3 if they also influence the dependent variable and are correlated with X1. Not including
them would cause X1 to capture also parts of the effects of X2 and X3. In this case, the coefficient
estimate of X1 would be biased. This bias that results from the omission of a relevant variable in our
model is called omitted variable bias.
Lets see what happens if we dont include all relevant independent variables in our regression model.
In the following, we run three regressions with the share of women in parliament as the dependent
variable. The first one uses GDP per capita to explain the share of women in parliament (you may
recognise this model from last week). The second uses the democracy dummy variable by to explain
the share of women in parliament. And the third one is a multiple regression, which uses GDP per
capita, the democracy dummy variable, and the share of students enrolled in education to explain the
share of women in parliament.
. regress women2000 gdppc2000
Source
SS
df
MS
Model
Residual
2627.15635
8565.94689
1
139
2627.15635
61.6255172
Total
11193.1032
140
79.9507374
women2000
Coef.
gdppc2000
_cons
.0004655
8.393485
Std. Err.
.0000713
.9063686
6.53
9.26
Number of obs
F( 1,
139)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
141
42.63
0.0000
0.2347
0.2292
7.8502
P>|t|
[95% Conf. Interval]
0.000
0.000
.0003246
6.601433
.0006065
10.18554
. regress women2000 aclp_democ2000

Source
SS
df
MS
Model
Residual
693.492558
14594.2217
1
187
693.492558
78.0439662
Total
15287.7142
188
81.3176289
women2000
Coef.
aclp_democ2000
_cons
3.915333
8.768
Std. Err.
Number of obs
F( 1,
187)
Prob > F
R-squared
Adj R-squared
Root MSE
1.313462
1.020091
2.98
8.60
=
=
=
=
=
=
189
8.89
0.0033
0.0454
0.0403
8.8342
P>|t|
0.003
0.000
1.324226
6.755634
6.506441
10.78037
. regress women2000 gdppc2000 aclp_democ2000 educ2001

Source
SS
df
MS
Model
Residual
3095.09093
8098.01231
3
137
1031.69698
59.1095789
Total
11193.1032
140
79.9507374
women2000
Coef.
gdppc2000
aclp_democ2000
educ2001
_cons
.0002838
.5907561
.1158674
1.612113
Std. Err.
Number of obs
F( 3,
137)
Prob > F
R-squared
Adj R-squared
Root MSE
.0000953
1.483168
.0433338
2.568476
2.98
0.40
2.67
0.63
P>|t|
0.003
0.691
0.008
0.531
=
=
=
=
=
=
141
17.45
0.0000
0.2765
0.2607
7.6883

.0000954
-2.342106
.0301779
-3.466871
.0004722
3.523618
.201557
6.691097
Question: Compare the effects of GDP per capita and the democracy dummy variable in the bivariate
regression models and in the multiple regression model. What has happened?
We see that by just looking at the bivariate regressions, we would get biased coefficient estimates for
GDP per capita and the democracy variable, which would strongly overestimate the effects of these
two variables. Thus, it is crucial that we control for other relevant independent variables that influence
Y and are correlated with GDP per capita or the level of democracy.
Note that by including additional variables in our regression model the interpretation of the
coefficients of the independent variables also slightly changes. They now indicate the change in the
dependent variable for a one unit increase in the independent variable, while holding all other
independent variables constant. Thus the coefficients now report the effects of the independent
variables while controlling for all the other variables we have included in our model. The p-values tell
us whether the respective variables have a significant effect on the dependent variable, while
controlling for all other independent variables.
Also note that the value of R2 now reports the share of the variance in the dependent variable that can
be explained by all the independent variables together.
The multiple regression model:
Although the last example made use of just three independent variables, we can easily include any
number of additional independent variables in our multiple regression model. As in the bivariate
regression case, we can express the dependent variable of our multiple regression as a linear function
of the independent variables:
2
Using this equation we can again use the coefficient estimates combined with a set of hypothetical
values of our independent variables to make predictions about the expected value of Y for a given
scenario of X values.
Question: Write down the regression equation for the multiple regression model that used GDP per
capita, democracy, and the share of students enrolled in education to explain the share of women in
parliament.
Question: Given this equation
by how much will the % of women in parliament change if GDPPC increases by $10000?
by how much will the % of women in parliament change if a country is democratic instead of
autocratic?
what % of women in parliament would we expect in a country that is autocratic and has a
GDPPC of $0 and a share of students enrolled in education of 0%?
what % of women in parliament would we expect in a country that is democratic with a GDPPC
of $10000 and a share of students enrolled in education of 70%?
Multiple regression with dummy variables:

As we have seen above for the example of the democracy variable, we can easily incorporate dummy
variables into our regression models. The coefficient of the dummy variable tells us by how much the
dependent variable changes if the dummy variable takes on a value of 1 instead of a value of 0.
In addition, dummy variables can be used to include nominal and ordinal variables with more than
two categories into our regression models (think about why we cant include these variables in their
original form). We can transform any nominal or ordinal variable into a set of dummy variables.
Lets use the Freedom House 3-category variable (fhcat2000) to see how this works. We can use the
tab command together with its generate option to transform the original fhcat2000 variable into three
dummy variables with the names fhdum1, fhdum2 and fhdum3. (Note that we get three dummy
variables because the original fhcat2000 has three categories.)
. tab fhcat2000, generate(fhdum)
freedom
house
category of
democracy
2000
Freq.
Percent
Cum.
free
partly free
not free
86
52
53
45.03
27.23
27.75
45.03
72.25
100.00
Total
191
100.00
Notice that Stata has created three new dummy variables: fhdum1, fhdum2 and fhdum3. fhdum1
takes on a value of 1 if fhcat2000 is equal to free and a value of 0 otherwise; fhdum2 takes on a
value of 1 if fhcat2000 is equal to partly free and a value of 0 otherwise; and fhdum3 takes on a
value of 1 if fhcat2000 is equal to not free and a value of 0 otherwise.
. list country fhcat2000 fhdum1 fhdum2 fhdum3
1.
2.
3.
4.
5.
country
fhc~2000
fhdum1
fhdum2
fhdum3
Afghanistan
Albania
Algeria
Andorra
Angola
not free
partly f
not free
free
not free
0
0
0
1
0
0
1
0
0
0
1
0
1
0
1
We can now add these dummy variables to our multiple regression in order to see whether the level of
freedom has an effect on the share of women in parliament if controlling for GDP per capita and the
share of students enrolled in education. (We exclude the aclp_democ2000 variable from the model,
since it measures roughly the same thing as the fhcat2000 variable.)
To assess the effect of fhcat2000 on the share of women in parliament, we can either include just one
of the newly created dummy variables or we can include all of them except for one. Note that we
cannot include all dummy variables, since one of them has to serve as the reference category against
which we interpret the coefficients of the other dummy variables.
In the following regression, we only include the fhdum1 dummy variable.
Question: How do you interpret the coefficient of the fhdum1 variable?
. regress women2000 gdppc2000 educ2001 fhdum1

Source
SS
df
MS
Model
Residual
3325.33302
7867.77022
3
137
1108.44434
57.4289797
Total
11193.1032
140
79.9507374
women2000
Coef.
gdppc2000
educ2001
fhdum1
_cons
.0002246
.0933386
3.325281
2.49
Std. Err.
.0000981
.0438816
1.627918
2.534783
2.29
2.13
2.04
0.98
Number of obs
F( 3,
137)
Prob > F
R-squared
Adj R-squared
Root MSE
141
19.30
0.0000
0.2971
0.2817
7.5782
P>|t|
0.024
0.035
0.043
0.328
.0000306
.0065657
.1061845
-2.52236
In the next regression, we include both the fhdum1 and the fhdum3 variable.
Question: How do you interpret the coefficients of the two dummy variables?
=
=
=
=
=
=
.0004186
.1801115
6.544378
7.50236
. regress women2000 gdppc2000 educ2001 fhdum1 fhdum3

Source
SS
df
MS
Model
Residual
3372.69196
7820.41129
4
136
843.172989
57.5030242
Total
11193.1032
140
79.9507374
women2000
Coef.
gdppc2000
educ2001
fhdum1
fhdum3
_cons
.0002249
.0941024
3.983734
1.621946
1.765097
Std. Err.
.0000982
.043918
1.783245
1.787232
2.65922
2.29
2.14
2.23
0.91
0.66
Number of obs
F( 4,
136)
Prob > F
R-squared
Adj R-squared
Root MSE
P>|t|
0.023
0.034
0.027
0.366
0.508
=
=
=
=
=
=
141
14.66
0.0000
0.3013
0.2808
7.5831

.0000308
.0072519
.457259
-1.912413
-3.493672
.0004191
.1809528
7.510209
5.156306
7.023866
Comparing the strength of the effects of different variables:

The coefficient estimates provide us with information about the form of the relationships between our
independent variables and the dependent variable, while the p-values tell us whether the effects of our
independent variables are significant. However, we may also be interested in the strength of the
effects of our independent variables. Unfortunately, we normally cannot look at the size of the
coefficient estimates to see which of the independent variables has the strongest effect because our
independent variables are typically measured in different units.
However, we can use so-called standardised coefficients (betas) to compare the effects of our
variables. The standardised coefficients are measured in the same units, i.e. standard deviations. Thus,
they tell us by how many standard deviations the dependent variable will change if we increase the
independent variable by 1 standard deviation. To get standardised coefficients, we simply have to add
the beta option to our regress command:
. regress women2000 gdppc2000 educ2001 fhdum1, beta
Source
SS
df
MS
Model
Residual
3325.33302
7867.77022
3
137
1108.44434
57.4289797
Total
11193.1032
140
79.9507374
women2000
Coef.
gdppc2000
educ2001
fhdum1
_cons
.0002246
.0933386
3.325281
2.49
Std. Err.
.0000981
.0438816
1.627918
2.534783
t
2.29
2.13
2.04
0.98
Number of obs
F( 3,
137)
Prob > F
R-squared
Adj R-squared
Root MSE
=
=
=
=
=
=
141
19.30
0.0000
0.2971
0.2817
7.5782
P>|t|
Beta
0.024
0.035
0.043
0.328
.2337447
.2143155
.1863789
.
Question: Interpret the standardised coefficients of the three variables. Which of them has the
strongest effect on the share of women in parliament?
Stata exercise:
As in the last few weeks we will be using the data set Democracy small.dta.
1. Run a bivariate regression with the share of students enrolled in education (educ2001) as the
dependent variable and government spending (cengov2000) as the independent variable. Note
that this is the same regression you ran last week.
2. Interpret the coefficient of the cengov2000 variable. Is it significant?
3. Now run a multiple regression with the share of students enrolled in education (educ2001) as the
dependent variable and government spending (cengov2000), GDP per capita (gdppc2000) and
democracy (aclp_democ2000) as the independent variables.
4. Again interpret the coefficient of the cengov2000 variable. Is it still significant? What has
happened and why?
5. By how much does the share of students enrolled in education change if GDP per capita increases
by $1000? By how much does the share of students enrolled in education change if a country is
democratic instead of autocratic?
6. Compare the R2 between the two regression model. Which model explains a larger share in the
variance of the share of students enrolled in education?
7. Lets say we are interested in whether the share of students enrolled in education is significantly
different in Africa, after controlling for government spending, GDP per capita and democracy.
Create a set of region dummy variables using the region variable. Include the dummy variable for
Africa in your model. By how much is the share of students enrolled in education lower or higher
in Africa compared to other regions?

Multiple Regression: Department of Government, University of Essex GV207 - Political Analysis, Week 10

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Multiple Regression: Department of Government, University of Essex GV207 - Political Analysis, Week 10

Caricato da

Copyright:

Formati disponibili

GV207 Political Analysis, Week 10

Department of Government, University of Essex

[95% Conf. Interval]

GV207 Political Analysis, Week 10

Department of Government, University of Essex

. regress women2000 aclp_democ2000

[95% Conf. Interval]

. regress women2000 gdppc2000 aclp_democ2000 educ2001

[95% Conf. Interval]

GV207 Political Analysis, Week 10

Department of Government, University of Essex

Question: Given this equation

Multiple regression with dummy variables:

GV207 Political Analysis, Week 10

Department of Government, University of Essex

. regress women2000 gdppc2000 educ2001 fhdum1

[95% Conf. Interval]

GV207 Political Analysis, Week 10

Department of Government, University of Essex

. regress women2000 gdppc2000 educ2001 fhdum1 fhdum3

[95% Conf. Interval]

Comparing the strength of the effects of different variables:

GV207 Political Analysis, Week 10

Department of Government, University of Essex

Potrebbero piacerti anche