Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Jane Yang
Matthew Eckel
Analysis of Political Data
30 April 2018
The Power of Ideas: Does Ideology Affect the Percentage of Women in State Legislatures?
In this paper, I theorize that people who embody more liberal ideologies tend to vote
more women into office. More specifically, I posit that U.S. states with a more liberal
constituency on average tend to have a higher percentage of women in their state legislatures.
This is due to the fact “Ideas about women’s role and position in society can enhance or
constrain women’s ability to seek political power.”1 In other words, “despite the presence of
favorable political systems or an adequate supply of female candidates, […] ideologies and
arguments against women’s right[s] to participate in politics have created substantial barriers to
women’s political participation for many years.”2 Put simply, our ideologies dictate how we
perceive women’s rights, women’s interests, and women’s roles; these perceptions then play a
role in whether or not we vote for a woman. Of course, my theory can only apply under certain
conditions. For one, because the theory specifies U.S. states, it can only be applied to U.S. states.
Moreover, my theory is contingent on the assumption that the data are coming from strong
democracies, where constituents are voting in free and fair elections and have complete agency
over their voting decisions. In other words, voters are not being coerced to vote for specific
candidates, to the extent that they might even be compromising their own ideological stances.
To test this theory, I draw from the “Correlates of State Policy Project” (CSPP) dataset,
compiled by the Institute for Public Policy and Social Research at Michigan State University,
and compare ADA/COPE measures of citizen ideology to the percentage of women in each
1
Pamela Marie Paxton and Sheri Kunovich. “Women’s Political Representation: The Importance of Ideology,” Social Forces 82:1 (September
2003): 90. http://muse.jhu.edu/article/47842/pdf
2
Paxton and Kunovich, “Women’s Political Representation,” 90-91.
Yang 2
state’s legislature.3 The CSPP includes “more than nine-hundred variables, with observations
across the U.S. 50 states and time (1900 – 2016). These variables represent policy outputs or
political, social, or economic factors that may influence policy differences across the states.”4 I
then proceed as follows. First, I consider other possible theories about women’s political
representation. Then, I describe in detail the independent variable, dependent variable, units of
observation, and possible control variables. After that, I run a preliminary bivariate linear
visualizations and interpretations. Then, I assess the validity of the model itself. Finally, I
conclude with an overall analysis of my theory given my tests, a consideration for other possible
threats to my tests (namely, endogeneity and a lack of clarity on the ideology variable), as well
Competing Theories
There are many other theories on what factors affect women’s representation in politics.
Broadly, these theories can be categorized as “supply” and “demand” factors; the idea is that “the
‘supply’ of female candidates and ‘demand’ for female candidates” affect the number of women
in legislatures.5 The “supply” factor posits that “Political elites are pulled disproportionately
from the highly educated and from certain professions, such as law,” thus, “if women do not
have access to educational and professional opportunities, they will not have the human financial
capital necessary to run for office.”6 The demand factor suggests that “institutional differences in
political systems may manifest a different ‘demand’ for women, irrespective of the available
3
Correlates of State Policy Web site. Michigan State University, Institute for Public Policy and Social Research (IPPSR).
http://ippsr.msu.edu/public-policy/correlates-state-policy. This paper will elaborate on the ADA/COPE measure of ideology and the
percentage of women in each state’s legislature in subsequent sections.
4
Correlates of State Policy Web site.
5
Paxton and Kunovich, “Women’s Political Representation,” 89.
6
Paxton and Kunovich, “Women’s Political Representation,” 89.
Yang 3
supply,” such as political parties and electoral systems, which “can be crucial factors in allowing
women access in equal numbers.”7 My theory on ideology counts as a “demand” factor, as I posit
that people with more liberal ideologies have a higher “demand” for female representatives.
factors. For instance, “Studies on corruption, such as those initiated by researchers at the World
Bank, find evidence of a relationship between the number of women in parliament and the level
of corruption;” however, the causality is unclear.8 That is, national parliaments with a lower level
of corruption are correlated with higher numbers of women, but the causal mechanism is
all increase female political representation due to the fact that these phenomena empower
women, such that “the political interests of working women are changed enough to create an
ideological gender gap.”10 My test takes these theories into consideration by controlling for
education and income levels, thus accounting for socioeconomic factors which may empower
women and increase the “supply” of eligible female candidates, as opposed to the “demand.”
This variable is based on Berry, Ringquist, Fording and Hanson’s aggregation of ADA
(American’s for Democratic Action) and COPE (Committee on Political Education) scores to
measure ideology, with 0 being the most conservative and 100 being the most liberal.11 As we
7
Paxton and Kunovich, “Women’s Political Representation,” 90.
8 Lena Wängnerud, “Women in Parliaments: Descriptive and Substantive Representation,” Annual Review of Political Science 12 (2009): 58.
https://www.annualreviews.org/doi/pdf/10.1146/annurev.polisci.11.053106.123839
9
Wängnerud, “Women in Parliaments,” 58.
10
Wängnerud, “Women in Parliaments,” 56-58.
11
William D. Berry, Evan J. Ringquist, Richard C. Fording, and Russell I. Hanson, “Measuring Citizen and Government Ideology in the
American States, 1960-93,” American Journal of Political Science 42:1 (January 1998): 327 – 348.
https://www.jstor.org/stable/pdf/2991759.pdf?refreqid=excelsior:6047b68e3bbde6c3420943cee0689d9c
Yang 4
can see in the following histogram, the distribution of the scores is fairly normal, with the lowest
score being 0.963, the highest score being 95.972, and the mean score being 47.838.
It is important to note here that the conceptual definition of this variable is somewhat
misleading. Firstly, the Citizen Ideology Score is computed based on the ideology of the
representative of each district, which is then used to compute an average for the state as a whole.
As such, the Citizen Ideology Score is technically based off of an aggregation of the ideology of
the representatives—not each individual citizen. Moreover, whereas social scientists generally
agree that ideological factors differ from political-institutional factors, the Citizen Ideology
Measure variable conflates ideology with political preferences.12 Both ADA and COPE measure
ideology based on how representatives vote on specific political issues. For instance, ADA uses
the Liberal Quotient, which “combin[es] 20 key votes on a wide range of social and economic
issues, both domestic and international…[to provide] a basic overall picture of an elected
official’s political position.”13 Meanwhile, COPE measures ideology based on voting records
12
Paxton and Kunovich, “Women’s Political Representation,” 89-91.
13
“ADA Voting Records.” Americans for Democratic Action. https://adaction.org/ada-voting-records/. Emphasis added.
Yang 5
tracked by the American Federation of Labor and Congress of Industrial Organizations (AFL-
CIO): the more a representative votes to strengthen “Social Security and Medicare, freedom to
join a union, workplace safety,” the more “liberal” they are, and the higher their score.14 These
phenomena may compromise the validity of my tests, which I will expand upon later.
Dependent Variable: Percentage of state legislators who are women, by state (1975 – 2016)
legislators who are women by state, in a given year, between the years of 1975 and 2016. As
seen in the following histogram, the distribution of these percentages is also fairly normal, with
the lowest percentage at 0.70%, the highest percentage at 42%, and the mean at 19.49%.
The CSPP has over 700 variables “with observations across the U.S. 50 states and time
(1900 – 2016).”15 In other words, I am working with time-series data, where the unit of
14
“Legislative Scorecard.” American Federation of Labor and Congress of Industrial Organizations (AFL-CIO). https://aflcio.org/what-unions-
do/social-economic-justice/advocacy/scorecard
15
Jordan, Marty P. and Matt Grossmann. 2016. The Correlates of State Policy Project v1.14. East Lansing, MI: Institute for Public Policy and
Social Research (IPPSR). http://ippsr.msu.edu/sites/default/files/CorrelatesCodebook.pdf
Yang 6
observation is not just fixed on states, but rather, a hybrid of state-years. This is important to note
because it raises the possibility of running into autocorrelation errors in my regression models.
Indeed, it would be naïve to assume that the citizen ideology score of a state in one year has no
effect on the citizen ideology score of that same state in the following year. The same goes for
percentage of female state legislators in a given year. Moreover, because the unit of observation
is state-year, not country-year, the results can only be applied in the context of U.S. state
legislatures, not for U.S. national legislatures or for legislatures of other countries.
Control Variables: Income per Capita, Education Level, Lagged IV, Lagged DV
factor. In my multivariate regression test, I include variables for the average income per capita
(total personal income in that state divided by total midyear population), percentage of
respondents with a high school diploma or higher, the citizen ideology score for a state the year
before (in other words, a one-year lagged citizen ideology variable), and a one-year lagged
percentage of women in legislature variable. The first two variables are meant to account for
socioeconomic factors. Indeed, we can assume that the higher the average income per capita in a
state, and the larger the percentage of residents with a high school diploma or higher, the better
the socioeconomic circumstances. The idea here is that better socioeconomic circumstances lead
to more female empowerment, which may increase the “supply” of eligible female candidates.16
The lag variables are then meant to mitigate possible autocorrelation errors due to the time-series
nature of both the IV and the DV. The distribution of the control variables are shown in the
following histograms.
16
For more on this theory, see the section in this paper on “Competing Theories.”
Yang 7
H0: Citizen Ideology Score does not have an effect on the Percentage of Female Legislators
Ha: Citizen Ideology Score has an effect on the Percentage of Female Legislators. As ideology
As we can see from our regression table, the beta coefficient for Citizen Ideology Score is
0.176. This means that as Citizen Ideology Score increases by 1 point, the Percentage of Female
Legislators increases by 0.176 percent. Meanwhile, because the beta coefficient of the constant is
10.370, we can assume that when the Citizen Ideology Score is 0, the estimated Percentage of
Female Legislators is 10.73. Finally, because the p-value is less than 0.01 for both Citizen
Yang 9
Ideology Score and the constant, we can assume that both of these numbers are statistically
significant. As such, we can reject the null hypothesis in favor of the alternative hypothesis.
explained by Citizen Ideology Score alone. In order to assess this, we look to the Adjusted R-
squared. At 0.105, the Adjusted R-squared tells us that 10.5% of the variation in the Percentage
of Female Legislators can be explained by the variation in the Citizen Ideology Score. This is a
fairly low percentage; thus, we can assume that Citizen Ideology Score has a positive effect on
the Percentage of Female Legislators, but also a weak one. The low Adjusted R-squared is most
likely due to omitted variable bias; that is, there are many other causal factors which affect the
test and include control variables in order to mitigate omitted variable bias.
H0: Controlling for average state Income-per-Capita, Percentage of Residents with a High School
Diploma or Higher, and Citizen Ideology Score and Percentage of Female Legislators the year
before, Citizen Ideology Score does not have an effect on the Percentage of Female Legislators.
H1: Accounting for the control variables, Citizen Ideology Score has an effect on Percentage of
From our regression table, we can see that, when controlling for average state Income-
per-Capita, Percentage of Residents with a High School Diploma or Higher, and lagged variables
for both the IV and the DV, the beta coefficient for Citizen Ideology Score decreases from 0.176
Yang 11
to 0.070. Although the inclusion of control variables has caused the effect of Citizen Ideology
Score itself to decrease, the effect of Citizen Ideology Score in estimating Percentage of Female
Legislators is still statistically significant at p < 0.01. Thus, we can reject the null hypothesis
At the same time, we consider the effects of the average state Income-per-Capita,
Percentage of Residents with a High School Diploma or Higher, and the lagged variables for
Citizen Ideology Score and Percentage of Female Legislators. Because the lagged variable for
Citizen Ideology Score is not statistically significant (p > 0.05), we assume that it has no effect
on the Percentage of Female Legislators.17 The beta coefficients for average state Income-per-
Capita, Percentage of Residents with a High School Diploma or Higher, and lagged Percentage
of Female Legislators respectively are 0.0003, 0.503, and 0.095, all at a p-value of less than 0.01.
Not only do each of these variables have a positive effect on the Percentage of Female
Legislators, but they are all statistically significant. That is, as each of these variables increases,
so too does the Percentage of Female Legislators. The strength of this model is also much higher,
as the Adjusted R-squared increased from 0.105 to 0.476. This Adjusted R-squared suggests that
47.6% of the variation in the Percentage of Female Legislators is explained by the aggregate
Diploma or Higher, and lagged variables for the IV and DV. This is a moderately high Adjusted
R-squared, which means that my multivariate model fits the data moderately well overall.
What is rather confusing is the beta coefficient for the constant, -32.995. Not only is this
beta coefficient statistically significant at a p-value of less than 0.01, but it is also negative. As
17
We can also tell from our visualization of the predicted values that Lagged Citizen Ideology is not statistically significant because the 95%
confidence interval includes the value 0, which means that a possible beta coefficient for Lagged Citizen Ideology is 0, aka no effect on the
Percentage of Female Legislators.
Yang 12
such, the interpretation would be: when Citizen Ideology Score, average state Income-per-
Capita, Percentage of Residents with a High School Diploma or Higher, and the lagged variables
for the IV and DV are all 0, the estimated Percentage of Female Legislators is -32.995%. This,
however, does not make logical sense, as the lowest percentage of female legislators possible is
0%. Unfortunately, due to a lack of statistical knowledge and resources, I cannot determine why
exactly this is the case. However, it is certainly important to note, and perhaps a point of
Diagnostics
When using linear regression models to analyze data, the model must meet several basic
assumptions in order to be considered a good “fit” for the data. However, these assumptions are
sometimes violated. Thus, before I conclude with my overall analysis, I check for some of these
assumptions—as well as other potential factors which may threaten the validity of my tests—by
The first assumption of multivariate linear regression models is that there is a linear
relationship between our predictor variables and the dependent variable. To check for this, I
plotted Component-Residual plots for each individual predictor variable and an aggregate
Residuals vs. Fitted plot. In general, a major difference between the residual lines and the
component lines indicate that the predictor variables do not have a linear relationship with the
dependent variable. As we can see from both our aggregate Residuals vs. Fitted plot and the
individual Component-Residual plots, the component lines are quite closely matched to the
18
See Appendix 1 for the relevant tables and graphs for these tests.
Yang 13
residual lines, and the data are quite evenly distributed along the residual lines. Thus, we can
The second assumption is that the residuals are normally distributed. In order to check
this, I plotted a Normal Quantile-Quantile plot (Q-Q plot). This is essentially a scatterplot which
plots the quantiles of the residuals of predictor variables against standardized residuals.19 If the
residuals are normal, then the points in the scatterplot should form a straight line.20 As we can
see from the Normal Q-Q plot, the line is quite straight and matches the fitted line very well.
residuals are the same across all predictor variables, or that there is equal variance.21 If the
residuals are not spread equally across the predictors, then the data is not homoscedastic.22 To
test this, I created a Scale-Location plot, which is a scatterplot of the fitted values against the
standardized residuals. If there is a distinct pattern in the data, then it is not homoscedastic.23 As
a general rule, a straight line with randomly distributed points is consistent with the assumption
of homoscedasticity.24 In the case of my Scale-Location plot, the line is somewhat straight, but
also somewhat parabolic. That said, the points seem to be randomly scattered, and the line is
more horizontal than it is parabolic; thus, we assume that the error terms are homoscedastic.
That is, the predictor variables in my model are not so highly correlated to each other that they
also predict each other to some degree. This is especially important to note because it is possible
19
“Understanding Q-Q Plots.” University of Virginia Library. http://data.library.virginia.edu/understanding-q-q-plots/
20
“Understanding Q-Q Plots.”
21
“Diagnostic Plots.” University of Virginia Library. http://data.library.virginia.edu/diagnostic-plots/
22
“Diagnostic Plots.”
23
“Diagnostic Plots.”
24
“Diagnostic Plots.”
Yang 14
that the average state Income-per-Capita is linearly related to the Percentage of Residents with a
High School Diploma or Higher; after all, the higher the income, the easier it is to pay for higher
education. Even more likely is a linear correlation between the lagged IV and the lagged DV. We
do not want these relationships to affect the variance of the overall model. Thus, to check for
multicollinearity, I generated variance inflation factors (VIFs) for each variable, which estimates
for us how much the regression’s variance is increased because of collinearity. As a general rule
of thumb, any VIF greater than 10 suggests that there may be multicollinearity. In my model, the
highest VIF is for average state Income-per-Capita at 2.249091. As such, it is safe for us to
Finally, it is important to check for influential points; as in, observations which are
outliers and have high leverage, and thus are influential enough to distort the coefficients in our
linear model. To check this, I plotted a Residuals vs. Leverage chart and checked for any points
that fall in the top right or bottom right corner beyond the Cook’s distance line; any points that
fall in those regions are said to have “high leverage or potential for influencing [the] model.”25 In
my Residuals vs. Leverage plot, none of the observations fall in the top right or bottom right
corner outside of the Cook’s distance line. Thus, we can assume that there are no observations
Overall, my model seems to pass the respective tests for checking linearity, normality of
assumptions hold for my model. Similarly, my regression results do not seem to be distorted by
any influential points. That said, it is important to remember that other than the multicollinearity
25
“R Tutorial: How to use Diagnostic Plots for Regression Models.” http://analyticspro.org/2016/03/07/r-tutorial-how-to-use-diagnostic-plots-
for-regression-models/. Cook’s distance (or Cook’s D) is a “measure that combines the information of leverage and residual of the
observation,” and can be used to determine influential points. See also: “Robust Regression: R Data Analysis Examples.” Institute for
Digital Research and Education, University of California – Los Angeles, https://stats.idre.ucla.edu/r/dae/robust-regression/
Yang 15
and influential points tests, all of the tests are based purely on visual interpretation and
determining patterns. That is, interpreting these graphs can be quite subjective—as in the case of
our homoscedasticity test.26 Ultimately, however, the diagnostics tests seem to show strong
Conclusion
What does all of this suggest? Is there a relationship between Citizen Ideology Score and
Percentage of Female Legislators? Does the average ideology of the people in a given state affect
the percentage of females in their state legislature? Even more broadly, does our ideology have
an effect on what genders we vote for? Are there any potential problems with the model and/or
potential threats to causal validity (outside of the diagnostics tests) that I could not eliminate?
Are there any areas that require further exploration? And finally, why does this matter?
Score has a positive effect on the Percentage of Female Legislators to a statistically significant
degree (p < 0.01). Indeed, even when accounting for other statistically significant causal factors,
such as per capita income, education level, and the Percentage of Female Legislators one year
prior, the estimating effect of Citizen Ideology Score on Percentage of Female Legislators is still
positive and statistically significant, albeit a small effect. Thus, we can say that U.S. states with a
more liberal constituency on average tend to vote more women into their state legislatures, even
when accounting for other causal factors like socioeconomic status and the percentage of female
However, as mentioned briefly earlier in this paper, my analysis runs into two major
threats to causal validity: endogeneity and a lack of distinction between politics and ideology. Of
26
The influential points test is also based on visual interpretation; however, the Cook’s distance line makes it easier to have a more objective,
universal interpretation.
Yang 16
course, since less than 50% of the variation in Percentage of Female Legislators can be explained
possibility that my model still suffers from omitted variable bias. Even more important is the
cyclical logic of my model. That is, I argue that more liberal ideologies lead to more positive
perceptions of women, which then affects the percentage of female legislators, yet at the same
obviously, female legislators will have a higher opinion of women, which will lead to a more
liberal ideology score for a state. Thus, this logic is self-serving. Of course, the only way to
really avoid this endogeneity problem would be to measure the ideology of each individual
citizen in a state and then average that out, instead of using the representatives as proxy
The other threat to the validity of my test is the fact that my ideology variable conflates
political preference with ideology. That is, the Citizen Ideology Score assumes that people who
considers themselves a Liberal on the political spectrum—is thus liberal overall. In the context of
my theory, this would suggest that anyone who considers themselves a Conservative is thus less
inclined to vote women into office; and yet, there are plenty of women in politics who identify as
Conservatives, and many Conservatives who believe in female representation in politics. The
ultimate problem here is that there is no concrete and specific definition for ideology, so it is
incredibly difficult to measure. Thus, coming up with a more concrete and specific way to define
One final thing to consider would be differences across time periods. Although my model
includes lagged variables for the independent variable and the dependent variable one year prior,
Yang 17
it still does not account for the fact that these observations fluctuate over time, as opposed to
increasing or decreasing in a steady linear fashion over time. In other words, it does not account
for the possibility that some citizen ideology scores and some percentages of female legislators
might be higher in, say, 2006 than in 2010. Thus, we cannot say that: “as a U.S. state’s average
citizen ideology becomes more liberal, the percentage of women in its state legislature
increases.” However, it would certainly be interesting to consider how time might affect the
theory; that is, whether or not certain time periods experienced higher citizen ideologies scores
All things considered, however, and especially given the results of my regression models
and diagnostics tests, we can say with confidence that higher citizen ideology scores cause
higher percentages of female legislators, even when controlling for socioeconomic factors like
income and education level. In other words, U.S. states with a more liberal constituency are more
likely to vote more women into their state legislature. This is important to note because it
suggests that our ideals, our values, the ways we think, the ways we were raised even, affect the
representation in politics in the future, then we should certainly consider ideological factors, not
27
All of the relevant coding for this project can be seen in Appendix 2.
Yang 18
BIBLIOGRAPHY
records/
plots/
(AFL-CIO). https://aflcio.org/what-unions-do/social-economic-
justice/advocacy/scorecard
http://analyticspro.org/2016/03/07/r-tutorial-how-to-use-diagnostic-plots-for-regression-
models/
“Robust Regression: R Data Analysis Examples.” Institute for Digital Research and Education,
http://data.library.virginia.edu/understanding-q-q-plots/
Berry, William D., Evan J. Ringquist, Richard C. Fording, and Russell I. Hanson. “Measuring
Citizen and Government Ideology in the American States, 1960-93,” American Journal
https://www.jstor.org/stable/pdf/2991759.pdf?refreqid=excelsior:6047b68e3bbde6c3420
943cee0689d9c
Correlates of State Policy. Michigan State University, Institute for Public Policy and Social
Jordan, Marty P. and Matt Grossmann. 2016. The Correlates of State Policy Project v1.14. East
Lansing, MI: Institute for Public Policy and Social Research (IPPSR).
http://ippsr.msu.edu/sites/default/files/CorrelatesCodebook.pdf
Paxton, Pamela Marie, and Sheri Kunovich. “Women’s Political Representation: The Importance
http://muse.jhu.edu/article/47842/pdf
https://www.annualreviews.org/doi/pdf/10.1146/annurev.polisci.11.053106.123839
Yang 20
Linearity of Relationship
Residuals vs. Fitted Plot (Aggregate of Predictor Variables)
Multicollinearity
Variance Inflation Factors for Each Predictor Variable
citizenideology incomepcap hsdiploma lagcitizenideology lagpctfemaleleg
1.065776 2.249091 1.486476 1.097590 1.726666
Influential Points
Yang 23
APPENDIX 2: CODING
Call:
lm(formula = pctfemaleleg ~ citizenideology, data = cspdataset)
Residuals:
Min 1Q Median 3Q Max
-19.9834 -5.9796 -0.1783 5.7894 23.5963
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.37029 0.63464 16.34 <2e-16 ***
citizenideology 0.17627 0.01227 14.37 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
=======================================================
Dependent variable:
--------------------------------
Percentage of Female Legislators
-------------------------------------------------------
Citizen Ideology Score 0.176***
(0.012)
Constant 10.370***
(0.635)
-------------------------------------------------------
Observations 1,747
R2 0.106
Adjusted R2 0.105
Residual Std. Error 7.967 (df = 1745)
F Statistic 206.364*** (df = 1; 1745)
=======================================================
Note: *p<0.1; **p<0.05; ***p<0.01
> plot(citizenideology, pctfemaleleg, main = "Percentage of Female
Legislators by Citizen Ideology Score", xlab = "Citizen Ideology Score", ylab
= "Percentage of Female Legislators")
> abline(lm(pctfemaleleg ~ citizenideology))
>
> #Running a multivariate linear regression on the percentage of female
legislators by citizen ideology,
> #controlling for average income per capita, percent of respondents with a
high school diploma or higher,
> #number of residents who are female, and lagged IV and DV
> reg2 <- lm(pctfemaleleg ~ citizenideology + incomepcap + hsdiploma +
lagcitizenideology + lagpctfemaleleg, data = cspdataset)
> summary(reg2)
Call:
Yang 25
Residuals:
Min 1Q Median 3Q Max
-14.8428 -4.0661 -0.6256 3.7721 19.9827
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.300e+01 2.238e+00 -14.743 < 2e-16 ***
citizenideology 7.004e-02 1.102e-02 6.355 2.84e-10 ***
incomepcap 2.689e-04 2.772e-05 9.699 < 2e-16 ***
hsdiploma 5.029e-01 3.037e-02 16.556 < 2e-16 ***
lagcitizenideology 1.486e-02 1.139e-02 1.304 0.192297
lagpctfemaleleg 9.506e-02 2.584e-02 3.679 0.000244 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> #Creating confidence intervals for multivariate beta coefficients,
visualizing predicted values
> ci_citizenideology <- coef(reg2)[2] + c(-1, 1) * se.coef(reg2)[2] * 1.96
> ci_citizenideology_dataframe <- data.frame(est = coef(reg2)[2],
+ lb = ci_citizenideology[1],
+ ub = ci_citizenideology[2],
+ model = "Citizen Ideology Score")
> ci_incomepcap <- coef(reg2)[3] + c(-1, 1) * se.coef(reg2)[3] * 1.96
> ci_incomepcap_dataframe <- data.frame(est = coef(reg2)[3],
+ lb = ci_incomepcap[1],
+ ub = ci_incomepcap[2],
+ model = "Income-per-Capita")
> ci_hsdiploma <- coef(reg2)[4] + c(-1, 1) * se.coef(reg2)[4] * 1.96
> ci_hsdiploma_dataframe <- data.frame(est = coef(reg2)[4],
+ lb = ci_hsdiploma[1],
+ ub = ci_hsdiploma[2],
+ model = "High School Diploma or
Higher")
> ci_lagcitizenideology <- coef(reg2)[5] + c(-1, 1) * se.coef(reg2)[5] * 1.96
> ci_lagcitizenideology_dataframe <- data.frame(est = coef(reg2)[5],
+ lb = ci_lagcitizenideology[1],
+ ub = ci_lagcitizenideology[2],
+ model = "Lag Citizen Ideology")
> ci_lagpctfemaleleg <- coef(reg2)[6] + c(-1, 1) * se.coef(reg2)[6] * 1.96
> ci_lagpctfemaleleg_dataframe <- data.frame(est = coef(reg2)[6],
+ lb = ci_lagpctfemaleleg[1],
+ ub = ci_lagpctfemaleleg[2],
+ model = "Lag Percent Female
Legislators")
> est <- rbind(ci_citizenideology_dataframe, ci_incomepcap_dataframe,
ci_hsdiploma_dataframe, ci_lagcitizenideology_dataframe,
ci_lagpctfemaleleg_dataframe)
> ggplot(est, aes(x = model, y = est)) +
Yang 26
+ geom_point() +
+ geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1) +
+ geom_hline(yintercept = 0, lty = 2, color = "red") +
+ labs(title = "Predicted Values for Multivariate Regression Beta
Coefficients (95% Confidence Interval)", x = "", y = "Predicted Values")
>
> #Creating a multivariate regression table
> stargazer(reg2, type="text", dep.var.labels=c("Percentage of Female
Legislators"), covariate.labels=c("Citizen Ideology Score", "Income per
Capita", "Percentage of Residents with a High School Diploma or Higher",
"Lagged Citizen Ideology Score", "Lagged Percentage of Female Legislators"),
out="models.txt")
>
> #Visualizing the distribution of each predictor variable in multivariate
regression
> ggplot(data = cspdataset, aes(x = incomepcap)) + geom_histogram(binwidth =
300) + labs(title = "Distribution of Average Income per Capita", x = "Average
Income per Capita", y = "Number of Observations")
Warning message:
Removed 1878 rows containing non-finite values (stat_bin).
> ggplot(data = cspdataset, aes(x = hsdiploma)) + geom_histogram(binwidth =
1) + labs(title = "Distribution of Percentage of Residents with a High School
Diploma or Higher", x = "Percentage of Residents with a High School Diploma
or Higher", y = "Number of Observations")
Warning message:
Removed 4406 rows containing non-finite values (stat_bin).
> ggplot(data = cspdataset, aes(x = lagcitizenideology)) +
geom_histogram(binwidth = 1) + labs(title = "Distribution of Lagged Citizen
Ideology", x = "Laggued Citizen Ideology", y = "Number of Observations")
Warning message:
Removed 3318 rows containing non-finite values (stat_bin).
Yang 27
> plot(reg2)
Hit <Return> to see next plot: gvlmareg2 <- gvlma(reg2)
Hit <Return> to see next plot: summary(gvlmareg2)
Hit <Return> to see next plot: #Not exactly sure what the results of this
gvlma means--it seems that most of the assumptions are NOT met,
Hit <Return> to see next plot: #which is the opposite of my results from the
other tests. Since I don't want to get ahead of myself,
> #I'm going to stick with my crPlots, vif, and plot diagnostic tests.
However, I do want to note that this
> #inconsistency is both interesting and confusing...
>
> #Running a multivariate linear regression on the percentage of female
legislators by citizen ideology,
> #controlling for average income per capita and percent of respondents with
a high school diploma or higher,
> #with interactions with female population
> pctfemaleleg_by_citizenideology_incomeppcap_hsdiploma_withinteraction <-
+ lm(pctfemaleleg ~ citizenideology + incomepcap + hsdiploma + popfemale +
incomepcap*popfemale + hsdiploma*popfemale, data = cspdataset)
>
summary(pctfemaleleg_by_citizenideology_incomeppcap_hsdiploma_withinteraction
)
Call:
lm(formula = pctfemaleleg ~ citizenideology + incomepcap + hsdiploma +
popfemale + incomepcap * popfemale + hsdiploma * popfemale,
data = cspdataset)
Residuals:
Min 1Q Median 3Q Max
-14.3383 -4.3692 0.1633 4.2516 17.8847
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.569e+01 4.958e+00 -5.182 3.01e-07 ***
citizenideology 1.477e-01 1.934e-02 7.637 8.98e-14 ***
incomepcap -1.580e-04 7.663e-05 -2.062 0.0396 *
hsdiploma 5.527e-01 7.109e-02 7.774 3.39e-14 ***
popfemale 1.074e-07 1.681e-06 0.064 0.9491
incomepcap:popfemale 5.552e-12 1.848e-11 0.300 0.7640
hsdiploma:popfemale -2.473e-09 2.507e-08 -0.099 0.9215
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Yang 28