The Power of Ideas: Does Ideology Affect The Percentage of Women in State Legislatures?

Yang 1
Jane Yang
Matthew Eckel
Analysis of Political Data
30 April 2018
The Power of Ideas: Does Ideology Affect the Percentage of Women in State Legislatures?
In this paper, I theorize that people who embody more liberal ideologies tend to vote
more women into office. More specifically, I posit that U.S. states with a more liberal
constituency on average tend to have a higher percentage of women in their state legislatures.
This is due to the fact “Ideas about women’s role and position in society can enhance or
constrain women’s ability to seek political power.”1 In other words, “despite the presence of
favorable political systems or an adequate supply of female candidates, […] ideologies and
arguments against women’s right[s] to participate in politics have created substantial barriers to
women’s political participation for many years.”2 Put simply, our ideologies dictate how we
perceive women’s rights, women’s interests, and women’s roles; these perceptions then play a
role in whether or not we vote for a woman. Of course, my theory can only apply under certain
conditions. For one, because the theory specifies U.S. states, it can only be applied to U.S. states.
Moreover, my theory is contingent on the assumption that the data are coming from strong
democracies, where constituents are voting in free and fair elections and have complete agency
over their voting decisions. In other words, voters are not being coerced to vote for specific
candidates, to the extent that they might even be compromising their own ideological stances.
To test this theory, I draw from the “Correlates of State Policy Project” (CSPP) dataset,
compiled by the Institute for Public Policy and Social Research at Michigan State University,
and compare ADA/COPE measures of citizen ideology to the percentage of women in each
1
Pamela Marie Paxton and Sheri Kunovich. “Women’s Political Representation: The Importance of Ideology,” Social Forces 82:1 (September
2003): 90. http://muse.jhu.edu/article/47842/pdf
2
Paxton and Kunovich, “Women’s Political Representation,” 90-91.
Yang 2
state’s legislature.3 The CSPP includes “more than nine-hundred variables, with observations
across the U.S. 50 states and time (1900 – 2016). These variables represent policy outputs or
political, social, or economic factors that may influence policy differences across the states.”4 I
then proceed as follows. First, I consider other possible theories about women’s political
representation. Then, I describe in detail the independent variable, dependent variable, units of
observation, and possible control variables. After that, I run a preliminary bivariate linear
regression, as well as a more comprehensive multivariate linear regression, both with
visualizations and interpretations. Then, I assess the validity of the model itself. Finally, I
conclude with an overall analysis of my theory given my tests, a consideration for other possible
threats to my tests (namely, endogeneity and a lack of clarity on the ideology variable), as well
as ideas for future tests or areas of improvement.
Competing Theories
There are many other theories on what factors affect women’s representation in politics.
Broadly, these theories can be categorized as “supply” and “demand” factors; the idea is that “the
‘supply’ of female candidates and ‘demand’ for female candidates” affect the number of women
in legislatures.5 The “supply” factor posits that “Political elites are pulled disproportionately
from the highly educated and from certain professions, such as law,” thus, “if women do not
have access to educational and professional opportunities, they will not have the human financial
capital necessary to run for office.”6 The demand factor suggests that “institutional differences in
political systems may manifest a different ‘demand’ for women, irrespective of the available
3
Correlates of State Policy Web site. Michigan State University, Institute for Public Policy and Social Research (IPPSR).
http://ippsr.msu.edu/public-policy/correlates-state-policy. This paper will elaborate on the ADA/COPE measure of ideology and the
percentage of women in each state’s legislature in subsequent sections.
4
Correlates of State Policy Web site.
5
Paxton and Kunovich, “Women’s Political Representation,” 89.
6
Yang 3
supply,” such as political parties and electoral systems, which “can be crucial factors in allowing
women access in equal numbers.”7 My theory on ideology counts as a “demand” factor, as I posit
that people with more liberal ideologies have a higher “demand” for female representatives.
Other competing theories include those on government corruption and socioeconomic
factors. For instance, “Studies on corruption, such as those initiated by researchers at the World
Bank, find evidence of a relationship between the number of women in parliament and the level
of corruption;” however, the causality is unclear.8 That is, national parliaments with a lower level
of corruption are correlated with higher numbers of women, but the causal mechanism is
unknown.9 Meanwhile, socioeconomic factors such as “women’s share in professional
occupations,” “welfare state policies,” and “increases in government (non-military) expenditure”
all increase female political representation due to the fact that these phenomena empower
women, such that “the political interests of working women are changed enough to create an
ideological gender gap.”10 My test takes these theories into consideration by controlling for
education and income levels, thus accounting for socioeconomic factors which may empower
women and increase the “supply” of eligible female candidates, as opposed to the “demand.”
Independent Variable: Citizen Ideology Measure (1960 – 2013)
This variable is based on Berry, Ringquist, Fording and Hanson’s aggregation of ADA
(American’s for Democratic Action) and COPE (Committee on Political Education) scores to
measure ideology, with 0 being the most conservative and 100 being the most liberal.11 As we
7
8 Lena Wängnerud, “Women in Parliaments: Descriptive and Substantive Representation,” Annual Review of Political Science 12 (2009): 58.
https://www.annualreviews.org/doi/pdf/10.1146/annurev.polisci.11.053106.123839
9
Wängnerud, “Women in Parliaments,” 58.
10
Wängnerud, “Women in Parliaments,” 56-58.
11
William D. Berry, Evan J. Ringquist, Richard C. Fording, and Russell I. Hanson, “Measuring Citizen and Government Ideology in the
American States, 1960-93,” American Journal of Political Science 42:1 (January 1998): 327 – 348.
https://www.jstor.org/stable/pdf/2991759.pdf?refreqid=excelsior:6047b68e3bbde6c3420943cee0689d9c
Yang 4
can see in the following histogram, the distribution of the scores is fairly normal, with the lowest
score being 0.963, the highest score being 95.972, and the mean score being 47.838.
It is important to note here that the conceptual definition of this variable is somewhat
misleading. Firstly, the Citizen Ideology Score is computed based on the ideology of the
representative of each district, which is then used to compute an average for the state as a whole.
As such, the Citizen Ideology Score is technically based off of an aggregation of the ideology of
the representatives—not each individual citizen. Moreover, whereas social scientists generally
agree that ideological factors differ from political-institutional factors, the Citizen Ideology
Measure variable conflates ideology with political preferences.12 Both ADA and COPE measure
ideology based on how representatives vote on specific political issues. For instance, ADA uses
the Liberal Quotient, which “combin[es] 20 key votes on a wide range of social and economic
issues, both domestic and international…[to provide] a basic overall picture of an elected
official’s political position.”13 Meanwhile, COPE measures ideology based on voting records
12
Paxton and Kunovich, “Women’s Political Representation,” 89-91.
13
“ADA Voting Records.” Americans for Democratic Action. https://adaction.org/ada-voting-records/. Emphasis added.
Yang 5
tracked by the American Federation of Labor and Congress of Industrial Organizations (AFL-
CIO): the more a representative votes to strengthen “Social Security and Medicare, freedom to
join a union, workplace safety,” the more “liberal” they are, and the higher their score.14 These
phenomena may compromise the validity of my tests, which I will expand upon later.
Dependent Variable: Percentage of state legislators who are women, by state (1975 – 2016)
This variable is straightforward. Quite simply, it measures the percentage of state
legislators who are women by state, in a given year, between the years of 1975 and 2016. As
seen in the following histogram, the distribution of these percentages is also fairly normal, with
the lowest percentage at 0.70%, the highest percentage at 42%, and the mean at 19.49%.
Units of Observation: state-year
The CSPP has over 700 variables “with observations across the U.S. 50 states and time
(1900 – 2016).”15 In other words, I am working with time-series data, where the unit of
14
“Legislative Scorecard.” American Federation of Labor and Congress of Industrial Organizations (AFL-CIO). https://aflcio.org/what-unions-
do/social-economic-justice/advocacy/scorecard
15
Jordan, Marty P. and Matt Grossmann. 2016. The Correlates of State Policy Project v1.14. East Lansing, MI: Institute for Public Policy and
Social Research (IPPSR). http://ippsr.msu.edu/sites/default/files/CorrelatesCodebook.pdf
Yang 6
observation is not just fixed on states, but rather, a hybrid of state-years. This is important to note
because it raises the possibility of running into autocorrelation errors in my regression models.
Indeed, it would be naïve to assume that the citizen ideology score of a state in one year has no
effect on the citizen ideology score of that same state in the following year. The same goes for
percentage of female state legislators in a given year. Moreover, because the unit of observation
is state-year, not country-year, the results can only be applied in the context of U.S. state
legislatures, not for U.S. national legislatures or for legislatures of other countries.
Control Variables: Income per Capita, Education Level, Lagged IV, Lagged DV
As mentioned, while ideological factors constitute as one possible explanation for
women’s representation in legislatures, other explanations include socioeconomics as a causal
factor. In my multivariate regression test, I include variables for the average income per capita
(total personal income in that state divided by total midyear population), percentage of
respondents with a high school diploma or higher, the citizen ideology score for a state the year
before (in other words, a one-year lagged citizen ideology variable), and a one-year lagged
percentage of women in legislature variable. The first two variables are meant to account for
socioeconomic factors. Indeed, we can assume that the higher the average income per capita in a
state, and the larger the percentage of residents with a high school diploma or higher, the better
the socioeconomic circumstances. The idea here is that better socioeconomic circumstances lead
to more female empowerment, which may increase the “supply” of eligible female candidates.16
The lag variables are then meant to mitigate possible autocorrelation errors due to the time-series
nature of both the IV and the DV. The distribution of the control variables are shown in the
following histograms.
16
For more on this theory, see the section in this paper on “Competing Theories.”
Yang 7
Bivariate Hypothesis Test
H0: Citizen Ideology Score does not have an effect on the Percentage of Female Legislators
Ha: Citizen Ideology Score has an effect on the Percentage of Female Legislators. As ideology
scores increase, percentage of female legislators either increases or decreases.

Yang 8
As we can see from our regression table, the beta coefficient for Citizen Ideology Score is
0.176. This means that as Citizen Ideology Score increases by 1 point, the Percentage of Female
Legislators increases by 0.176 percent. Meanwhile, because the beta coefficient of the constant is
10.370, we can assume that when the Citizen Ideology Score is 0, the estimated Percentage of
Female Legislators is 10.73. Finally, because the p-value is less than 0.01 for both Citizen
Yang 9
Ideology Score and the constant, we can assume that both of these numbers are statistically
significant. As such, we can reject the null hypothesis in favor of the alternative hypothesis.
It is also important to consider how much the Percentage of Female Legislators is
explained by Citizen Ideology Score alone. In order to assess this, we look to the Adjusted R-
squared. At 0.105, the Adjusted R-squared tells us that 10.5% of the variation in the Percentage
of Female Legislators can be explained by the variation in the Citizen Ideology Score. This is a
fairly low percentage; thus, we can assume that Citizen Ideology Score has a positive effect on
the Percentage of Female Legislators, but also a weak one. The low Adjusted R-squared is most
likely due to omitted variable bias; that is, there are many other causal factors which affect the
Percentage of Female Legislators. In the following section, I conduct a multivariate hypothesis
test and include control variables in order to mitigate omitted variable bias.
Multivariate Hypothesis Test
H0: Controlling for average state Income-per-Capita, Percentage of Residents with a High School
Diploma or Higher, and Citizen Ideology Score and Percentage of Female Legislators the year
before, Citizen Ideology Score does not have an effect on the Percentage of Female Legislators.
H1: Accounting for the control variables, Citizen Ideology Score has an effect on Percentage of
Female Legislators. As Citizen Ideology Score increases, Percentage of Female Legislators
either increases or decreases.

Yang 10
From our regression table, we can see that, when controlling for average state Income-
per-Capita, Percentage of Residents with a High School Diploma or Higher, and lagged variables
for both the IV and the DV, the beta coefficient for Citizen Ideology Score decreases from 0.176
Yang 11
to 0.070. Although the inclusion of control variables has caused the effect of Citizen Ideology
Score itself to decrease, the effect of Citizen Ideology Score in estimating Percentage of Female
Legislators is still statistically significant at p < 0.01. Thus, we can reject the null hypothesis
again in favor of the alternative hypothesis.
At the same time, we consider the effects of the average state Income-per-Capita,
Percentage of Residents with a High School Diploma or Higher, and the lagged variables for
Citizen Ideology Score and Percentage of Female Legislators. Because the lagged variable for
Citizen Ideology Score is not statistically significant (p > 0.05), we assume that it has no effect
on the Percentage of Female Legislators.17 The beta coefficients for average state Income-per-
Capita, Percentage of Residents with a High School Diploma or Higher, and lagged Percentage
of Female Legislators respectively are 0.0003, 0.503, and 0.095, all at a p-value of less than 0.01.
Not only do each of these variables have a positive effect on the Percentage of Female
Legislators, but they are all statistically significant. That is, as each of these variables increases,
so too does the Percentage of Female Legislators. The strength of this model is also much higher,
as the Adjusted R-squared increased from 0.105 to 0.476. This Adjusted R-squared suggests that
47.6% of the variation in the Percentage of Female Legislators is explained by the aggregate
variation of average state Income-per-Capita, Percentage of Residents with a High School
Diploma or Higher, and lagged variables for the IV and DV. This is a moderately high Adjusted
R-squared, which means that my multivariate model fits the data moderately well overall.
What is rather confusing is the beta coefficient for the constant, -32.995. Not only is this
beta coefficient statistically significant at a p-value of less than 0.01, but it is also negative. As
17
We can also tell from our visualization of the predicted values that Lagged Citizen Ideology is not statistically significant because the 95%
confidence interval includes the value 0, which means that a possible beta coefficient for Lagged Citizen Ideology is 0, aka no effect on the
Percentage of Female Legislators.
Yang 12
such, the interpretation would be: when Citizen Ideology Score, average state Income-per-
Capita, Percentage of Residents with a High School Diploma or Higher, and the lagged variables
for the IV and DV are all 0, the estimated Percentage of Female Legislators is -32.995%. This,
however, does not make logical sense, as the lowest percentage of female legislators possible is
0%. Unfortunately, due to a lack of statistical knowledge and resources, I cannot determine why
exactly this is the case. However, it is certainly important to note, and perhaps a point of
exploration for future studies.
Diagnostics
When using linear regression models to analyze data, the model must meet several basic
assumptions in order to be considered a good “fit” for the data. However, these assumptions are
sometimes violated. Thus, before I conclude with my overall analysis, I check for some of these
assumptions—as well as other potential factors which may threaten the validity of my tests—by
looking at the following: linearity of relationship, normality of residuals, homoscedasticity vs.
heteroscedasticity of error variance, multicollinearity, and influential points.18
The first assumption of multivariate linear regression models is that there is a linear
relationship between our predictor variables and the dependent variable. To check for this, I
plotted Component-Residual plots for each individual predictor variable and an aggregate
Residuals vs. Fitted plot. In general, a major difference between the residual lines and the
component lines indicate that the predictor variables do not have a linear relationship with the
dependent variable. As we can see from both our aggregate Residuals vs. Fitted plot and the
individual Component-Residual plots, the component lines are quite closely matched to the
18
See Appendix 1 for the relevant tables and graphs for these tests.
Yang 13
residual lines, and the data are quite evenly distributed along the residual lines. Thus, we can
assume that a linear model fits our data well.
The second assumption is that the residuals are normally distributed. In order to check
this, I plotted a Normal Quantile-Quantile plot (Q-Q plot). This is essentially a scatterplot which
plots the quantiles of the residuals of predictor variables against standardized residuals.19 If the
residuals are normal, then the points in the scatterplot should form a straight line.20 As we can
see from the Normal Q-Q plot, the line is quite straight and matches the fitted line very well.
Thus, we can assume that our residuals are normally distributed.
A third assumption is homoscedasticity of residual error terms: we assume that the
residuals are the same across all predictor variables, or that there is equal variance.21 If the
residuals are not spread equally across the predictors, then the data is not homoscedastic.22 To
test this, I created a Scale-Location plot, which is a scatterplot of the fitted values against the
standardized residuals. If there is a distinct pattern in the data, then it is not homoscedastic.23 As
a general rule, a straight line with randomly distributed points is consistent with the assumption
of homoscedasticity.24 In the case of my Scale-Location plot, the line is somewhat straight, but
also somewhat parabolic. That said, the points seem to be randomly scattered, and the line is
more horizontal than it is parabolic; thus, we assume that the error terms are homoscedastic.
The final assumption of multivariate linear regression is that there is no multicollinearity.
That is, the predictor variables in my model are not so highly correlated to each other that they
also predict each other to some degree. This is especially important to note because it is possible
19
“Understanding Q-Q Plots.” University of Virginia Library. http://data.library.virginia.edu/understanding-q-q-plots/
20
“Understanding Q-Q Plots.”
21
“Diagnostic Plots.” University of Virginia Library. http://data.library.virginia.edu/diagnostic-plots/
22
“Diagnostic Plots.”
23
24
Yang 14
that the average state Income-per-Capita is linearly related to the Percentage of Residents with a
High School Diploma or Higher; after all, the higher the income, the easier it is to pay for higher
education. Even more likely is a linear correlation between the lagged IV and the lagged DV. We
do not want these relationships to affect the variance of the overall model. Thus, to check for
multicollinearity, I generated variance inflation factors (VIFs) for each variable, which estimates
for us how much the regression’s variance is increased because of collinearity. As a general rule
of thumb, any VIF greater than 10 suggests that there may be multicollinearity. In my model, the
highest VIF is for average state Income-per-Capita at 2.249091. As such, it is safe for us to
assume that there is no multicollinearity in our model.
Finally, it is important to check for influential points; as in, observations which are
outliers and have high leverage, and thus are influential enough to distort the coefficients in our
linear model. To check this, I plotted a Residuals vs. Leverage chart and checked for any points
that fall in the top right or bottom right corner beyond the Cook’s distance line; any points that
fall in those regions are said to have “high leverage or potential for influencing [the] model.”25 In
my Residuals vs. Leverage plot, none of the observations fall in the top right or bottom right
corner outside of the Cook’s distance line. Thus, we can assume that there are no observations
which disproportionately influence the regression results.
Overall, my model seems to pass the respective tests for checking linearity, normality of
residuals, homoscedasticity of error variance, and multicollinearity; thus, all of these
assumptions hold for my model. Similarly, my regression results do not seem to be distorted by
any influential points. That said, it is important to remember that other than the multicollinearity
25
“R Tutorial: How to use Diagnostic Plots for Regression Models.” http://analyticspro.org/2016/03/07/r-tutorial-how-to-use-diagnostic-plots-
for-regression-models/. Cook’s distance (or Cook’s D) is a “measure that combines the information of leverage and residual of the
observation,” and can be used to determine influential points. See also: “Robust Regression: R Data Analysis Examples.” Institute for
Digital Research and Education, University of California – Los Angeles, https://stats.idre.ucla.edu/r/dae/robust-regression/
Yang 15
and influential points tests, all of the tests are based purely on visual interpretation and
determining patterns. That is, interpreting these graphs can be quite subjective—as in the case of
our homoscedasticity test.26 Ultimately, however, the diagnostics tests seem to show strong
evidence of a linear regression model fitting the data well.
Conclusion
What does all of this suggest? Is there a relationship between Citizen Ideology Score and
Percentage of Female Legislators? Does the average ideology of the people in a given state affect
the percentage of females in their state legislature? Even more broadly, does our ideology have
an effect on what genders we vote for? Are there any potential problems with the model and/or
potential threats to causal validity (outside of the diagnostics tests) that I could not eliminate?
Are there any areas that require further exploration? And finally, why does this matter?
According to my bivariate and multivariate linear regression models, Citizen Ideology
Score has a positive effect on the Percentage of Female Legislators to a statistically significant
degree (p < 0.01). Indeed, even when accounting for other statistically significant causal factors,
such as per capita income, education level, and the Percentage of Female Legislators one year
prior, the estimating effect of Citizen Ideology Score on Percentage of Female Legislators is still
positive and statistically significant, albeit a small effect. Thus, we can say that U.S. states with a
more liberal constituency on average tend to vote more women into their state legislatures, even
when accounting for other causal factors like socioeconomic status and the percentage of female
legislators in the previous year.
However, as mentioned briefly earlier in this paper, my analysis runs into two major
threats to causal validity: endogeneity and a lack of distinction between politics and ideology. Of
26
The influential points test is also based on visual interpretation; however, the Cook’s distance line makes it easier to have a more objective,
universal interpretation.
Yang 16
course, since less than 50% of the variation in Percentage of Female Legislators can be explained
by the variation in the predictor variables in my multivariate regression model, there is a
possibility that my model still suffers from omitted variable bias. Even more important is the
cyclical logic of my model. That is, I argue that more liberal ideologies lead to more positive
perceptions of women, which then affects the percentage of female legislators, yet at the same
time, my ideology variable is measured based on the ideology of the representatives—and
obviously, female legislators will have a higher opinion of women, which will lead to a more
liberal ideology score for a state. Thus, this logic is self-serving. Of course, the only way to
really avoid this endogeneity problem would be to measure the ideology of each individual
citizen in a state and then average that out, instead of using the representatives as proxy
measures. This is a possible area of future study.
The other threat to the validity of my test is the fact that my ideology variable conflates
political preference with ideology. That is, the Citizen Ideology Score assumes that people who
are pro-Social Security, pro-Medicare, pro-gun control, pro-abortion—essentially anyone who
considers themselves a Liberal on the political spectrum—is thus liberal overall. In the context of
my theory, this would suggest that anyone who considers themselves a Conservative is thus less
inclined to vote women into office; and yet, there are plenty of women in politics who identify as
Conservatives, and many Conservatives who believe in female representation in politics. The
ultimate problem here is that there is no concrete and specific definition for ideology, so it is
incredibly difficult to measure. Thus, coming up with a more concrete and specific way to define
and operationalize ideology is another area of further exploration.
One final thing to consider would be differences across time periods. Although my model
includes lagged variables for the independent variable and the dependent variable one year prior,
Yang 17
it still does not account for the fact that these observations fluctuate over time, as opposed to
increasing or decreasing in a steady linear fashion over time. In other words, it does not account
for the possibility that some citizen ideology scores and some percentages of female legislators
might be higher in, say, 2006 than in 2010. Thus, we cannot say that: “as a U.S. state’s average
citizen ideology becomes more liberal, the percentage of women in its state legislature
increases.” However, it would certainly be interesting to consider how time might affect the
theory; that is, whether or not certain time periods experienced higher citizen ideologies scores
and higher percentages of women in state legislature, and why.
All things considered, however, and especially given the results of my regression models
and diagnostics tests, we can say with confidence that higher citizen ideology scores cause
higher percentages of female legislators, even when controlling for socioeconomic factors like
income and education level. In other words, U.S. states with a more liberal constituency are more
likely to vote more women into their state legislature. This is important to note because it
suggests that our ideals, our values, the ways we think, the ways we were raised even, affect the
level of female representation in politics. If we wish to see a greater level of female
representation in politics in the future, then we should certainly consider ideological factors, not
just socioeconomic ones.27
27
All of the relevant coding for this project can be seen in Appendix 2.
Yang 18
BIBLIOGRAPHY
“ADA Voting Records.” Americans for Democratic Action. https://adaction.org/ada-voting-
records/
“Diagnostic Plots.” University of Virginia Library. http://data.library.virginia.edu/diagnostic-
plots/
“Legislative Scorecard.” American Federation of Labor and Congress of Industrial Organizations
(AFL-CIO). https://aflcio.org/what-unions-do/social-economic-
justice/advocacy/scorecard
“R Tutorial: How to use Diagnostic Plots for Regression Models.”
http://analyticspro.org/2016/03/07/r-tutorial-how-to-use-diagnostic-plots-for-regression-
models/
“Robust Regression: R Data Analysis Examples.” Institute for Digital Research and Education,
University of California – Los Angeles, https://stats.idre.ucla.edu/r/dae/robust-regression/
“Understanding Q-Q Plots.” University of Virginia Library.
http://data.library.virginia.edu/understanding-q-q-plots/
Berry, William D., Evan J. Ringquist, Richard C. Fording, and Russell I. Hanson. “Measuring
Citizen and Government Ideology in the American States, 1960-93,” American Journal
of Political Science 42:1 (January 1998): 327 – 348.
https://www.jstor.org/stable/pdf/2991759.pdf?refreqid=excelsior:6047b68e3bbde6c3420
943cee0689d9c
Correlates of State Policy. Michigan State University, Institute for Public Policy and Social
Research (IPPSR). http://ippsr.msu.edu/public-policy/correlates-state-policy

Yang 19
Jordan, Marty P. and Matt Grossmann. 2016. The Correlates of State Policy Project v1.14. East
Lansing, MI: Institute for Public Policy and Social Research (IPPSR).
http://ippsr.msu.edu/sites/default/files/CorrelatesCodebook.pdf
Paxton, Pamela Marie, and Sheri Kunovich. “Women’s Political Representation: The Importance
of Ideology,” Social Forces 82:1 (September 2003): 87 – 113.
http://muse.jhu.edu/article/47842/pdf
Wängnerud, Lena. “Women in Parliaments: Descriptive and Substantive Representation,”
Annual Review of Political Science 12 (2009): 51 – 69.
https://www.annualreviews.org/doi/pdf/10.1146/annurev.polisci.11.053106.123839
Yang 20
APPENDIX 1: DIAGNOSTIC TESTS
Linearity of Relationship
Residuals vs. Fitted Plot (Aggregate of Predictor Variables)
Component-Residual Plots (Linearity of Individual Predictor Variables)

Yang 21
Normality of Residuals (Normal Quantile-Quantile Plot)
Homoscedasticity vs. Heteroscedasticity

Yang 22
Multicollinearity
Variance Inflation Factors for Each Predictor Variable
citizenideology incomepcap hsdiploma lagcitizenideology lagpctfemaleleg
1.065776 2.249091 1.486476 1.097590 1.726666
Influential Points
Yang 23
APPENDIX 2: CODING
Coding (with Outputs)

> #Renaming and calling the dataset
> cspdataset <- read.csv("C:/Users/janieyangbang/Desktop/College/Senior
Year/~Senior Spring~ LIT AF/GOVT 201 Analysis of Political Data
I/correlatesofstatepolicyprojectv1_14.csv")
>
> #Renaming and summarizing DV (percentage of female legislators)
> pctfemaleleg <- cspdataset$pctfemaleleg
> lagpctfemaleleg <- lag(pctfemaleleg, k = 1)
> summary(pctfemaleleg)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.70 13.20 19.00 19.49 25.60 42.00 4121
> var(pctfemaleleg, na.rm = TRUE)
[1] 71.22083
> sd(pctfemaleleg, na.rm = TRUE)
[1] 8.439243
> IQR(pctfemaleleg, na.rm = TRUE)
[1] 12.4
> ggplot(data = cspdataset, aes(x = pctfemaleleg)) + geom_histogram(binwidth
= 3) + labs(title = "Distribution of Percentage of Female State Legislators",
x = "Percentage", y = "Number of Observations")
Warning message:
Removed 4121 rows containing non-finite values (stat_bin).
>
> #Renaming and summarizing IV (measure of citizen ideology)
> citizenideology <- cspdataset$citi6013
> lagcitizenideology <- lag(citizenideology, k = 1)
> summary(citizenideology)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.963 37.026 47.891 47.838 58.682 95.972 3318
> var(citizenideology, na.rm = TRUE)
[1] 274.4515
> sd(citizenideology, na.rm = TRUE)
[1] 16.56658
> IQR(citizenideology, na.rm = TRUE)
[1] 21.65585
> ggplot(data = cspdataset, aes(x = citizenideology)) +
geom_histogram(binwidth = 3) + labs(title = "Distribution of Citizen Ideology
Scores", x = "Score", y = "Number of Observations")
Warning message:
>
> #Running a bivariate linear regression on the percentage of female
legislators by citizen ideology
> reg1 <- lm(pctfemaleleg ~ citizenideology, data = cspdataset)
> summary(reg1)
Yang 24
Call:
lm(formula = pctfemaleleg ~ citizenideology, data = cspdataset)
Residuals:
Min 1Q Median 3Q Max
-19.9834 -5.9796 -0.1783 5.7894 23.5963
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.37029 0.63464 16.34 <2e-16 ***
citizenideology 0.17627 0.01227 14.37 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.967 on 1745 degrees of freedom

(4271 observations deleted due to missingness)
Multiple R-squared: 0.1058, Adjusted R-squared: 0.1052
F-statistic: 206.4 on 1 and 1745 DF, p-value: < 2.2e-16
> stargazer(reg1, type="text", dep.var.labels=c("Percentage of Female

Legislators"), covariate.labels=c("Citizen Ideology Score"),
out="models.txt")
=======================================================
Dependent variable:
--------------------------------
Percentage of Female Legislators
-------------------------------------------------------
Citizen Ideology Score 0.176***
(0.012)
Constant 10.370***
(0.635)
-------------------------------------------------------
Observations 1,747
R2 0.106
Adjusted R2 0.105
Residual Std. Error 7.967 (df = 1745)
F Statistic 206.364*** (df = 1; 1745)
=======================================================
Note: *p<0.1; **p<0.05; ***p<0.01
> plot(citizenideology, pctfemaleleg, main = "Percentage of Female
Legislators by Citizen Ideology Score", xlab = "Citizen Ideology Score", ylab
= "Percentage of Female Legislators")
> abline(lm(pctfemaleleg ~ citizenideology))
>
> #Running a multivariate linear regression on the percentage of female
legislators by citizen ideology,
> #controlling for average income per capita, percent of respondents with a
high school diploma or higher,
> #number of residents who are female, and lagged IV and DV
> reg2 <- lm(pctfemaleleg ~ citizenideology + incomepcap + hsdiploma +
lagcitizenideology + lagpctfemaleleg, data = cspdataset)
> summary(reg2)
Call:
Yang 25
lm(formula = pctfemaleleg ~ citizenideology + incomepcap + hsdiploma +

lagcitizenideology + lagpctfemaleleg, data = cspdataset)
Residuals:
-14.8428 -4.0661 -0.6256 3.7721 19.9827
Coefficients:
(Intercept) -3.300e+01 2.238e+00 -14.743 < 2e-16 ***
citizenideology 7.004e-02 1.102e-02 6.355 2.84e-10 ***
incomepcap 2.689e-04 2.772e-05 9.699 < 2e-16 ***
hsdiploma 5.029e-01 3.037e-02 16.556 < 2e-16 ***
lagcitizenideology 1.486e-02 1.139e-02 1.304 0.192297
lagpctfemaleleg 9.506e-02 2.584e-02 3.679 0.000244 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

>
> #Creating confidence intervals for multivariate beta coefficients,
visualizing predicted values
> ci_citizenideology <- coef(reg2)[2] + c(-1, 1) * se.coef(reg2)[2] * 1.96
> ci_citizenideology_dataframe <- data.frame(est = coef(reg2)[2],
+ lb = ci_citizenideology[1],
+ ub = ci_citizenideology[2],
+ model = "Citizen Ideology Score")
> ci_incomepcap <- coef(reg2)[3] + c(-1, 1) * se.coef(reg2)[3] * 1.96
> ci_incomepcap_dataframe <- data.frame(est = coef(reg2)[3],
+ lb = ci_incomepcap[1],
+ ub = ci_incomepcap[2],
+ model = "Income-per-Capita")
> ci_hsdiploma <- coef(reg2)[4] + c(-1, 1) * se.coef(reg2)[4] * 1.96
> ci_hsdiploma_dataframe <- data.frame(est = coef(reg2)[4],
+ lb = ci_hsdiploma[1],
+ ub = ci_hsdiploma[2],
+ model = "High School Diploma or
Higher")
> ci_lagcitizenideology <- coef(reg2)[5] + c(-1, 1) * se.coef(reg2)[5] * 1.96
> ci_lagcitizenideology_dataframe <- data.frame(est = coef(reg2)[5],
+ lb = ci_lagcitizenideology[1],
+ ub = ci_lagcitizenideology[2],
+ model = "Lag Citizen Ideology")
> ci_lagpctfemaleleg <- coef(reg2)[6] + c(-1, 1) * se.coef(reg2)[6] * 1.96
> ci_lagpctfemaleleg_dataframe <- data.frame(est = coef(reg2)[6],
+ lb = ci_lagpctfemaleleg[1],
+ ub = ci_lagpctfemaleleg[2],
+ model = "Lag Percent Female
Legislators")
> est <- rbind(ci_citizenideology_dataframe, ci_incomepcap_dataframe,
ci_hsdiploma_dataframe, ci_lagcitizenideology_dataframe,
ci_lagpctfemaleleg_dataframe)
> ggplot(est, aes(x = model, y = est)) +
Yang 26
+ geom_point() +
+ geom_errorbar(aes(ymin = lb, ymax = ub), width = 0.1) +
+ geom_hline(yintercept = 0, lty = 2, color = "red") +
+ labs(title = "Predicted Values for Multivariate Regression Beta
Coefficients (95% Confidence Interval)", x = "", y = "Predicted Values")
>
> #Creating a multivariate regression table
> stargazer(reg2, type="text", dep.var.labels=c("Percentage of Female
Legislators"), covariate.labels=c("Citizen Ideology Score", "Income per
Capita", "Percentage of Residents with a High School Diploma or Higher",
"Lagged Citizen Ideology Score", "Lagged Percentage of Female Legislators"),
out="models.txt")
>
> #Visualizing the distribution of each predictor variable in multivariate
regression
> ggplot(data = cspdataset, aes(x = incomepcap)) + geom_histogram(binwidth =
300) + labs(title = "Distribution of Average Income per Capita", x = "Average
Income per Capita", y = "Number of Observations")
Warning message:
> ggplot(data = cspdataset, aes(x = hsdiploma)) + geom_histogram(binwidth =
1) + labs(title = "Distribution of Percentage of Residents with a High School
Diploma or Higher", x = "Percentage of Residents with a High School Diploma
or Higher", y = "Number of Observations")
Warning message:
> ggplot(data = cspdataset, aes(x = lagcitizenideology)) +
geom_histogram(binwidth = 1) + labs(title = "Distribution of Lagged Citizen
Ideology", x = "Laggued Citizen Ideology", y = "Number of Observations")
Warning message:
Yang 27
> ggplot(data = cspdataset, aes(x = lagpctfemaleleg)) +

geom_histogram(binwidth = 1) + labs(title = "Distribution of Lagged
Percentage of Female Legislators", x = "Laggued Percentage of Female
Legislators", y = "Number of Observations")
Warning message:
>
> #Diagnostics tests: Component-Residual, Multicollinearity, Linearity,
Normality, Homoscedasticity, Influential Points
> crPlots(reg2)
> vif(reg2)
> plot(reg2)
Hit <Return> to see next plot: gvlmareg2 <- gvlma(reg2)
Hit <Return> to see next plot: summary(gvlmareg2)
Hit <Return> to see next plot: #Not exactly sure what the results of this
gvlma means--it seems that most of the assumptions are NOT met,
Hit <Return> to see next plot: #which is the opposite of my results from the
other tests. Since I don't want to get ahead of myself,
> #I'm going to stick with my crPlots, vif, and plot diagnostic tests.
However, I do want to note that this
> #inconsistency is both interesting and confusing...
>
> #Running a multivariate linear regression on the percentage of female
legislators by citizen ideology,
> #controlling for average income per capita and percent of respondents with
a high school diploma or higher,
> #with interactions with female population
> pctfemaleleg_by_citizenideology_incomeppcap_hsdiploma_withinteraction <-
+ lm(pctfemaleleg ~ citizenideology + incomepcap + hsdiploma + popfemale +
incomepcap*popfemale + hsdiploma*popfemale, data = cspdataset)
>
summary(pctfemaleleg_by_citizenideology_incomeppcap_hsdiploma_withinteraction
)
Call:
lm(formula = pctfemaleleg ~ citizenideology + incomepcap + hsdiploma +
popfemale + incomepcap * popfemale + hsdiploma * popfemale,
data = cspdataset)
Residuals:
-14.3383 -4.3692 0.1633 4.2516 17.8847
Coefficients:
(Intercept) -2.569e+01 4.958e+00 -5.182 3.01e-07 ***
citizenideology 1.477e-01 1.934e-02 7.637 8.98e-14 ***
incomepcap -1.580e-04 7.663e-05 -2.062 0.0396 *
hsdiploma 5.527e-01 7.109e-02 7.774 3.39e-14 ***
popfemale 1.074e-07 1.681e-06 0.064 0.9491
incomepcap:popfemale 5.552e-12 1.848e-11 0.300 0.7640
hsdiploma:popfemale -2.473e-09 2.507e-08 -0.099 0.9215
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Yang 28

> #Interactions are NOT statistically significant! Don't need to include in

paper

The Power of Ideas: Does Ideology Affect The Percentage of Women in State Legislatures?

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

The Power of Ideas: Does Ideology Affect The Percentage of Women in State Legislatures?

Caricato da

Copyright:

Formati disponibili

Yang 1

regression, as well as a more comprehensive multivariate linear regression, both with

as ideas for future tests or areas of improvement.

Other competing theories include those on government corruption and socioeconomic

unknown.9 Meanwhile, socioeconomic factors such as “women’s share in professional

occupations,” “welfare state policies,” and “increases in government (non-military) expenditure”

Independent Variable: Citizen Ideology Measure (1960 – 2013)

This variable is straightforward. Quite simply, it measures the percentage of state

Units of Observation: state-year

As mentioned, while ideological factors constitute as one possible explanation for

women’s representation in legislatures, other explanations include socioeconomics as a causal

Bivariate Hypothesis Test

scores increase, percentage of female legislators either increases or decreases.

It is also important to consider how much the Percentage of Female Legislators is

Percentage of Female Legislators. In the following section, I conduct a multivariate hypothesis

Multivariate Hypothesis Test

Female Legislators. As Citizen Ideology Score increases, Percentage of Female Legislators

either increases or decreases.

again in favor of the alternative hypothesis.

variation of average state Income-per-Capita, Percentage of Residents with a High School

exploration for future studies.

looking at the following: linearity of relationship, normality of residuals, homoscedasticity vs.

heteroscedasticity of error variance, multicollinearity, and influential points.18

assume that a linear model fits our data well.

Thus, we can assume that our residuals are normally distributed.

A third assumption is homoscedasticity of residual error terms: we assume that the

The final assumption of multivariate linear regression is that there is no multicollinearity.

assume that there is no multicollinearity in our model.

which disproportionately influence the regression results.

residuals, homoscedasticity of error variance, and multicollinearity; thus, all of these

evidence of a linear regression model fitting the data well.

According to my bivariate and multivariate linear regression models, Citizen Ideology

legislators in the previous year.

by the variation in the predictor variables in my multivariate regression model, there is a

time, my ideology variable is measured based on the ideology of the representatives—and

measures. This is a possible area of future study.

are pro-Social Security, pro-Medicare, pro-gun control, pro-abortion—essentially anyone who

and operationalize ideology is another area of further exploration.

and higher percentages of women in state legislature, and why.

level of female representation in politics. If we wish to see a greater level of female

just socioeconomic ones.27

“ADA Voting Records.” Americans for Democratic Action. https://adaction.org/ada-voting-

“Diagnostic Plots.” University of Virginia Library. http://data.library.virginia.edu/diagnostic-

“Legislative Scorecard.” American Federation of Labor and Congress of Industrial Organizations

“R Tutorial: How to use Diagnostic Plots for Regression Models.”

University of California – Los Angeles, https://stats.idre.ucla.edu/r/dae/robust-regression/

“Understanding Q-Q Plots.” University of Virginia Library.

of Political Science 42:1 (January 1998): 327 – 348.

Research (IPPSR). http://ippsr.msu.edu/public-policy/correlates-state-policy

of Ideology,” Social Forces 82:1 (September 2003): 87 – 113.

Wängnerud, Lena. “Women in Parliaments: Descriptive and Substantive Representation,”

Annual Review of Political Science 12 (2009): 51 – 69.

APPENDIX 1: DIAGNOSTIC TESTS

Component-Residual Plots (Linearity of Individual Predictor Variables)

Normality of Residuals (Normal Quantile-Quantile Plot)

Homoscedasticity vs. Heteroscedasticity

Coding (with Outputs)

Residual standard error: 7.967 on 1745 degrees of freedom

> stargazer(reg1, type="text", dep.var.labels=c("Percentage of Female

lm(formula = pctfemaleleg ~ citizenideology + incomepcap + hsdiploma +

Residual standard error: 6.064 on 1358 degrees of freedom

> ggplot(data = cspdataset, aes(x = lagpctfemaleleg)) +

Residual standard error: 6.211 on 591 degrees of freedom