Sei sulla pagina 1di 8

ECON 601 Module 7 Problem Set

Fall 2019

Your solutions should be typed and well organized. You need to explain / show all of the steps you used to
arrive at your answer. Submit your work through Blackboard as a Word or pdf file.

1. Indicate whether the following statements are true or false, along with a brief explanation.

a. A dummy variable refers to an explanatory variable that is not of economic importance


despite being statistically significant.

b. The statistical significance of an indicator variable in a regression can be affected based


on how the outcomes are coded (e.g., “yes” coded as 1 and “no” coded as 0, versus
“yes” coded as 0 and “no” coded as 1).

c. A dummy variable trap occurs when you include a nonsensical variable in your
regression and it appears to be statistically significant.

d. Multicollinearity can be a problem when using interaction variables in a regression.

e. Seasonality is an important issue when working with yearly time series data.

Solution:

(a) False: the term dummy variable is synonymous with indicator variable. It refers to a binary
variable.
(b) False: the coding of a variable is based on the arbitrary wishes of a researcher; the coding
has no impact on the variable’s statistical significance in a regression.
(c) False: a dummy variable trap occurs when a researcher includes m dummies in a regression
when there are m groups being studied.
(d) True: multicollinearity can be a problem with specifying interaction terms in a regression.
(e) False: by definition, seasonality are regular movements in a variable that repeat every year.
Annual data renders void the issue of seasonality.
2. Use the data file BOOKCOST posted on blackboard under Assignments. A major publishing
company would like to develop an equation that will help it in determining the cost of books
that it publishes. It has a sample of 200 books that have been published recently. Of the 200
books in the sample, 80 are hardcover and 120 are softcover. Hardcover books are priced at a
premium, so some adjustment for this will need to be made. The variables in the data are as
follows:
cost the cost of producing the book
pages the number of pages in the book
softcover a dummy variable coded as: 0 = hardcover; 1 = softcover
a. Regress cost on pages and softcover. Show the regression output.
i. How much does an additional page cost the publisher? Does this cost depend on
whether softcover or hardcover is being produced? If so, how?
ii. How much more/less, on average, does it cost to publish a softcover than a
hardcover?
b. Regress COST on PAGES, SOFTCOVER, and an interaction term. Show the regression
output.
i. How much does additional page cost the publisher? Does this cost depend on
whether softcover or hardcover is being produced? If so, how?

Solution:

(a) The regression command is: regress cost pages softcover.


Source SS df MS Number of obs = 200
F(2, 197) = 178.05
Model 6350.09001 2 3175.045 Prob > F = 0.0000
Residual 3512.93136 197 17.8321389 R-squared = 0.6438
Adj R-squared = 0.6402
Total 9863.02137 199 49.5629214 Root MSE = 4.2228

cost Coef. Std. Err. t P>|t| [95% Conf. Interval]

pages .0142253 .0019076 7.46 0.000 .0104634 .0179872


softcover -10.60341 .6095313 -17.40 0.000 -11.80545 -9.401362
_cons 19.80456 .6944762 28.52 0.000 18.435 21.17412

The estimated regression equation is:


𝑐𝑜̂𝑠𝑡𝑖 = 19.80 + .014𝑝𝑎𝑔𝑒𝑠𝑖 − 10.60𝑠𝑜𝑓𝑡𝑐𝑜𝑣𝑒𝑟𝑖
Moreover, we could write:
If softcover = 1, then: 𝑐𝑜̂𝑠𝑡𝑖 = (19.80 − 10.60) + .012𝑝𝑎𝑔𝑒𝑠𝑖 = 9.20 + .014𝑝𝑎𝑔𝑒𝑠𝑖
If softcover = 0, then: 𝑐𝑜̂𝑠𝑡𝑖 = 19.80 + .014𝑝𝑎𝑔𝑒𝑠𝑖

(i) Each additional page costs the publisher about 1.4 cents. The cost of each additional page
is the same for hardcover books as it is for softcover.
(ii) A softcover book costs $10.60 less, on average, than a hardcover book.
(b) First, the interaction term needs to be created in Stata (generate abc = pages*softcover).
Then then regression command is: regress cost pages softcover abc.
Source SS df MS Number of obs = 200
F(3, 196) = 126.01
Model 6495.29131 3 2165.0971 Prob > F = 0.0000
Residual 3367.73006 196 17.1822962 R-squared = 0.6585
Adj R-squared = 0.6533
Total 9863.02137 199 49.5629214 Root MSE = 4.1452

cost Coef. Std. Err. t P>|t| [95% Conf. Interval]

pages .0191729 .0025304 7.58 0.000 .0141826 .0241632


softcover -7.275719 1.291652 -5.63 0.000 -9.823039 -4.7284
abc -.0109364 .0037621 -2.91 0.004 -.0183558 -.003517
_cons 18.3063 .8546083 21.42 0.000 16.6209 19.99171

The estimated regression equation is:

𝑐𝑜̂𝑠𝑡𝑖 = 18.31 + .019𝑝𝑎𝑔𝑒𝑠𝑖 − 7.28𝑠𝑜𝑓𝑡𝑐𝑜𝑣𝑒𝑟𝑖 − .011𝑝𝑎𝑔𝑒𝑠𝑖 × 𝑠𝑜𝑓𝑡𝑐𝑜𝑣𝑒𝑟𝑖

Moreover, we could write:

If softcover = 1, then: 𝑐𝑜̂𝑠𝑡𝑖 = (18.31 − 7.28) + (.019 − .011)𝑝𝑎𝑔𝑒𝑠𝑖 = 11.03 +


.008𝑝𝑎𝑔𝑒𝑠𝑖

If softcover = 0, then: 𝑐𝑜̂𝑠𝑡𝑖 = 18.31 + .019𝑝𝑎𝑔𝑒𝑠𝑖

(i) The cost depends on which type of cover is being produced. For a softcover, each additional page
costs the publisher about 0.8 cents. For a hardcover, each additional page costs the publisher about
1.9 cents.
3. Use the data file BEERPROD posted on blackboard under Assignments. This file contains monthly
U.S. beer production in millions of barrels for January 1983 through December 1991. The
objective in this problem is to develop an extrapolative model for beer production.
a. Construct a model which uses a linear trend, monthly dummies, and a lagged dependent
variable.1 Show Stata’s regression output in your solutions.
b. Discuss the results of this model. Be sure to explain if and how beer production is
trending. Also, explain which months are associated with the highest and lowest beer
production, and by how much.
c. What does your model predict beer production will be for January 1992? Remember to
show me your work.
Solution:

(a) The regression output from having regressed beer production on a linear time trend, monthly
dummies, and a lagged dependent variable:
Source SS df MS Number of obs = 107
F(13, 93) = 94.02
Model 333.215024 13 25.6319249 Prob > F = 0.0000
Residual 25.3532024 93 .27261508 R-squared = 0.9293
Adj R-squared = 0.9194
Total 358.568226 106 3.38271912 Root MSE = .52213

beerprod Coef. Std. Err. t P>|t| [95% Conf. Interval]

time .0077967 .0019484 4.00 0.000 .0039277 .0116658

month
2 -.9342516 .3416837 -2.73 0.007 -1.612768 -.2557355
3 1.08746 .3161917 3.44 0.001 .459566 1.715354
4 .7974535 .460326 1.73 0.087 -.1166628 1.71157
5 1.866525 .4717713 3.96 0.000 .9296807 2.803369
6 1.847002 .5685113 3.25 0.002 .718051 2.975952
7 1.698118 .5886596 2.88 0.005 .5291566 2.867079
8 1.143453 .5795242 1.97 0.051 -.0073668 2.294274
9 -1.236346 .5277144 -2.34 0.021 -2.284283 -.1884103
10 -.5326708 .328952 -1.62 0.109 -1.185904 .1205626
11 -2.287266 .3389528 -6.75 0.000 -2.96036 -1.614173
12 -2.440852 .2588506 -9.43 0.000 -2.954878 -1.926826

beerprod
L1. .2199934 .1010703 2.18 0.032 .0192879 .420699

_cons 10.15921 1.156014 8.79 0.000 7.863598 12.45483

(b) The model “explains” almost 93% of the variation in beer production. All of the explanatory
variables are statistically significant at the 5% level except for the dummies on April and
October. The linear trend is statistically significant, but the size of the coefficient is quite small
(i.e., 0.007) suggesting that trend is not of much economic importance. The regression model
uses January as the base category. Thus, beer production is highest in May (month 5) with an
additional 1.87 million barrels of beer produced in this month relative to January. Beer
production is lowest in December (month 12) where production is about 2.4 million barrels less
than production in January.

1
This is time series data, thus you need to first format the data as such: (i) generate time =
tm(1983m1)+_n-1; (ii) format time %tm; and (iii) tsset time, monthly. Next, create a variable “month” which
contains observations 1, 2, 3, …, 12 and then repeats this sequence: egen month = fill(1 2 3 4 5 6 7 8 9 10
11 12 1 2 3 4 5 6 7 8 9 10 11 12). Now, you can execute the regression command using Stata’s factor
notation as a short-cut for the monthly dummies: regress beerprod time i.month L.beerprod.
(c) For January 1992, trend equals 109; the value of beer production in the previous month is 13.64;
and all of the monthly seasonal dummies equal zero since January is the base category. The
predicted beer production is about 14 million barrels for January 1992.
𝑏𝑝̂𝑡 = 10.159 + .0078 × 𝑡𝑟𝑒𝑛𝑑𝑡 + .22 × 𝑏𝑝𝑡−1
𝑏𝑝̂𝑡 = 10.159 + .0078 × 109 + .22 × 13.64 = 14.01
4. The purpose of this question is to help prepare you for the research project. Please read the
section “Literature review” below before answering (a). Similarly, please read the section
“Formatted regression results” before answering (b).
a. Consider the abstract to Webber and Ehrenberg (2010). Use a few sentences to
summarize what you learned about the relationship between university expenditures
and graduation rates.
b. Open Stata and load the practice dataset on 1978 automobiles (command: sysuse
auto.dta). Use Stata to construct a formatted regression table that contains the
following regressions:
regress mpg weight
regress mpg weight foreign
regress mpg weight foreign headroom
regress mpg weight foreign headroom trunk
Paste your formatted regression table into your solutions. What is the relationship
between weight and mpg, if any? How does this relationship change as you add control
variables to the model?

*READ THIS BEFORE ANSWERING PART (A)*


Literature review.
Research, at its most fundamental level, means contributing to a body of knowledge. Thus, for a given
topic it is crucial for a researcher to understand what previous studies have found. Any research paper
will contain a section where previous studies are summarized—typically this is called a “literature
review”. Collecting and processing this information can be a very time consuming process.
Suppose you are interested in studying the relationship between a university’s expenditures and its
graduation rate.2
Google Scholar (https://scholar.google.com/) is a useful tool to find previous studies on your research
topic. Search “university expenditures graduation rate”. The first hit that appears should be a paper by
Webber and Ehrenberg3 published in 2010. Select the link to Webber and Ehrenberg’s published paper
and you can read the abstract; however, if you choose “Get Access” you will be prompted to pay for this
journal… don’t do this. FHSU’s library participates in the Interlibrary Loan Program (or “ILL”) where you
can fill out an online form to request an article (see: https://www.fhsu.edu/library/collections/ill). But let’s
not use ILL for Webber and Ehrenberg’s paper.
Sometimes journals will allow earlier versions of a published research paper (called a working paper) to
be available on the internet. Go back to Google Scholar’s search results from “university expenditures
graduation rate” and you should see a hyperlink “[PDF} cornell.edu” to the right of the link for Webber
and Ehrenberg’s published paper. Select this link and their working paper opens up in the form of a pdf.
Another important feature worth noting is that Google Scholar shows how many papers have cited a
given paper (and provides links to these papers). This can be a useful tool when building your literature
review.

2
In the parlance of regression analysis, your Y variable is a university’s graduation rate and the X variable is
university expenditures (or a type thereof).
3
Webber, DA, and RG Ehrenberg, (2010). “Do expenditures other than instructional expenditures affect graduation
rates and persistence rates in American higher education?” Economics of Education Review, 29:6, 947-58.
*READ THIS BEFORE ANSWERING PART (B)*
Formatted regression results.
It is considered sloppy—and thus a big no-no—to merely copy/paste regression results from Stata (or
whatever program) directly into your research paper. Instead, regression results are usually formatted in
columns: the left-most column contains the names of explanatory variables; and the adjoining column(s)
contains the (i) estimated coefficients, the standard errors in parentheses, and statistical significance at
the 10%, 5%, or 1% level indicated via the use of asterisks.
For example, open Webber and Ehrenberg’s working paper and go to Table 2 (on p. 31). This table
summarizes the results from 3 regressions all of which use graduation rate as the dependent variable.
The left-most column contains the explanatory variables (STUDENT, ACADEMIC, etc.). Column (1)
contains the regression results when graduation rate (Y) is regressed on 5 explanatory variables
(STUDENT, ACADEMIC, RESEARCH, INSTRUCTION, and PELL) plus the intercept/constant. Specifically,
column (1) shows the estimated coefficient and the standard error in parentheses. Asterisks are used to
show statistical significance at the 10% level (*); 5% level (**); and 1% level (***). If there is not an
asterisk next to an estimated coefficient, then it is not statistically significant (i.e., the t statistic is so
small that we cannot reject 𝐻0 : 𝛽 = 0).
Column (2) shows a similar regression except that some additional explanatory variables are used in the
regression (HBCU, HISPANIC, etc). And then column (3) regresses graduation rate on all of the
explanatory variables shown. What is the purpose of this? Remember that when you add an explanatory
variable you are now controlling for it.
For example, the variable STUDENT represents student service expenditures at a university (e.g., student
organizations, student health services). Column (1) shows that STUDENT is estimated to be positively
related with graduation rates (i.e., the coefficient is 0.263), and this is statistically significant at the 1%
level. However, columns (2) and (3) show us how this relationship between STUDENT and graduation
rates change as more and more variables are controlled for. In this case, STUDENT remains statistically
significant (albeit at the 5% level in column (3)), but the size of the estimated coefficient decreases
which is expected since more and more variables are being controlled for.
Creating a regression table analogous to Table 2 in Webber and Ehrenberg’s paper takes time.
Fortunately, Stata has a command, called outreg2, which will do much of the formatting for you.
i. First, the outreg2 command needs to be installed on your computer (you need only do
this once on a computer). Run the command: ssc install outreg2
ii. Run your first regression. Afterwards, run the command: outreg2 using
Stata_outreg2.doc, replace
iii. Run subsequent regressions. Afterward each regression, run the command: outreg2
using Stata_outreg2.doc, append

After you run all of the regression and outreg2 commands, look in the results window in Stata. You
should see “Stata_outreg2.doc” in blue font. Select this, and a word document opens up containing a
formatted regression table.
Solution:

(a) SUMMARY: Webber and Ehrenberg (2010) test whether different types of university spending
(i.e., instruction, academic support, student services, and research) are related with a
university’s graduation rate. The study finds that spending on student services is positively
related with graduation rates, especially for universities with low graduation rates to begin with.

(b) The formatted regression table is shown below. The estimated relationship between mpg and
weight is negative for all regressions. Regression (1) shows that, without any controls, a 1,000
pound increase in a car’s weight is associated with 6.01 fewer miles per gallon. The magnitude
of this relationship weakens slightly as more control variables are added to the model (except
for (3)). Regression (4) has the full set of controls, a 1,000 pound increase in a car’s weight is
associated with 5.5 fewer miles per gallon.

(1) (2) (3) (4)


VARIABLES mpg mpg mpg mpg

weight -0.00601*** -0.00590*** -0.00647*** -0.00550***


(0.000518) (0.000595) (0.000700) (0.00104)
headroom -0.210 -0.219 -0.205
(0.547) (0.542) (0.539)
foreign -1.655 -2.077*
(1.082) (1.128)
turn -0.234
(0.185)
Constant 39.44*** 39.74*** 41.99*** 48.39***
(1.614) (1.796) (2.312) (5.556)

Observations 74 74 74 74
R-squared 0.652 0.652 0.663 0.671
Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1

Potrebbero piacerti anche