Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Second, check your answer by conducting a test of means in Stata. You can use Simple
Test of Association Test of Means in the Stata User menu.
What are the p-values for each of these two tests? Based on the p-value can you reject
the null hypothesis at the 5% level for each test? Explain why.
(b) Compute the 95% confidence interval for the unknown population mean of CEO salaries, by
writing down the formula for the confidence interval
using the results from the command summarize salary to compute the confidence interval.
Does 1200 fall in the confidence interval?
Yale SOM
MGT 403: Statistics
Practice Problem Set - P1-2
The Internet portal Yahoo may allow its members to customize their start pages
(homepages). As part of a short survey regarding likes and dislikes, users were asked
about their interests in options such as QuickTime movie clips with daily news and
sports events on their pages. Yahoo hopes that QuickTime will entice users to follow
a larger number of hyperlinks so that it can attract more advertisers.
The newly customized page option was made available to 100 Internet users who
were randomly sampled from the target population. The benchmark for Yahoo is 6
non-Yahoo content links clicked by all its customers on average prior to the availability of the QuickTime option (during any one-week period).
After one week of access to the customized homepage option, Yahoo observes
the (average) number of non-Yahoo links for each customer. For the sample of 100
customers, the average is 7.8 links and the standard deviation is 9.5 links.
1. Test the two-tailed null hypothesis that the customization with QuickTime
does not alter the true average (benchmark) number of links at the 5% -level
(critical value is 1.96). Specify null and alternative hypotheses, compute the
value for the test statistic and state whether you can reject or not the null
hypothesis and why.
2. Construct a 95 percent confidence interval for the true but unknown population
parameter. Interpret the resulting interval statistically and managerially.
3. Which of these two procedures is more informative, the test of the null hypothesis or the confidence interval? Explain.
Yale SOM
MGT 403: Statistics
Practice Problem Set - P1-1
Introduction. You have been hired to study the evolution of executive compensation over time.
Specifically, how CEOs salaries vary between dierent sectors and how they are related to a companys sales in the early 1990s. You receive data on a random sample of CEOs which is contained
in ceosalary1.dta. Type describe to see the contents of this data set.
Question 1
(a) There are two hypotheses concerning CEO compensation in the early 1990s. One is that
average CEO salaries were at most $1,000,000. Another concerns the default belief that
average CEO salaries were actually $1,200,000. You want to test these two hypotheses. Note
that the data to test them is contained in the variable salary (which measures CEO salary
in $1000). Can you reject the null hypotheses, at the 5% level, implied by these tests?
To answer this question, write down the following steps for each test:
1. The null hypothesis
2. The alternative hypothesis
3. The formula for the realization of the test statistic
4. The rejection region: for which values of the test statistic you reject the null hypothesis
Now use the data to carry out the two tests.
First, do it manually by typing summarize salary or using the User Menu, Summarize
and Describe Data, Simple Summary Statistics (summarize) to input this command. Use
the result to calculate the realization (or outcome) for the test statistics.
Can you reject the null hypotheses? Why or why not?
Second, check your answer by conducting a test of means in Stata. You can use Simple
Test of Association Test of Means in the Stata User menu.
What are the p-values for each of these two tests? Based on the p-value can you reject
the null hypothesis at the 5% level for each test? Explain why.
(b) Compute the 95% confidence interval for the unknown population mean of CEO salaries, by
writing down the formula for the confidence interval
using the results from the command summarize salary to compute the confidence interval.
Does 1200 fall in the confidence interval?
Yale SOM
MGT 403: Statistics
Practice Problem Set - P1-1
Introduction. You have been hired to study the evolution of executive compensation over time.
Specifically, how CEOs salaries vary between dierent sectors and how they are related to a companys sales in the early 1990s. You receive data on a random sample of CEOs which is contained
in ceosalary1.dta. Type describe to see the contents of this data set.
Question 1
(a) There are two hypotheses concerning CEO compensation in the early 1990s. One is that
average CEO salaries were at most $1,000,000. Another concerns the default belief that
average CEO salaries were actually $1,200,000. You want to test these two hypotheses. Note
that the data to test them is contained in the variable salary (which measures CEO salary
in $1000). Can you reject the null hypotheses, at the 5% level, implied by these tests?
To answer this question, write down the following steps for each test:
1. The null hypothesis
2. The alternative hypothesis
3. The formula for the realization of the test statistic
4. The rejection region: for which values of the test statistic you reject the null hypothesis
Now use the data to carry out the two tests.
First, do it manually by typing summarize salary or using the User Menu, Summarize
and Describe Data, Simple Summary Statistics (summarize) to input this command. Use
the result to calculate the realization (or outcome) for the test statistics.
Can you reject the null hypotheses? Why or why not?
Second, check your answer by conducting a test of means in Stata. You can use Simple
Test of Association Test of Means in the Stata User menu.
What are the p-values for each of these two tests? Based on the p-value can you reject
the null hypothesis at the 5% level for each test? Explain why.
(b) Compute the 95% confidence interval for the unknown population mean of CEO salaries, by
writing down the formula for the confidence interval
using the results from the command summarize salary to compute the confidence interval.
Does 1200 fall in the confidence interval?
Yale SOM
MGT 403: Statistics
Practice Problem Set - P1-2
The Internet portal Yahoo may allow its members to customize their start pages
(homepages). As part of a short survey regarding likes and dislikes, users were asked
about their interests in options such as QuickTime movie clips with daily news and
sports events on their pages. Yahoo hopes that QuickTime will entice users to follow
a larger number of hyperlinks so that it can attract more advertisers.
The newly customized page option was made available to 100 Internet users who
were randomly sampled from the target population. The benchmark for Yahoo is 6
non-Yahoo content links clicked by all its customers on average prior to the availability of the QuickTime option (during any one-week period).
After one week of access to the customized homepage option, Yahoo observes
the (average) number of non-Yahoo links for each customer. For the sample of 100
customers, the average is 7.8 links and the standard deviation is 9.5 links.
1. Test the two-tailed null hypothesis that the customization with QuickTime
does not alter the true average (benchmark) number of links at the 5% -level
(critical value is 1.96). Specify null and alternative hypotheses, compute the
value for the test statistic and state whether you can reject or not the null
hypothesis and why.
2. Construct a 95 percent confidence interval for the true but unknown population
parameter. Interpret the resulting interval statistically and managerially.
3. Which of these two procedures is more informative, the test of the null hypothesis or the confidence interval? Explain.
Yale SOM
MGT 403: Statistics
Practice Problem Set P1-1 Answers
Question 1
(a) To test the research hypothesis that the mean of salary is at most (less than or equal to) 1000,
we have
1. The null hypothesis: H0 : > 1000
2. The alternative hypothesis: Ha : 1000
x
1000
p
/ N
1.65.
x
1200
p
/ N
4. The rejection region: reject if |t| > 1.96 (this is the same as saying that the rejection
region is t < 1.96 or t > 1.96).
For the manual calculation of the realization of the test statistic we need the mean of
salaries in the sample, the standard deviation, and the number of observations. We get
all of these from Statas summarize command.
. summarize salary
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------salary |
206
1141.063
611.193
223
4143
Hence the realization or value for the test statistic for the first test equals
t=
1141.063 1000
p
= 3.313.
611.193/ 206
The realization or value for the test statistic for the second test equals
t=
1141.063 1200
p
=
611.193/ 206
1.384.
For the first test, since the value for t is 3.3 which is not smaller than -1.65 we cannot
reject the null hypothesis in favor of the alternative that that the mean of salaries in the
population of CEOs is at most $1,000,000 at the 5% level. For the second test, since the
value for the test statistic t of -1.384 is not in the rejection region of t < 1.96 or t > 1.96
we also cannot reject the null hypothesis that the mean of salaries in the population of
CEOs is equal to $1,200,000 at the 5% level.
We get the same results using the ttest command in Stata. Note that when Stata sets
as a default 95% confidence level", it is just asking you if you would like to see the 95%
confidence interval for the unknown population mean of CEO salaries together with the
value for t.
. ttest salary == 1000
One-sample t test
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------salary |
206
1141.063
42.58383
611.193
1057.105
1225.022
-----------------------------------------------------------------------------mean = mean(salary)
t =
3.3126
Ho: mean = 1000
degrees of freedom =
205
Ha: mean < 1000
Pr(T < t) = 0.9995
The p-value for the first test is 0.9995. Since the p-value is greater than 5% we cannot
reject the null hypothesis in favor of the alternative that that average salaries in the
population of CEOs are, at most, $1,000,000 at the 5% level. The p-value for the second
test is 0.1679. Since the p-value is greater than 5% we also cannot reject the null
2
hypothesis that average salaries in the population of CEOs are $1,200,000, at the 5%
level.
(b)
The formula for the 95% confidence interval for the mean of salary in our CEO population
is
x
1.96 p , x
+ 1.96 p
.
N
N
With the results from summarize salary above we get
611.193
611.193
1141.063 1.96 p
, 1141.063 + 1.96 p
= [1057.5, 1224.5].
206
206
Thus we are 95% confident that the true mean of CEO salaries is between [1057.5, 1224.5].
Note that the interval contains 1200 and it is almost equal to the interval given to us
in the results for the command ttest salary == 1200. We would have obtained the
same values if we had done no rounding.
Yale SOM
MGT 403: Statistics
Practice Problem Set - P1-2- Answers
The Internet portal Yahoo may allow its members to customize their start pages
(homepages). As part of a short survey regarding likes and dislikes, users were asked
about their interests in options such as QuickTime movie clips with daily news and
sports events on their pages. Yahoo hopes that QuickTime will entice users to follow
a larger number of hyperlinks so that it can attract more advertisers.
The newly customized page option was made available to 100 Internet users who
were randomly sampled from the target population. The benchmark for Yahoo is 6
non-Yahoo content links clicked by all its customers on average prior to the availability of the QuickTime option (during any one-week period).
After one week of access to the customized homepage option, Yahoo observes
the (average) number of non-Yahoo links for each customer. For the sample of 100
customers, the average is 7.8 links and the standard deviation is 9.5 links.
1.
Null hypothesis: H0 : = 6
Alternative hypothesis: H0 : 6= 6
Computing the value for the test statistic
t = px 2 = p7.82 6
/N
9.5 /100
= 1.89
Given that t = 1.89 is not in the rejection region for a two-sided test at
the 5% level (t > 1.96 or t < 1.96), then we cannot reject the null at
the 5% level.
1
2. The 95% confidence interval for the true but unknown mean of the number of
links is:
7.8 1.96
9.5
10
Therefore, we can be quite sure or confident (95 percent) that the true but
unknown population mean is between 5.9 and 9.7 links.
3. None is more informative than the other it depends on the type of question
one is asking. The confidence interval gives us a range for which we are 95%
confident that the population mean falls into. It is good when we want to get
a sense of what the population mean could be. A hypothesis test, in contrast,
allows us to answer a dierent question: whether a specific hypothesis about
the population it true (supported by the data) or not.
Yale SOM
MGT 403: Statistics
Practice Problem Set P2-1
Introduction You have been asked to analyze the relationship between research and development
(R&D) spending and sales of firms in the chemical and telecommunications industries. You receive
data on a random sample of firms contained in the data set rd.dta. Type describe to see the
contents of the data set. The binary (dummy) variable chem is equal to one if the firm is in the
chemical industry and equal to zero if the firm is in the telecommunications industry.
Question 1
1. Run the regression of sales as a function of R&D.
2. Does the estimated coecient suggest that sales and R&D spending are positively or negatively correlated?
3. By how much do sales increase or decrease on average when R&D spending increases by one
million dollars?
4. Is this eect significantly dierent from zero at the 5% level and why?
5. What is the interpretation of the estimate for the intercept parameter
0?
6. How much does the variation in R&D spending explain the variation in sales?
Question 2
After analyzing the relationship between prices you are asked how the returns of the DJIA and
GE are related: when one increases, does the other decrease or vice-versa? Or when one increases
does the other also increase? To investigate this question, you first have to generate the returns
using the User menu command Manipulate Variables and Obs Generate New Variable and the
formula
pricet pricet 1
returnt = 100
.
pricet 1
Since each observation in our data set represents one date and the observations are chronologically
sorted, we can implement this formula in Stata by 100 * (close_DJIA - close_DJIA[_n-1]) /
close_DJIA[_n-1], for example, for DJIA. Here [_n-1] means that we are taking the observation
from the previous period. Do the same for GE returns.
1. Plot the relationship between the return for the GE stock and the return for DJIA. Is the
relationship increasing or decreasing?
cov(return ,return )
a
p
2. We can define the beta of a given stock as a =
where returna and returnp
var(returnp )
are the returns of the stock in question and the stock market index, respectively, and the risk
free rate is constant over time. Given the previous plot, should the beta of the GE stock be
positive or negative? Explain why.
Yale SOM
MGT 403: Statistics
Practice Problem set P2-1 Answers
Question 1
1. The Stata command for the regression is regress sales rd, robust, which yields the following output:
Linear regression
Number of obs =
F( 1,
59) =
Prob > F
=
R-squared
=
Root MSE
=
61
42.28
0.0000
0.7971
3542
-----------------------------------------------------------------------------|
Robust
sales |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------rd |
18.00799
2.769413
6.50
0.000
12.46641
23.54957
_cons |
1040.966
397.866
2.62
0.011
244.8379
1837.094
-----------------------------------------------------------------------------2. The estimated parameter 1 equals about 18, a positive number, which suggests that R&D
spending and sales are positively correlated
3. When R&D spending increases by one million dollars, predicted sales increase by 18 million
dollars.
4. This eect is significantly dierent from zero at the 5% level since the value of the t-statistic
reported in the regression output is equal to 6.5, which is in the rejection region of t < 1.96
or t > 1.96. Accordingly, the p-value is lower than 5% at nearly zero.
5. The estimate for the intercept, 0 , is 1041, which implies that predicted sales are 1041 million
dollars when R&D spending equals zero.
6. The R-squared value tells us that variation in R&D spending explains about 80% of the
variation in sales. The fit is pretty high.
Question 2
First generate the returns for DJIA and GE using the following commands:
generate return_DJIA = 100 * (close_DJIA - close_DJIA[_n-1]) / close_DJIA[_n-1]
generate return_GE = 100 * (close_GE - close_GE[_n-1]) / close_GE[_n-1]
1. We plot the returns of GE against the returns of DJIA using the Stata command twoway
(scatter return_GE return_DJIA). See Figure 3.
Yale SOM
MGT 403: Statistics
Practice Problem Set P2-2
A consulting firm wants to get a better understanding of its cost structure based
on data on costs incurred for projects in the past so as to improve its bidding process
for projects. Experience suggests that there are two main components of costs in a
project: (1) variable costs that are directly related to the size of the project, which
is reasonably proxied by the number of person-hours for the project, and (2) fixed
costs, which are incurred irrespective of the size of the project.
A regression of the total costs (in $) against the number of person-hours based
on data on 42 projects gave the following results:
Linear regression
Number of obs =
F( 1,
40) =
Prob > F
=
R-squared
=
Root MSE
=
42
157.8
0.000
0.87
2979
-----------------------------------------------------------------------------|
Robust
totalcost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------Person-hours |
372.15
29.629
12.6
0.000
311.0
433.3
_cons |
3209.76
1387.962
2.31
0.030
345.1
6074.4
------------------------------------------------------------------------------
1. Test the null hypothesis that the slope parameter is zero. State the hypotheses
in appropriate symbols, state the p-value, and interpret the result.
2. Define the 95 percent confidence interval for the true slope parameter and
interpret this interval.
3. Assuming that the equation is a reasonable approximation of the nature of
project costs, interpret the slope coefficient precisely in a manner understandable by a layperson.
4. What is the best estimate of fixed costs?
5. What is the predicted total cost for a project that will employ 1,000 personhours?
Yale SOM
MGT 403: Statistics
Practice Problem Set P2-2-Answers
1.
H0 :
=0
Ha :
6= 0
Yale SOM
MGT 403: Statistics
Practice Problem Set P3-1
Jim Douglas, the manager of Colonial Furniture has been reviewing weekly advertising expenditures. All of his advertising thus far has been focused on radio.
He is interested in learning how the eect of advertising might dier across dierent
media. He recorded the following variables:
Sales: Number of customers in each week (individuals visiting an outlet)
# Ads: The number of ads in the week
Medium (1=radio, 2=television).
1. Jim recalled from a class he had taken, that regression analysis could be used to
estimate the eects of the dierent media. He proposed the following regression
model:
Sales= 0 + 1 Ads+ 2 Medium
and he seeks your advice. Would you propose an alternative model? If so, explain the problem with Jims model. Write out your proposed model explicitly
in the form of an equation.
2. Jim then created one indicator variable: Radio (1 if radio, 0 otherwise, that is,
television). He then ran a regression of Sales against #Ads and Radio. The
results are reported below:
Linear regression
Number of obs =
52
F( 2,
49) =
14.91
Prob > F
= 0.0000
R-squared
= 0.69
Root MSE
= 44.87
-----------------------------------------------------------------------------|
Robust
sales |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------Ads
|
25
3.98
6.34
0.00
17.23
33.23
Radio
|
-47
16.44
-2.83
0.01
-79.64
-13.53
_cons
|
283
17.46
16.19
0.00
247.50
317.69
------------------------------------------------------------------------------
Yale SOM
MGT 403: Statistics
Practice Problem Set P3-1-Answers
1. Medium is a categorical variable (with values 1 or 2). So I would not include
it directly. I would create a dummy variable for the Medium category. For
example, Radio (1 if radio; 0 otherwise, that is, TV)
I would then estimate the model (treating TV as the base category):
Sales= 0 + 1 Ads+ 2 Radio
2. For any given level of advertising (number of ads), radio ads are expected to
produce 47 fewer customers relative to ads shown on TV.
3. An increase in the number of ads per week by 1 is expected to increase the
number of customers per week by 25, holding the medium through which the
the ads are transmitted constant.
4. To answer this question, we first need to compute the predicted sales value
when airing 50 ads on television.
Sales=283+25*50+(-47)*0=$1,533
Then the 95% prediction interval for sales is:
1, 533 1.96 44.87 = (1, 445.1; 1620.9)
Yale SOM
MGT 403: Statistics
Practice Problem Set P3-2
Data Set and Questions
You have been hired to investigate the relationship between individuals physical
attractiveness and their wage. You receive the data set beauty3.dta, which contains
data on the wage and other characteristics, such as education and years of experience,
for a random sample of individuals.
The data set also contains the variable looks, which measures a given individuals
subjective physical attractiveness. The variable looks encompasses five categories,
where 5 denotes the highest level of attractiveness and 1 denotes the lowest level of
attractiveness. The binary zero-one variable belavg is derived from looks: belavg
is equal to 1 if looks equals 1 or 2 and 0 otherwise.
1. By how much more/less do individuals with below average looks earn per hour,
on average, relative to individuals with average/above average looks? Run the
appropriate regression and answer the question.
2. Is the above estimate significant at the 5% level?
3. Does experience attenuate the looks" advantage? Run the appropriate regression and answer the question.
4. How much does the variation in looks and experience explain the variation in
wages?
Yale SOM
MGT 403: Statistics
Statistics Practice PS 3-2 Answers
1. Regression: The Stata command is regress wage belavg, robust, which
yields the following output:
Linear regression
Number of obs
F( 1, 1257)
Prob > F
R-squared
Root MSE
=
=
=
=
=
1259
12.84
0.0004
0.0076
4.1905
-----------------------------------------------------------------------------|
Robust
wage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------belavg | -1.118143
.3120741
-3.58
0.000
-1.730386
-.5058995
_cons |
6.387627
.128631
49.66
0.000
6.135272
6.639982
------------------------------------------------------------------------------
Number of obs
F( 2, 1256)
Prob > F
R-squared
Root MSE
=
=
=
=
=
1259
59.86
0.0000
0.0831
4.0297
-----------------------------------------------------------------------------|
Robust
wage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------belavg | -1.270895
.29984
-4.24
0.000
-1.859137
-.6826524
exper |
.0966187
.0093563
10.33
0.000
.078263
.1149743
_cons |
4.646653
.1693257
27.44
0.000
4.314461
4.978845
-----------------------------------------------------------------------------It seems that experience does not attenuate the advantage of looks. Holding
experience constant, those with below-average looks earn 1.27 dollars per hour
than those with above-average looks. Further, this coecient is statistically
significant at the 5% level as the p-value of 0.000 is less than 5%.
4. The variation in looks and experience only explain 8% of the variation in wages.
Yale SOM
MGT 403: Statistics
Sample Exam Questions
Administrative Details
This final is open book. You can consult your class notes, problem set solutions
and other materials. But you cannot discuss the exam with anyone. This
constitutes a violation of the honor code. Show all your work, including all the Stata
output relevant to answer the questions.1
Linear regression
Number of obs = 64
F( 1,
62) = 1.89
Prob > F
= 0.17
R-squared
= 0.18
Root MSE
= 1.53
-----------------------------------------------------------------------------Robust
ratings |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-----------+---------------------------------------------------------------After |
0.551
0.401
1.375
0.175
-0.251
1.355
_cons |
2.068
0.283
7.293
0.000
1.500
2.637
-----------------------------------------------------------------------------What is the average preference in the sample for brand X before the TV programs?
2. What is the average preference in the sample for brand X after the TV programs?
3. Based on the above regression, do the ads for brand X have a statistically
significant eect on the average preference for the brand in the target market?
Be precise and show relevant numbers.
4. Your exposure to regression analysis suggests that it may be useful to include
other variables so as to improve the understanding of eects of interest. So
you decide to add two independent variables: PPur = 1, if the consumer has
purchased the product in the past, = 0 if not; Male = 1 if male, = 0 if female.
Linear regression
Number of obs =
F( 1,
60) =
Prob > F
=
R-squared
=
Root MSE
=
64
35.5
0.00
0.81
0.92
-----------------------------------------------------------------------------Robust
ratings |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-----------+---------------------------------------------------------------After |
0.521
0.240
2.170
0.030
0.051
0.991
PPur |
2.452
0.254
9.649
0.000
1.942
2.961
Male | -0.047
0.201
-0.234
0.815
-0.441
0.347
_cons |
0.742
0.237
3.126
0.002
0.266
1.218
-----------------------------------------------------------------------------Based on the above analysis, do the ads have a statistically significant eect
on the average preference for the brand in the target market? Be precise and
show relevant numbers.
5. Is your conclusion in (4) dierent from your conclusion in (3)? Explain the
dierence, if any. Relate the idea of controlling for other variables (that is,
adding more relevant variables to the model or holding constant these other
relevant variables) to the dierence between the test you did in (3) and in (4).
Linear regression
Number of obs =
F( 1,
34) =
Prob > F
=
R-squared
=
Root MSE
=
36
19.9
0.000
0.604
12286
-----------------------------------------------------------------------------Robust
M&AVolume |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-----------+---------------------------------------------------------------Deals |
269.660
60.512
4.456
0.000
138.932
400.389
_cons | 1461.941
5737.309
0.254
0.802
-10932.8
13856.640
------------------------------------------------------------------------------
2. How much of the variation in M&A volume across these firms is explained by
the number of deals that each firm handles?
3. What is the marginal increase in M&A volume attributable to an additional
deal that a firm makes (or what is the predicted dierence in M&A volume
between firm B and firm A, if B has one more deal than A)? Be precise and
state the units (e.g. billions? millions?).
4. Your firm wants to be among the top players in this industry next year with
100 deals. Assuming that the estimated relationship applies to next year, what
is your best estimate of M&A volume for your firm if it achieved its goal next
year? Again, be sure to state the units (e.g. millions or billions) Note: The
question asks for your best estimate of the predicted value; so whether or not
something is statistically significant is irrelevant to this question.
Yale SOM
MGT 403: Statistics
Sample Exam Questions-Answers
Administrative Details
This final is open book. You can consult your class notes, problem set solutions
and other materials. But you cannot discuss the exam with anyone. This
constitutes a violation of the honor code. Show all your work, including all the Stata
output relevant to answer the questions.1
Though all the sample questions already show the Stata output, you will have to create your
own Stata output when answering the questions in the exam.
5. Yes, it is dierent. The reason is that there are other characteristics, such as
past purchase, that explain a dierence in preferences between consumers. By
including such variables in a regression, we have a better chance of learning
the impact of ads on brand preference.
Number of obs
F( 2,
941)
Prob > F
R-squared
Root MSE
=
=
=
=
=
944
65.80
0.0000
0.3153
48604
-----------------------------------------------------------------------------|
Robust
boxoffice |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------budget |
1.032913
.1143477
9.03
0.000
.8085068
1.257319
usa |
23697.24
4742.385
5.00
0.000
14390.36
33004.11
_cons | -10530.44
4557.711
-2.31
0.021
-19474.9
-1585.987
------------------------------------------------------------------------------
4. The predicted box oce for a movie with the same budget produced outside
the US is
=
10, 530.4 + 1.03 50, 000 + 23, 697.2 0 = 40, 969.6 thousand.
Question 1
The Internet portal Yahoo is considering allowing its members to customize their start
pages (homepages). As part of a short survey regarding likes and dislikes, users were asked
about their interests in options such as QuickTime movie clips with daily news and sports
events on their pages. Yahoo hopes that QuickTime will entice users to follow a larger
number of hyperlinks so that it can attract more advertisers.
The newly customized page option with QuickTime links was made available to 100 Internet users who were randomly sampled from the target population. The prior benchmark
for Yahoo has been 6 non-Yahoo content links clicked on average by its members per visit.
It collected data on the 100 users over 1 week to see if the availability of the QuickTime
link options significantly changes the average non-Yahoo links clicked per visit.
Findings: After one week of access to the new customized homepage option with QuickTime links, Yahoo observes that the average number of non-Yahoo links for each customer
in the sample per visit is 7.8 links and the standard deviation is 9.5 links.
Answer the following:
(i). Draw a graph and test the Null Hypothesis that the customization with QuickTime
does NOT alter the average number of non-Yahoo links clicked. Use the customary
95% Confidence Interval (t critical value is 1.96). State whether you reject the null
hypothesis or not. Also compute the t statistic.
(ii). Draw a new graph and show the 95% Confidence Interval for the estimated mean
number of non-Yahoo links clicked in the sample of 100 customers - be precise as
far as where numerically the boundaries of the Confidence Interval lie? Does the
Confidence Interval include the previous average of 7.8 or not? How does your
answer to this last question relate to your answer to (i)?
Question 2
You have been hired to study executive compensation patterns. Your current project examines CEO salaries in the 1990s. You are curious whether some of the popular statements
about high CEO salaries during this time period are correct. You have collected data on
CEO salaries in the 90s - the data is in the STATA dataset ceosalary.dta (available on the
class website on Canvas).
A widely read commentator of the time is known to have stated that average CEO compensation in the 90s (your sample period) was 1.2 million. You want to test this hypothesis.
(i). Carry out the appropriate t test in STATA, just as we did in class and you did in
Problem Set 1, Question 2. What is the t value? Is the Null Hypothesis rejected or
not? What is the p value?
(ii). You can also carry out this kind of test manually in STATA. To do this run the
command summarize salary from the command line. This will show you the mean
of salary as well as its standard deviation. To proceed assume the distribution
for salary is a Normal distribution. Now compute the standard deviation of the
Test Statistic which is the average over the observations. To do this recall that the
standard deviation for the Test Statistic is:
=
N
where
is the estimated standard deviation of the underlying variable, and N is the size
of the sample. Once you compute this, go out 1.96
in either direction to construct
the Confidence Interval. Then check whether the Null Hypothesis value lies inside the
Confidence Interval or not.
Question 3
Consider the VERY small dataset that consists of 3 datapoints:
X1 = 1.0 Y1 = 200.0
X2 = 2.0 Y1 = 145.0
X3 = 3.0
Use the formulas:
1 =
!N
Y1 = 20.0
Y )
0 = Y 1 X
=
X
N
"
i=1
Xi
Y =
N
"
Yi
i=1
to compute 0 and 1 . Then compute residi for each datapoint and finally R2 .
After you have done this calculation manually, enter these 3 datapoints into STATA (or
it has already been done for you in the dataset Q3-practice on the Canvas website under
STATS/FEINSTEIN/STATA Datasets. Run the regress command and check your work.