Sei sulla pagina 1di 52

Yale SOM

MGT 403: Statistics


Practice Problem Set - P1-1
Introduction. You have been hired to study the evolution of executive compensation over time.
Specifically, how CEOs salaries vary between dierent sectors and how they are related to a companys sales in the early 1990s. You receive data on a random sample of CEOs which is contained
in ceosalary1.dta. Type describe to see the contents of this data set.
Question 1
(a) There are two hypotheses concerning CEO compensation in the early 1990s. One is that
average CEO salaries were at most $1,000,000. Another concerns the default belief that
average CEO salaries were actually $1,200,000. You want to test these two hypotheses. Note
that the data to test them is contained in the variable salary (which measures CEO salary
in $1000). Can you reject the null hypotheses, at the 5% level, implied by these tests?
To answer this question, write down the following steps for each test:
1. The null hypothesis
2. The alternative hypothesis
3. The formula for the realization of the test statistic
4. The rejection region: for which values of the test statistic you reject the null hypothesis
Now use the data to carry out the two tests.
First, do it manually by typing summarize salary or using the User Menu, Summarize
and Describe Data, Simple Summary Statistics (summarize) to input this command. Use
the result to calculate the realization (or outcome) for the test statistics.
Can you reject the null hypotheses? Why or why not?

Second, check your answer by conducting a test of means in Stata. You can use Simple
Test of Association Test of Means in the Stata User menu.
What are the p-values for each of these two tests? Based on the p-value can you reject
the null hypothesis at the 5% level for each test? Explain why.
(b) Compute the 95% confidence interval for the unknown population mean of CEO salaries, by
writing down the formula for the confidence interval

using the results from the command summarize salary to compute the confidence interval.
Does 1200 fall in the confidence interval?

Yale SOM
MGT 403: Statistics
Practice Problem Set - P1-2
The Internet portal Yahoo may allow its members to customize their start pages
(homepages). As part of a short survey regarding likes and dislikes, users were asked
about their interests in options such as QuickTime movie clips with daily news and
sports events on their pages. Yahoo hopes that QuickTime will entice users to follow
a larger number of hyperlinks so that it can attract more advertisers.
The newly customized page option was made available to 100 Internet users who
were randomly sampled from the target population. The benchmark for Yahoo is 6
non-Yahoo content links clicked by all its customers on average prior to the availability of the QuickTime option (during any one-week period).
After one week of access to the customized homepage option, Yahoo observes
the (average) number of non-Yahoo links for each customer. For the sample of 100
customers, the average is 7.8 links and the standard deviation is 9.5 links.
1. Test the two-tailed null hypothesis that the customization with QuickTime
does not alter the true average (benchmark) number of links at the 5% -level
(critical value is 1.96). Specify null and alternative hypotheses, compute the
value for the test statistic and state whether you can reject or not the null
hypothesis and why.
2. Construct a 95 percent confidence interval for the true but unknown population
parameter. Interpret the resulting interval statistically and managerially.
3. Which of these two procedures is more informative, the test of the null hypothesis or the confidence interval? Explain.

Yale SOM
MGT 403: Statistics
Practice Problem Set - P1-1
Introduction. You have been hired to study the evolution of executive compensation over time.
Specifically, how CEOs salaries vary between dierent sectors and how they are related to a companys sales in the early 1990s. You receive data on a random sample of CEOs which is contained
in ceosalary1.dta. Type describe to see the contents of this data set.
Question 1
(a) There are two hypotheses concerning CEO compensation in the early 1990s. One is that
average CEO salaries were at most $1,000,000. Another concerns the default belief that
average CEO salaries were actually $1,200,000. You want to test these two hypotheses. Note
that the data to test them is contained in the variable salary (which measures CEO salary
in $1000). Can you reject the null hypotheses, at the 5% level, implied by these tests?
To answer this question, write down the following steps for each test:
1. The null hypothesis
2. The alternative hypothesis
3. The formula for the realization of the test statistic
4. The rejection region: for which values of the test statistic you reject the null hypothesis
Now use the data to carry out the two tests.
First, do it manually by typing summarize salary or using the User Menu, Summarize
and Describe Data, Simple Summary Statistics (summarize) to input this command. Use
the result to calculate the realization (or outcome) for the test statistics.
Can you reject the null hypotheses? Why or why not?

Second, check your answer by conducting a test of means in Stata. You can use Simple
Test of Association Test of Means in the Stata User menu.
What are the p-values for each of these two tests? Based on the p-value can you reject
the null hypothesis at the 5% level for each test? Explain why.
(b) Compute the 95% confidence interval for the unknown population mean of CEO salaries, by
writing down the formula for the confidence interval

using the results from the command summarize salary to compute the confidence interval.
Does 1200 fall in the confidence interval?

Yale SOM
MGT 403: Statistics
Practice Problem Set - P1-1
Introduction. You have been hired to study the evolution of executive compensation over time.
Specifically, how CEOs salaries vary between dierent sectors and how they are related to a companys sales in the early 1990s. You receive data on a random sample of CEOs which is contained
in ceosalary1.dta. Type describe to see the contents of this data set.
Question 1
(a) There are two hypotheses concerning CEO compensation in the early 1990s. One is that
average CEO salaries were at most $1,000,000. Another concerns the default belief that
average CEO salaries were actually $1,200,000. You want to test these two hypotheses. Note
that the data to test them is contained in the variable salary (which measures CEO salary
in $1000). Can you reject the null hypotheses, at the 5% level, implied by these tests?
To answer this question, write down the following steps for each test:
1. The null hypothesis
2. The alternative hypothesis
3. The formula for the realization of the test statistic
4. The rejection region: for which values of the test statistic you reject the null hypothesis
Now use the data to carry out the two tests.
First, do it manually by typing summarize salary or using the User Menu, Summarize
and Describe Data, Simple Summary Statistics (summarize) to input this command. Use
the result to calculate the realization (or outcome) for the test statistics.
Can you reject the null hypotheses? Why or why not?

Second, check your answer by conducting a test of means in Stata. You can use Simple
Test of Association Test of Means in the Stata User menu.
What are the p-values for each of these two tests? Based on the p-value can you reject
the null hypothesis at the 5% level for each test? Explain why.
(b) Compute the 95% confidence interval for the unknown population mean of CEO salaries, by
writing down the formula for the confidence interval

using the results from the command summarize salary to compute the confidence interval.
Does 1200 fall in the confidence interval?

Yale SOM
MGT 403: Statistics
Practice Problem Set - P1-2
The Internet portal Yahoo may allow its members to customize their start pages
(homepages). As part of a short survey regarding likes and dislikes, users were asked
about their interests in options such as QuickTime movie clips with daily news and
sports events on their pages. Yahoo hopes that QuickTime will entice users to follow
a larger number of hyperlinks so that it can attract more advertisers.
The newly customized page option was made available to 100 Internet users who
were randomly sampled from the target population. The benchmark for Yahoo is 6
non-Yahoo content links clicked by all its customers on average prior to the availability of the QuickTime option (during any one-week period).
After one week of access to the customized homepage option, Yahoo observes
the (average) number of non-Yahoo links for each customer. For the sample of 100
customers, the average is 7.8 links and the standard deviation is 9.5 links.
1. Test the two-tailed null hypothesis that the customization with QuickTime
does not alter the true average (benchmark) number of links at the 5% -level
(critical value is 1.96). Specify null and alternative hypotheses, compute the
value for the test statistic and state whether you can reject or not the null
hypothesis and why.
2. Construct a 95 percent confidence interval for the true but unknown population
parameter. Interpret the resulting interval statistically and managerially.
3. Which of these two procedures is more informative, the test of the null hypothesis or the confidence interval? Explain.

Yale SOM
MGT 403: Statistics
Practice Problem Set P1-1 Answers
Question 1
(a) To test the research hypothesis that the mean of salary is at most (less than or equal to) 1000,
we have
1. The null hypothesis: H0 : > 1000
2. The alternative hypothesis: Ha : 1000

3. The formula for the realization of the test statistic: t =


4. The rejection region: reject if t <

x
1000
p
/ N

1.65.

To test if the mean of salary is equal to 1200:


1. The null hypothesis: H0 : = 1200
2. The alternative hypothesis: H1 : 6= 1200

3. The formula for the realization of the test statistic: t =

x
1200
p
/ N

4. The rejection region: reject if |t| > 1.96 (this is the same as saying that the rejection
region is t < 1.96 or t > 1.96).
For the manual calculation of the realization of the test statistic we need the mean of
salaries in the sample, the standard deviation, and the number of observations. We get
all of these from Statas summarize command.
. summarize salary
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------salary |
206
1141.063
611.193
223
4143
Hence the realization or value for the test statistic for the first test equals
t=

1141.063 1000
p
= 3.313.
611.193/ 206

The realization or value for the test statistic for the second test equals
t=

1141.063 1200
p
=
611.193/ 206

1.384.

For the first test, since the value for t is 3.3 which is not smaller than -1.65 we cannot
reject the null hypothesis in favor of the alternative that that the mean of salaries in the
population of CEOs is at most $1,000,000 at the 5% level. For the second test, since the
value for the test statistic t of -1.384 is not in the rejection region of t < 1.96 or t > 1.96
we also cannot reject the null hypothesis that the mean of salaries in the population of
CEOs is equal to $1,200,000 at the 5% level.
We get the same results using the ttest command in Stata. Note that when Stata sets
as a default 95% confidence level", it is just asking you if you would like to see the 95%
confidence interval for the unknown population mean of CEO salaries together with the
value for t.
. ttest salary == 1000
One-sample t test
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------salary |
206
1141.063
42.58383
611.193
1057.105
1225.022
-----------------------------------------------------------------------------mean = mean(salary)
t =
3.3126
Ho: mean = 1000
degrees of freedom =
205
Ha: mean < 1000
Pr(T < t) = 0.9995

Ha: mean != 1000


Pr(|T| > |t|) = 0.0011

Ha: mean > 1000


Pr(T > t) = 0.0005

. ttest salary == 1200


One-sample t test
-----------------------------------------------------------------------------Variable |
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------salary |
206
1141.063
42.58383
611.193
1057.105
1225.022
-----------------------------------------------------------------------------mean = mean(salary)
t = -1.3840
Ho: mean = 1200
degrees of freedom =
205
Ha: mean < 1200
Pr(T < t) = 0.0839

Ha: mean != 1200


Pr(|T| > |t|) = 0.1679

Ha: mean > 1200


Pr(T > t) = 0.9161

The p-value for the first test is 0.9995. Since the p-value is greater than 5% we cannot
reject the null hypothesis in favor of the alternative that that average salaries in the
population of CEOs are, at most, $1,000,000 at the 5% level. The p-value for the second
test is 0.1679. Since the p-value is greater than 5% we also cannot reject the null
2

hypothesis that average salaries in the population of CEOs are $1,200,000, at the 5%
level.
(b)

The formula for the 95% confidence interval for the mean of salary in our CEO population
is

x
1.96 p , x
+ 1.96 p
.
N
N
With the results from summarize salary above we get

611.193
611.193
1141.063 1.96 p
, 1141.063 + 1.96 p
= [1057.5, 1224.5].
206
206
Thus we are 95% confident that the true mean of CEO salaries is between [1057.5, 1224.5].
Note that the interval contains 1200 and it is almost equal to the interval given to us
in the results for the command ttest salary == 1200. We would have obtained the
same values if we had done no rounding.

Yale SOM
MGT 403: Statistics
Practice Problem Set - P1-2- Answers
The Internet portal Yahoo may allow its members to customize their start pages
(homepages). As part of a short survey regarding likes and dislikes, users were asked
about their interests in options such as QuickTime movie clips with daily news and
sports events on their pages. Yahoo hopes that QuickTime will entice users to follow
a larger number of hyperlinks so that it can attract more advertisers.
The newly customized page option was made available to 100 Internet users who
were randomly sampled from the target population. The benchmark for Yahoo is 6
non-Yahoo content links clicked by all its customers on average prior to the availability of the QuickTime option (during any one-week period).
After one week of access to the customized homepage option, Yahoo observes
the (average) number of non-Yahoo links for each customer. For the sample of 100
customers, the average is 7.8 links and the standard deviation is 9.5 links.
1.

Null hypothesis: H0 : = 6
Alternative hypothesis: H0 : 6= 6
Computing the value for the test statistic
t = px 2 = p7.82 6
/N

9.5 /100

= 1.89

Given that t = 1.89 is not in the rejection region for a two-sided test at
the 5% level (t > 1.96 or t < 1.96), then we cannot reject the null at
the 5% level.
1

2. The 95% confidence interval for the true but unknown mean of the number of
links is:
7.8 1.96

9.5
10

= 7.8 1.9 = (5.9, 9.7)

Therefore, we can be quite sure or confident (95 percent) that the true but
unknown population mean is between 5.9 and 9.7 links.
3. None is more informative than the other it depends on the type of question
one is asking. The confidence interval gives us a range for which we are 95%
confident that the population mean falls into. It is good when we want to get
a sense of what the population mean could be. A hypothesis test, in contrast,
allows us to answer a dierent question: whether a specific hypothesis about
the population it true (supported by the data) or not.

Yale SOM
MGT 403: Statistics
Practice Problem Set P2-1
Introduction You have been asked to analyze the relationship between research and development
(R&D) spending and sales of firms in the chemical and telecommunications industries. You receive
data on a random sample of firms contained in the data set rd.dta. Type describe to see the
contents of the data set. The binary (dummy) variable chem is equal to one if the firm is in the
chemical industry and equal to zero if the firm is in the telecommunications industry.

Question 1
1. Run the regression of sales as a function of R&D.
2. Does the estimated coecient suggest that sales and R&D spending are positively or negatively correlated?
3. By how much do sales increase or decrease on average when R&D spending increases by one
million dollars?
4. Is this eect significantly dierent from zero at the 5% level and why?
5. What is the interpretation of the estimate for the intercept parameter

0?

6. How much does the variation in R&D spending explain the variation in sales?

Question 2
After analyzing the relationship between prices you are asked how the returns of the DJIA and
GE are related: when one increases, does the other decrease or vice-versa? Or when one increases
does the other also increase? To investigate this question, you first have to generate the returns
using the User menu command Manipulate Variables and Obs Generate New Variable and the
formula
pricet pricet 1
returnt = 100
.
pricet 1
Since each observation in our data set represents one date and the observations are chronologically
sorted, we can implement this formula in Stata by 100 * (close_DJIA - close_DJIA[_n-1]) /
close_DJIA[_n-1], for example, for DJIA. Here [_n-1] means that we are taking the observation
from the previous period. Do the same for GE returns.
1. Plot the relationship between the return for the GE stock and the return for DJIA. Is the
relationship increasing or decreasing?
cov(return ,return )

a
p
2. We can define the beta of a given stock as a =
where returna and returnp
var(returnp )
are the returns of the stock in question and the stock market index, respectively, and the risk
free rate is constant over time. Given the previous plot, should the beta of the GE stock be
positive or negative? Explain why.

Yale SOM
MGT 403: Statistics
Practice Problem set P2-1 Answers
Question 1
1. The Stata command for the regression is regress sales rd, robust, which yields the following output:
Linear regression

Number of obs =
F( 1,
59) =
Prob > F
=
R-squared
=
Root MSE
=

61
42.28
0.0000
0.7971
3542

-----------------------------------------------------------------------------|
Robust
sales |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------rd |
18.00799
2.769413
6.50
0.000
12.46641
23.54957
_cons |
1040.966
397.866
2.62
0.011
244.8379
1837.094
-----------------------------------------------------------------------------2. The estimated parameter 1 equals about 18, a positive number, which suggests that R&D
spending and sales are positively correlated
3. When R&D spending increases by one million dollars, predicted sales increase by 18 million
dollars.
4. This eect is significantly dierent from zero at the 5% level since the value of the t-statistic
reported in the regression output is equal to 6.5, which is in the rejection region of t < 1.96
or t > 1.96. Accordingly, the p-value is lower than 5% at nearly zero.
5. The estimate for the intercept, 0 , is 1041, which implies that predicted sales are 1041 million
dollars when R&D spending equals zero.
6. The R-squared value tells us that variation in R&D spending explains about 80% of the
variation in sales. The fit is pretty high.

Question 2
First generate the returns for DJIA and GE using the following commands:
generate return_DJIA = 100 * (close_DJIA - close_DJIA[_n-1]) / close_DJIA[_n-1]
generate return_GE = 100 * (close_GE - close_GE[_n-1]) / close_GE[_n-1]
1. We plot the returns of GE against the returns of DJIA using the Stata command twoway
(scatter return_GE return_DJIA). See Figure 3.

Figure 1: Question I.4


2. From the graph we see that the relationship between GE returns and DJIA returns is positive: higher DJIA returns are associated with higher GE returns. Thus the covariance in this
sample between GE and DJIA returns is positive. Thus the numerator for beta, measuring
the covariance between GE returns and DJIA returns is positive. A variance is never negative
(recall that a variance involves sums of squared terms, and squared terms are always nonnegative), so the denominator for beta is positive. This implies that the beta of the GE stock in
this sample is positive.

Yale SOM
MGT 403: Statistics
Practice Problem Set P2-2
A consulting firm wants to get a better understanding of its cost structure based
on data on costs incurred for projects in the past so as to improve its bidding process
for projects. Experience suggests that there are two main components of costs in a
project: (1) variable costs that are directly related to the size of the project, which
is reasonably proxied by the number of person-hours for the project, and (2) fixed
costs, which are incurred irrespective of the size of the project.
A regression of the total costs (in $) against the number of person-hours based
on data on 42 projects gave the following results:

Linear regression

Number of obs =
F( 1,
40) =
Prob > F
=
R-squared
=
Root MSE
=

42
157.8
0.000
0.87
2979

-----------------------------------------------------------------------------|
Robust
totalcost |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------Person-hours |
372.15
29.629
12.6
0.000
311.0
433.3
_cons |
3209.76
1387.962
2.31
0.030
345.1
6074.4
------------------------------------------------------------------------------

1. Test the null hypothesis that the slope parameter is zero. State the hypotheses
in appropriate symbols, state the p-value, and interpret the result.
2. Define the 95 percent confidence interval for the true slope parameter and
interpret this interval.
3. Assuming that the equation is a reasonable approximation of the nature of
project costs, interpret the slope coefficient precisely in a manner understandable by a layperson.
4. What is the best estimate of fixed costs?
5. What is the predicted total cost for a project that will employ 1,000 personhours?

Yale SOM
MGT 403: Statistics
Practice Problem Set P2-2-Answers
1.

H0 :

=0

Ha :

6= 0

The p-value for the test is 0.000.


Since the p-value is less than 5% we can reject the null hypothesis at the
5% level.
We can therefore state that there is a relationship between total costs
of a project and the number of person-hours required to complete the
project in our sample and that this relationship is very significant (it
is not likely that, given our data, the relationship does not exist)
2. The 95% confidence interval for the slope parameter is (311.0, 433.3). We
can state confidently (95% confidence level) that the predicted total cost for
a project for each additional person-hour could be anywhere between $311.0
and $433.3.
3. If the number of person-hours for a project increases by an hour, the total cost
of the project is expected to increase by $372.
4. The best estimate of fixed costs is given by the intercept: the cost of the
project when the number of person-hours is zero. This cost is is $3,210.
5. The predicted total cost for a project that will employ 1,000 person-hours is
3, 210 + 372 1000 = $375, 210

Yale SOM
MGT 403: Statistics
Practice Problem Set P3-1
Jim Douglas, the manager of Colonial Furniture has been reviewing weekly advertising expenditures. All of his advertising thus far has been focused on radio.
He is interested in learning how the eect of advertising might dier across dierent
media. He recorded the following variables:
Sales: Number of customers in each week (individuals visiting an outlet)
# Ads: The number of ads in the week
Medium (1=radio, 2=television).
1. Jim recalled from a class he had taken, that regression analysis could be used to
estimate the eects of the dierent media. He proposed the following regression
model:
Sales= 0 + 1 Ads+ 2 Medium
and he seeks your advice. Would you propose an alternative model? If so, explain the problem with Jims model. Write out your proposed model explicitly
in the form of an equation.
2. Jim then created one indicator variable: Radio (1 if radio, 0 otherwise, that is,
television). He then ran a regression of Sales against #Ads and Radio. The
results are reported below:

Linear regression

Number of obs =
52
F( 2,
49) =
14.91
Prob > F
= 0.0000
R-squared
= 0.69
Root MSE
= 44.87

-----------------------------------------------------------------------------|
Robust
sales |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------Ads
|
25
3.98
6.34
0.00
17.23
33.23
Radio
|
-47
16.44
-2.83
0.01
-79.64
-13.53
_cons
|
283
17.46
16.19
0.00
247.50
317.69
------------------------------------------------------------------------------

Interpret the eect for Radio precisely.


3. What is the eect of a 1 unit increase in the number of ads? State this result
precisely.
4. What is the 95% prediction interval for sales when the company airs 50 ads on
television?

Yale SOM
MGT 403: Statistics
Practice Problem Set P3-1-Answers
1. Medium is a categorical variable (with values 1 or 2). So I would not include
it directly. I would create a dummy variable for the Medium category. For
example, Radio (1 if radio; 0 otherwise, that is, TV)
I would then estimate the model (treating TV as the base category):
Sales= 0 + 1 Ads+ 2 Radio
2. For any given level of advertising (number of ads), radio ads are expected to
produce 47 fewer customers relative to ads shown on TV.
3. An increase in the number of ads per week by 1 is expected to increase the
number of customers per week by 25, holding the medium through which the
the ads are transmitted constant.
4. To answer this question, we first need to compute the predicted sales value
when airing 50 ads on television.
Sales=283+25*50+(-47)*0=$1,533
Then the 95% prediction interval for sales is:
1, 533 1.96 44.87 = (1, 445.1; 1620.9)

Yale SOM
MGT 403: Statistics
Practice Problem Set P3-2
Data Set and Questions
You have been hired to investigate the relationship between individuals physical
attractiveness and their wage. You receive the data set beauty3.dta, which contains
data on the wage and other characteristics, such as education and years of experience,
for a random sample of individuals.
The data set also contains the variable looks, which measures a given individuals
subjective physical attractiveness. The variable looks encompasses five categories,
where 5 denotes the highest level of attractiveness and 1 denotes the lowest level of
attractiveness. The binary zero-one variable belavg is derived from looks: belavg
is equal to 1 if looks equals 1 or 2 and 0 otherwise.
1. By how much more/less do individuals with below average looks earn per hour,
on average, relative to individuals with average/above average looks? Run the
appropriate regression and answer the question.
2. Is the above estimate significant at the 5% level?
3. Does experience attenuate the looks" advantage? Run the appropriate regression and answer the question.
4. How much does the variation in looks and experience explain the variation in
wages?

Yale SOM
MGT 403: Statistics
Statistics Practice PS 3-2 Answers
1. Regression: The Stata command is regress wage belavg, robust, which
yields the following output:

Linear regression

Number of obs
F( 1, 1257)
Prob > F
R-squared
Root MSE

=
=
=
=
=

1259
12.84
0.0004
0.0076
4.1905

-----------------------------------------------------------------------------|
Robust
wage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------belavg | -1.118143
.3120741
-3.58
0.000
-1.730386
-.5058995
_cons |
6.387627
.128631
49.66
0.000
6.135272
6.639982
------------------------------------------------------------------------------

Individuals with below-average looks earn 1.12 less per hour.


2. The coecient on belavg of 1.12 is significant at the 5% level since the value
for the t-statistic is lower than that of the critical value of 1.96, which implies
that one can reject the null hypothesis at the 5% level. Recall that the rejection
region for a large sample two-sided test that each of the regression coecients
is equal to zero, at the 5% level, is t < 1.96 or t > 1.96.

3. regress wage belavg exper, robust


Linear regression

Number of obs
F( 2, 1256)
Prob > F
R-squared
Root MSE

=
=
=
=
=

1259
59.86
0.0000
0.0831
4.0297

-----------------------------------------------------------------------------|
Robust
wage |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------belavg | -1.270895
.29984
-4.24
0.000
-1.859137
-.6826524
exper |
.0966187
.0093563
10.33
0.000
.078263
.1149743
_cons |
4.646653
.1693257
27.44
0.000
4.314461
4.978845
-----------------------------------------------------------------------------It seems that experience does not attenuate the advantage of looks. Holding
experience constant, those with below-average looks earn 1.27 dollars per hour
than those with above-average looks. Further, this coecient is statistically
significant at the 5% level as the p-value of 0.000 is less than 5%.
4. The variation in looks and experience only explain 8% of the variation in wages.

Yale SOM
MGT 403: Statistics
Sample Exam Questions
Administrative Details
This final is open book. You can consult your class notes, problem set solutions
and other materials. But you cannot discuss the exam with anyone. This
constitutes a violation of the honor code. Show all your work, including all the Stata
output relevant to answer the questions.1

Sample Exam Question 1


The Nielsen Media organization conducts tests of commercials in its laboratories.
The firm regularly invites members of identified target markets to its premises. Attendees are shown one or more television programs in which commercials are embedded, and asked questions about products and other aspects both before and after
they view programs.
Each study is typically sponsored by a single company such as Procter & Gamble
(P&G). On November 29, 2002, Nielsen Media Research did a study on a brand
that was not performing well in the market. P&G was interested in whether new
commercials it proposes to air might change target members preferences for the
brand.
A total of 32 consumers participated in the study. They first provided preference
and perception data on multiple brands. Then they watched two TV programs
with a standard number of commercials. Thereafter they provided preference and
perception data on some of the same brands and other brands. (Researchers also
1
Though all the sample questions already show the Stata output, you will have to create your
own Stata output when answering the questions in the exam.

obtained brain scanner analyses based on principles of neuromarketing but those


data are ignored here.)
The data of interest pertain to brand X on which consumer preferences were
obtained both before and after the TV programs (with relevant commercials on brand
X as part of the TV program). The rating was on a 5 point scale, where 5=great
and 1=lousy. The sample data about the preferences of the brand are summarized
below:
1. You have taken a regression course and want to use this promising analytical
technique". You create a dependent variable with 32 before" and 32 after"
preference scores for brand X. The regression includes only one dummy (indicator) variable, AFTER, to distinguish the two categories of observations:
AFTER = 1 if after, 0 if before.

Linear regression

Number of obs = 64
F( 1,
62) = 1.89
Prob > F
= 0.17
R-squared
= 0.18
Root MSE
= 1.53

-----------------------------------------------------------------------------Robust
ratings |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-----------+---------------------------------------------------------------After |
0.551
0.401
1.375
0.175
-0.251
1.355
_cons |
2.068
0.283
7.293
0.000
1.500
2.637
-----------------------------------------------------------------------------What is the average preference in the sample for brand X before the TV programs?
2. What is the average preference in the sample for brand X after the TV programs?
3. Based on the above regression, do the ads for brand X have a statistically
significant eect on the average preference for the brand in the target market?
Be precise and show relevant numbers.
4. Your exposure to regression analysis suggests that it may be useful to include
other variables so as to improve the understanding of eects of interest. So
you decide to add two independent variables: PPur = 1, if the consumer has
purchased the product in the past, = 0 if not; Male = 1 if male, = 0 if female.

Linear regression

Number of obs =
F( 1,
60) =
Prob > F
=
R-squared
=
Root MSE
=

64
35.5
0.00
0.81
0.92

-----------------------------------------------------------------------------Robust
ratings |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-----------+---------------------------------------------------------------After |
0.521
0.240
2.170
0.030
0.051
0.991
PPur |
2.452
0.254
9.649
0.000
1.942
2.961
Male | -0.047
0.201
-0.234
0.815
-0.441
0.347
_cons |
0.742
0.237
3.126
0.002
0.266
1.218
-----------------------------------------------------------------------------Based on the above analysis, do the ads have a statistically significant eect
on the average preference for the brand in the target market? Be precise and
show relevant numbers.
5. Is your conclusion in (4) dierent from your conclusion in (3)? Explain the
dierence, if any. Relate the idea of controlling for other variables (that is,
adding more relevant variables to the model or holding constant these other
relevant variables) to the dierence between the test you did in (3) and in (4).

Sample Exam Question 2


Investment Bankers earn large fees for making arrangements and giving advice relating to mergers and acquisitions (M&A) when one firm joins with or purchases
another.
Consider the following regression on the total dollar amount of M&A activity
against the number of deals of the top 15 major firms in this industry.
Dependent Variable: Total M&A Volume (in millions of dollars) for a firm
Independent Variable: Number of Deals for the corresponding firm
Below is the regression output:

Linear regression

Number of obs =
F( 1,
34) =
Prob > F
=
R-squared
=
Root MSE
=

36
19.9
0.000
0.604
12286

-----------------------------------------------------------------------------Robust
M&AVolume |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-----------+---------------------------------------------------------------Deals |
269.660
60.512
4.456
0.000
138.932
400.389
_cons | 1461.941
5737.309
0.254
0.802
-10932.8
13856.640
------------------------------------------------------------------------------

For all questions, assume that the linearity assumption holds.


1. Does the regression equation have significant explanatory power? Be precise
(use a specific result to explain).
5

2. How much of the variation in M&A volume across these firms is explained by
the number of deals that each firm handles?
3. What is the marginal increase in M&A volume attributable to an additional
deal that a firm makes (or what is the predicted dierence in M&A volume
between firm B and firm A, if B has one more deal than A)? Be precise and
state the units (e.g. billions? millions?).
4. Your firm wants to be among the top players in this industry next year with
100 deals. Assuming that the estimated relationship applies to next year, what
is your best estimate of M&A volume for your firm if it achieved its goal next
year? Again, be sure to state the units (e.g. millions or billions) Note: The
question asks for your best estimate of the predicted value; so whether or not
something is statistically significant is irrelevant to this question.

Sample Exam Question 3


The movie-v2.dta dataset contains a sample of movies shown on U.S movie screens
between 1985 and 2001. It contains the title of the movie, the year of its premiere,
the number of screens per week, the movies total U.S box oce (revenue from ticket
sales), a binary variable indicating whether the movie was produced in the U.S or
not, and the movies production budget, among other information.
Type describe to see the contents of the data set.
1. Run the regression of box oce onto budget and whether the movie was produced in the U.S.
2. There is a claim that movies with larger budgets generate bigger box oce
revenues. Holding the location of production constant, how does an increase
in one thousand dollars in budget change the movies box oce?
3. What is the predicted box oce for a movie with a 50 million dollar budget
produced in the US?
4. What is the predicted box oce for a movie with the same budget produced
outside the US?
5. By how much is the variation in box oce explained by the production budget
and whether a movie is produced in the US?

Yale SOM
MGT 403: Statistics
Sample Exam Questions-Answers
Administrative Details
This final is open book. You can consult your class notes, problem set solutions
and other materials. But you cannot discuss the exam with anyone. This
constitutes a violation of the honor code. Show all your work, including all the Stata
output relevant to answer the questions.1

Sample Exam Question 1


1. The average preference for brand X before TV programs=2.07 (intercept)
2. The average preference for brand X after TV programs=2.07+0.55=2.62
3. No, the p-value of the dierence is 0.18 which is greater than 0.05. Therefore
we cannot reject the null that the dierence of 0.55 (After-Before Advertising)
is equal to zero.
4. Yes, now they do.
The null is H0 : Af ter = 0
The alternative is Ha : Af ter 6= 0
The p-value is 0.03<0.05, so we can reject H0 . That is, we can reject the null
that the ads do not aect the average preference for the brand.
1

Though all the sample questions already show the Stata output, you will have to create your
own Stata output when answering the questions in the exam.

5. Yes, it is dierent. The reason is that there are other characteristics, such as
past purchase, that explain a dierence in preferences between consumers. By
including such variables in a regression, we have a better chance of learning
the impact of ads on brand preference.

Sample Exam Question 2


1. The p-value of the F-statistic is 0.000 (Stata rounded it to zero), which is less
than 0.01 or any reasonable type I error probability. Hence the regression is
highly significant and has significant explanatory power.
2. 60.4% of the variation in M&A Volume is explained by the number of deals (as
seen in the R-square).
3. The slope coecient is 269.66. Therefore, the marginal increase in M&A volume attributable to an additional deal is $269.66 million.
4. 1,462+269.66*100=$28,428 million or $28.428 billion.

Sample Exam Question 3


1. The regression is implemented as follows:
. regress boxoffice budget usa, robust
Linear regression

Number of obs
F( 2,
941)
Prob > F
R-squared
Root MSE

=
=
=
=
=

944
65.80
0.0000
0.3153
48604

-----------------------------------------------------------------------------|
Robust
boxoffice |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------budget |
1.032913
.1143477
9.03
0.000
.8085068
1.257319
usa |
23697.24
4742.385
5.00
0.000
14390.36
33004.11
_cons | -10530.44
4557.711
-2.31
0.021
-19474.9
-1585.987
------------------------------------------------------------------------------

2. Holding the location of production constant, an increase in one thousand dollars


in budget increases the movies box oce by 1.03 thousand dollars.
3. The predicted box oce for a movie with a a budget of 50 million dollars
produced in the US is:
10, 530.4 + 1.03 50, 000 + 23, 697.2 1 = 64, 666.8 thousand.
That is 64.6668 million dollars.

4. The predicted box oce for a movie with the same budget produced outside
the US is
=

10, 530.4 + 1.03 50, 000 + 23, 697.2 0 = 40, 969.6 thousand.

That is 40.9696 million dollars.


5. We can see by the R-squared that 31.5% of the variation in box oce is explained by the production budget and whether a movie is produced in the
US.

MGT 403 Statistics PRACTICE PROBLEMS

MGT 403: Probability Modeling and Statistics


STATISTICS: PRACTICE PROBLEMS
This is a PRACTICE PROBLEM SET. You do NOT need to turn it in. It is optional,
for students who would like a little more experience solving problems. Solutions will be
posted.
There are 3 QUESTIONS.

Question 1
The Internet portal Yahoo is considering allowing its members to customize their start
pages (homepages). As part of a short survey regarding likes and dislikes, users were asked
about their interests in options such as QuickTime movie clips with daily news and sports
events on their pages. Yahoo hopes that QuickTime will entice users to follow a larger
number of hyperlinks so that it can attract more advertisers.
The newly customized page option with QuickTime links was made available to 100 Internet users who were randomly sampled from the target population. The prior benchmark
for Yahoo has been 6 non-Yahoo content links clicked on average by its members per visit.
It collected data on the 100 users over 1 week to see if the availability of the QuickTime
link options significantly changes the average non-Yahoo links clicked per visit.
Findings: After one week of access to the new customized homepage option with QuickTime links, Yahoo observes that the average number of non-Yahoo links for each customer
in the sample per visit is 7.8 links and the standard deviation is 9.5 links.
Answer the following:
(i). Draw a graph and test the Null Hypothesis that the customization with QuickTime
does NOT alter the average number of non-Yahoo links clicked. Use the customary
95% Confidence Interval (t critical value is 1.96). State whether you reject the null
hypothesis or not. Also compute the t statistic.
(ii). Draw a new graph and show the 95% Confidence Interval for the estimated mean
number of non-Yahoo links clicked in the sample of 100 customers - be precise as
far as where numerically the boundaries of the Confidence Interval lie? Does the
Confidence Interval include the previous average of 7.8 or not? How does your
answer to this last question relate to your answer to (i)?

MGT 403 Statistics PRACTICE PROBLEMS

Question 2
You have been hired to study executive compensation patterns. Your current project examines CEO salaries in the 1990s. You are curious whether some of the popular statements
about high CEO salaries during this time period are correct. You have collected data on
CEO salaries in the 90s - the data is in the STATA dataset ceosalary.dta (available on the
class website on Canvas).
A widely read commentator of the time is known to have stated that average CEO compensation in the 90s (your sample period) was 1.2 million. You want to test this hypothesis.
(i). Carry out the appropriate t test in STATA, just as we did in class and you did in
Problem Set 1, Question 2. What is the t value? Is the Null Hypothesis rejected or
not? What is the p value?

(ii). You can also carry out this kind of test manually in STATA. To do this run the
command summarize salary from the command line. This will show you the mean
of salary as well as its standard deviation. To proceed assume the distribution
for salary is a Normal distribution. Now compute the standard deviation of the
Test Statistic which is the average over the observations. To do this recall that the
standard deviation for the Test Statistic is:

=
N
where
is the estimated standard deviation of the underlying variable, and N is the size
of the sample. Once you compute this, go out 1.96
in either direction to construct
the Confidence Interval. Then check whether the Null Hypothesis value lies inside the
Confidence Interval or not.

MGT 403 Statistics PRACTICE PROBLEMS

Question 2 - Continued - Regression


Next we will run a regression to explore factors that may influence CEO Salaries. Run
a regression in STATA in which salary is the dependent variable and the independent
variables are: sales - sales of the company in the preceding few years; roe - return on
equity for the company in the preceding few years; indus - a dummy variable set to 1
if the company is in an industrial sector; finance - a dummy variable set to 1 if the
company is in the financial secctor; and utility - a dummy variable set to 1 if the company
is a utility (often regulated). Interpret your results. Then run the model without sales.
What is strange (or hard to intepret about these results compared to the results when
sales is included? How can we interpret this?

MGT 403 Statistics PRACTICE PROBLEMS

Question 3
Consider the VERY small dataset that consists of 3 datapoints:
X1 = 1.0 Y1 = 200.0
X2 = 2.0 Y1 = 145.0
X3 = 3.0
Use the formulas:
1 =

!N

Y1 = 20.0

i=1 (Xi X )(Yi


!N
2
i=1 (Xi X)

Y )

0 = Y 1 X
=
X

N
"
i=1

Xi

Y =

N
"

Yi

i=1

to compute 0 and 1 . Then compute residi for each datapoint and finally R2 .
After you have done this calculation manually, enter these 3 datapoints into STATA (or
it has already been done for you in the dataset Q3-practice on the Canvas website under
STATS/FEINSTEIN/STATA Datasets. Run the regress command and check your work.

Potrebbero piacerti anche