Sei sulla pagina 1di 4

Assignment

Eudora
February 9, 2016
Reading the CSV file for the for a large metropolitan newspaper, information was obtained from a sample of
34 newspapers concerning their daily and Sunday circulations (in thousands)
NewsCirculation <- read.csv("NewsData.csv")
NewsCirculation
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Newspaper
Daily
Sunday
1
Baltimore Sun 391.952 488.506
2
Boston Globe 516.981 798.298
3
Boston Herald 355.628 235.084
4
Charlotte Observer 238.555 299.451
5
Chicago Sun Times 537.780 559.093
6
Chicago Tribune 733.775 1133.249
7
Cincinnati Enquirer 198.832 348.744
8
Denver Post 252.624 417.779
9
Des Moines Register 206.204 344.522
10
Hartford Courant 231.177 323.084
11
Houston Chronicle 449.755 620.752
12
Kansas City Star 288.571 423.305
13
Los Angeles Daily News 185.736 202.614
14
Los Angeles Times 1164.388 1531.527
15
Miami Herald 444.581 553.479
16
Minneapolis Star Tribune 412.871 685.975
17
New Orleans Times-Picayune 272.280 324.241
18
New York Daily News 781.796 983.240
19
New York Times 1209.225 1762.015
20
Newsday 825.512 960.308
21
Omaha World Herald 223.748 284.611
22
Orange County Register 354.843 407.760
23
Philadelphia Inquirer 515.523 982.663
24
Pittsburgh Press 220.465 557.000
25
Portland Oregonian 337.672 440.923
26
Providence Journal-Bulletin 197.120 268.060
27 Rochester Democrat & Chronicle 133.239 262.048
28
Rocky Mountain News 374.009 432.502
29
Sacramento Bee 273.844 338.355
30
San Francisco Chronicle 570.364 704.322
31
St. Louis Post-Dispatch 391.286 585.681
32
St. Paul Pioneer Press 201.860 267.781
33
Tampa Tribune 321.626 408.343
34
Washington Post 838.902 1165.567

Question a:
Draw a scatter plot of Sunday circulation (vertical axis) versus Daily circulation (horizontal axis). Does the
plot indicate that there is a linear relationship between Daily and Sunday circulation? Why might you have
1

thought that there could be such a relationship before looking at the data?

1500
1000
500

NewsCirculation$Sunday

plot(NewsCirculation$Daily, NewsCirculation$Sunday)

200

400

600

800

1000

1200

NewsCirculation$Daily
The plot does suggest a linear relationship between Daily and Sunday circulations. This could have been
expected given that a daily subscriber or purchaser of a newspaper is also a likely sunday subscriber or
purchaser.

Question b:
Fit a regression predicting Sunday circulation from Daily circulation.
NewsCirculation.RegressionModel <- lm(Sunday ~ Daily, data = NewsCirculation)
NewsCirculation.RegressionModel
##
##
##
##
##
##
##

Call:
lm(formula = Sunday ~ Daily, data = NewsCirculation)
Coefficients:
(Intercept)
13.84

Daily
1.34

Fromt the result it appears that every Daily unit is associated with 1.34 Sunday units.
2

Question c:
Obtain 95% confidence intervals for ??0 and ??1.
confint(NewsCirculation.RegressionModel, level = 0.95)
##
2.5 %
97.5 %
## (Intercept) -59.094743 86.766003
## Daily
1.195594 1.483836

Question d:
Is Daily circulation significantly related to Sunday circulation? Carefully state the hypotheses you are testing
here, and the test you are using.
summary(NewsCirculation.RegressionModel)
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##
##

Call:
lm(formula = Sunday ~ Daily, data = NewsCirculation)
Residuals:
Min
1Q
-255.19 -55.57

Median
-20.89

3Q
62.73

Max
278.17

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.83563
35.80401
0.386
0.702
Daily
1.33971
0.07075 18.935
<2e-16 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 109.4 on 32 degrees of freedom
Multiple R-squared: 0.9181, Adjusted R-squared: 0.9155
F-statistic: 358.5 on 1 and 32 DF, p-value: < 2.2e-16

Though the B0 intercept is not statistically significant, the Daily impact on Sunday Circulation, B1, is
significant. Used F and T- test for drawing the conclusions.

Question e:
Proportion of the variability in Sunday circulation is accounted for by Daily circulation?
From the summary drawn in question d, the R-squared is 91.81%, so Daily Circulation is 91.8% of Sunday
Circulation variability

Question f:
Provide an interval estimate (based on a 95% level) for the true average Sunday circulation of newspapers
with a Daily circulation of 600,000.

DialyCirculation_Assume <- data.frame(Daily = c(600))


predict(NewsCirculation.RegressionModel, DialyCirculation_Assume, interval = "confidence", level = 0.95)
##
fit
lwr
upr
## 1 817.6645 772.3369 862.9921
With daily circulation of 600,000, the epxected Sunday circulation would be between 772,000 and 863,000
with 95% certainty

Question g:
The particular newspaper that is being considered as a candidate for a Sunday edition has a Daily circulation
of 600,000. Provide an interval estimate for the predicted Sunday circulation of this newspaper. How does
this interval compare to the interval in part (f)? From an intuitive point of view, why is this what you would
expect?
DialyCirculation_Assume <- data.frame(Daily = c(600))
predict(NewsCirculation.RegressionModel, DialyCirculation_Assume, interval = "predict", level = 0.95)
##
fit
lwr
upr
## 1 817.6645 590.2185 1045.11
An interval estimate for the predicted Sunday circulation of this is given by 590000 and 1045000
The estimate for the individual newspaper with 600,000 in Daily Circulation is much broader than for the set
of newspapers with 600,000 in Daily Circulation. The variability is greater than that obtained in a sample of
similar newspapers

Question h:
Another newspaper being considered as a candidate for a Sunday edition has a Daily circulation of 2,500,000.
Provide an interval estimate for the predicted Sunday circulation of this newspaper. How does this interval
compare to the interval in part (g)? Why is this what you would expect?

NewNewsPaper_DailyCirculation <- data.frame(Daily = c(2500))


predict(NewsCirculation.RegressionModel, NewNewsPaper_DailyCirculation, interval = "predict", level = 0.
##
fit
lwr
upr
## 1 3363.123 2988.881 3737.364
The predicted Sunday circulation of this new newspaper is between 2,988,000 and 3,734,000.
This interval is beyond than the single paper at 600,000 in Daily Circulation because the 2.5M circulation is
much greater than the average daily circulation of 600,000. It is more than twice the highest circulation in
the original data set, and there is less confidence that the relationship holds at such extreme levels, so the
95% interval is necessarily broader.
4

Potrebbero piacerti anche