Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
The assumption “there are no outliers” is violated. From the graph, it can be seen there is
one observation whose GDP is way higher than others.
b. Construct the scatterplot and rerun the regression with the United States omitted.
Construct the residual plots for this new regression. Do the results appear any different
from the original regression results? If so, how do they differ? Do you prefer the original
results or the results with the United States omitted? On what do you base your choice?
Do there still appear to be problems with this regression?
Comparing with the previous one, there are no outliers in this graph. I prefer the original one
because the coefficient of GDP in original regression is 0.10567, but when we remove United
States from the dataset, it changes to 0.0763, so this outlier matters. We can not just remove it.
From the above plot, it can be seen heteroscedasticity problem may exist in this dataset.
c. Try to develop a curvilinear model using the original data (with the United States included)
that provides improved results over the linear model. Be sure to examine the residual plots from
the curvilinear model to see if any regression assumptions are violated for this model.
We take log transformation for y and run the regression again. From the output below, it can be
seen both coefficients are significant, however, in the original regression, constant is
insignificant with 0.258 p-value. And comparing with S value, 87.0031, in original model, the S
value is only 1.91 in the new model.
We run a regression on days versus hours, materials and 3 types of buildings, keeping building 4
as a reference variable.
When looking at p-value for each coefficient, we find that only hours is significant and all the
other variables are insignificant. So every additional hour of labor charged will increase the
average time to completion of routine work orders by 0.81 days. And there is no sufficient
evidence to indicate there is any difference for different types of buildings.
Question 3:
Techcore is a high-tech company located in Fort Worth, Texas. The company produces a part
called a fiber -optic connector (FOC) and wants to generate forecasts of the sales of FOCs over
time. The company has weekly sales data for the past 265 weeks. The data in the file FOC8, are
in the following columns: SALES, MONTH (numbers the month in which the observations were
taken), FOV, COMPOSITE, INDUSTRIAL, TRANS, UTILITY, FINANCE, PROD, and
HOUSE. (The SALES and FOV data have been disguised to provide confidentiality.) The
variables are defined as follows:
FOV: Sales of a complementary product; sales of FOV are much easier to forecast than FOC
Sales
COMPOSITE: Friday close of the NYSE Composite Index
INDUSTRIAL: Friday close of the NYSE Industrial Stocks
TRANS: Friday close of the NYSE Transportation Stocks
UTILITY: Friday close of the NYSE Utility Stocks
FINANCE: Friday close of the NYSE Financials Stocks
PROD: Industrial Production – computers, communications equipment, and semiconductor, not
Seasonally adjusted
HOUSE: Monthly housing permits in thousands, seasonally adjusted rates
You have been hired as a consultant to Techcore to help build the forecasting model. Create a
two part report for Techcore that includes an executive summary with the essential nontechnical
results of your study and a technical report that contains the details of your model building
process.
So we include lag1 and lag5 of FOV into the model and run a regression on FOV sales versus
PCS1, FOVlag1 and FOVlag5. We get the following regression model. It can be seen all VIF are
less than 10, indicating there is no collinearity.
When we look at partial autocorrelation of PCS1, we find lag1 is almost 1, so we take the
difference for PCS1 and store the value as PCS1df.