Sei sulla pagina 1di 7

MAS 766: Linear Models: Regression

Spring 2020 Team Assignment #3 Due: Wed. Apr 8. 2020

1. Start each problem on a separate page.


2. You should show complete work to receive full credit.
3. For computer output, cut and paste only the necessary details.
Question 1:
The gross domestic product (GDP) and imports (IMPORTS) for 25 countries are available in a
file named IMPORTS6. Construct the scatterplot of IMPORTS versus GDP. Run the regression
using IMPORTS as the dependent variable and GDP as the explanatory variable. Plot the
standardized residuals versus the fitted values and the explanatory variable. Use the results to
help answer the following questions:
a. Do any of the assumptions of the regression model appear to be violated? If so, which
one (or ones)? Justify your answer.

The assumption “there are no outliers” is violated. From the graph, it can be seen there is
one observation whose GDP is way higher than others.
b. Construct the scatterplot and rerun the regression with the United States omitted.
Construct the residual plots for this new regression. Do the results appear any different
from the original regression results? If so, how do they differ? Do you prefer the original
results or the results with the United States omitted? On what do you base your choice?
Do there still appear to be problems with this regression?

Comparing with the previous one, there are no outliers in this graph. I prefer the original one
because the coefficient of GDP in original regression is 0.10567, but when we remove United
States from the dataset, it changes to 0.0763, so this outlier matters. We can not just remove it.
From the above plot, it can be seen heteroscedasticity problem may exist in this dataset.

c. Try to develop a curvilinear model using the original data (with the United States included)
that provides improved results over the linear model. Be sure to examine the residual plots from
the curvilinear model to see if any regression assumptions are violated for this model.

We take log transformation for y and run the regression again. From the output below, it can be
seen both coefficients are significant, however, in the original regression, constant is
insignificant with 0.258 p-value. And comparing with S value, 87.0031, in original model, the S
value is only 1.91 in the new model.

When looking at the plot below, no assumptions seem to be violated.


Question 2:
Work-Order Closing. Management at the Texas Christian University (TCU) Physical Plant is
interested in reducing the average time to completion of routine work orders. The time to
completion is defined as the difference between the date of receipt of a work order and the date
closing information is entered. The number of labor hours charged to each work order and the
cost of materials are two variables believed to be related to the time to completion of the work
order. Management wants to know if there is any difference in the time to completion of work
orders, on average, for different types of buildings. Buildings are classified into four types one
the TCU campus: residence halls, athletic, academic, and administrative. In answering the
question, take into account the possible effect of labor hours charged and materials cost. The data
for a random sample of 72 work orders (chosen from a population of 11,720) are available in a
file named WORKORD7. The variables are as follows:
Y=DAYS= number of days to complete each work order
x_1=HOURS= number of hours of labor charged to each work order
x_2= MATERIAL= cost of materials charged to each work order
x_3= BUILDING= 1 for residence halls
2 for athletic buildings
3 for academic buildings
4 for administrative buildings
Develop a regression model to answer the question that the management is interested in.

We run a regression on days versus hours, materials and 3 types of buildings, keeping building 4
as a reference variable.

When looking at p-value for each coefficient, we find that only hours is significant and all the
other variables are insignificant. So every additional hour of labor charged will increase the
average time to completion of routine work orders by 0.81 days. And there is no sufficient
evidence to indicate there is any difference for different types of buildings.
Question 3:
Techcore is a high-tech company located in Fort Worth, Texas. The company produces a part
called a fiber -optic connector (FOC) and wants to generate forecasts of the sales of FOCs over
time. The company has weekly sales data for the past 265 weeks. The data in the file FOC8, are
in the following columns: SALES, MONTH (numbers the month in which the observations were
taken), FOV, COMPOSITE, INDUSTRIAL, TRANS, UTILITY, FINANCE, PROD, and
HOUSE. (The SALES and FOV data have been disguised to provide confidentiality.) The
variables are defined as follows:

FOV: Sales of a complementary product; sales of FOV are much easier to forecast than FOC
Sales
COMPOSITE: Friday close of the NYSE Composite Index
INDUSTRIAL: Friday close of the NYSE Industrial Stocks
TRANS: Friday close of the NYSE Transportation Stocks
UTILITY: Friday close of the NYSE Utility Stocks
FINANCE: Friday close of the NYSE Financials Stocks
PROD: Industrial Production – computers, communications equipment, and semiconductor, not
Seasonally adjusted
HOUSE: Monthly housing permits in thousands, seasonally adjusted rates

You have been hired as a consultant to Techcore to help build the forecasting model. Create a
two part report for Techcore that includes an executive summary with the essential nontechnical
results of your study and a technical report that contains the details of your model building
process.
So we include lag1 and lag5 of FOV into the model and run a regression on FOV sales versus
PCS1, FOVlag1 and FOVlag5. We get the following regression model. It can be seen all VIF are
less than 10, indicating there is no collinearity.

When we look at partial autocorrelation of PCS1, we find lag1 is almost 1, so we take the
difference for PCS1 and store the value as PCS1df.

Potrebbero piacerti anche