Sei sulla pagina 1di 8

DS Assignment

Question 1: Regression model to predict price of “Bet performer”

Compass Maritime Services is a S&P Broker who specializes in ships and offshore vehicles. In
order to conduct its day to day business CMS needs to have a good and accurate model to
calculate the market value of ships and offshore vehicles. There are multiple approaches to
estimating the price of a ship
 Market approach: The estimated value equalled that market price of a recently
completed sale of a comparable ship between willing and knowledgeable buyer and
seller
 Income approach: According to this technique the price was calculated based on the
net present value of the future cash flows that the ship could generate. A major
determinant of the cash flows was the daily charter rate.
 Cost approach: This technique stated that the cost of the ship equalled the cost of
replacing a ship and its functionality. Generally used to value ships which had unique
functionalities or customized features.

As part of the case we are required to predict the market price of a capsize bulk carrier named
“Bet Performer”. It had been sold two years earlier for $70 million. We will be using the
Market approach to estimate the market price of this vessel. We’ll be using the data available
in Appendix 1 which lists the various capsizes sold in the last year who are of similar
specifications to the Bet Performer. We’ll be using multi linear regression model to estimate
the selling price of Bet Performer.

Before proceeding with building the model we plotted all the available variable pairwise to
get a rough understanding of the type of relationship that could exist between the variables
The first row of the above plots is of major importance to us as we want to predict the sales
price of the ship. These graphs reveal the following insights
 Sales prices seems to be direct co-related to year of build and seems to be higher for
more recent build
 Sales price seems to be negatively correlated to the age of ship at time of sale. It also
seems as if there is co-linearity between Year of Build and Age at Sale, which also
seems intuitive as age would be higher for ships built earlier so the relation with age
should be opposite to that of the year of build.
 Sales prices seems to be higher for a particular rage of size of the ship measured in
Dead weight tons as the graph seems to spike for a particular range and the price
falls again for higher values
 The relationship of sales price with Average Monthly Baltic Capesize Index is not very
evident from the graph.

We ran the data through the linear regression feature of SPSS using step-wise method and
we observed that our initial insight about year of build and age was correct. Year of build
was removed by the SPSS software in the step-wise approach as it didn’t add any significant
value to the model which already had the age. Below is the statistics generated from the
regression run
Model Summaryf

Model R R Square Adjusted R Square Std. Error of the


Estimate

1 .808a .653 .646 20.1722

2 .950b .903 .898 10.8035

3 .956c .915 .909 10.2302

4 .961d .924 .917 9.7476

5 .959e .920 .915 9.8818

a. Predictors: (Constant), Yearbuilt


b. Predictors: (Constant), Yearbuilt, AMBCI
c. Predictors: (Constant), Yearbuilt, AMBCI, DWT
d. Predictors: (Constant), Yearbuilt, AMBCI, DWT, Ageatsale
e. Predictors: (Constant), AMBCI, DWT, Ageatsale
f. Dependent Variable: Salesprice

The row depicting the final value has been bolded. The R squrare of the generated model in
0.92 which is very high and suggests that we were able to explain quite a lot of the
variability of the sales price with this model.

The below table tabulates the value of the coefficient for the various dependent variables as
the model was being constructed in step-wise approach one variable at a time.
Coefficientsa

Model Unstandardized Coefficients Standardized t Sig.


Coefficients

B Std. Error Beta

(Constant) -8551.875 926.281 -9.232 .000


1
Yearbuilt 4.328 .465 .808 9.311 .000
(Constant) -9538.366 504.514 -18.906 .000
2 Yearbuilt 4.796 .253 .896 18.979 .000
AMBCI .007 .001 .507 10.741 .000
(Constant) -8965.863 530.303 -16.907 .000
Yearbuilt 4.491 .269 .839 16.698 .000
3
AMBCI .007 .001 .492 10.905 .000
DWT .237 .095 .123 2.487 .017
(Constant) 15926.473 10660.796 1.494 .142
Yearbuilt -7.917 5.314 -1.479 -1.490 .144
4 AMBCI .008 .001 .599 9.540 .000
DWT .260 .091 .136 2.854 .007
Ageatsale -12.503 5.349 -2.335 -2.338 .024
(Constant) 44.222 16.384 2.699 .010

AMBCI .007 .001 .531 12.050 .000


5
DWT .242 .092 .126 2.643 .011

Ageatsale -4.544 .261 -.849 -17.377 .000

a. Dependent Variable: Salesprice

From these values the resultant linear regression model generated is –

Sales price^ = 44.222 + .007* (Avg. Monthly Baltic Cap. Index) + .242*(Dead weight tons) -
4.544* (Age at sale)

Another possible independent variable we could have used is the make of the engine on the
ship. Bet performer Burmiester and Wain 6S70MC engine. Details like the capacity and
brand of the engine could play a role in determining the sales price of the vessel. But we do
not have sufficient data on these line about the other ships and hence we can’t generate a
model to test out our hypothesis
Question 2: Diagnostic check to perform validity of the model

Below is the Anova table generated for the entire model having the three independent
variables along with the R square values for the generated model.

Model Summary

R R Square Adjusted R Std. Error of the


Square Estimate

.959 .920 .915 9.8818

The R Square value is 0.92 which is very high and hence suggests that the model is quite
accurate in predicting the Sales Price of the ships

ANOVA

Model Sum of Squares df Mean Square F Sig.

Regression 49701.552 3 16567.184 169.660 .000

Residual 4296.566 44 97.649

Total 53998.118 47

The Anova on the other hand reveals the F test for this regression model and can be used to
test the validity of the proposed model. The below hypothesis can be tested using it

H0: 0=1=2=0
Ha: At least one of the coefficients is non-zero and the model isn’t completely useless
Where 0,1&2 are the coefficient for the three dependent variables: AMBCI,DWT,Age

The F test statistic value for this Anova test is 169.660 calculated as
F = MSR/MSE.
The p value is 0.000 which suggests that the null hypothesis can be rejected and hence it
can be said that not all coefficients are zero and our model seems to be correct. This tests
the overall model

To test the validity of the coefficients of the independent variables individually we use t-test
via Hypothesis of the form

H0: i=0
Ha: i0

For all these t-test the test statistic Is of the following form
test statistic (t) = bi/si
where bi is the ith coefficient and si is the standard error if the bi. Below is the table
generated using SPSS for the coefficient validations
Coefficients

Model Unstandardized Coefficients Standardized t Sig.


Coefficients

B Std. Error Beta

(Constant) 44.222 16.384 2.699 .010

AMBCI .007 .001 .531 12.050 .000


1
DWT .242 .092 .126 2.643 .011

Ageatsale -4.544 .261 -.849 -17.377 .000

From the above table it can be seen that at 5% level of significance all the I are significant
and cannot be considered to be as zero. Though at 1% level of significance we would have
had to remove DWT from this linear regression model for predicting the Sales prices.

Question 3: Verify if any transformation of the variables can help improve the model
performance

Model 1: No variable transformation


The model used above without any variable transformations has the following performance
metrics

R square = 0.920 : It’s a quite high value which implies that the current model is able to
explain quite a significant amount of the variability of the Sales price from the average sales
price of all the samples available

Plot of residuals –

Also we do not see any pattern in the plot of the residuals, it seems randomly distributed
which further strengthens the validity of this model
P-P plot of expected vs observed : the expected probability seems to be following the
observed probabilities

Model 2: Regression using transformed AMBCI (capsize index) to Ln(AMBCI)

Model Summaryd

Model R R Square Adjusted R Std. Error of the


Square Estimate

1 .959c .920 .914 9.9240

With this transformation there is no improvement in R Square value of the model it remains
at 0.92. So, we don’t see any improvement with this variable transformation from the R
Square metric.

Plot of residuals from this model : The spread of residuals seems the same as the previous
model and is randomly distributed so we don’t observe any improvement on this front as
well
P-P plot of observed vs expected: The seems to be a slightly lower deviation from the
normal probabilities in the lower half as compared to the previous model but at the same
time the deviation from normal probabilities has increased at the higher end.

Model 3: Regression using transformed Salesprice variable to Sqrt (Salesprice)

Model Summary

Model R R Square Adjusted R Std. Error of the


Square Estimate

1 .968d .937 .931 .51216

With this transformation we see an increase in the R square value of the overall model to
0.937. This seems to be a slight improvement on the previous two models.

Plot of residuals: The spread seems to be similar to that of previous model but at the same
time we can see that the maximum residual error has increased in this model to -4 which
seems to be a disadvantage of this model over the previous to models
P-P Plot of observed vs expected: Even the P-P plot shows greater deviation from the
normal distribution with this model. Therefore, even on this front the we see that this
model performs worse than the previous two models.

Thus, after comparing the three models with two involving variable transformation we have
decided to go with the original model without any variable transformation. As none of the
transformed models seems to add any significant improvement over the original model
without causing equally negative impacts on other fronts.

Potrebbero piacerti anche