Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Submitted to
Submitted by
Section 2
October 2, 2020
Biplob Kumar Nandi
Lecturer
Department of Economics
East West University
Subject: Letter of transmittal for the term paper.
Dear Sir,
It is our pleasure to submit a report based on “Apple sales Data”, as a requirement for the course
curriculum. The purpose of writing this report to find out and analyze how Average Monthly
Sale is dependent on Average Monthly Advertising Cost Average Monthly Visit by People and
Number of Products Under Discount. Writing of this report was a great experience to us. Our
effort will be rewarded only if it adds value to reach literature.
Thank you for your guidance and supervision to fulfill this report.
Sincerely Yours,
Name ID
Nafiz Ahmed Nabil 2019-1-10-253
Sabbir Hasan Emon 2018-1-10-277
Md. Ardia Rabbi 2016-3-10-141
G M Toufiqur Rahman 2017-1-10-199
Jannatul Tanha Mim 2019-2-30-036
Acknowledgement
All the very beginning, we would like to express our deepest gratitude to Almighty ALLAH for
giving us the strength & the composure to complete this challenging analytical report. Words will
never be enough to express how grateful we are, but never the less we’ll try our level best to
The report might never be completed without the assistance of numerous articles, websites and
obviously the assistance we have got throughout the semester from our honorable instructor Biplob
We would like to express our special gratitude to our honorable faculty Biplob Kumar Nandi sir
for his supervision, guidance, co-operation and advices. We’ll be always in debt to him for the
valuable suggestions and the time that he has spent for guiding us throughout this report.
TABLE OF CONTENTS
Letter of Transmittal ...........................................................................................................................
Acknowledgement ..............................................................................................................................
Executive summary .............................................................................................................................
Introduction ......................................................................................................................................1
1.1 Model Building .............................................................................................................................. 1
1.2 Research Objective and Relevance ............................................................................................... 1
1.3 Interpretation of Mean, Standard Deviation, Skewness and Sample Size ................................... 2
Mean ..................................................................................................................................................... 2
Standard Deviation ............................................................................................................................... 2
Skewness ............................................................................................................................................... 2
Sample Size ........................................................................................................................................... 2
Analysis .............................................................................................................................................3
2.1 Estimate Regression Equation ...................................................................................................... 3
Expected sign ........................................................................................................................................ 3
Interpretation........................................................................................................................................ 3
Regression Statistics (goodness of fit / model fit) ................................................................................ 3
Graphical Proof: of model fitness: All of these graph shows actual & predicted 𝑌 close, almost fitted
so this also indicates that the model is a good fit................................................................................. 4
2.2 Test of Significance ....................................................................................................................... 5
T test: Individual test of significance for 𝑩𝟏𝒙𝟏 .................................................................................... 5
T test: Individual test of significance for 𝑩𝟐𝒙𝟐 .................................................................................... 6
T test: Individual test of significance for 𝑩𝟑𝒙𝟑 .................................................................................... 6
F test: overall test of significance for 𝑩𝟏𝒙𝟏, 𝑩𝟐𝒙𝟐, 𝑩𝟑𝒙𝟑 .................................................................. 7
2.3 Diagnosis of Multicollinearity ....................................................................................................... 7
Checking Multicollinearity .................................................................................................................... 8
The Remedy .......................................................................................................................................... 8
2.4 Residual Analysis ........................................................................................................................... 9
Normality check .................................................................................................................................... 9
Graphical proof: We have found from the normality plot that, over the time the sample percentile
corresponding apple outlet’s sales follows an increasing pattern. We can say, our probability plot
satisfies the normality assumption. [Bell shaped] .............................................................................. 10
2.5 Heteroscedasticity ...................................................................................................................... 11
Conclusion ....................................................................................................................................... 12
References..........................................................................................................................................
Table of Figures
Figure 1: This graph shows actual & predicted 𝒀 is close or almost fitted so it also proves that it
is indeed a good fit model. .............................................................................................................. 4
Figure 2: This graph shows actual & predicted 𝒀 is close or almost fitted so it also proves that it
is indeed a good fit model. .............................................................................................................. 4
Figure 3: This graph shows actual & predicted 𝒀 is close or almost fitted so it also proves that it
is indeed a good fit model ............................................................................................................... 5
Figure 4: Normality plot .............................................................................................................. 10
Figure 5: Residual plot between apple outlet's sales & Average monthly advertising cost ........ 11
Figure 6: Residual plot between apple outlet's sales & Average Monthly visits by people ........ 11
Figure 7: Residual plot between apple outlet's sales & Number of Products under discount ..... 12
Executive summary
In this report we found out how out and analyze how Average Monthly Sale is dependent on
Average Monthly Advertising Cost, Average Monthly Visit by People and Number of Products
Under Discount. To do so, we did find out the summary data from excel. Then we did analyze the
regression analysis. We found out the correlation, Goodness of fit, residual plot and
heteroscedasticity problem. Then we concluded with our report after thoroughly analyzing all the
data & recommend apple Inc. to open another outlet in the Dhaka city.
Introduction
There are 50 Apple outlets in Dhaka City. The company is going to open a new outlet. We have
done a market analysis for that Apple Company based on dependent variable average monthly
sale and independent variable average monthly advertising cost, average monthly visit by people
and number of products under discount.
1.1Model Building
Model is built based on research hypothesis. Research hypothesis comes from data. Four kinds of
data were given and they are dependent variable average monthly sale and independent variable
average monthly advertising cost, average monthly visit by people and number of products under
discount.
Summary Data
Average Average Number
Average monthly Monthly of
monthly Advertising visits by Products
Sale (in Cost (in people under
thousands) Value thousands) Value Value discount Value
1|Page
There are four variables:
Average Monthly Sale (Dependent 𝑌̂)
Average Monthly Advertising Cost (Independent 𝑏1 )
Average Monthly Visit by People (Independent 𝑏2 )
Number of Products Under Discount (Independent 𝑏3 )
Standard Deviation:
Standard deviation shows the fluctuations of variables. Higher the standard deviation more
fluctuation to be expected. It hampers the heteroscedasticity’s property.
Here, Average Monthly Visit by People variable has the highest standard deviation and expected
to be more fluctuating.
Skewness:
Here, all the skewness is positive and less than one (1). That shows mean value is greater than
median value. That shows all the variables are moderately right skewed.
Sample Size:
Sample Size for all the variables are 50.
2|Page
Analysis
2.1 Estimate Regression Equation
Coefficients
Intercept (𝒃𝟎 ) -224.60
̂ = 𝒃𝟎 + 𝒃𝟏 𝒙𝟏 + 𝒃𝟐 𝒙𝟐 + 𝒃𝟑 𝒙𝟑
𝒀
Expected sign:
There might be positive relation between 𝑥1 (average monthly advertising costs) &
𝑌̂ (average monthly sales) coefficient of 𝑥1 is positive.
There might be positive relation between 𝑥2 (visits) & 𝑌̂ (average monthly sales) as
coefficient 𝑥2 is positive.
There might be positive relation between 𝑥3 (numbers of products under) & 𝑌̂ (average
monthly sales) as coefficient x3 is positive.
Interpretation:
𝑏1 = 17.14 indicates that when the advertising cost increases by 1 thousands, the monthly
sales will increase by 17.14 thousands holding other factors (average monthly visits by
people, number of products under discount) constant.
𝑏2 = 0.03 indicates that when the average monthly visit increases by 1unit, the monthly
sales will increase by 0.03 thousands holding other factors (average monthly advertising
costs, number of products under discounts) constant.
𝑏3 = 1.16 indicates that when the Number of Products under discount increases by 1unit,
the monthly sales will increase by 1.16 thousands holding other factors (average monthly
visits by people, average monthly advertising costs) constant.
Regression Statistics (goodness of fit / model fit)
3|Page
𝐹𝑟𝑜𝑚 𝑡ℎ𝑒 𝑒𝑥𝑐𝑒𝑙 𝑜𝑢𝑡𝑝𝑢𝑡 𝑤𝑒 𝑔𝑜𝑡, 𝑅 2 = 0.97, meaning that 97% variation sales of apple outlets
explained by average monthly average advertising cost, average monthly visit by people & number
of products under discount. So, the model is a Good Fit as 𝑅 2 value is closer to 1
Graphical Proof: of model fitness: All of these graph shows actual & predicted 𝑌̂ close, almost
fitted so this also indicates that the model is a good fit.
600
500
400
300
200
100
0
0 5 10 15 20 25 30 35
500
400
300
200
100
0
0 5 10 15 20 25 30 35
̂ is close or almost fitted so it also proves that it is indeed a good fit model.
Figure 2: This graph shows actual & predicted 𝒀
4|Page
Average Monthly visits by people Line Fit Plot
600
AVERAGE MONTHLY SALE (IN THOUSANDS)
500
400
300
200
100
0
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
First of all, we want to test the positive relationship between Average monthly advertising cost &
sales of apple outlets is statistically significant or not.
Hypothesis:
𝑯𝟎 : 𝑩𝟏 = 0 [we expect that there is no positive relation between Average monthly
advertising cost & sales of apple outlets]
𝑯𝒂 : 𝑩𝟏 ≠ 0 [we expect that there is a positive relation between Average monthly
advertising cost & sales of apple outlets]
5|Page
T test:
From the excel output we got,
Probability value approach: t probability value is 0.00 < 0.05 [alpha] we reject the null
means its statistically significant at 1% & 5% level
Critical value approach: t statistics value [which is 4.28] > t α/2, (n-4) [which is 3.515]
we reject the null means it’s statistically significant.
So, we can say the positive relationship between Average monthly advertising cost & sales is
statistically significant. We can further use the model for prediction purpose.
Then, we want to test if the positive relationship between Average Monthly visits by people &
sales of apple outlets is statistically significant or not.
Hypothesis:
𝑯𝟎 : 𝑩𝟐 = 0 [we expect that there is no + relation between Average Monthly visits by people
& sales of apple outlets]
𝑯𝒂 : 𝑩𝟐 ≠ 0 [we expect that there is a + relation between Average Monthly visits by people
& sales of apple outlets]
T test:
From the excel output we got,
Probability value approach: t probability value is 0.24> 0.05 [alpha] we can’t reject the
null means it’s not statistically significant
Critical value approach: t statistics value [which is 1.19 < t α/2,(n-4) [which is 3.515] we
can’t reject the null & it’s not significant at 1 or 5% level
So we can say the positive relationship between Average monthly visits by people & sales is not
statistically significant. We can't further use the model for prediction purpose without fixing the
error.
Finally, we want to test if the positive relationship between Number of Products under discount &
sales of apple outlets is statistically significant or not.
Hypothesis:
𝑯𝟎 : 𝑩𝟑 = 0 [we expect that there is no + relation between Number of Products under
discount & sales of apple outlets]
𝑯𝒂 : 𝑩𝟑 ≠ 0 [we expect that there is a + relation between Number of Products under discount
& sales of apple outlets]
T test:
From the excel output we got,
Probability value approach: t probability value is 0.62 > 0.05 [alpha] we can’t reject the
null means it’s not statistically significant & it’s not significant at 5% level
6|Page
Critical value approach: t statistics value [which is 0.50 < t α/2, (n-4) [which is 3.515]
we can’t reject the null.
So we can say the positive relationship between Number of Products under discount & sales is not
statistically significant. We can't further use the model for prediction purpose without fixing the
error.
Hypothesis:
𝑯𝟎 : 𝑩𝟏 = 𝑩𝟐 = 𝑩𝟑 = 0 [Meaning, Sales of apple outlets are jointly explained by Average
Monthly Advertising Cost, Average Monthly Visits by people & number of products under
discount]
𝑯𝒂 : 𝑩𝟏 ≠ 𝑩𝟐 ≠ 𝑩𝟑 ≠ [Meaning, Sales of apple outlets are not jointly explained by Average
Monthly Advertising Cost, Average Monthly Visits by people & number of products under
discount]
F stat:
From the excel output we got,
Probability value approach: F probability value is 0.00 < 0.05 [alpha] we reject the null
Critical value approach: F statistics [which is 580.54] > Fα,(n-4) [which is 2.81]
statistically significant at 5% level
So, that means that null hypothesis is rejected as well as the Average monthly advertising cost,
average monthly visits by people & number of products under discount are jointly significant. This
model is overall significant.
Moreover, as we have multicollinearity problem so we are experiencing an increased F value
[which is 580.54] along with an increased 𝑅 2 value & SSR. On the other hand we got x2 & x3
insignificant because of that.
7|Page
other & if the problem exists then using this model we cannot determine the separate effect of any
particular independent variable on the dependent variable (average monthly sales)
Checking Multicollinearity:
From the correlation result, we have found out that, Average Average
coefficient among independent variables are more monthly Monthly Number of
than 0.7 as (0.98, 0.97, 0.99 >0.70) or all of the Advertising visits by Products
independent variables are highly correlated to each Cost people under
other. discount
The Remedy:
We can't reduce the insignificant variable x2 (average monthly visit by people) & x3
(number of products under discount) even though they are insignificant. Because these two
variables are inseparable parts of sales of apple outlets. As we know until visits the stores
then in store sales of apple products can’t be possible. Also, products under discount has a
major shift in customers buying behavior as they rush in the store to grab the discounted
items & in the case of apple products discount the impact is much more higher. Also, we
will lose the goodness of fit as the value of 𝑅 2 will be reduced.
We can increase the sample size if we can collect but only if there are more apple outlets
data available. Because if there are only 50 apple outlets available in the Dhaka City &
then there is no way we can get more samples until we consider outside Dhaka City outlets
but in that case the focus of our analysis will be diverted as we only want to know about
Dhaka City at the moment.
Finally, the only possible option we have on our hand is to use proxy variables. We can
add variables like average number of monthly apple website visit by local people, average
number of monthly apple products added to cart by website visitors or even dummy
variables like number of outlets in Gulshan, Baridhara DOHS (areas where inhabitant's
average monthly incomes are high) -- Ultimate solution
8|Page
2.4 Residual Analysis
Normality check
9|Page
35 437.12 15.88 1.01
36 405.25 -43.25 -2.74
37 436.76 -4.76 -0.30
38 245.10 18.90 1.20
39 234.67 6.33 0.40
40 315.48 -3.48 -0.22
41 414.65 -31.65 -2.00
42 229.77 -6.77 -0.43
43 307.33 1.67 0.11
44 414.76 -10.76 -0.68
45 284.79 9.21 0.58
46 499.37 7.63 0.48
47 471.24 8.76 0.55
48 349.30 -2.30 -0.15
49 409.29 -35.29 -2.23
50 282.85 2.15 0.14
Table 7: Normality checking
We found out that, more than 85% of the standard residuals lies between the ranges of -2 to +2, so
we can say that this model satisfies the normal distribution
Normal Probability Plot
600
Average monthly Sale(in thousands)
500
400
300
200
100
0
0 20 40 60 80 100 120
Sample Percentile
Graphical proof: We have found from the normality plot that, over the time the sample percentile
corresponding apple outlet’s sales follows an increasing pattern. We can say, our probability plot
satisfies the normality assumption. [Bell shaped]
10 | P a g e
2.5 Heteroscedasticity
Residual plot between apple outlet's sales & Average monthly advertising
cost
40.00
30.00
20.00
10.00
Residual
0.00
-10.00
-20.00
-30.00
-40.00
-50.00
Average monthly advertising cost
Figure 5: Residual plot between apple outlet's sales & Average monthly advertising cost
Now, analyzing figure 5 plot we have found out that the variance of residual is the same
for all values of 𝑥1 (Average monthly advertising cost) & the assumed regression model is
an adequate representation of the relationship between the variable as the residual plot
gives an overall impression of a horizontal band of points even though it doesn’t look so
perfect but it is close to in horizontal pattern.
Residual plot between apple outlet's sales & Average Monthly visits by
40.00 people
30.00
20.00
10.00
Residual
0.00
-10.00
-20.00
-30.00
-40.00
-50.00
Average Monthly visits by people
Figure 6: Residual plot between apple outlet's sales & Average Monthly visits by people
After that, analyzing Figure 6 plot we have found out that the variance of residual is the
same for all values of 𝑥2 (Average Monthly visits by people) & the assumed regression
model is an adequate representation of the relationship between the variable as the residual
plot gives an overall impression of a horizontal band of points which is more in horizontal
form than the previous plot.
11 | P a g e
Residual plot between apple outlet's sales & Number of Products under
discount
3.00
2.00
1.00
Resiual
0.00
-1.00
-2.00
-3.00
Number of Products under discount
Figure 7: Residual plot between apple outlet's sales & Number of Products under discount
Finally, After analyzing Figure 7 plot we have found out that the variance of residual is
the same for all values of 𝑥3 (Number of Products under discount) & the assumed
regression model is an adequate representation of the relationship between the variable as
the residual plot gives a good overall impression of a horizontal band of points which more
in horizontal form than last two plots. (Williams, December 1, 2001)
Conclusion
From our data we have learned that, as the average monthly visit increases, the monthly sales also
increase. We have also found that, the increase in volume of advertising cost also increases average
monthly sales. Along with that, when the number of Products under discount increases, the
monthly sales will increase too.
Though we found error while doing T statistics, after doing F statistics we can conclude that
average monthly advertising cost, average monthly visits by people & number of products under
discount are jointly significant and have a positive relation with the volume of sales. Still there
was multicollinearity problem. So to fix that we used proxy variables like average number of
monthly apple website visit by local people, average number of monthly apple products added to
cart by website visitors or even dummy variables like number of outlets in areas where inhabitant's
average monthly incomes are high.
After that, using the normality plot we have found that as the number of apple outlet increases, the
number of sales follows an increasing pattern with that.
So finally after thoroughly analyzing all the data, we would recommend apple Inc. should open
another outlet in the heart of Dhaka city.
12 | P a g e
References
Williams, S. &. (December 1, 2001). Statistics for Business and Economics by Anderson. South-
Western Pub.