Sei sulla pagina 1di 11

Cold Storage Case Study

Data Analysis Report

Descriptive Data Analysis:


 Setting Working Directory.

setwd ("E:/BABI Study Materials/Project1-Cold-Storage-CaseStudy")

 Reading Cold_Storage_Temp_Data.csv file in R.

colddata = read.csv("Cold_Storage_Temp_Data.csv")

 External Library Used :


1. gglopt2
2. dplyr

load then to r session by “library(dplyer)” Command

 Exploring Columns names of colddata dataset

names(colddata) ## There are four Columns in the dataset


 Explore data Structure.

str(colddata)

1. Colddat is a data frame with 365 Observation and 4 Variables


2. Variable Season and Month are factors (Categorical Variable)
3. Variables Date is integer and Temperature is numeric

 Verify Data summary

summary(colddata$Month)

summary(colddata)

dim(colddata)

1. There are three Season Rainy, summer, winter with 122,120 and 123 days respectively.
2. Date ranges between 1 to 31 as usual.(Normally Distributed). Temperature varies
between 1.700 to 5.000 with Mean 2.963 and Median 2.900.(Normally Distributed)
3. Data. Frame is 365 by 4 dataset.

 Checking for null/missing values.

anyNA(colddata)
1. There are no null values in the dataset

 Box Plot of Temperature To check for outliers

ggplot(colddata, aes(x = 1, y = colddata$Temperature)) + geom_boxplot(fill = "orange",col = "Blue") +

ylab("Temperature") + ggtitle("BoxPLot of Temperatue with outlier value") +

stat_summary(aes(label = round(stat(y),1)),color="red",geom = "text", fun.y = function(y){

o <- boxplot.stats(y)$out; if (length(o) == 0) NA else o

},hjust = -1) + coord_flip()

boxplot(colddata$Temperature,plot = F)$out

1. we have two outliers value in Temperature column (values 5 and 5)

 Boxplot of Temperature at Season Level

ggplot(colddata, aes(y = colddata$Temperature, x = colddata$Season)) + geom_boxplot(fill =


c("blue","brown","grey")) +
xlab("Season") + ylab("Temperature") + ggtitle("Comperative BoxPlot SeasonVsTemperature") +

stat_summary(aes(label = round(stat(y),1)),color="red",geom = "text", fun.y = function(y){

o <- boxplot.stats(y)$out; if (length(o) == 0) NA else o

},hjust = -1)boxplot(colddata$Temperature~colddata$Season)$out

1. This shows that we have outliers at Season level as well.


2. This shows if we categories the Temperature at Season level there are total 5 outliers.
3. Temperature In rainy season well spread. Temperature in winter and summer lower
and higher as expected.

 Bar plot of Season shows all three season almost equally distributed throughout the year.

ggplot(colddata, aes(x = colddata$Season,fill = colddata$Season)) + geom_bar(stat="count") +

xlab("Season") + ylab("Number of Days") + ggtitle("Different season with number of days") +


geom_text(aes(label =..count..),stat = "count",vjust=1.6, color="white", size=5.5)

 Density plot of Temperature

ggplot(colddata, aes(x = colddata$Temperature)) + geom_density(fill = "lightblue") +

ggtitle("Density plot of Temperature") + xlab("Temperature")

1. We can see from this diagram that temperature is slightly right skued distributed.
1. Problem 1.

1.1. Question 1. Find mean cold storage temperature for summer, winter and Rainy
Season?

We can Use group by using dplyr also to generate Mean for all three season. We can also uses ubset
function also to create different subset and calculate mean.

library(dplyr) ## Loading dplyr package to r sessioncolddata %>% group_by(Season = Season) %>%


summarise(Mean = format(mean(Temperature),digits = 4))

Mean Temperature of Rainy = 3.039, summer = 3.153, winter = 2.701 (round up to 3 decimal point)

1.2. Question 2.Find overall mean for the full year


MeanTemp = mean(colddata$Temperature)

MeanTemp

Overall mean for the full year is 2.96274

1.3. Question 3. Find Standard Deviation for the full year


SDTemp = sd(colddata$Temperature)

SDTemp

Standard Deviation for the full year is 0.509


1.4. Question 4. Assume Normal distribution, what is the probability of temperature
having fallen below 2 degree C?
poftemlt2c = pnorm(2,mean = MeanTemp, sd = SDTemp, lower.tail = T)

poftemlt2c

Probability of temperature having fallen below 2 degree is 0.0291.

1.5. Question 5. Assume Normal distribution, is the probability of temperature having


gone above 4 degree C?

poftemgt4c = pnorm(4,mean = MeanTemp, sd = SDTemp, lower.tail = F)

poftemgt4c

Probability of temperature having gone above 4 degree C is 0.0207

1.6. Question 6. What will be the penalty for the AMC Company?
Previous two question we saw that for a normal distribution probability of temperature Going below 2
and above 4 degree C is the total probability if Temperature < 2 + Temperature > 4

TotalPofPenalty = poftemlt2c + poftemgt4c

TotalPofPenalty
Since the total probability 0.0499 value is between 2.5% and 5% so the penalty for the AMC Company
would be 10%.

2. Problem 2.

Reading the dataset Cold_Storage_Mar2018.csv

coldmarch = read.csv("Cold_Storage_Mar2018.csv")

Basic Descriptive Analysis of Cold_Storage_Mar2018.csv dataset.

str(coldmarch)

summary(coldmarch)

anyNA(coldmarch)

1. Four Variable and 35 observation, same as previous dataset.


2. Two Month and One season.
3. There are no Outliers.

2.1. Question 1. State the Hypothesis, do the calculation using z test.


Hypothesis:

H0 Mu <= 3.9 -> No Changes require in Cold Storage plant (Null Hypothesis)

H1 Mu > 3.9 -> Problem in Cold Storage plant and need correction (Alternate Hypothesis)

Calculation using z test (Population Mean: SD (Standard Deviation) Known


Smean = mean(coldmarch$Temperature) ## Sample Mean

Here Mean Mu is Smean = 3.974286 and Standard Daviation calclate from prev SDTemp = 0.508589

Population Mean given POPMean = 3.9 and Alpha = 0.1 and population size = 35

POPMean = 3.9 ## GIven in the Problem

Alpha = 0.1 ## Given in the problem

PopSize = 35 ## GIven in dataset Cold_Storage_Mar2018.csv

SDTemp ## Standard deviation calculated from previous sample.

Calculating z value:

Zval = (Smean - POPMean) / (SDTemp/sqrt(35))

Calculating z critical value, as it is right/positive of Mean we minus 1 from Alpha

Zcritical = qnorm(1-0.1)

Zee value approach we can see that Zval < Zcritical

Since Zval (z value is less than Z critical we cannot reject the null hypothesis)

P value Approach, Calculating value using z value

Pval = 1- pnorm(Zval)

Since Pval > Zval We cannot reject the null hypothesis. (We failed to reject null hypothesis)
We can say that Temperature is maintained below level 3.9 degree C and no correction needed at Cold
storage plant.

2.2. Question 2. State the Hypothesis, do the calculation using t-test.


Hypothesis:
H0 Mu <= 3.9 -> No need for changes in Cold Storage Plan (Null Hypothesis)

H1 Mu > 3.9 -> Problem in Cold Storage plant and need correction (Alternative Hypothesis)

Calculation Using z test (Population Mean: SD(Standard Deviation) Unknown

POPMean ## Given Upper limit (using as POP Mean)

Alpha = 0.1 ## given in the problem

PopSize = 35 ## Given in dataset Cold_Storage_Mar2018.csv

Calculate Sample Standard Deviation SamSD as known as sample error

SamSD = sd(coldmarch$Temperature) / sqrt(PopSize)

Using t test:

t.test(coldmarch$Temperature, mu = POPMean, alternative = "greater",conf.level = 0.90)

t.test(coldmarch$Temperature, mu = POPMean, alternative = "greater",conf.level = 0.99)


From Summary of test we can see that p value is very small 0.004711 which is less than

Alpha 0.1 so that we can say that we reject NULL hypothesis.

From this test we can say that problem in Cold Storage plant and need corrective measure.

Though Given Confidence level is 90% but we can see that we can reject null hypothesis with 99%
confidence as well.

2.3. Question 3. Give your inference after doing both the tests.

1. We received two completely different result using two test.


2. Reason for this is we used two different Standard deviation, for Z we used SD from 2016 dataset
and for T test we used 2018 sample dataset.
3. Since the issue reported from 2018, we can say that T test is more accurate because we used
recent Sample Dataset.
4. The sample (35) which is given, we saw that all the days belongs to summer (In summary). That
could be another reason that temperature going above 3.9.

Potrebbero piacerti anche