Sei sulla pagina 1di 85

PROJECT – TIME SERIES FORECASTING

Australian Monthly Gas Production - Report


TABLE OF CONTENTS

1. Project Objective …………………………………………………………………...………1


2. Data assumptions ………………………………………………...…….…….…….……...1
3. Steps for ARIMA & Auto ARIMA Analysis…………….…..……………..………..…..…2
i. Load the data & Visualization ………………………………………………………………..3

ii. Preprocessing the data ………………………………………………………………………..6

iii. Check/Make series stationary ………………………………………………….…………..11

iv. Determine d value …………………………………………………………………..………..13

v. Determine the p and q values …………………………………………...………………….13

vi. Fit ARIMA Model/Calculate MAPE/RSME …………………………….…………………..15

vii. Compare models using accuracy measures ………………………………………………15

viii. Make prediction …………………………………………………………………..…………..41

ix. Predict values on validation set …………………………………………………………….42

x. Auto ARIMA Model ……………………………………………………….……………..

4. Appendix A – Source Code……………………………………………….……….…..…43


I. Project Objective

Forecast the Australian Gas Production over the next 12 periods.

The objective of the report is to analyze the Australian Gas Production (1956-1995) and forecasting the
Gas production over next 12 periods (1 year) after analyzing and modeling the Time series data.

This exploration report will consist of the following:

 Importing the time series dataset in R

 Plot, examine, and prepare series for modeling


 Understanding the components of Time series

 Graphical exploration

 Extract the seasonality component from the time series


 Test for stationarity and apply appropriate transformations
 Choose the order of an ARIMA model
 Forecast using ARIMA and Auto ARIMA models

 Establish accuracy of the model

II. Data Assumptions

 The Australian Gas production time series data was downloaded from ‘Forecast’ package in R.

 Components of Time Series are not known.

 Stationarity of Time Series are not known.

 Seasonality of Time Series is not known.

1
III. Steps for ARIMA Analysis
1. Load the data & Visualization

2. Preprocessing the data

3. Check/Make series stationary

Do a formal Hypothesis Test (Augmented Dickey-Fuller Test, adf.test in r: Ha: TS is


stationary), If series non-stationary then stationarize it (Take difference of consecutive
terms in a series: diff(dataset) in R)

4. Determine d value

5. Determine the p and q values

Create ACF & PACF plots - Explore Auto correlations and Partial Correlations (Decide the
order of Autoregression in ACF & PACF) – Determine d value, Create ACF(p) & PACF(q)
plots, Determine p & q values.
ARIMA (p,d,q) identifies a non-seasonal model which needs to be differenced d times to
make it stationary and contains p AR terms and q MA terms.
6. Fit ARIMA Model

ARIMA controls – (p,d,q)-> (0,1,2) .. Adjust the values of p,d,q until the residual are un
correlated.
Adding seasonal component (if required)
ARIMA (p,d,q) (P,D,Q) [frequency]

7. Compare models using accuracy measures


After Forecasting, run accuracy tests followed by Hypothesis to check status of residuals
(Histogram, acf and box.test [Ljung Box])

8. Make prediction

9. Predict values on validation set

10. Calculate MAPE/RSME

Auto ARIMA Model

Auto ARIMA involves the same steps involved in building an ARIMA Model except steps 3 to
5 since they are automatically calculated by Auto ARIMA model, hence called Auto ARIMA.

2
1. Load the data & Visualization

setwd("E:/P5")
getwd()
## [1] "E:/P5"
library(tseries)
library(timeSeries)
library(forecast)
library(zoo)
#Loading the data
data(gas, package = "forecast")

#Plot
plot(gas, main = "Plot of Australian Gas Production")

The
production
of Gas in
Australia
has
increased
significantly
over a long
period of
time (40
years).
There is a
significant
upward
trend which
can be
observed
and there
seems to be
some
seasonality
but there is extremely high variance which can be observed looking at the plot. The timeline
involved is 40 years therefore it has to be seen how significant is the historical data.

3
Histogram

A large number of lower values (<10000)


i.e. depicting lower production of gas are
from the early years, prior to 1970’s.
Whether these lower values would aid in
accurately forecasting the production in
1996 remains to be seen.

summary(gas)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1646 2675 16788 21415 38629 66600
head(gas)
## Jan Feb Mar Apr May Jun
## 1956 1709 1646 1794 1878 2173 2321
tail(gas)
## Mar Apr May Jun Jul Aug
## 1995 46287 49013 56624 61739 66600 60054

The lowest gas production was recorded in February 1956 i.e. in the early years of
production, whereas the highest monthly production till date was recorded in July 1995.
Therefore, there exists a huge gap in the production over the years. (Current levels not
anywhere near the Mean - 21415 or the Median - 16788)

4
Monthplot(gas)

The monthplot for the Australian gas production data shows a clear increasing trend in
production of gas during each month from 1956-1995. It can be observed that there is clear
upward trend with any visible fluctuations/variations seen mainly during last 5-10 years.

Frequency

A time series with one observation each month has a monthly sampling frequency or
monthly periodicity and so is called a monthly time series. Data periodicity is described
by specifying periodic time intervals into which the dates of the observations fall. Using the
frequency () function we can determine the periodicity of the time series.

> frequency(gas)
[1] 12

5
2. Preprocessing the data
Visual Analysis

Visual inspection of the plot helps us understand that there is an upward trend with a semi-
annual seasonality which is mainly observed throughout the time series looking at the plot.

Now, the seasonal component at the beginning of the series is smaller than the seasonal
component later in the series.
To account for this, you’d need to log-transform the data as follows:

Log transformation

Plot a graph of the data against time. If it looks like the variation increases with the level of
the series, take logs. Otherwise model the original data.

#Log transformation
loggas <- log(gas)
plot(loggas, main = "Plot of log(gas)")

Compared to the plot of the original time series data, we can observe that once we have done
the Log transformation the variation is less skewed and quite uniform throughout.

6
Decomposition

Now we have to decide whether an additive or a multiplicative model would describe the
data appropriately. Since the size of seasonal fluctuations and random fluctuations
increases in the time series as time goes on, it indicates that an additive model is NOT
appropriate. That way, our data could be described by multiplicative model rather than an
additive model. Since, we have taken already done log transformation we can use Additive
model for decomposition.

At first glance, the time series data includes a seasonal (semi-annual) component, a trend
(upward) component and residual or error. The extent of each component can be deduced
by decomposition of the data using the stl () function.

Decomposing can also allow us to remove seasonal trends in our data. To illustrate why
this might be useful:

> loggasdec <- stl(loggas, s.window= "p")


plot(loggasdec)

From the decomposed plotting we can observe that there is definitely a trend as noted in our
visual inspection along-with a semiannual seasonal component and residuals or white noise.
But the trend component is most significant.

7
Seasonal plot

As observed earlier, during our visual inspection there is semi-annual seasonality present in the
time series data along with an upward trend. Deseasonalization involves removal of seasonal
component from the time series which would help us understand the effect of other components
on the time series.

Then the deseasonalized plot would be compared to original plot to better understand its
impact.

> plot(gas.sa, type="l", main= "Seasonal Adjusted") # seasonal adjusted

> seasonplot(gas.sa, 12, col=rainbow(12), year.labels=TRUE, main="Seasonal pl


ot: Australian Gas Production") # seasonal frequency set as 12 for monthly da
ta.

#Deseasonalize
Deseasonloggas <- (loggasdec$time.series[,2]+loggasdec$time.series[,3])
ts.plot(Deseasonloggas, loggas, col=c("red","blue"), main = "Comparison of lo
ggas and Deseasonalized loggas")

8
#Plotting actual values with Exponentiation
Deseasongas<-(exp(loggasdec$time.series[,2])+exp(loggasdec$time.series[,3]))
ts.plot(Deseasongas, gas, col=c("red","blue"), main = "Comparison of gas and
Deseasonalized gas")

9
#Plotting seasonality only, took first 12 months data
logseason=loggasdec$time.series[1:12,1]
plot(logseason,type="l")

#Exponentiate to get actual value


Gasseason<-exp(loggasdec$time.series[1:12,1])
plot(Gasseason, type="l")

10
3. Check/Make series stationary

Stationarity

Fitting an ARIMA model requires the series to be stationary. A series is said to be


stationary when its mean, variance, and autocovariance are time invariant.
So, the first thing to do is to determine if our time series is stationary (i.e., if the mean is
generally constant throughout the time series, as opposed to going up or down over time).
First, we’ll do this with a visual inspection.

OK, this doesn’t look stationary


at all, as the mean tends to go
up over time. We can do a
formal test to determine
stationarity (or lack thereof) in a
more empirical way.

Adf.test

For this, we can use the


augmented Dickey-Fuller (ADF)
test, which tests the null
hypothesis that the series is
non-stationary. This is included
in the “tseries” package.

Hypothesis
H0 – Non-Stationary
Ha - Stationary
If P-value is more than 0.05, alternative (Ha) hypothesis is rejected and null (H0) hypothesis is
accepted that the data is non-stationary or vice-versa.

11
Check for stationarity of data
adf.test(gas)
##
## Augmented Dickey-Fuller Test
##
## data: gas
## Dickey-Fuller = -2.7131, Lag order = 7, p-value = 0.2764
## alternative hypothesis: stationary
#Non-Stationar

It looks like our p value is above .05, meaning that our data is indeed Non-stationary. P-value is
significant, hence, alternative (Ha) hypothesis is rejected and null (H0) hypothesis is accepted
that the data is indeed non-stationary. This confirms the results of our visual inspection.

Stationarize – Differencing

Given that we have non-stationary data, we will need to “difference” the data until we obtain a
stationary time series. We can do this with the “diff” function in R.

adf.test(diff1)
## Augmented Dickey-Fuller Test
##
## data: diff1
## Dickey-Fuller = -19.321, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary

12
4. Determine d value
After differencing, we again ran the adf.test to check for stationarity of data. p is below 0.05.
Looks like our data is indeed stationary.
P-value is not significant, hence, alternative (Ha) hypothesis is accepted and null (H0)
hypothesis is rejected, now the data is indeed stationary.
Given that we had to difference the data once, the d value for our ARIMA model is 1.

5. Determine p and q values

#Autocorrelation of lag 50
acf(diff1, lag=50, main= "Auto Correlation(q)")

13
pacf(diff1, lag=50, main = "Partial Auto correlation (p)")

From the above ACF (q) and PACF (p) correlation plots. We observe that there is a large
amount of correlation that exists
Looking at ACF plot we can see that there is seasonal pattern which can be observed.

The p and q values from the ACF & PACF plots would be 2 and 2 respectively.

14
6. Fit ARIMA Model / Calculate MAPE/RSME / Compare models
using accuracy measures

#ARIMA (p,d,q)
gas.arima.fit<-arima(gas, c(2,1,2)) #With AR & MA, with differencing
summary(gas.arima.fit)
## Call:
## arima(x = gas, order = c(2, 1, 2))
## Coefficients:
## ar1 ar2 ma1 ma2
## 0.1355 0.0005 0.1261 0.2753
## s.e. 0.4295 0.1481 0.4269 0.0752
## sigma^2 estimated as 6801981: log likelihood = -4410.63, aic = 8831.27
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 70.11654 2605.32 1526.492 0.3193329 6.992583 0.8840275
## ACF1
## Training set 9.96889e-05
hist(gas.arima.fit$residuals, col = "blue")

15
#Testing the fit with original series
ts.plot(gas, fitted(gas.arima.fit), col=c("blue", "red"))

#Test auto correlation in residuals to check the fit


acf(gas.arima.fit$residuals)

16
#Portmanteau test : Ljung Box method used :H0 : residuals are independent
Box.test(gas.arima.fit$residuals,lag = 30, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: gas.arima.fit$residuals
## X-squared = 661.61, df = 30, p-value < 2.2e-16

The 1st ARIMA Model ran above on (p,d,q,) measures of (2,1,2) gave the following
results:
 MAPE - 6.992583 (less than 10 is great)
 Histogram – Normally distributed
 Compared to original plot – Good
 Auto correlation in residuals – Correlation exists Lag 4 onward at various lags
 Box-Ljung Test – P- value is significantly less than 0.05, then the residuals are
dependant

#Adding seasonal component if required


gas.arima.fit.s<-arima(gas, c(2,1,2), seasonal = list(order=c(1,1,2), period=
12))
gas.arima.fit.s
##
## Call:
## arima(x = gas, order = c(2, 1, 2), seasonal = list(order = c(1, 1, 2), per
iod = 12))
##
## Coefficients:
## ar1 ar2 ma1 ma2 sar1 sma1 sma2
## 0.229 0.2133 -0.7067 -0.134 -0.4313 -0.1708 -0.3013
## s.e. 0.315 0.1176 0.3157 0.242 1.4959 1.4856 0.9190
##
## sigma^2 estimated as 2559509: log likelihood = -4076.14, aic = 8168.27
summary(gas.arima.fit.s)
##
## Call:
## arima(x = gas, order = c(2, 1, 2), seasonal = list(order = c(1, 1, 2), per
iod = 12))
##

17
## Coefficients:
## ar1 ar2 ma1 ma2 sar1 sma1 sma2
## 0.229 0.2133 -0.7067 -0.134 -0.4313 -0.1708 -0.3013
## s.e. 0.315 0.1176 0.3157 0.242 1.4959 1.4856 0.9190
##
## sigma^2 estimated as 2559509: log likelihood = -4076.14, aic = 8168.27
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 27.4472 1577.849 894.845 0.2790015 3.9086 0.5182258
## ACF1
## Training set -0.000577524
hist(gas.arima.fit.s$residuals, col = "blue")

18
ts.plot(gas, fitted(gas.arima.fit.s), col=c("blue", "red"))

acf(gas.arima.fit.s$residuals)

19
Box.test(gas.arima.fit.s$residuals, lag = 30, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: gas.arima.fit.s$residuals
## X-squared = 74.75, df = 30, p-value = 1.092e-05

The 2nd model ran above is a SARIMA Model ran on (p,d,q,) (P,D,Q) measures of
(2,1,2) (1,1,2) since there exists a seasonal component which gave the following
results:
 MAPE - 3.9086 (less than 10 is great)
 Histogram – Normally distributed
 Compared to original plot – Good
 Auto correlation in residuals – Correlation exists Lag 4 onward at fewer lags
compared to previous model. A better outcome compared to previous model.
 Box-Ljung Test – P- value is significantly less than 0.05, then the residuals are
dependent. P-value is slightly better than previous model.

#auto-arima
fitauto=auto.arima(gas, seasonal = TRUE, trace = T)
##
## Fitting models using approximations to speed things up...
##
## ARIMA(2,1,2)(1,1,1)[12] : 7975.595
## ARIMA(0,1,0)(0,1,0)[12] : 8195.028
## ARIMA(1,1,0)(1,1,0)[12] : 8058.801
## ARIMA(0,1,1)(0,1,1)[12] : 7967.259
## ARIMA(0,1,1)(0,1,0)[12] : 8099.389
## ARIMA(0,1,1)(1,1,1)[12] : 7981.315
## ARIMA(0,1,1)(0,1,2)[12] : 7969.124
## ARIMA(0,1,1)(1,1,0)[12] : 8022.501
## ARIMA(0,1,1)(1,1,2)[12] : 7983.475
## ARIMA(0,1,0)(0,1,1)[12] : 8046.576
## ARIMA(1,1,1)(0,1,1)[12] : 7962.588
## ARIMA(1,1,1)(0,1,0)[12] : 8099.616
## ARIMA(1,1,1)(1,1,1)[12] : 7976.808

20
## ARIMA(1,1,1)(0,1,2)[12] : 7964.626
## ARIMA(1,1,1)(1,1,0)[12] : 8022.009
## ARIMA(1,1,1)(1,1,2)[12] : 7978.712
## ARIMA(1,1,0)(0,1,1)[12] : 7989.065
## ARIMA(2,1,1)(0,1,1)[12] : 7959.768
## ARIMA(2,1,1)(0,1,0)[12] : 8084.882
## ARIMA(2,1,1)(1,1,1)[12] : 7973.938
## ARIMA(2,1,1)(0,1,2)[12] : 7961.533
## ARIMA(2,1,1)(1,1,0)[12] : 8023.31
## ARIMA(2,1,1)(1,1,2)[12] : Inf
## ARIMA(2,1,0)(0,1,1)[12] : 7984.927
## ARIMA(3,1,1)(0,1,1)[12] : 7962.327
## ARIMA(2,1,2)(0,1,1)[12] : 7961.417
## ARIMA(1,1,2)(0,1,1)[12] : 7960.058
## ARIMA(3,1,0)(0,1,1)[12] : 7977.814
## ARIMA(3,1,2)(0,1,1)[12] : 7964.384
## Now re-fitting the best model(s) without approximations...
## ARIMA(2,1,1)(0,1,1)[12] : 8163.291
##
## Best model: ARIMA(2,1,1)(0,1,1)[12]
summary(fitauto)
## Series: gas
## ARIMA(2,1,1)(0,1,1)[12]
##
## Coefficients:
## ar1 ar2 ma1 sma1
## 0.3756 0.1457 -0.8620 -0.6216
## s.e. 0.0780 0.0621 0.0571 0.0376
## sigma^2 estimated as 2587081: log likelihood=-4076.58
## AIC=8163.16 AICc=8163.29 BIC=8183.85
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 27.72266 1579.457 893.4504 0.272275 3.900233 0.4789312

21
## ACF1
## Training set 0.002271673
hist(fitauto$residuals, col="green")

ts.plot(gas,fitted(fitauto), col=c("green","blue"))

22
acf(fitauto$residuals)

Box.test(fitauto$residuals, lag=30, type = "Ljung-Box")


## Box-Ljung test
## data: fitauto$residuals
## X-squared = 77.964, df = 30, p-value = 3.86e-06
checkresiduals(fitauto)

23
After 2 failed attempts, we ran the Auto ARIMA function to deduce the 3rd model
and gave us the Best model: ARIMA(2,1,1)(0,1,1)[12] - a SARIMA Model which
gave the following results:
 MAPE - 3.900233 (less than 10 is great)
 Histogram – Normally distributed
 Compared to original plot – Good
 Auto correlation in residuals – Very much similar to previous SARIMA model
 Box-Ljung Test – P- value is significantly less than 0.05, then the residuals are
dependent. P-value is still not better than the previous models.

#Box Cox
gas1=BoxCox(gas,lambda = BoxCox.lambda(gas))

summary(gas1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.21 11.13 14.94 14.25 16.86 18.20
tsdisplay(gas1,lag.max = 150, plot.type = c("histogram"))

24
fitauto1<-auto.arima(gas1, seasonal = TRUE, trace = T)
##
## Fitting models using approximations to speed things up...
##
## ARIMA(2,1,2)(1,1,1)[12] : -633.6623
## ARIMA(0,1,0)(0,1,0)[12] : -399.6777
## ARIMA(1,1,0)(1,1,0)[12] : -536.1276
## ARIMA(0,1,1)(0,1,1)[12] : -657.6893
## ARIMA(0,1,1)(0,1,0)[12] : -471.4437
## ARIMA(0,1,1)(1,1,1)[12] : -643.515
## ARIMA(0,1,1)(0,1,2)[12] : -661.887
## ARIMA(0,1,1)(1,1,2)[12] : -647.0125
## ARIMA(0,1,0)(0,1,2)[12] : Inf
## ARIMA(1,1,1)(0,1,2)[12] : -658.9943
## ARIMA(0,1,2)(0,1,2)[12] : -659.9056
## ARIMA(1,1,0)(0,1,2)[12] : Inf
## ARIMA(1,1,2)(0,1,2)[12] : -656.8534
##
## Now re-fitting the best model(s) without approximations...
##
## ARIMA(0,1,1)(0,1,2)[12] : Inf
## ARIMA(0,1,2)(0,1,2)[12] : Inf
## ARIMA(1,1,1)(0,1,2)[12] : Inf
## ARIMA(0,1,1)(0,1,1)[12] : -700.5158
##
## Best model: ARIMA(0,1,1)(0,1,1)[12]
summary(fitauto1)
## Series: gas1
## ARIMA(0,1,1)(0,1,1)[12]
##
## Coefficients:
## ma1 sma1
## -0.3755 -0.8586
## s.e. 0.0450 0.0450

25
##
## sigma^2 estimated as 0.01235: log likelihood=353.28
## AIC=-700.57 AICc=-700.52 BIC=-688.15
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set 0.001163337 0.1093533 0.0766936 0.01790397 0.5343536
## MASE ACF1
## Training set 0.3532688 0.002647187
hist(fitauto1$residuals, col="green")

ts.plot(gas1,fitted(fitauto1), col=c("green","blue"))

26
acf(fitauto1$residuals)

27
Box.test(fitauto1$residuals, lag=30, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: fitauto1$residuals
## X-squared = 62.654, df = 30, p-value = 0.0004342
checkresiduals(fitauto)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,1)(0,1,1)[12]
## Q* = 64.344, df = 20, p-value = 1.484e-06
##
## Model df: 4. Total lags used: 24

28
Even after Auto ARIMA failed to give us an accurate model we decided to go for
Box Cox Transformation (Best model: ARIMA(0,1,1)(0,1,1)[12]) which gave the
following results:
 MAPE - 0.5343536 (lowest)
 Histogram – Normally distributed
 Compared to original plot – Good
 Auto correlation in residuals – Very much similar to previous Auto ARIMA model
 Box-Ljung Test – P- value is significantly less than 0.05, then the residuals are
dependent. P-value is still not better than the previous models.

#Subset of dataset
gassub<-window(gas,start=c(1990,1))
plot(gassub)

#Check for stationarity of data


adf.test(gassub)
## ## Augmented Dickey-Fuller Test
## ## data: gassub

29
## Dickey-Fuller = -6.1377, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
#Stationary
#Auto correlation of lag 30
acf((gassub), lag=30)

pacf((gassub), lag=30)

30
plot(gassub)

#arima (p,d,q)
gas.arima.fit<-arima(gassub, c(0,0,2))

31
summary(gas.arima.fit)
##
## Call:
## arima(x = gassub, order = c(0, 0, 2))
##
## Coefficients:
## ma1 ma2 intercept
## 0.7535 0.6040 47943.139
## s.e. 0.2255 0.1659 1368.765
##
## sigma^2 estimated as 23411514: log likelihood = -674, aic = 1356.01
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 69.1481 4838.545 3996.105 -1.304102 8.588294 1.015244
## ACF1
## Training set 0.3029303
hist(gas.arima.fit$residuals, col = "blue")

#Testing the fit with original series


ts.plot(gassub, fitted(gas.arima.fit), col=c("blue", "red"))

32
fitted(gas.arima.fit)
## Jan Feb Mar Apr May Jun Jul
## 1990 45842.57 43030.72 43553.54 46669.04 45350.29 51170.59 56281.25
## 1991 39165.67 38971.49 43302.54 43362.56 44659.88 51285.83 49773.07
## 1992 41886.55 41223.01 44268.62 43040.01 44866.15 51621.74 56061.28
## 1993 44069.48 40954.01 41454.90 38291.14 44231.68 54631.62 53415.67
## 1994 45211.37 42063.41 43585.63 49479.11 47431.35 51641.71 57160.95
## 1995 40847.17 41773.77 48169.73 46341.62 48818.80 55437.92 57405.53
## Aug Sep Oct Nov Dec
## 1990 51617.45 53347.22 46193.00 42863.00 46416.83
## 1991 50628.70 54939.46 46557.13 42842.83 46176.57
## 1992 54135.53 53712.14 52842.70 45157.23 42431.61
## 1993 51915.17 52040.22 49078.11 46779.15 46938.07
## 1994 55647.59 57353.02 53250.14 48189.04 49562.87
## 1995 58677.16
#Test auto correlation in residuals to check the fit
acf(gas.arima.fit$residuals)

33
#Portmanteau test : Ljung Box method used :H0 : residuals are independent
Box.test(gas.arima.fit$residuals,lag = 30, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: gas.arima.fit$residuals
## X-squared = 213.87, df = 30, p-value < 2.2e-16

Eventually decided to go for a subset for the original time series, only working with data
from 1990 onward, since we encountered lot of dependent residuals along-with
inaccurate models, white noise with ARIMA as well as Auto ARIMA. Hence, we shall
follow the same steps with the subset of the data to build a robust model and make an
accurate forecast with highest confidence. (Best model: ARIMA(0,0,2) which gave the
following results:
 MAPE - 8.588294 (less than 10 is great)
 Histogram – Fairly Normally distributed
 Compared to original plot – Good
 Auto correlation in residuals – Significant correlation are observed, no improvement
compared to previous Auto ARIMA model
 Box-Ljung Test – P- value is significantly less than 0.05, then the residuals are
dependent. P-value is still not better than the previous models.

34
#Adding seasonal component if required
gas.arima.fit.s<-arima(gassub, c(0,1,1), seasonal = list(order=c(1,1,0), p
eriod=12))
gas.arima.fit.s
##
## Call:
## arima(x = gassub, order = c(0, 1, 1), seasonal = list(order = c(1, 1, 0
), period = 12))
##
## Coefficients:
## ma1 sar1
## -0.6974 -0.5136
## s.e. 0.1167 0.1208
##
## sigma^2 estimated as 11007538: log likelihood = -526.11, aic = 1058.2
1
summary(gas.arima.fit.s)
##
## Call:
## arima(x = gassub, order = c(0, 1, 1), seasonal = list(order = c(1, 1, 0
), period = 12))
##
## Coefficients:
## ma1 sar1
## -0.6974 -0.5136
## s.e. 0.1167 0.1208
##
## sigma^2 estimated as 11007538: log likelihood = -526.11, aic = 1058.2
1
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 448.4043 2983.901 1910.476 0.6358374 4.139159 0.4853723
## ACF1
## Training set -0.01701014

35
hist(gas.arima.fit.s$residuals, col = "blue")

ts.plot(gassub, fitted(gas.arima.fit.s), col=c("blue", "red"))

36
acf(gas.arima.fit.s$residuals)

Box.test(gas.arima.fit.s$residuals, lag = 30, type = "Ljung-Box")


##
## Box-Ljung test
##
## data: gas.arima.fit.s$residuals
## X-squared = 28.547, df = 30, p-value = 0.5415

Model – Subset of data including a seasonal component SARIMA(0,1,1)(1,1,0) which gave


the following results:
 MAPE - 4.139159 (less than 10 is great and better than previous result)
 Histogram –Normally distributed
 Compared to original plot – Very Good
 Auto correlation in residuals – No correlation oberved, best fit so far.
 Box-Ljung Test – P- value is significantly more than 0.05, hence the residuals are
independent. P-value is the best till now.

#auto-arima
fitauto=auto.arima(gassub, seasonal = TRUE, trace = T)
##

37
## ARIMA(2,1,2)(1,1,1)[12] : Inf
## ARIMA(0,1,0)(0,1,0)[12] : 1081.687
## ARIMA(1,1,0)(1,1,0)[12] : 1065.345
## ARIMA(0,1,1)(0,1,1)[12] : Inf
## ARIMA(1,1,0)(0,1,0)[12] : 1074.33
## ARIMA(1,1,0)(1,1,1)[12] : Inf
## ARIMA(1,1,0)(0,1,1)[12] : Inf
## ARIMA(0,1,0)(1,1,0)[12] : 1075.677
## ARIMA(2,1,0)(1,1,0)[12] : 1065.765
## ARIMA(1,1,1)(1,1,0)[12] : 1060.953
## ARIMA(1,1,1)(0,1,0)[12] : 1072.152
## ARIMA(1,1,1)(1,1,1)[12] : Inf
## ARIMA(1,1,1)(0,1,1)[12] : Inf
## ARIMA(0,1,1)(1,1,0)[12] : 1058.684
## ARIMA(0,1,1)(0,1,0)[12] : 1070.051
## ARIMA(0,1,1)(1,1,1)[12] : Inf
## ARIMA(0,1,2)(1,1,0)[12] : 1060.965
## ARIMA(1,1,2)(1,1,0)[12] : 1063.06
##
## Best model: ARIMA(0,1,1)(1,1,0)[12]
summary(fitauto)
## Series: gassub
## ARIMA(0,1,1)(1,1,0)[12]
##
## Coefficients:
## ma1 sar1
## -0.6974 -0.5136
## s.e. 0.1167 0.1208
##
## sigma^2 estimated as 11423571: log likelihood=-526.11
## AIC=1058.21 AICc=1058.68 BIC=1064.24
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE

38
## Training set 448.4043 2983.901 1910.476 0.6358374 4.139159 0.5551405
## ACF1
## Training set -0.01701014
hist(fitauto$residuals, col="green")

acf(fitauto$residuals)

39
Box.test(fitauto$residuals, lag=30, type = "Ljung-Box")
## Box-Ljung test
## data: fitauto$residuals
## X-squared = 28.547, df = 30, p-value = 0.5415
checkresiduals(fitauto)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1)(1,1,0)[12]
## Q* = 11.922, df = 12, p-value = 0.452
##
## Model df: 2. Total lags used: 14
ts.plot(gassub,fitted(fitauto), col=c("green","blue"))

40
Final Model – Ran auto ARIMA on subset of data to find the best model which is exactly
similar to the previous SARIMA model (0,1,1)(1,1,0) which gave the exact same results:
 MAPE - 4.139159 (less than 10 is great and better than previous result)
 Histogram –Normally distributed
 Compared to original plot – Very Good
 Auto correlation in residuals – No correlation oberved, best fit so far.
 Box-Ljung Test – P- value is significantly more than 0.05, hence the residuals are
independent. P-value is the best till now.

Now we can run a forecast on this model.

7. Make prediction
## Forecast
#After ensuring model is stable and accurate, forecast for next 12 intervals
fct1=forecast(gas.arima.fit.s, h=12)

fct1$mean
## Jan Feb Mar Apr May Jun Jul
## 1995
## 1996 44630.39 44826.00 50464.32 51405.98 59660.56 63580.35 68333.48

41
## Aug Sep Oct Nov Dec
## 1995 58353.09 54446.75 52111.62 45010.61
## 1996 65892.39
plot(forecast(gas.arima.fit.s), h=12)
## Warning in plot.window(xlim, ylim, log, ...): "h" is not a graphical
## parameter
## Warning in title(main = main, xlab = xlab, ylab = ylab, ...): "h" is not a
## graphical parameter
## Warning in box(...): "h" is not a graphical parameter

summary(gas.arima.fit.s)
##
## Call:
## arima(x = gassub, order = c(0, 1, 1), seasonal = list(order = c(1, 1, 0),
period = 12))
##
## Coefficients:
## ma1 sar1

42
## -0.6974 -0.5136
## s.e. 0.1167 0.1208
##
## sigma^2 estimated as 11007538: log likelihood = -526.11, aic = 1058.21
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 448.4043 2983.901 1910.476 0.6358374 4.139159 0.4853723
## ACF1
## Training set -0.01701014
#######################################################################

8. Predict values on validation set

## 1996 44630.39 44826.00 50464.32 51405.98 59660.56 63580.35 68333.48


## Aug Sep Oct Nov Dec
## 1995 58353.09 54446.75 52111.62 45010.61

So finally we have arrived at the best model where we used a subset of the
original data due to correlation of residuals, white noise, inaccuracy of models,
historical data dating back 40 years with high variance and it seems that it was
highly affecting the accuracy of our previous models and since we only had to
forecast for nest 12 months, the data from last 5 years was fairly stationary and
enough to build the model, predict the values and plot the forecast.

9. Appendix A – Source Code

P5.R

43
Akshay
2019-08-02

setwd("E:/P5")
getwd()
## [1] "E:/P5"
library(tseries)
## Warning: package 'tseries' was built under R version 3.5.3
library(timeSeries)
## Warning: package 'timeSeries' was built under R version 3.5.3
## Loading required package: timeDate
## Warning: package 'timeDate' was built under R version 3.5.3
library(forecast)
## Warning: package 'forecast' was built under R version 3.5.3
library(zoo)
## Warning: package 'zoo' was built under R version 3.5.3
##
## Attaching package: 'zoo'
## The following object is masked from 'package:timeSeries':
##
## time<-
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
#Loading the data
data(gas, package = "forecast")

#Plot
plot(gas, main = "Plot of Australian Gas Production")

44
hist(gas, col = "blue", main = "Histogram of gas")

45
summary(gas)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1646 2675 16788 21415 38629 66600
head(gas)
## Jan Feb Mar Apr May Jun
## 1956 1709 1646 1794 1878 2173 2321
tail(gas)
## Mar Apr May Jun Jul Aug
## 1995 46287 49013 56624 61739 66600 60054
monthplot(gas, main = "Monthly plot of Australian Gas production")

frequency(gas)
## [1] 12
#Log transformation
loggas <- log(gas)
plot(loggas, main = "Plot of log(gas)")

46
loggasdec <- stl(loggas, s.window= "p")

plot(loggasdec)

47
loggasdec
## Call:
## stl(x = loggas, s.window = "p")
##
##

#Deseasonalize
Deseasonloggas <- (loggasdec$time.series[,2]+loggasdec$time.series[,3])
ts.plot(Deseasonloggas, loggas, col=c("red","blue"), main = "Comparison of lo
ggas and Deseasonalized loggas")

#Plotting actual values with Exponentiation


Deseasongas<-(exp(loggasdec$time.series[,2])+exp(loggasdec$time.series[,3]))
ts.plot(Deseasongas, gas, col=c("red","blue"), main = "Comparison of gas and
Deseasonalized gas")

48
#Plotting seasonality only, took first 12 months data
logseason=loggasdec$time.series[1:12,1]
plot(logseason,type="l")

49
#Exponentiate to get actual value
Gasseason<-exp(loggasdec$time.series[1:12,1])
plot(Gasseason, type="l")

##############################################
#Check for stationarity of data
adf.test(gas)
##
## Augmented Dickey-Fuller Test
##
## data: gas
## Dickey-Fuller = -2.7131, Lag order = 7, p-value = 0.2764
## alternative hypothesis: stationary
#Non-Stationary

#Stationarize the series

50
diff1<-diff(gas)
plot(diff1, main = "Differenced Plot of Gas")

adf.test(diff1)
## Warning in adf.test(diff1): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: diff1
## Dickey-Fuller = -19.321, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
#Autocorrelation of lag 50
acf(diff1, lag=50, main= "Auto Correlation(q)")

51
pacf(diff1, lag=50, main = "Partial Auto correlation (p)")

52
#######################################################

#ARIMA (p,d,q)
gas.arima.fit<-arima(gas, c(2,1,2)) #With AR & MA, with differencing
#gas.arima.fit<-arima(gas, c(2,1,1)) #With AR & MA, with differencing

summary(gas.arima.fit)
##
## Call:
## arima(x = gas, order = c(2, 1, 2))
##
## Coefficients:
## ar1 ar2 ma1 ma2
## 0.1355 0.0005 0.1261 0.2753
## s.e. 0.4295 0.1481 0.4269 0.0752
##
## sigma^2 estimated as 6801981: log likelihood = -4410.63, aic = 8831.27
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 70.11654 2605.32 1526.492 0.3193329 6.992583 0.8840275
## ACF1
## Training set 9.96889e-05
hist(gas.arima.fit$residuals, col = "blue")

53
#Testing the fit with original series

ts.plot(gas, fitted(gas.arima.fit), col=c("blue", "red"))

54
#Test auto correlation in residuals to check the fit
acf(gas.arima.fit$residuals)

#Portmanteau test : Ljung Box method used :H0 : residuals are independent
Box.test(gas.arima.fit$residuals,lag = 30, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: gas.arima.fit$residuals
## X-squared = 661.61, df = 30, p-value < 2.2e-16
#Adding seasonal component if required
gas.arima.fit.s<-arima(gas, c(2,1,2), seasonal = list(order=c(1,1,2), period=
12))
gas.arima.fit.s
##
## Call:

55
## arima(x = gas, order = c(2, 1, 2), seasonal = list(order = c(1, 1, 2), per
iod = 12))
##
## Coefficients:
## ar1 ar2 ma1 ma2 sar1 sma1 sma2
## 0.229 0.2133 -0.7067 -0.134 -0.4313 -0.1708 -0.3013
## s.e. 0.315 0.1176 0.3157 0.242 1.4959 1.4856 0.9190
##
## sigma^2 estimated as 2559509: log likelihood = -4076.14, aic = 8168.27
summary(gas.arima.fit.s)
##
## Call:
## arima(x = gas, order = c(2, 1, 2), seasonal = list(order = c(1, 1, 2), per
iod = 12))
##
## Coefficients:
## ar1 ar2 ma1 ma2 sar1 sma1 sma2
## 0.229 0.2133 -0.7067 -0.134 -0.4313 -0.1708 -0.3013
## s.e. 0.315 0.1176 0.3157 0.242 1.4959 1.4856 0.9190
##
## sigma^2 estimated as 2559509: log likelihood = -4076.14, aic = 8168.27
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 27.4472 1577.849 894.845 0.2790015 3.9086 0.5182258
## ACF1
## Training set -0.000577524
hist(gas.arima.fit.s$residuals, col = "blue")

56
ts.plot(gas, fitted(gas.arima.fit.s), col=c("blue", "red"))

57
acf(gas.arima.fit.s$residuals)

Box.test(gas.arima.fit.s$residuals, lag = 30, type = "Ljung-Box")


##
## Box-Ljung test
##
## data: gas.arima.fit.s$residuals
## X-squared = 74.75, df = 30, p-value = 1.092e-05
#plot(forecast(gas.arima.fit.s, h=12))

#gas.arima.fit.s1<-arima(gas, c(0,1,2), seasonal = list(order=c(0,1,2), perio


d=12))
#gas.arima.fit.s1
#summary(gas.arima.fit.s1)
#hist(gas.arima.fit.s1$residuals, col = "blue")
#ts.plot(gas, fitted(gas.arima.fit.s1), col=c("blue", "red"))

58
#acf(gas.arima.fit.s1$residuals)
#Box.test(gas.arima.fit.s1$residuals, lag = 30, type = "Ljung-Box")
#plot(forecast(gas.arima.fit.s1, h=12))

#auto-arima
fitauto=auto.arima(gas, seasonal = TRUE, trace = T)
##
## Fitting models using approximations to speed things up...
##
## ARIMA(2,1,2)(1,1,1)[12] : 7975.595
## ARIMA(0,1,0)(0,1,0)[12] : 8195.028
## ARIMA(1,1,0)(1,1,0)[12] : 8058.801
## ARIMA(0,1,1)(0,1,1)[12] : 7967.259
## ARIMA(0,1,1)(0,1,0)[12] : 8099.389
## ARIMA(0,1,1)(1,1,1)[12] : 7981.315
## ARIMA(0,1,1)(0,1,2)[12] : 7969.124
## ARIMA(0,1,1)(1,1,0)[12] : 8022.501
## ARIMA(0,1,1)(1,1,2)[12] : 7983.475
## ARIMA(0,1,0)(0,1,1)[12] : 8046.576
## ARIMA(1,1,1)(0,1,1)[12] : 7962.588
## ARIMA(1,1,1)(0,1,0)[12] : 8099.616
## ARIMA(1,1,1)(1,1,1)[12] : 7976.808
## ARIMA(1,1,1)(0,1,2)[12] : 7964.626
## ARIMA(1,1,1)(1,1,0)[12] : 8022.009
## ARIMA(1,1,1)(1,1,2)[12] : 7978.712
## ARIMA(1,1,0)(0,1,1)[12] : 7989.065
## ARIMA(2,1,1)(0,1,1)[12] : 7959.768
## ARIMA(2,1,1)(0,1,0)[12] : 8084.882
## ARIMA(2,1,1)(1,1,1)[12] : 7973.938
## ARIMA(2,1,1)(0,1,2)[12] : 7961.533
## ARIMA(2,1,1)(1,1,0)[12] : 8023.31
## ARIMA(2,1,1)(1,1,2)[12] : Inf
## ARIMA(2,1,0)(0,1,1)[12] : 7984.927
## ARIMA(3,1,1)(0,1,1)[12] : 7962.327

59
## ARIMA(2,1,2)(0,1,1)[12] : 7961.417
## ARIMA(1,1,2)(0,1,1)[12] : 7960.058
## ARIMA(3,1,0)(0,1,1)[12] : 7977.814
## ARIMA(3,1,2)(0,1,1)[12] : 7964.384
##
## Now re-fitting the best model(s) without approximations...
##
## ARIMA(2,1,1)(0,1,1)[12] : 8163.291
##
## Best model: ARIMA(2,1,1)(0,1,1)[12]
summary(fitauto)
## Series: gas
## ARIMA(2,1,1)(0,1,1)[12]
##
## Coefficients:
## ar1 ar2 ma1 sma1
## 0.3756 0.1457 -0.8620 -0.6216
## s.e. 0.0780 0.0621 0.0571 0.0376
##
## sigma^2 estimated as 2587081: log likelihood=-4076.58
## AIC=8163.16 AICc=8163.29 BIC=8183.85
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 27.72266 1579.457 893.4504 0.272275 3.900233 0.4789312
## ACF1
## Training set 0.002271673
hist(fitauto$residuals, col="green")

60
ts.plot(gas,fitted(fitauto), col=c("green","blue"))

61
acf(fitauto$residuals)

Box.test(fitauto$residuals, lag=30, type = "Ljung-Box")


##
## Box-Ljung test
##
## data: fitauto$residuals
## X-squared = 77.964, df = 30, p-value = 3.86e-06
checkresiduals(fitauto)

62
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,1)(0,1,1)[12]
## Q* = 64.344, df = 20, p-value = 1.484e-06
##
## Model df: 4. Total lags used: 24
#plot(forecast(fitauto, h=12))
#forecastfit = forecast(fitauto, h=12)
#automean=forecastfit$mean
#automean
#forecastfit

#Box Cox
gas1=BoxCox(gas,lambda = BoxCox.lambda(gas))

63
summary(gas1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.21 11.13 14.94 14.25 16.86 18.20
tsdisplay(gas1,lag.max = 150, plot.type = c("histogram"))

fitauto1<-auto.arima(gas1, seasonal = TRUE, trace = T)


##
## Fitting models using approximations to speed things up...
##
## ARIMA(2,1,2)(1,1,1)[12] : -633.6623
## ARIMA(0,1,0)(0,1,0)[12] : -399.6777
## ARIMA(1,1,0)(1,1,0)[12] : -536.1276
## ARIMA(0,1,1)(0,1,1)[12] : -657.6893
## ARIMA(0,1,1)(0,1,0)[12] : -471.4437
## ARIMA(0,1,1)(1,1,1)[12] : -643.515

64
## ARIMA(0,1,1)(0,1,2)[12] : -661.887
## ARIMA(0,1,1)(1,1,2)[12] : -647.0125
## ARIMA(0,1,0)(0,1,2)[12] : Inf
## ARIMA(1,1,1)(0,1,2)[12] : -658.9943
## ARIMA(0,1,2)(0,1,2)[12] : -659.9056
## ARIMA(1,1,0)(0,1,2)[12] : Inf
## ARIMA(1,1,2)(0,1,2)[12] : -656.8534
##
## Now re-fitting the best model(s) without approximations...
##
## ARIMA(0,1,1)(0,1,2)[12] : Inf
## ARIMA(0,1,2)(0,1,2)[12] : Inf
## ARIMA(1,1,1)(0,1,2)[12] : Inf
## ARIMA(0,1,1)(0,1,1)[12] : -700.5158
##
## Best model: ARIMA(0,1,1)(0,1,1)[12]
summary(fitauto1)
## Series: gas1
## ARIMA(0,1,1)(0,1,1)[12]
##
## Coefficients:
## ma1 sma1
## -0.3755 -0.8586
## s.e. 0.0450 0.0450
##
## sigma^2 estimated as 0.01235: log likelihood=353.28
## AIC=-700.57 AICc=-700.52 BIC=-688.15
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set 0.001163337 0.1093533 0.0766936 0.01790397 0.5343536
## MASE ACF1
## Training set 0.3532688 0.002647187

65
hist(fitauto1$residuals, col="green")

ts.plot(gas1,fitted(fitauto1), col=c("green","blue"))

66
acf(fitauto1$residuals)

67
Box.test(fitauto1$residuals, lag=30, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: fitauto1$residuals
## X-squared = 62.654, df = 30, p-value = 0.0004342
checkresiduals(fitauto)

68
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,1)(0,1,1)[12]
## Q* = 64.344, df = 20, p-value = 1.484e-06
##
## Model df: 4. Total lags used: 24
#############################################################################
#############################

#Subset of dataset
gassub<-window(gas,start=c(1990,1))
plot(gassub)

69
#Check for stationarity of data
adf.test(gassub)
## Warning in adf.test(gassub): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: gassub
## Dickey-Fuller = -6.1377, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
#Stationary

#Auto correlation of lag 30

acf((gassub), lag=30)

70
pacf((gassub), lag=30)

71
plot(gassub)

#arima (p,d,q)
gas.arima.fit<-arima(gassub, c(0,0,2)) #With AR & MA, without differencing
#gas.arima.fit<-arima(gassub, c(2,1,1)) #With AR & MA, with differencing
#gas.arima.fit<-arima(gassub, c(2,1,2)) #With AR & MA, with differencing

summary(gas.arima.fit)
##
## Call:
## arima(x = gassub, order = c(0, 0, 2))
##
## Coefficients:
## ma1 ma2 intercept
## 0.7535 0.6040 47943.139
## s.e. 0.2255 0.1659 1368.765

72
##
## sigma^2 estimated as 23411514: log likelihood = -674, aic = 1356.01
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 69.1481 4838.545 3996.105 -1.304102 8.588294 1.015244
## ACF1
## Training set 0.3029303
hist(gas.arima.fit$residuals, col = "blue")

#Testing the fit with original series

ts.plot(gassub, fitted(gas.arima.fit), col=c("blue", "red"))

73
fitted(gas.arima.fit)
## Jan Feb Mar Apr May Jun Jul
## 1990 45842.57 43030.72 43553.54 46669.04 45350.29 51170.59 56281.25
## 1991 39165.67 38971.49 43302.54 43362.56 44659.88 51285.83 49773.07
## 1992 41886.55 41223.01 44268.62 43040.01 44866.15 51621.74 56061.28
## 1993 44069.48 40954.01 41454.90 38291.14 44231.68 54631.62 53415.67
## 1994 45211.37 42063.41 43585.63 49479.11 47431.35 51641.71 57160.95
## 1995 40847.17 41773.77 48169.73 46341.62 48818.80 55437.92 57405.53
## Aug Sep Oct Nov Dec
## 1990 51617.45 53347.22 46193.00 42863.00 46416.83
## 1991 50628.70 54939.46 46557.13 42842.83 46176.57
## 1992 54135.53 53712.14 52842.70 45157.23 42431.61
## 1993 51915.17 52040.22 49078.11 46779.15 46938.07
## 1994 55647.59 57353.02 53250.14 48189.04 49562.87
## 1995 58677.16
#Test auto correlation in residuals to check the fit

74
acf(gas.arima.fit$residuals)

#Portmanteau test : Ljung Box method used :H0 : residuals are independent
Box.test(gas.arima.fit$residuals,lag = 30, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: gas.arima.fit$residuals
## X-squared = 213.87, df = 30, p-value < 2.2e-16
#Adding seasonal component if required
gas.arima.fit.s<-arima(gassub, c(0,1,1), seasonal = list(order=c(1,1,0), peri
od=12))
gas.arima.fit.s
##
## Call:
## arima(x = gassub, order = c(0, 1, 1), seasonal = list(order = c(1, 1, 0),
period = 12))

75
##
## Coefficients:
## ma1 sar1
## -0.6974 -0.5136
## s.e. 0.1167 0.1208
##
## sigma^2 estimated as 11007538: log likelihood = -526.11, aic = 1058.21
summary(gas.arima.fit.s)
##
## Call:
## arima(x = gassub, order = c(0, 1, 1), seasonal = list(order = c(1, 1, 0),
period = 12))
##
## Coefficients:
## ma1 sar1
## -0.6974 -0.5136
## s.e. 0.1167 0.1208
##
## sigma^2 estimated as 11007538: log likelihood = -526.11, aic = 1058.21
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 448.4043 2983.901 1910.476 0.6358374 4.139159 0.4853723
## ACF1
## Training set -0.01701014
hist(gas.arima.fit.s$residuals, col = "blue")

76
ts.plot(gassub, fitted(gas.arima.fit.s), col=c("blue", "red"))

77
acf(gas.arima.fit.s$residuals)

Box.test(gas.arima.fit.s$residuals, lag = 30, type = "Ljung-Box")


##
## Box-Ljung test
##
## data: gas.arima.fit.s$residuals
## X-squared = 28.547, df = 30, p-value = 0.5415
#auto-arima
fitauto=auto.arima(gassub, seasonal = TRUE, trace = T)
##
## ARIMA(2,1,2)(1,1,1)[12] : Inf
## ARIMA(0,1,0)(0,1,0)[12] : 1081.687
## ARIMA(1,1,0)(1,1,0)[12] : 1065.345
## ARIMA(0,1,1)(0,1,1)[12] : Inf
## ARIMA(1,1,0)(0,1,0)[12] : 1074.33

78
## ARIMA(1,1,0)(1,1,1)[12] : Inf
## ARIMA(1,1,0)(0,1,1)[12] : Inf
## ARIMA(0,1,0)(1,1,0)[12] : 1075.677
## ARIMA(2,1,0)(1,1,0)[12] : 1065.765
## ARIMA(1,1,1)(1,1,0)[12] : 1060.953
## ARIMA(1,1,1)(0,1,0)[12] : 1072.152
## ARIMA(1,1,1)(1,1,1)[12] : Inf
## ARIMA(1,1,1)(0,1,1)[12] : Inf
## ARIMA(0,1,1)(1,1,0)[12] : 1058.684
## ARIMA(0,1,1)(0,1,0)[12] : 1070.051
## ARIMA(0,1,1)(1,1,1)[12] : Inf
## ARIMA(0,1,2)(1,1,0)[12] : 1060.965
## ARIMA(1,1,2)(1,1,0)[12] : 1063.06
##
## Best model: ARIMA(0,1,1)(1,1,0)[12]
summary(fitauto)
## Series: gassub
## ARIMA(0,1,1)(1,1,0)[12]
##
## Coefficients:
## ma1 sar1
## -0.6974 -0.5136
## s.e. 0.1167 0.1208
##
## sigma^2 estimated as 11423571: log likelihood=-526.11
## AIC=1058.21 AICc=1058.68 BIC=1064.24
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 448.4043 2983.901 1910.476 0.6358374 4.139159 0.5551405
## ACF1
## Training set -0.01701014
hist(fitauto$residuals, col="green")

79
acf(fitauto$residuals)

80
Box.test(fitauto$residuals, lag=30, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: fitauto$residuals
## X-squared = 28.547, df = 30, p-value = 0.5415
checkresiduals(fitauto)

##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1)(1,1,0)[12]
## Q* = 11.922, df = 12, p-value = 0.452
##
## Model df: 2. Total lags used: 14
ts.plot(gassub,fitted(fitauto), col=c("green","blue"))

81
## Forecast
#After ensuring model is stable and accurate, forecast for next 12 intervals
fct1=forecast(gas.arima.fit.s, h=12)

fct1$mean
## Jan Feb Mar Apr May Jun Jul
## 1995
## 1996 44630.39 44826.00 50464.32 51405.98 59660.56 63580.35 68333.48
## Aug Sep Oct Nov Dec
## 1995 58353.09 54446.75 52111.62 45010.61
## 1996 65892.39
plot(forecast(gas.arima.fit.s), h=12)
## Warning in plot.window(xlim, ylim, log, ...): "h" is not a graphical
## parameter
## Warning in title(main = main, xlab = xlab, ylab = ylab, ...): "h" is not a
## graphical parameter

82
## Warning in box(...): "h" is not a graphical parameter

summary(gas.arima.fit.s)
## Call:
## arima(x = gassub, order = c(0, 1, 1), seasonal = list(order = c(1, 1, 0),
period = 12))
## Coefficients:
## ma1 sar1
## -0.6974 -0.5136
## s.e. 0.1167 0.1208
## sigma^2 estimated as 11007538: log likelihood = -526.11, aic = 1058.21
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 448.4043 2983.901 1910.476 0.6358374 4.139159 0.4853723
## ACF1
## Training set -0.01701014
#######################################################################

83

Potrebbero piacerti anche