Sei sulla pagina 1di 28

Business Analytics

ARIMA Modeling

© EduPristine – www.edupristine.com/ca
© EduPristine | Business Analytics
Agenda

▪ Introduction
▪ Data
▪ Basic Statistics
▪ Predictive modeling using Linear Regression
▪ Logistic Regression
▪ Forecasting using Time Series Techniques
▪ ARIMA Modeling
▪ Linear Regression – Revise
▪ Scorecard Development
▪ Churn Analytics
▪ Market Basket Analysis
▪ Clustering
▪ Decision Trees

© EduPristine | Business Analytics Post your Queries at our Forum 1


Concepts

▪ Time Series
▪ AR, MA, and ARMA models
▪ I in ARIMA
• Stationarity
• Detrending
• Differencing
• Seasonality
▪ ACF and PACF
▪ Dickey – Fuller Test

© EduPristine | Business Analytics Post your Queries at our Forum 2


Time Series

▪ Time Series – A sequence of data points, measured over equal intervals of time
▪ Forecasting – predicting the future
▪ Time Series Analysis
• Linear Regression has an assumption that, the value is independent of its own previous values
• In Time series, there are time when today’s value depends on yesterday’s value (yt is dependent on yt-1)

© EduPristine | Business Analytics Post your Queries at our Forum 3


AR, MA and ARMA models – White Noise

▪ White Noise
• White Noise is an assumption where each data point in time-series should be random with mean equal to
zero and constant variance

▪ In regression we have white noise assumption

▪ In case there is white noise assumption violation, AR (Autoregressive) and MA (Moving Average)
models correct of this violation

© EduPristine | Business Analytics Post your Queries at our Forum 4


Autoregressive (AR) Models

▪ An Autoregressive model of order “p”: AR(p), p is the number of lags


X t  1 X t 1   2 X t  2  ...   p X t  p  et
▪ Current value of Xt is dependent on its own past values, plus a random shock et
▪ Like a multiple regression model, but Xt is regressed on past values of Xt

▪ The AR(1) Model


A simple way to model dependence over time is with the “autoregressive model of order 1”
“This is a OLS model of Xt regressed on lagged Xt-1"
X t   0  1 X t 1  et
“What does the model say for the t+1 observation?"

X t 1   0  1 X t  et 1

▪ “The AR(1) model expresses what we don’t know in terms of what we do know at time t"

© EduPristine | Business Analytics Post your Queries at our Forum 5


AR Models…..

▪ The AR(1) Model can also be expressed as:


X t   0  1 X t 1  et
=> X t   0  1 ( LX t )  et
=> (1  1 L) X t   0  et

“The AR(1) model expresses what we don’t know in terms of what we do know at time t"
Beta = 0.7 Beta = -0.7
4.50
2.00
4.00
3.50 1.50
3.00
1.00
2.50
2.00 0.50
1.50
1.00 0.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
0.50
-0.50
0.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
-1.00

© EduPristine | Business Analytics Post your Queries at our Forum 6


Moving – Average (MA) Models

▪ “A moving-average model of order “q” " MA(q), q is the number of lags

X t  et  1et 1   2 et  2  ...   q et  q

▪ “Current value of Xt can be found from past shocks/error (e), plus a new shock/error (et)"
▪ “ The time series is regarded as a moving average (unevenly weighted, because of different coefficients)
of a random shock series et"
▪ The MA(1) model: “A first order moving average model would look like":

X t  et  1et 1
▪ If 1 is zero, X depends purely on the error or shock (e) at the current time, and there is no temporal
dependence

▪ If 1 is large, previous errors influence the value of Xt


▪ If our model successfully captures the dependence structure in the data then the residuals should look
random

© EduPristine | Business Analytics Post your Queries at our Forum 7


MA Models

Beta = 0.7
2.00

1.50

1.00

0.50

0.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

-0.50

Beta = -0.7
2.50

2.00

1.50

1.00

0.50

0.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
-0.50

© EduPristine | Business Analytics Post your Queries at our Forum 8


ARMA models

▪ ARMA models are only suited for time series stationary in mean and variance
▪ A mixture of these two types of model would be referred to as an autoregressive moving average model
(ARMA)p, q, where p is the order of the autoregressive part and q is the order of the moving average
term.
▪ Mixed ARMA models
"An ARMA process of the order (p, q)“

X t   0  1 X t 1  ...   p X t  p   1et 1   2 et  2  ... q et  q

▪ "Just a combination of MA and AR terms"

What If there is Trend and Seasonality ?? – Not Stationary.

© EduPristine | Business Analytics Post your Queries at our Forum 9


Stationarity

▪ ARMA (p,q) are only suitable for stationary data series


▪ Stationary data series has a constant mean and variance which means they do not change with time and
data doesn’t have trends

X t   0  1 X t 1  et
▪ The above process is stationary when |1| < 1, and et is white noise
▪ Following data stationary or not?
Original Data
12.00

10.00

8.00

6.00

4.00

2.00

0.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

© EduPristine | Business Analytics Post your Queries at our Forum 10


Detrending

▪ When we have time trend, we can get the stationarity by detrending the variable
▪ A variable can be detrended by getting the residuals
X t   0  1 X t 1  et
▪ Detrending:
et  X t  (  0  1 X t 1 )

Original Data Detrending


1.00
4.00
0.80
3.50
0.60
3.00 0.40

2.50 0.20
0.00
2.00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
-0.20
1.50 -0.40
1.00 -0.60
-0.80
0.50
-1.00
0.00
-1.20
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

© EduPristine | Business Analytics Post your Queries at our Forum 11


ARIMA – Differencing

▪ ARIMA – "Type of ARMA model that can be used with some kinds of non-stationary data“
▪ When the data is non-stationary, we can use differenced series
ΔXt = Xt – Xt-1
▪ This process is called First order or “simple” differencing“
▪ Series with deterministic trends should be differenced first then an ARMA model applied
▪ "The “I” in ARIMA stands for integrated, which basically means you’re differencing“

© EduPristine | Business Analytics Post your Queries at our Forum 12


ARIMA – Differencing

▪ The ARIMA Model - Typically written as ARIMA(p, d, q) where:


• "p is the number of autoregressive terms“
• "d is the order of differencing“
• "q is the number of moving average terms“
• e.g. ARIMA(1,1,0) is a first-order AR model with one order of differencing

Original Data Differencing


1.50
4.00

3.50
1.00
3.00

2.50 0.50
2.00

1.50 0.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
1.00
-0.50
0.50

0.00
-1.00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

© EduPristine | Business Analytics Post your Queries at our Forum 13


Figure 1 – Arima Forecasting Procedure

© EduPristine | Business Analytics Post your Queries at our Forum 14


Autocorrelation functions (ACFs) and
Partial-autocorrelation functions (PACFs)

“The autocorrelation function (ACF) is a set of correlation coefficients between the series and lags of
itself over time"

“The partial autocorrelation function (PACF) is the partial correlation coefficients between the series
and lags of itself over time"
▪ Amount of correlation between a variable and a lag of itself that is not explained by correlations
at all lower-order-lags
• Correlation at lag 1 “propagates” to lag 2 and presumably to higher- order lags
• PA at lag 2 is difference between the actual correlation at lag 2 and expected correlation due to propagation
of correlation at lag

© EduPristine | Business Analytics Post your Queries at our Forum 15


Autoregressive (AR) models

An autoregressive model of order “p” AR(p).


X t  1 X t 1   2 X t 2  ...   p X t  p  et
Current value of Xt can be found from past values, plus a random shock et.
Like a multiple regression model, but Xt is regressed on past values of Xt.

The AR(1) Model


A simple way to model dependence over time is with the “autoregressive model of order 1”.
“This is a OLS model of Xt regressed on lagged Xt-1“.
X t   0  1 X t 1  et
“What does the model say for the t+1 observation?"
X t 1   0  1 X t  et 1
“The AR(1) model expresses what we don’t know in terms of what we do know at time t“.

© EduPristine | Business Analytics Post your Queries at our Forum 16


Identifying an AR process

The autocorrelations of a pure AR(p) process should decay gradually at increasing lag length. Hence, using
an autocorrelogram it is not possible to differentiate between a pure AR(3) model or a pure AR(4) model.
However, the partial autocorrelations of a pure AR(p) process do display distinctive features. The partial
autocorrelogram should ‘die out’ after p lags. Thus, the partial autocorrelogram of a pure AR(3) process
should die out after 3 lags, whereas that of a pure AR(4) process would die out after 4 lags.

ACF and PACF for an AR(1) process


© EduPristine | Business Analytics Post your Queries at our Forum 17
ACF and PACF for an AR(2) process

© EduPristine | Business Analytics Post your Queries at our Forum 18


Moving-average (MA) models

“A moving-average model of order “q” " MA(q)


X t  et  1et 1   2 et 2  ...   q et q
“Current value of Xt can be found from past shocks/error (e), plus a new shock/error (et)"
“ The time series is regarded as a moving average (unevenly weighted, because of different coefficients) of
a random shock series et"
The MA(1) model
“A first order moving average model would look like":

X t  et  1et 1

▪ If 1 is zero, X depends purely on the error or shock (e) at the current time, and there is no temporal
dependence

▪ If 1 is large, previous errors influence the value of Xt


If our model successfully captures the dependence structure in the data then the residuals should look
random

© EduPristine | Business Analytics Post your Queries at our Forum 19


Identifying a MA process

The behaviour of correlograms and partial autocorrelograms for pure MA(q) processes is the reverse of
that for pure AR processes. The autocorrelogram of a pure MA(q) process should ‘die out’ after q lags. The
partial autocorrelogram of a pure MA process, on the other hand, only decays slowly over time (similar to
the behaviour of the autocorrelogram of a pure AR process). Thus, it should be impossible to distinguish
between the PACF of an MA(3) and MA(4) process, whereas the ACF of the MA(3) process should decay to
zero after 3 lags and the MA(4) process after 4 lags.

ACF and PACF for an MA(1) process


© EduPristine | Business Analytics Post your Queries at our Forum 20
Identifying a MA process (contd…)

ACF and PACF for an MA(2) process


© EduPristine | Business Analytics Post your Queries at our Forum 21
Case Study

© EduPristine | Business Analytics 22


ARIMA – Case Study

You are a sales Manager of a leading Automobile manufacturer and you have been asked to
forecast next 8 months sales. You have been given with historical sales data of the automobiles.
You have thought to apply ARIMA model in R.

#Set working Directory


setwd("F:/Courses/Final/International Content/BA/Abroad - BA/Day - 4/Practice/ARIMA")

#Read the sales file


sa <- read.csv("sales.csv")

#Check the data


sa
head(sa)
tail(sa)

© EduPristine | Business Analytics Continued… Post your Queries at our Forum 23


Create Time Series

#Create time series for the sales data


sa <- ts(sa[,1],start=2000,freq=12)

#Let's look at the graph of the data, to identify visually if the data is stationary or not
plot(sa)

© EduPristine | Business Analytics Continued… Post your Queries at our Forum 24


Differencing

#There seems to be up-trend, let's try to go for differencing if it affects


d.sa<-diff(sa)

#Look at the resulting series


d.sa

#Let's plot this


plot(d.sa)
#Seems trend is gone with First order differencing, we will check for stationarity later

© EduPristine | Business Analytics Continued… Post your Queries at our Forum 25


Second Order Differencing

#Second order differencing


d2.sa<-diff(d.sa)
plot(d2.sa)

Continued in the following slides:

© EduPristine | Business Analytics Continued… Post your Queries at our Forum 26


Thank You!

help@edupristine.com
www.edupristine.com/ca

© EduPristine – www.edupristine.com/ca
© EduPristine | Business Analytics

Potrebbero piacerti anche