Sei sulla pagina 1di 6

An Attempt to model Quarterly GDP Data from the Agriculture Sector

The first step is to observe the raw data to see if there is any trend and/or seasonality and visually observe if the data is stationary. Time series Plot of the raw data:

On observation one can say that there is a visible seasonality of period 4 (quarterly) and a visible trend in the raw data. Hence just by visualizing the data one can think that the raw data would have to be differenced at lag 4 once to remove seasonality. Philip-Perron Test for the original raw data: Null Hypothesis: Presence of unit root in the AR polynomial of an ARMA process Alternative Hypothesis: Unit root is not present in the lag polynomial Data split in 4 parts and named as Q1, Q2, Q3 and Q4. Each is subjected to PP Test. p-value (Q1) = 0.99 p-value (Q3) = 0.9368846 p-value (Q2) = 0.99 p-value (Q4) = 0.9559262

Since the p-value is sufficiently high we cannot reject the null hypothesis that there is a unit root in the lag polynomial. We will now plot the after seasonal first difference.

Page 1 of 6

We see that there is still seasonality left in the data. This differenced series (at lag 4) is then subjected to Adjusted Dickey-Fuller test and PhilipPerron test. ADF Test for the differenced series : Null Hypothesis: Presence of unit root in the lag polynomial Alternative Hypothesis: Unit root is not present in the lag polynomial Dickey-Fuller p-value = 0.01 PP Test for the differenced series : Null Hypothesis: Presence of unit root in the AR polynomial of an ARMA process Alternative Hypothesis: Unit root is not present in the lag polynomial Dickey-Fuller p-value = 0.02386142 Since the p-value is not sufficiently low we cannot reject the null hypothesis that there is a unit root in the lag polynomial. Consequently, we will difference the series again. This is further confirmed by the acf and pacf of the this resultant series.

There are quite a few entries that are outside the confidence level. We will now difference the series again at lag 4 and we see that the plot seems stationary.

This differenced series (at lag 4) is then subjected to Adjusted Dickey-Fuller test and PhilipPerron test. ADF Test for the differenced series (differenced at lag 4 for seasonality and also for trend): Null Hypothesis: Presence of unit root in the lag polynomial Alternative Hypothesis: Unit root is not present in the lag polynomial Dickey-Fuller p-value = 0.01 PP Test for the differenced series (differenced at lag 4 for seasonality and also for trend): Null Hypothesis: Presence of unit root in the AR polynomial of an ARMA process Alternative Hypothesis: Unit root is not present in the lag polynomial Dickey-Fuller p-value = 0.01 In both the tests, we find that since p-value is very small, there is strong evidence to reject the null hypothesis and hence we can conclude that unit root is not present in the lag polynomial and that the seasonal integrated process of order one is apt.

Page 2 of 6

We will also check the acf and pacf plot of this resultant series We see that almost all the values except the lag 7 are within the confidnce level. This suggests that we should consider this in our model construction. We will now check the periodogram of the resultant series. The periodogram is an estimate of the spectral density of a signal. The periodogram is the Fourier transform of the autocovariance function. A periodogram helps in explaining the variability in data due to various frequencies present.

The above diagram shows the periodogram of a data set which has been differenced it once for trend and once for sesonality.We are now interested in looking at the frequencies where the two peaks show up in the diagram which will give us an idea as to which frequency corresponds to the maximum variability in the data set. We find that the two peaks occur at approximately two frequencies which are 0.15 and 0.38 from the diagram. We also find that the number of observations per cycle (reciprocal of periodogram) are 7 and 3 respectively. As we have accounted for both trend as well as seasonality, we now move on to the business of model construction. Model Construction: We now embark upon constructing a SARIMA model with d and D being 1. We have to calculate an appropriate value for p,q, Pand Q using the AIC rigmaroll. Eventually we end up with the following model: arima(x = gdpagr, order = c(0, 1, 1), seasonal = list(order = c(1, 1, 2), period = 4)) Coefficients: ma1 sar1 sma1 sma2 -0.6370 -0.4176 0.0627 0.6845 s.e. 0.1605 0.4393 0.5082 0.4535 sigma^2 estimated as 34937370: log likelihood = -357.11, aic = 724.22 From the above model we find that there are some insignificant terms like sma1. Now we make that term 0 and find the model again. arima(x = gdpagr, order = c(0, 1, 1), seasonal = list(order = c(1, 1, 2), period = 4), fixed = c(NA, NA, 0, NA)) Coefficients: ma1 sar1 sma1 sma2 -0.6439 -0.3662 0 0.7268 s.e. 0.1489 0.1478 0 0.3395 sigma^2 estimated as 34286125: log likelihood = -357.12, aic = 722.23 Page 3 of 6

Now we find that all the terms are significant. Hence we have finally arrived at SARIMA(0,1,1,1,1,2) model for the given agriculture data set. Our next job is to check if the residuals in the model are WN or not. Residual plot: The residual plot indicates that it is not homoscedastic because of some spikes in some time periods. It can be argued that the variance of the process is quite high and hence a high volatility in the GDP of agriculture sector. Note: When we do a residual analysis, we have to check for two important things: Whiteness test and Independence test Whiteness Test: A good model must have the autocorrelation function inside the confidence interval of the corresponding estimates, indicating that the residuals are uncorrelated. Independence Test: A good model must have residuals uncorrelated with past inputs The acf and pacf plots of the residuals are shown below:

ACF and PACF plots: The ACF and PACF for WN(0,2) are zero. Hence, the residuals which are estimates of the epsilons () in the model must die out or not be present for higher lags. Also for WN it can be shown that PACF estimate is asymptotically normal (0,1/n) The PACF plot indicates that values at all lags are within the limits for a 95% C.I thereby concurring with the theory mentioned above. Similar is the case for ACF plot where all the values are within the limits. Next we plot the normal quantile quantile plot to check if the residuals are normal.

The above plot indicates that the residuals may not be normal as we dont get a straight line. Time series diagnosis plots: We now look at the standardized residual plots and the p-values for Ljung-box plots for various lags. The standardized residual plot show some spikes in some time periods which is indicative of the fact that they are violating the normality and also heteroscedasticity. The ACF of residuals plot indicates that the ACFs at higher lags are not significant which in turn indicates that the residuals are uncorrelated to some extent.

Page 4 of 6

The p-values for Ljung-Box statistic are high till approximately 10 lags indicating that there is no serious problem of autocorrelation. The p-values from the Ljung-Box statistic suggest that white noise has been reached. Next we also look at the periodogram plots to find if its a WN. The first diagram is with L= 2 and second is with L=4. We carry out kolmogorov smirnov test to test if the residuals are WN or not. One-sample Kolmogorov-Smirnov test data: u D = 0.0955, p-value = 0.9847 alternative hypothesis: two-sided Since p-value is high we cannot reject null hypothesis indicating that the residuals are WN. This is further vetted by Box-Ljung Test. Box-Ljung test: Null Hypothesis: The data are independently distributed Alternative Hypothesis: The data are not independently distributed. X-squared = 14.7266, df = 20, p-value = 0.7918 Note: These values are for lag = 20 Since p-value is sufficiently high one cannot reject null hypothesis. Hence residuals are WN. Normality checks for residuals: We will also run a normality test on the residuals. Method 1 Shapiro-Wilk normality test 2 Anderson-Darling normality test 3 Cramer-von Mises normality test 4 Lilliefors (Kolmogorov-Smirnov) normality test 5 Shapiro-Francia normality test P.Value 0.2622724 0.2030070 0.1784992 0.3726499 0.1433496

Since the p-values are very high, we cannot reject the null hypothesis. Hence the residuals are normally distributed. With the above tests results, we can conclude that having been reached white noise and normally distributed residuals, the model building exercise is now complete. Step 2: Prediction We now move on to the most important part of this exercise which is to predict the gold prices for the next 10 periods.

Predicted 222721.4 184255.5

SE 5926.786 6289.458

Lower Limit 210867.9 171676.6 Page 5 of 6

Upper Limit 234575.0 196834.4

Real Value 213342 179767

310437.6 241714.2

6632.327 6958.322

297172.9 227797.6

323702.3 255630.8

CONCLUSION We have predicted the Agriculture GDP estimates for the next two periods with confidence levels of 95% and compared it with the real values for the corresponding periods. We observe that the values are within the predicted confidence limits for both the periods indicating that the model is doing a very good job.

Page 6 of 6

Potrebbero piacerti anche