Continuing from Part 1.
The ARIMA(1,1,1) model has a smaller AIC and a smaller residual variance. In the ARIMA(0,1,2) model, the MA(2) parameter is not significant, and all estimates are highly correlated. This suggests that the ARIMA(1,1,1) is preferable.
To check the need for additional parameters, we fitted both models with extra parameters, with and without a constant. Results are summarised in the following table:
In models with (p+q) = 3, estimates were very highly correlated. Consequently a number of such models had identical AIC and residual variance. This suggests that that many parameters are not necessary. Therefore I prefer models with (p+q) not exceeding 2. Among these, the ARIMA(1,1,1) without constant has the smallest AIC and residual variance.
Details of the ARIMA(1,1,1) model with no constant are given below.
Note: Both parameters are statistically significant, and they are only moderately correlated.
A time plot of residuals does not show any pattern. and a plot of residuals against fitted values looks like a random scatter (below).
The autocorrelogram of residuals is shown below.
The results are consistent with a white noise process. The Box-Ljung statistics are not significant at any lag. We can the see a histogram of residuals seems to approximate a normal curve (as shown below).
The histogram shows a good approximation to a normal distribution. The normal Q-Q plot and normality tests also are consistent with the residuals being normally distributed. The normality tests also confirm the normality.
Next we do a diagnostic on the overfitting through the fitting of an ARIMA(2,1,1) or ARIMA(1,1,2) model to verify if the choice of ARIMA (1,1,1) is a good. As is seen below, with ARIMA(2,1,1), the second coefficient of AR is insignificant with the p-value is approximately 28.5%.
This summarises the good of fit of ARIMA(1,1,1) without constant.
As shown in the above plot, on average the fitted values underestimates the original values, so ARIMA(1,1,1) without constant is a good model, but is not the best one. According to the text, Shumway and Stoffer (2000), page 170, one should consider Long Memory ARMA models for this dataset (ie differencing d = 0.384), but this model beyonds the level of this course.
To be Continued in Part 3.