Skip to content
Search
Generic filters
Exact matches only

How to Make Out-of-Sample Forecasts with ARIMA in Python

Last Updated on August 28, 2019

Making out-of-sample forecasts can be confusing when getting started with time series data.

The statsmodels Python API provides functions for performing one-step and multi-step out-of-sample forecasts.

In this tutorial, you will clear up any confusion you have about making out-of-sample forecasts with time series data in Python.

After completing this tutorial, you will know:

  • How to make a one-step out-of-sample forecast.
  • How to make a multi-step out-of-sample forecast.
  • The difference between the forecast() and predict() functions.

Discover how to prepare and visualize time series data and develop autoregressive forecasting models in my new book, with 28 step-by-step tutorials, and full python code.

Let’s get started.

  • Updated Apr/2019: Updated the link to dataset.
  • Updated Aug/2019: Updated data loading to use new API.

How to Make Out-of-Sample Forecasts with ARIMA in Python

How to Make Out-of-Sample Forecasts with ARIMA in Python
Photo by dziambel, some rights reserved.

Tutorial Overview

This tutorial is broken down into the following 5 steps:

  1. Dataset Description
  2. Split Dataset
  3. Develop Model
  4. One-Step Out-of-Sample Forecast
  5. Multi-Step Out-of-Sample Forecast

Stop learning Time Series Forecasting the slow way!

Take my free 7-day email course and discover how to get started (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Start Your FREE Mini-Course Now!

1. Minimum Daily Temperatures Dataset

This dataset describes the minimum daily temperatures over 10 years (1981-1990) in the city of Melbourne, Australia.

The units are in degrees Celsius and there are 3,650 observations. The source of the data is credited as the Australian Bureau of Meteorology.

Download the Minimum Daily Temperatures dataset to your current working directory with the filename “daily-minimum-temperatures.csv”.

The example below loads the dataset as a Pandas Series.

# line plot of time series
from pandas import read_csv
from matplotlib import pyplot
# load dataset
series = read_csv(‘daily-minimum-temperatures.csv’, header=0, index_col=0)
# display first few rows
print(series.head(20))
# line plot of dataset
series.plot()
pyplot.show()

# line plot of time series

from pandas import read_csv

from matplotlib import pyplot

# load dataset

series = read_csv(‘daily-minimum-temperatures.csv’, header=0, index_col=0)

# display first few rows

print(series.head(20))

# line plot of dataset

series.plot()

pyplot.show()

Running the example prints the first 20 rows of the loaded dataset.

Date
1981-01-01 20.7
1981-01-02 17.9
1981-01-03 18.8
1981-01-04 14.6
1981-01-05 15.8
1981-01-06 15.8
1981-01-07 15.8
1981-01-08 17.4
1981-01-09 21.8
1981-01-10 20.0
1981-01-11 16.2
1981-01-12 13.3
1981-01-13 16.7
1981-01-14 21.5
1981-01-15 25.0
1981-01-16 20.7
1981-01-17 20.6
1981-01-18 24.8
1981-01-19 17.7
1981-01-20 15.5

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

Date

1981-01-01    20.7

1981-01-02    17.9

1981-01-03    18.8

1981-01-04    14.6

1981-01-05    15.8

1981-01-06    15.8

1981-01-07    15.8

1981-01-08    17.4

1981-01-09    21.8

1981-01-10    20.0

1981-01-11    16.2

1981-01-12    13.3

1981-01-13    16.7

1981-01-14    21.5

1981-01-15    25.0

1981-01-16    20.7

1981-01-17    20.6

1981-01-18    24.8

1981-01-19    17.7

1981-01-20    15.5

A line plot of the time series is also created.

Minimum Daily Temperatures Dataset Line Plot

Minimum Daily Temperatures Dataset Line Plot

2. Split Dataset

We can split the dataset into two parts.

The first part is the training dataset that we will use to prepare an ARIMA model. The second part is the test dataset that we will pretend is not available. It is these time steps that we will treat as out of sample.

The dataset contains data from January 1st 1981 to December 31st 1990.

We will hold back the last 7 days of the dataset from December 1990 as the test dataset and treat those time steps as out of sample.

Specifically 1990-12-25 to 1990-12-31:

1990-12-25,12.9
1990-12-26,14.6
1990-12-27,14.0
1990-12-28,13.6
1990-12-29,13.5
1990-12-30,15.7
1990-12-31,13.0

1990-12-25,12.9

1990-12-26,14.6

1990-12-27,14.0

1990-12-28,13.6

1990-12-29,13.5

1990-12-30,15.7

1990-12-31,13.0

The code below will load the dataset, split it into the training and validation datasets, and save them to files dataset.csv and validation.csv respectively.

# split the dataset
from pandas import read_csv
series = read_csv(‘daily-minimum-temperatures.csv’, header=0, index_col=0)
split_point = len(series) – 7
dataset, validation = series[0:split_point], series[split_point:]
print(‘Dataset %d, Validation %d’ % (len(dataset), len(validation)))
dataset.to_csv(‘dataset.csv’, index=False)
validation.to_csv(‘validation.csv’, index=False)

# split the dataset

from pandas import read_csv

series = read_csv(‘daily-minimum-temperatures.csv’, header=0, index_col=0)

split_point = len(series) – 7

dataset, validation = series[0:split_point], series[split_point:]

print(‘Dataset %d, Validation %d’ % (len(dataset), len(validation)))

dataset.to_csv(‘dataset.csv’, index=False)

validation.to_csv(‘validation.csv’, index=False)

Run the example and you should now have two files to work with.

The last observation in the dataset.csv is Christmas Eve 1990:

That means Christmas Day 1990 and onwards are out-of-sample time steps for a model trained on dataset.csv.

3. Develop Model

In this section, we are going to make the data stationary and develop a simple ARIMA model.

The data has a strong seasonal component. We can neutralize this and make the data stationary by taking the seasonal difference. That is, we can take the observation for a day and subtract the observation from the same day one year ago.

This will result in a stationary dataset from which we can fit a model.

# create a differenced series
def difference(dataset, interval=1):
diff = list()
for i in range(interval, len(dataset)):
value = dataset[i] – dataset[i – interval]
diff.append(value)
return numpy.array(diff)

# create a differenced series

def difference(dataset, interval=1):

diff = list()

for i in range(interval, len(dataset)):

value = dataset[i] – dataset[i – interval]

diff.append(value)

return numpy.array(diff)

We can invert this operation by adding the value of the observation one year ago. We will need to do this to any forecasts made by a model trained on the seasonally adjusted data.

# invert differenced value
def inverse_difference(history, yhat, interval=1):
return yhat + history[-interval]

# invert differenced value

def inverse_difference(history, yhat, interval=1):

return yhat + history[-interval]

We can fit an ARIMA model.

Fitting a strong ARIMA model to the data is not the focus of this post, so rather than going through the analysis of the problem or grid searching parameters, I will choose a simple ARIMA(7,0,7) configuration.

We can put all of this together as follows:

from pandas import read_csv
from statsmodels.tsa.arima_model import ARIMA
import numpy

# create a differenced series
def difference(dataset, interval=1):
diff = list()
for i in range(interval, len(dataset)):
value = dataset[i] – dataset[i – interval]
diff.append(value)
return numpy.array(diff)

# load dataset
series = read_csv(‘dataset.csv’, header=None)
# seasonal difference
X = series.values
days_in_year = 365
differenced = difference(X, days_in_year)
# fit model
model = ARIMA(differenced, order=(7,0,1))
model_fit = model.fit(disp=0)
# print summary of fit model
print(model_fit.summary())

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

from pandas import read_csv

from statsmodels.tsa.arima_model import ARIMA

import numpy

 

# create a differenced series

def difference(dataset, interval=1):

diff = list()

for i in range(interval, len(dataset)):

value = dataset[i] – dataset[i – interval]

diff.append(value)

return numpy.array(diff)

 

# load dataset

series = read_csv(‘dataset.csv’, header=None)

# seasonal difference

X = series.values

days_in_year = 365

differenced = difference(X, days_in_year)

# fit model

model = ARIMA(differenced, order=(7,0,1))

model_fit = model.fit(disp=0)

# print summary of fit model

print(model_fit.summary())

Running the example loads the dataset, takes the seasonal difference, then fits an ARIMA(7,0,7) model and prints the summary of the fit model.

ARMA Model Results
==============================================================================
Dep. Variable: y No. Observations: 3278
Model: ARMA(7, 1) Log Likelihood -8673.748
Method: css-mle S.D. of innovations 3.411
Date: Mon, 20 Feb 2017 AIC 17367.497
Time: 10:28:38 BIC 17428.447
Sample: 0 HQIC 17389.322

==============================================================================
coef std err z P>|z| [0.025 0.975]
——————————————————————————
const 0.0132 0.132 0.100 0.921 -0.246 0.273
ar.L1.y 1.1424 0.287 3.976 0.000 0.579 1.706
ar.L2.y -0.4346 0.154 -2.829 0.005 -0.736 -0.133
ar.L3.y 0.0961 0.042 2.289 0.022 0.014 0.178
ar.L4.y 0.0125 0.029 0.434 0.664 -0.044 0.069
ar.L5.y -0.0101 0.029 -0.343 0.732 -0.068 0.047
ar.L6.y 0.0119 0.027 0.448 0.654 -0.040 0.064
ar.L7.y 0.0089 0.024 0.368 0.713 -0.038 0.056
ma.L1.y -0.6157 0.287 -2.146 0.032 -1.178 -0.053
Roots
=============================================================================
Real Imaginary Modulus Frequency
—————————————————————————–
AR.1 1.2234 -0.0000j 1.2234 -0.0000
AR.2 1.2561 -1.0676j 1.6485 -0.1121
AR.3 1.2561 +1.0676j 1.6485 0.1121
AR.4 0.0349 -2.0160j 2.0163 -0.2472
AR.5 0.0349 +2.0160j 2.0163 0.2472
AR.6 -2.5770 -1.3110j 2.8913 -0.4251
AR.7 -2.5770 +1.3110j 2.8913 0.4251
MA.1 1.6242 +0.0000j 1.6242 0.0000
—————————————————————————–

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

                              ARMA Model Results

==============================================================================

Dep. Variable:                      y   No. Observations:                 3278

Model:                     ARMA(7, 1)   Log Likelihood               -8673.748

Method:                       css-mle   S.D. of innovations              3.411

Date:                Mon, 20 Feb 2017   AIC                          17367.497

Time:                        10:28:38   BIC                          17428.447

Sample:                             0   HQIC                         17389.322

 

==============================================================================

                 coef    std err          z      P>|z|      [0.025      0.975]

——————————————————————————

const          0.0132      0.132      0.100      0.921      -0.246       0.273

ar.L1.y        1.1424      0.287      3.976      0.000       0.579       1.706

ar.L2.y       -0.4346      0.154     -2.829      0.005      -0.736      -0.133

ar.L3.y        0.0961      0.042      2.289      0.022       0.014       0.178

ar.L4.y        0.0125      0.029      0.434      0.664      -0.044       0.069

ar.L5.y       -0.0101      0.029     -0.343      0.732      -0.068       0.047

ar.L6.y        0.0119      0.027      0.448      0.654      -0.040       0.064

ar.L7.y        0.0089      0.024      0.368      0.713      -0.038       0.056

ma.L1.y       -0.6157      0.287     -2.146      0.032      -1.178      -0.053

                                    Roots

=============================================================================

                 Real           Imaginary           Modulus         Frequency

—————————————————————————–

AR.1            1.2234           -0.0000j            1.2234           -0.0000

AR.2            1.2561           -1.0676j            1.6485           -0.1121

AR.3            1.2561           +1.0676j            1.6485            0.1121

AR.4            0.0349           -2.0160j            2.0163           -0.2472

AR.5            0.0349           +2.0160j            2.0163            0.2472

AR.6           -2.5770           -1.3110j            2.8913           -0.4251

AR.7           -2.5770           +1.3110j            2.8913            0.4251

MA.1            1.6242           +0.0000j            1.6242            0.0000

—————————————————————————–

We are now ready to explore making out-of-sample forecasts with the model.

4. One-Step Out-of-Sample Forecast

ARIMA models are great for one-step forecasts.

A one-step forecast is a forecast of the very next time step in the sequence from the available data used to fit the model.

In this case, we are interested in a one-step forecast of Christmas Day 1990:

Forecast Function

The statsmodel ARIMAResults object provides a forecast() function for making predictions.

By default, this function makes a single step out-of-sample forecast. As such, we can call it directly and make our forecast. The result of the forecast() function is an array containing the forecast value, the standard error of the forecast, and the confidence interval information. Now, we are only interested in the first element of this forecast, as follows.

# one-step out-of sample forecast
forecast = model_fit.forecast()[0]

# one-step out-of sample forecast

forecast = model_fit.forecast()[0]

Once made, we can invert the seasonal difference and convert the value back into the original scale.

# invert the differenced forecast to something usable
forecast = inverse_difference(X, forecast, days_in_year)

# invert the differenced forecast to something usable

forecast = inverse_difference(X, forecast, days_in_year)

The complete example is listed below:

from pandas import read_csv
from statsmodels.tsa.arima_model import ARIMA
import numpy

# create a differenced series
def difference(dataset, interval=1):
diff = list()
for i in range(interval, len(dataset)):
value = dataset[i] – dataset[i – interval]
diff.append(value)
return numpy.array(diff)

# invert differenced value
def inverse_difference(history, yhat, interval=1):
return yhat + history[-interval]

# load dataset
series = read_csv(‘dataset.csv’, header=None)
# seasonal difference
X = series.values
days_in_year = 365
differenced = difference(X, days_in_year)
# fit model
model = ARIMA(differenced, order=(7,0,1))
model_fit = model.fit(disp=0)
# one-step out-of sample forecast
forecast = model_fit.forecast()[0]
# invert the differenced forecast to something usable
forecast = inverse_difference(X, forecast, days_in_year)
print(‘Forecast: %f’ % forecast)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

from pandas import read_csv

from statsmodels.tsa.arima_model import ARIMA

import numpy

 

# create a differenced series

def difference(dataset, interval=1):

diff = list()

for i in range(interval, len(dataset)):

value = dataset[i] – dataset[i – interval]

diff.append(value)

return numpy.array(diff)

 

# invert differenced value

def inverse_difference(history, yhat, interval=1):

return yhat + history[-interval]

 

# load dataset

series = read_csv(‘dataset.csv’, header=None)

# seasonal difference

X = series.values

days_in_year = 365

differenced = difference(X, days_in_year)

# fit model

model = ARIMA(differenced, order=(7,0,1))

model_fit = model.fit(disp=0)

# one-step out-of sample forecast

forecast = model_fit.forecast()[0]

# invert the differenced forecast to something usable

forecast = inverse_difference(X, forecast, days_in_year)

print(‘Forecast: %f’ % forecast)

Running the example prints 14.8 degrees, which is close to the expected 12.9 degrees in the validation.csv file.

Predict Function

The statsmodel ARIMAResults object also provides a predict() function for making forecasts.

The predict function can be used to predict arbitrary in-sample and out-of-sample time steps, including the next out-of-sample forecast time step.

The predict function requires a start and an end to be specified, these can be the indexes of the time steps relative to the beginning of the training data used to fit the model, for example:

# one-step out of sample forecast
start_index = len(differenced)
end_index = len(differenced)
forecast = model_fit.predict(start=start_index, end=end_index)

# one-step out of sample forecast

start_index = len(differenced)

end_index = len(differenced)

forecast = model_fit.predict(start=start_index, end=end_index)

The start and end can also be a datetime string or a “datetime” type; for example:

start_index = ‘1990-12-25’
end_index = ‘1990-12-25’
forecast = model_fit.predict(start=start_index, end=end_index)

start_index = ‘1990-12-25’

end_index = ‘1990-12-25’

forecast = model_fit.predict(start=start_index, end=end_index)

and

from pandas import datetime
start_index = datetime(1990, 12, 25)
end_index = datetime(1990, 12, 26)
forecast = model_fit.predict(start=start_index, end=end_index)

from pandas import datetime

start_index = datetime(1990, 12, 25)

end_index = datetime(1990, 12, 26)

forecast = model_fit.predict(start=start_index, end=end_index)

Using anything other than the time step indexes results in an error on my system, as follows:

AttributeError: ‘NoneType’ object has no attribute ‘get_loc’

AttributeError: ‘NoneType’ object has no attribute ‘get_loc’

Perhaps you will have more luck; for now, I am sticking with the time step indexes.

The complete example is listed below:

from pandas import read_csv
from statsmodels.tsa.arima_model import ARIMA
import numpy
from pandas import datetime

# create a differenced series
def difference(dataset, interval=1):
diff = list()
for i in range(interval, len(dataset)):
value = dataset[i] – dataset[i – interval]
diff.append(value)
return numpy.array(diff)

# invert differenced value
def inverse_difference(history, yhat, interval=1):
return yhat + history[-interval]

# load dataset
series = read_csv(‘dataset.csv’, header=None)
# seasonal difference
X = series.values
days_in_year = 365
differenced = difference(X, days_in_year)
# fit model
model = ARIMA(differenced, order=(7,0,1))
model_fit = model.fit(disp=0)
# one-step out of sample forecast
start_index = len(differenced)
end_index = len(differenced)
forecast = model_fit.predict(start=start_index, end=end_index)
# invert the differenced forecast to something usable
forecast = inverse_difference(X, forecast, days_in_year)
print(‘Forecast: %f’ % forecast)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

from pandas import read_csv

from statsmodels.tsa.arima_model import ARIMA

import numpy

from pandas import datetime

 

# create a differenced series

def difference(dataset, interval=1):

diff = list()

for i in range(interval, len(dataset)):

value = dataset[i] – dataset[i – interval]

diff.append(value)

return numpy.array(diff)

 

# invert differenced value

def inverse_difference(history, yhat, interval=1):

return yhat + history[-interval]

 

# load dataset

series = read_csv(‘dataset.csv’, header=None)

# seasonal difference

X = series.values

days_in_year = 365

differenced = difference(X, days_in_year)

# fit model

model = ARIMA(differenced, order=(7,0,1))

model_fit = model.fit(disp=0)

# one-step out of sample forecast

start_index = len(differenced)

end_index = len(differenced)

forecast = model_fit.predict(start=start_index, end=end_index)

# invert the differenced forecast to something usable

forecast = inverse_difference(X, forecast, days_in_year)

print(‘Forecast: %f’ % forecast)

Running the example prints the same forecast as above when using the forecast() function.

You can see that the predict function is more flexible. You can specify any point or contiguous forecast interval in or out of sample.

Now that we know how to make a one-step forecast, we can now make some multi-step forecasts.

5. Multi-Step Out-of-Sample Forecast

We can also make multi-step forecasts using the forecast() and predict() functions.

It is common with weather data to make one week (7-day) forecasts, so in this section we will look at predicting the minimum daily temperature for the next 7 out-of-sample time steps.

Forecast Function

The forecast() function has an argument called steps that allows you to specify the number of time steps to forecast.

By default, this argument is set to 1 for a one-step out-of-sample forecast. We can set it to 7 to get a forecast for the next 7 days.

# multi-step out-of-sample forecast
forecast = model_fit.forecast(steps=7)[0]

# multi-step out-of-sample forecast

forecast = model_fit.forecast(steps=7)[0]

We can then invert each forecasted time step, one at a time and print the values. Note that to invert the forecast value for t+2, we need the inverted forecast value for t+1. Here, we add them to the end of a list called history for use when calling inverse_difference().

# invert the differenced forecast to something usable
history = [x for x in X]
day = 1
for yhat in forecast:
inverted = inverse_difference(history, yhat, days_in_year)
print(‘Day %d: %f’ % (day, inverted))
history.append(inverted)
day += 1

# invert the differenced forecast to something usable

history = [x for x in X]

day = 1

for yhat in forecast:

inverted = inverse_difference(history, yhat, days_in_year)

print(‘Day %d: %f’ % (day, inverted))

history.append(inverted)

day += 1

The complete example is listed below:

from pandas import read_csv
from statsmodels.tsa.arima_model import ARIMA
import numpy

# create a differenced series
def difference(dataset, interval=1):
diff = list()
for i in range(interval, len(dataset)):
value = dataset[i] – dataset[i – interval]
diff.append(value)
return numpy.array(diff)

# invert differenced value
def inverse_difference(history, yhat, interval=1):
return yhat + history[-interval]

# load dataset
series = read_csv(‘dataset.csv’, header=None)
# seasonal difference
X = series.values
days_in_year = 365
differenced = difference(X, days_in_year)
# fit model
model = ARIMA(differenced, order=(7,0,1))
model_fit = model.fit(disp=0)
# multi-step out-of-sample forecast
forecast = model_fit.forecast(steps=7)[0]
# invert the differenced forecast to something usable
history = [x for x in X]
day = 1
for yhat in forecast:
inverted = inverse_difference(history, yhat, days_in_year)
print(‘Day %d: %f’ % (day, inverted))
history.append(inverted)
day += 1

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

from pandas import read_csv

from statsmodels.tsa.arima_model import ARIMA

import numpy

 

# create a differenced series

def difference(dataset, interval=1):

diff = list()

for i in range(interval, len(dataset)):

value = dataset[i] – dataset[i – interval]

diff.append(value)

return numpy.array(diff)

 

# invert differenced value

def inverse_difference(history, yhat, interval=1):

return yhat + history[-interval]

 

# load dataset

series = read_csv(‘dataset.csv’, header=None)

# seasonal difference

X = series.values

days_in_year = 365

differenced = difference(X, days_in_year)

# fit model

model = ARIMA(differenced, order=(7,0,1))

model_fit = model.fit(disp=0)

# multi-step out-of-sample forecast

forecast = model_fit.forecast(steps=7)[0]

# invert the differenced forecast to something usable

history = [x for x in X]

day = 1

for yhat in forecast:

inverted = inverse_difference(history, yhat, days_in_year)

print(‘Day %d: %f’ % (day, inverted))

history.append(inverted)

day += 1

Running the example prints the forecast for the next 7 days.

Day 1: 14.861669
Day 2: 15.628784
Day 3: 13.331349
Day 4: 11.722413
Day 5: 10.421523
Day 6: 14.415549
Day 7: 12.674711

Day 1: 14.861669

Day 2: 15.628784

Day 3: 13.331349

Day 4: 11.722413

Day 5: 10.421523

Day 6: 14.415549

Day 7: 12.674711

Predict Function

The predict() function can also forecast the next 7 out-of-sample time steps.

Using time step indexes, we can specify the end index as 6 more time steps in the future; for example:

# multi-step out-of-sample forecast
start_index = len(differenced)
end_index = start_index + 6
forecast = model_fit.predict(start=start_index, end=end_index)

# multi-step out-of-sample forecast

start_index = len(differenced)

end_index = start_index + 6

forecast = model_fit.predict(start=start_index, end=end_index)

The complete example is listed below.

from pandas import read_csv
from statsmodels.tsa.arima_model import ARIMA
import numpy

# create a differenced series
def difference(dataset, interval=1):
diff = list()
for i in range(interval, len(dataset)):
value = dataset[i] – dataset[i – interval]
diff.append(value)
return numpy.array(diff)

# invert differenced value
def inverse_difference(history, yhat, interval=1):
return yhat + history[-interval]

# load dataset
series = read_csv(‘dataset.csv’, header=None)
# seasonal difference
X = series.values
days_in_year = 365
differenced = difference(X, days_in_year)
# fit model
model = ARIMA(differenced, order=(7,0,1))
model_fit = model.fit(disp=0)
# multi-step out-of-sample forecast
start_index = len(differenced)
end_index = start_index + 6
forecast = model_fit.predict(start=start_index, end=end_index)
# invert the differenced forecast to something usable
history = [x for x in X]
day = 1
for yhat in forecast:
inverted = inverse_difference(history, yhat, days_in_year)
print(‘Day %d: %f’ % (day, inverted))
history.append(inverted)
day += 1

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

from pandas import read_csv

from statsmodels.tsa.arima_model import ARIMA

import numpy

 

# create a differenced series

def difference(dataset, interval=1):

diff = list()

for i in range(interval, len(dataset)):

value = dataset[i] – dataset[i – interval]

diff.append(value)

return numpy.array(diff)

 

# invert differenced value

def inverse_difference(history, yhat, interval=1):

return yhat + history[-interval]

 

# load dataset

series = read_csv(‘dataset.csv’, header=None)

# seasonal difference

X = series.values

days_in_year = 365

differenced = difference(X, days_in_year)

# fit model

model = ARIMA(differenced, order=(7,0,1))

model_fit = model.fit(disp=0)

# multi-step out-of-sample forecast

start_index = len(differenced)

end_index = start_index + 6

forecast = model_fit.predict(start=start_index, end=end_index)

# invert the differenced forecast to something usable

history = [x for x in X]

day = 1

for yhat in forecast:

inverted = inverse_difference(history, yhat, days_in_year)

print(‘Day %d: %f’ % (day, inverted))

history.append(inverted)

day += 1

Running the example produces the same results as calling the forecast() function in the previous section, as you would expect.

Day 1: 14.861669
Day 2: 15.628784
Day 3: 13.331349
Day 4: 11.722413
Day 5: 10.421523
Day 6: 14.415549
Day 7: 12.674711

Day 1: 14.861669

Day 2: 15.628784

Day 3: 13.331349

Day 4: 11.722413

Day 5: 10.421523

Day 6: 14.415549

Day 7: 12.674711

Summary

In this tutorial, you discovered how to make out-of-sample forecasts in Python using statsmodels.

Specifically, you learned:

  • How to make a one-step out-of-sample forecast.
  • How to make a 7-day multi-step out-of-sample forecast.
  • How to use both the forecast() and predict() functions when forecasting.

Do you have any questions about out-of-sample forecasts, or about this post? Ask your questions in the comments and I will do my best to answer.

Want to Develop Time Series Forecasts with Python?

Introduction to Time Series Forecasting With Python

Develop Your Own Forecasts in Minutes

…with just a few lines of python code

Discover how in my new Ebook:
Introduction to Time Series Forecasting With Python

It covers self-study tutorials and end-to-end projects on topics like:
Loading data, visualization, modeling, algorithm tuning, and much more…

Finally Bring Time Series Forecasting to
Your Own Projects

Skip the Academics. Just Results.

See What’s Inside

error: Content is protected !!