Last Updated on August 21, 2019
Our time series dataset may contain a trend.
A trend is a continued increase or decrease in the series over time. There can be benefit in identifying, modeling, and even removing trend information from your time series dataset.
In this tutorial, you will discover how to model and remove trend information from time series data in Python.
After completing this tutorial, you will know:
- The importance and types of trends that may exist in time series and how to identify them.
- How to use a simple differencing method to remove a trend.
- How to model a linear trend and remove it from a sales time series dataset.
Discover how to prepare and visualize time series data and develop autoregressive forecasting models in my new book, with 28 step-by-step tutorials, and full python code.
Let’s get started.
- Updated Apr/2019: Updated the link to dataset.
What You Will Learn
Trends in Time Series
A trend is a long-term increase or decrease in the level of the time series.
In general, a systematic change in a time series that does not appear to be periodic is known as a trend.
— Page 5, Introductory Time Series with R
Identifying and understanding trend information can aid in improving model performance; below are a few reasons:
- Faster Modeling: Perhaps the knowledge of a trend or lack of a trend can suggest methods and make model selection and evaluation more efficient.
- Simpler Problem: Perhaps we can correct or remove the trend to simplify modeling and improve model performance.
- More Data: Perhaps we can use trend information, directly or as a summary, to provide additional information to the model and improve model performance.
Types of Trends
There are all kinds of trends.
Two general classes that we may think about are:
- Deterministic Trends: These are trends that consistently increase or decrease.
- Stochastic Trends: These are trends that increase and decrease inconsistently.
In general, deterministic trends are easier to identify and remove, but the methods discussed in this tutorial can still be useful for stochastic trends.
We can think about trends in terms of their scope of observations.
- Global Trends: These are trends that apply to the whole time series.
- Local Trends: These are trends that apply to parts or subsequences of a time series.
Generally, global trends are easier to identify and address.
Identifying a Trend
You can plot time series data to see if a trend is obvious or not.
The difficulty is that in practice, identifying a trend in a time series can be a subjective process. As such, extracting or removing it from the time series can be just as subjective.
Create line plots of your data and inspect the plots for obvious trends.
Add linear and nonlinear trend lines to your plots and see if a trend is obvious.
Removing a Trend
A time series with a trend is called non-stationary.
An identified trend can be modeled. Once modeled, it can be removed from the time series dataset. This is called detrending the time series.
If a dataset does not have a trend or we successfully remove the trend, the dataset is said to be trend stationary.
Using Time Series Trends in Machine Learning
From a machine learning perspective, a trend in your data represents two opportunities:
- Remove Information: To remove systematic information that distorts the relationship between input and output variables.
- Add Information: To add systematic information to improve the relationship between input and output variables.
Specifically, a trend can be removed from your time series data (and data in the future) as a data preparation and cleaning exercise. This is common when using statistical methods for time series forecasting, but does not always improve results when using machine learning models.
Alternately, a trend can be added, either directly or as a summary, as a new input variable to the supervised learning problem to predict the output variable.
One or both approaches may be relevant for your time series forecasting problem and may be worth investigating.
Next, let’s take a look at a dataset that has a trend.
Stop learning Time Series Forecasting the slow way!
Take my free 7-day email course and discover how to get started (with sample code).
Click to sign-up and also get a free PDF Ebook version of the course.
Start Your FREE Mini-Course Now!
Shampoo Sales Dataset
This dataset describes the monthly number of sales of shampoo over a 3 year period.
The units are a sales count and there are 36 observations. The original dataset is credited to Makridakis, Wheelwright, and Hyndman (1998).
Below is a sample of the first 5 rows of data, including the header row.
“Month”,”Sales”
“1-01”,266.0
“1-02”,145.9
“1-03”,183.1
“1-04”,119.3
“1-05”,180.3
“Month”,”Sales”
“1-01”,266.0
“1-02”,145.9
“1-03”,183.1
“1-04”,119.3
“1-05”,180.3
Below is a plot of the entire dataset, where you can learn more and download the dataset.
The dataset shows an increasing trend.
Load the Shampoo Sales Dataset
Download the dataset and place it in the current working directory with the filename “shampoo-sales.csv“.
The dataset can be loaded with a custom date parsing routine as follows:
from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot
def parser(x):
return datetime.strptime(‘190’+x, ‘%Y-%m’)
series = read_csv(‘shampoo-sales.csv’, header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
series.plot()
pyplot.show()
from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot
def parser(x):
return datetime.strptime(‘190’+x, ‘%Y-%m’)
series = read_csv(‘shampoo-sales.csv’, header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
series.plot()
pyplot.show()
Running the example loads the dataset and creates a plot.
Detrend by Differencing
Perhaps the simplest method to detrend a time series is by differencing.
Specifically, a new series is constructed where the value at the current time step is calculated as the difference between the original observation and the observation at the previous time step.
value(t) = observation(t) – observation(t-1)
value(t) = observation(t) – observation(t-1)
This has the effect of removing a trend from a time series dataset.
We can create a new difference dataset in Python by implementing this directly. A new list of observations can be created.
Below is an example that creates the difference detrended version of the Shampoo Sales dataset.
from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot
def parser(x):
return datetime.strptime(‘190’+x, ‘%Y-%m’)
series = read_csv(‘shampoo-sales.csv’, header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
X = series.values
diff = list()
for i in range(1, len(X)):
value = X[i] – X[i – 1]
diff.append(value)
pyplot.plot(diff)
pyplot.show()
from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot
def parser(x):
return datetime.strptime(‘190’+x, ‘%Y-%m’)
series = read_csv(‘shampoo-sales.csv’, header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
X = series.values
diff = list()
for i in range(1, len(X)):
value = X[i] – X[i – 1]
diff.append(value)
pyplot.plot(diff)
pyplot.show()
Running the example creates the new detrended dataset and then plots the time series.
Because no difference value can be created for the first observation (there is nothing for it to be subtracted from), the new dataset contains one less record. We can see that indeed the trend does appear to have been removed.
This approach works well for data with a linear trend. If the trend is quadratic (the change in the trend also increases or decreases), then a difference of the already-differenced dataset can be taken, a second level of differencing. This process can be further repeated if needed.
Because differencing only requires the observation at the previous time step, it can easily be applied to unseen out-of-sample data to either preprocess or provide an additional input for supervised learning.
Next, we will look at fitting a model to describe the trend.
Detrend by Model Fitting
A trend is often easily visualized as a line through the observations.
Linear trends can be summarized by a linear model, and nonlinear trends may be best summarized using a polynomial or other curve-fitting method.
Because of the subjective and domain-specific nature of identifying trends, this approach can help to identify whether a trend is present. Even fitting a linear model to a trend that is clearly super-linear or exponential can be helpful.
In addition to being used as a trend identification tool, these fit models can also be used to detrend a time series.
For example, a linear model can be fit on the time index to predict the observation. This dataset would look as follows:
X, y
1, obs1
2, obs2
3, obs3
4, obs4
5, obs5
X, y
1, obs1
2, obs2
3, obs3
4, obs4
5, obs5
The predictions from this model will form a straight line that can be taken as the trend line for the dataset. These predictions can also be subtracted from the original time series to provide a detrended version of the dataset.
value(t) = observation(t) – prediction(t)
value(t) = observation(t) – prediction(t)
The residuals from the fit of the model are a detrended form of the dataset. Polynomial curve fitting and other nonlinear models can also be used.
We can implement this in Python by training a scikit-learn LinearRegression model on the data.
from pandas import read_csv
from pandas import datetime
from sklearn.linear_model import LinearRegression
from matplotlib import pyplot
import numpy
def parser(x):
return datetime.strptime(‘190’+x, ‘%Y-%m’)
series = read_csv(‘shampoo-sales.csv’, header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
# fit linear model
X = [i for i in range(0, len(series))]
X = numpy.reshape(X, (len(X), 1))
y = series.values
model = LinearRegression()
model.fit(X, y)
# calculate trend
trend = model.predict(X)
# plot trend
pyplot.plot(y)
pyplot.plot(trend)
pyplot.show()
# detrend
detrended = [y[i]-trend[i] for i in range(0, len(series))]
# plot detrended
pyplot.plot(detrended)
pyplot.show()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
from pandas import read_csv
from pandas import datetime
from sklearn.linear_model import LinearRegression
from matplotlib import pyplot
import numpy
def parser(x):
return datetime.strptime(‘190’+x, ‘%Y-%m’)
series = read_csv(‘shampoo-sales.csv’, header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
# fit linear model
X = [i for i in range(0, len(series))]
X = numpy.reshape(X, (len(X), 1))
y = series.values
model = LinearRegression()
model.fit(X, y)
# calculate trend
trend = model.predict(X)
# plot trend
pyplot.plot(y)
pyplot.plot(trend)
pyplot.show()
# detrend
detrended = [y[i]-trend[i] for i in range(0, len(series))]
# plot detrended
pyplot.plot(detrended)
pyplot.show()
Running the example first fits the linear model to the integer-indexed observations and plots the trend line (green) over the original dataset (blue).
Next, the trend is subtracted from the original dataset and the resulting detrended dataset is plotted.
Again, we can see that this approach has effectively detrended the dataset. There may be a parabola in the residuals, suggesting that perhaps a polynomial fit may have done a better job.
Because the trend model takes only the integer index of the observation as input, it can be used on new data to either detrend or provide a new input variable for the model.
Further Reading
Below are some additional resources on trend estimation and detrending in time series.
Summary
In this tutorial, you discovered trends in time series data and how to remove them with Python.
Specifically, you learned:
- About the importance of trend information in time series and how you may be able to use it in machine learning.
- How to use differencing to remove a trend from time series data.
- How to model a linear trend and remove it from time series data.
Do you have any questions about detrending or about this tutorial?
Ask your questions in the comments below and I will do my best to answer.
Want to Develop Time Series Forecasts with Python?
Develop Your Own Forecasts in Minutes
…with just a few lines of python code
Discover how in my new Ebook:
Introduction to Time Series Forecasting With Python
It covers self-study tutorials and end-to-end projects on topics like:
Loading data, visualization, modeling, algorithm tuning, and much more…
Finally Bring Time Series Forecasting to
Your Own Projects
Skip the Academics. Just Results.
See What’s Inside