Last Updated on January 8, 2020
The encoder-decoder model provides a pattern for using recurrent neural networks to address challenging sequence-to-sequence prediction problems such as machine translation.
Encoder-decoder models can be developed in the Keras Python deep learning library and an example of a neural machine translation system developed with this model has been described on the Keras blog, with sample code distributed with the Keras project.
This example can provide the basis for developing encoder-decoder LSTM models for your own sequence-to-sequence prediction problems.
In this tutorial, you will discover how to develop a sophisticated encoder-decoder recurrent neural network for sequence-to-sequence prediction problems with Keras.
After completing this tutorial, you will know:
- How to correctly define a sophisticated encoder-decoder model in Keras for sequence-to-sequence prediction.
- How to define a contrived yet scalable sequence-to-sequence prediction problem that you can use to evaluate the encoder-decoder LSTM model.
- How to apply the encoder-decoder LSTM model in Keras to address the scalable integer sequence-to-sequence prediction problem.
Discover how to develop LSTMs such as stacked, bidirectional, CNN-LSTM, Encoder-Decoder seq2seq and more in my new book, with 14 step-by-step tutorials and full code.
Let’s get started.
- Update Jan/2020: Updated API for Keras 2.3 and TensorFlow 2.0.
What You Will Learn
Tutorial Overview
This tutorial is divided into 3 parts; they are:
- Encoder-Decoder Model in Keras
- Scalable Sequence-to-Sequence Problem
- Encoder-Decoder LSTM for Sequence Prediction
Python Environment
This tutorial assumes you have a Python SciPy environment installed. You can use either Python 2 or 3 with this tutorial.
You must have Keras (2.0 or higher) installed with either the TensorFlow or Theano backend.
The tutorial also assumes you have scikit-learn, Pandas, NumPy, and Matplotlib installed.
If you need help with your environment, see this post:
Encoder-Decoder Model in Keras
The encoder-decoder model is a way of organizing recurrent neural networks for sequence-to-sequence prediction problems.
It was originally developed for machine translation problems, although it has proven successful at related sequence-to-sequence prediction problems such as text summarization and question answering.
The approach involves two recurrent neural networks, one to encode the source sequence, called the encoder, and a second to decode the encoded source sequence into the target sequence, called the decoder.
The Keras deep learning Python library provides an example of how to implement the encoder-decoder model for machine translation (lstm_seq2seq.py) described by the libraries creator in the post: “A ten-minute introduction to sequence-to-sequence learning in Keras.”
For a detailed breakdown of this model see the post:
For more information on the use of return_state, which might be new to you, see the post:
For more help getting started with the Keras Functional API, see the post:
Using the code in that example as a starting point, we can develop a generic function to define an encoder-decoder recurrent neural network. Below is this function named define_models().
# returns train, inference_encoder and inference_decoder models
def define_models(n_input, n_output, n_units):
# define training encoder
encoder_inputs = Input(shape=(None, n_input))
encoder = LSTM(n_units, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]
# define training decoder
decoder_inputs = Input(shape=(None, n_output))
decoder_lstm = LSTM(n_units, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(n_output, activation=’softmax’)
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
# define inference encoder
encoder_model = Model(encoder_inputs, encoder_states)
# define inference decoder
decoder_state_input_h = Input(shape=(n_units,))
decoder_state_input_c = Input(shape=(n_units,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)
# return all models
return model, encoder_model, decoder_model
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# returns train, inference_encoder and inference_decoder models
def define_models(n_input, n_output, n_units):
# define training encoder
encoder_inputs = Input(shape=(None, n_input))
encoder = LSTM(n_units, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]
# define training decoder
decoder_inputs = Input(shape=(None, n_output))
decoder_lstm = LSTM(n_units, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(n_output, activation=’softmax’)
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
# define inference encoder
encoder_model = Model(encoder_inputs, encoder_states)
# define inference decoder
decoder_state_input_h = Input(shape=(n_units,))
decoder_state_input_c = Input(shape=(n_units,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)
# return all models
return model, encoder_model, decoder_model
The function takes 3 arguments, as follows:
- n_input: The cardinality of the input sequence, e.g. number of features, words, or characters for each time step.
- n_output: The cardinality of the output sequence, e.g. number of features, words, or characters for each time step.
- n_units: The number of cells to create in the encoder and decoder models, e.g. 128 or 256.
The function then creates and returns 3 models, as follows:
- train: Model that can be trained given source, target, and shifted target sequences.
- inference_encoder: Encoder model used when making a prediction for a new source sequence.
- inference_decoder Decoder model use when making a prediction for a new source sequence.
The model is trained given source and target sequences where the model takes both the source and a shifted version of the target sequence as input and predicts the whole target sequence.
For example, one source sequence may be [1,2,3] and the target sequence [4,5,6]. The inputs and outputs to the model during training would be:
Input1: [‘1’, ‘2’, ‘3’]
Input2: [‘_’, ‘4’, ‘5’]
Output: [‘4’, ‘5’, ‘6’]
Input1: [‘1’, ‘2’, ‘3’]
Input2: [‘_’, ‘4’, ‘5’]
Output: [‘4’, ‘5’, ‘6’]
The model is intended to be called recursively when generating target sequences for new source sequences.
The source sequence is encoded and the target sequence is generated one element at a time, using a “start of sequence” character such as ‘_’ to start the process. Therefore, in the above case, the following input-output pairs would occur during training:
t, Input1, Input2, Output
1, [‘1’, ‘2’, ‘3’], ‘_’, ‘4’
2, [‘1’, ‘2’, ‘3’], ‘4’, ‘5’
3, [‘1’, ‘2’, ‘3’], ‘5’, ‘6’
t, Input1, Input2, Output
1, [‘1’, ‘2’, ‘3’], ‘_’, ‘4’
2, [‘1’, ‘2’, ‘3’], ‘4’, ‘5’
3, [‘1’, ‘2’, ‘3’], ‘5’, ‘6’
Here you can see how the recursive use of the model can be used to build up output sequences.
During prediction, the inference_encoder model is used to encode the input sequence once which returns states that are used to initialize the inference_decoder model. From that point, the inference_decoder model is used to generate predictions step by step.
The function below named predict_sequence() can be used after the model is trained to generate a target sequence given a source sequence.
# generate target given source sequence
def predict_sequence(infenc, infdec, source, n_steps, cardinality):
# encode
state = infenc.predict(source)
# start of sequence input
target_seq = array([0.0 for _ in range(cardinality)]).reshape(1, 1, cardinality)
# collect predictions
output = list()
for t in range(n_steps):
# predict next char
yhat, h, c = infdec.predict([target_seq] + state)
# store prediction
output.append(yhat[0,0,:])
# update state
state = [h, c]
# update target sequence
target_seq = yhat
return array(output)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# generate target given source sequence
def predict_sequence(infenc, infdec, source, n_steps, cardinality):
# encode
state = infenc.predict(source)
# start of sequence input
target_seq = array([0.0 for _ in range(cardinality)]).reshape(1, 1, cardinality)
# collect predictions
output = list()
for t in range(n_steps):
# predict next char
yhat, h, c = infdec.predict([target_seq] + state)
# store prediction
output.append(yhat[0,0,:])
# update state
state = [h, c]
# update target sequence
target_seq = yhat
return array(output)
This function takes 5 arguments as follows:
- infenc: Encoder model used when making a prediction for a new source sequence.
- infdec: Decoder model use when making a prediction for a new source sequence.
- source:Encoded source sequence.
- n_steps: Number of time steps in the target sequence.
- cardinality: The cardinality of the output sequence, e.g. the number of features, words, or characters for each time step.
The function then returns a list containing the target sequence.
Scalable Sequence-to-Sequence Problem
In this section, we define a contrived and scalable sequence-to-sequence prediction problem.
The source sequence is a series of randomly generated integer values, such as [20, 36, 40, 10, 34, 28], and the target sequence is a reversed pre-defined subset of the input sequence, such as the first 3 elements in reverse order [40, 36, 20].
The length of the source sequence is configurable; so is the cardinality of the input and output sequence and the length of the target sequence.
We will use source sequences of 6 elements, a cardinality of 50, and target sequences of 3 elements.
Below are some more examples to make this concrete.
Source, Target
[13, 28, 18, 7, 9, 5] [18, 28, 13]
[29, 44, 38, 15, 26, 22] [38, 44, 29]
[27, 40, 31, 29, 32, 1] [31, 40, 27]
…
Source, Target
[13, 28, 18, 7, 9, 5] [18, 28, 13]
[29, 44, 38, 15, 26, 22] [38, 44, 29]
[27, 40, 31, 29, 32, 1] [31, 40, 27]
…
You are encouraged to explore larger and more complex variations. Post your findings in the comments below.
Let’s start off by defining a function to generate a sequence of random integers.
We will use the value of 0 as the padding or start of sequence character, therefore it is reserved and we cannot use it in our source sequences. To achieve this, we will add 1 to our configured cardinality to ensure the one-hot encoding is large enough (e.g. a value of 1 maps to a ‘1’ value in index 1).
For example:
We can use the randint() python function to generate random integers in a range between 1 and 1-minus the size of the problem’s cardinality. The generate_sequence() below generates a sequence of random integers.
# generate a sequence of random integers
def generate_sequence(length, n_unique):
return [randint(1, n_unique-1) for _ in range(length)]
# generate a sequence of random integers
def generate_sequence(length, n_unique):
return [randint(1, n_unique-1) for _ in range(length)]
Next, we need to create the corresponding output sequence given the source sequence.
To keep thing simple, we will select the first n elements of the source sequence as the target sequence and reverse them.
# define target sequence
target = source[:n_out]
target.reverse()
# define target sequence
target = source[:n_out]
target.reverse()
We also need a version of the output sequence shifted forward by one time step that we can use as the mock target generated so far, including the start of sequence value in the first time step. We can create this from the target sequence directly.
# create padded input target sequence
target_in = [0] + target[:-1]
# create padded input target sequence
target_in = [0] + target[:-1]
Now that all of the sequences have been defined, we can one-hot encode them, i.e. transform them into sequences of binary vectors. We can use the Keras built in to_categorical() function to achieve this.
We can put all of this into a function named get_dataset() that will generate a specific number of sequences that we can use to train a model.
# prepare data for the LSTM
def get_dataset(n_in, n_out, cardinality, n_samples):
X1, X2, y = list(), list(), list()
for _ in range(n_samples):
# generate source sequence
source = generate_sequence(n_in, cardinality)
# define target sequence
target = source[:n_out]
target.reverse()
# create padded input target sequence
target_in = [0] + target[:-1]
# encode
src_encoded = to_categorical([source], num_classes=cardinality)
tar_encoded = to_categorical([target], num_classes=cardinality)
tar2_encoded = to_categorical([target_in], num_classes=cardinality)
# store
X1.append(src_encoded)
X2.append(tar2_encoded)
y.append(tar_encoded)
return array(X1), array(X2), array(y)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# prepare data for the LSTM
def get_dataset(n_in, n_out, cardinality, n_samples):
X1, X2, y = list(), list(), list()
for _ in range(n_samples):
# generate source sequence
source = generate_sequence(n_in, cardinality)
# define target sequence
target = source[:n_out]
target.reverse()
# create padded input target sequence
target_in = [0] + target[:-1]
# encode
src_encoded = to_categorical([source], num_classes=cardinality)
tar_encoded = to_categorical([target], num_classes=cardinality)
tar2_encoded = to_categorical([target_in], num_classes=cardinality)
# store
X1.append(src_encoded)
X2.append(tar2_encoded)
y.append(tar_encoded)
return array(X1), array(X2), array(y)
Finally, we need to be able to decode a one-hot encoded sequence to make it readable again.
This is needed for both printing the generated target sequences but also for easily comparing whether the full predicted target sequence matches the expected target sequence. The one_hot_decode() function will decode an encoded sequence.
# decode a one hot encoded string
def one_hot_decode(encoded_seq):
return [argmax(vector) for vector in encoded_seq]
# decode a one hot encoded string
def one_hot_decode(encoded_seq):
return [argmax(vector) for vector in encoded_seq]
We can tie all of this together and test these functions.
A complete worked example is listed below.
from random import randint
from numpy import array
from numpy import argmax
from keras.utils import to_categorical
# generate a sequence of random integers
def generate_sequence(length, n_unique):
return [randint(1, n_unique-1) for _ in range(length)]
# prepare data for the LSTM
def get_dataset(n_in, n_out, cardinality, n_samples):
X1, X2, y = list(), list(), list()
for _ in range(n_samples):
# generate source sequence
source = generate_sequence(n_in, cardinality)
# define target sequence
target = source[:n_out]
target.reverse()
# create padded input target sequence
target_in = [0] + target[:-1]
# encode
src_encoded = to_categorical([source], num_classes=cardinality)
tar_encoded = to_categorical([target], num_classes=cardinality)
tar2_encoded = to_categorical([target_in], num_classes=cardinality)
# store
X1.append(src_encoded)
X2.append(tar2_encoded)
y.append(tar_encoded)
return array(X1), array(X2), array(y)
# decode a one hot encoded string
def one_hot_decode(encoded_seq):
return [argmax(vector) for vector in encoded_seq]
# configure problem
n_features = 50 + 1
n_steps_in = 6
n_steps_out = 3
# generate a single source and target sequence
X1, X2, y = get_dataset(n_steps_in, n_steps_out, n_features, 1)
print(X1.shape, X2.shape, y.shape)
print(‘X1=%s, X2=%s, y=%s’ % (one_hot_decode(X1[0]), one_hot_decode(X2[0]), one_hot_decode(y[0])))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
from random import randint
from numpy import array
from numpy import argmax
from keras.utils import to_categorical
# generate a sequence of random integers
def generate_sequence(length, n_unique):
return [randint(1, n_unique-1) for _ in range(length)]
# prepare data for the LSTM
def get_dataset(n_in, n_out, cardinality, n_samples):
X1, X2, y = list(), list(), list()
for _ in range(n_samples):
# generate source sequence
source = generate_sequence(n_in, cardinality)
# define target sequence
target = source[:n_out]
target.reverse()
# create padded input target sequence
target_in = [0] + target[:-1]
# encode
src_encoded = to_categorical([source], num_classes=cardinality)
tar_encoded = to_categorical([target], num_classes=cardinality)
tar2_encoded = to_categorical([target_in], num_classes=cardinality)
# store
X1.append(src_encoded)
X2.append(tar2_encoded)
y.append(tar_encoded)
return array(X1), array(X2), array(y)
# decode a one hot encoded string
def one_hot_decode(encoded_seq):
return [argmax(vector) for vector in encoded_seq]
# configure problem
n_features = 50 + 1
n_steps_in = 6
n_steps_out = 3
# generate a single source and target sequence
X1, X2, y = get_dataset(n_steps_in, n_steps_out, n_features, 1)
print(X1.shape, X2.shape, y.shape)
print(‘X1=%s, X2=%s, y=%s’ % (one_hot_decode(X1[0]), one_hot_decode(X2[0]), one_hot_decode(y[0])))
Running the example first prints the shape of the generated dataset, ensuring the 3D shape required to train the model matches our expectations.
The generated sequence is then decoded and printed to screen demonstrating both that the preparation of source and target sequences matches our intention and that the decode operation is working.
(1, 6, 51) (1, 3, 51) (1, 3, 51)
X1=[32, 16, 12, 34, 25, 24], X2=[0, 12, 16], y=[12, 16, 32]
(1, 6, 51) (1, 3, 51) (1, 3, 51)
X1=[32, 16, 12, 34, 25, 24], X2=[0, 12, 16], y=[12, 16, 32]
We are now ready to develop a model for this sequence-to-sequence prediction problem.
Encoder-Decoder LSTM for Sequence Prediction
In this section, we will apply the encoder-decoder LSTM model developed in the first section to the sequence-to-sequence prediction problem developed in the second section.
The first step is to configure the problem.
# configure problem
n_features = 50 + 1
n_steps_in = 6
n_steps_out = 3
# configure problem
n_features = 50 + 1
n_steps_in = 6
n_steps_out = 3
Next, we must define the models and compile the training model.
# define model
train, infenc, infdec = define_models(n_features, n_features, 128)
train.compile(optimizer=’adam’, loss=’categorical_crossentropy’, metrics=[‘accuracy’])
# define model
train, infenc, infdec = define_models(n_features, n_features, 128)
train.compile(optimizer=’adam’, loss=’categorical_crossentropy’, metrics=[‘accuracy’])
Next, we can generate a training dataset of 100,000 examples and train the model.
# generate training dataset
X1, X2, y = get_dataset(n_steps_in, n_steps_out, n_features, 100000)
print(X1.shape,X2.shape,y.shape)
# train model
train.fit([X1, X2], y, epochs=1)
# generate training dataset
X1, X2, y = get_dataset(n_steps_in, n_steps_out, n_features, 100000)
print(X1.shape,X2.shape,y.shape)
# train model
train.fit([X1, X2], y, epochs=1)
Once the model is trained, we can evaluate it. We will do this by making predictions for 100 source sequences and counting the number of target sequences that were predicted correctly. We will use the numpy array_equal() function on the decoded sequences to check for equality.
# evaluate LSTM
total, correct = 100, 0
for _ in range(total):
X1, X2, y = get_dataset(n_steps_in, n_steps_out, n_features, 1)
target = predict_sequence(infenc, infdec, X1, n_steps_out, n_features)
if array_equal(one_hot_decode(y[0]), one_hot_decode(target)):
correct += 1
print(‘Accuracy: %.2f%%’ % (float(correct)/float(total)*100.0))
# evaluate LSTM
total, correct = 100, 0
for _ in range(total):
X1, X2, y = get_dataset(n_steps_in, n_steps_out, n_features, 1)
target = predict_sequence(infenc, infdec, X1, n_steps_out, n_features)
if array_equal(one_hot_decode(y[0]), one_hot_decode(target)):
correct += 1
print(‘Accuracy: %.2f%%’ % (float(correct)/float(total)*100.0))
Finally, we will generate some predictions and print the decoded source, target, and predicted target sequences to get an idea of whether the model is working as expected.
Putting all of these elements together, the complete code example is listed below.
from random import randint
from numpy import array
from numpy import argmax
from numpy import array_equal
from keras.utils import to_categorical
from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from keras.layers import Dense
# generate a sequence of random integers
def generate_sequence(length, n_unique):
return [randint(1, n_unique-1) for _ in range(length)]
# prepare data for the LSTM
def get_dataset(n_in, n_out, cardinality, n_samples):
X1, X2, y = list(), list(), list()
for _ in range(n_samples):
# generate source sequence
source = generate_sequence(n_in, cardinality)
# define padded target sequence
target = source[:n_out]
target.reverse()
# create padded input target sequence
target_in = [0] + target[:-1]
# encode
src_encoded = to_categorical([source], num_classes=cardinality)
tar_encoded = to_categorical([target], num_classes=cardinality)
tar2_encoded = to_categorical([target_in], num_classes=cardinality)
# store
X1.append(src_encoded)
X2.append(tar2_encoded)
y.append(tar_encoded)
return array(X1), array(X2), array(y)
# returns train, inference_encoder and inference_decoder models
def define_models(n_input, n_output, n_units):
# define training encoder
encoder_inputs = Input(shape=(None, n_input))
encoder = LSTM(n_units, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]
# define training decoder
decoder_inputs = Input(shape=(None, n_output))
decoder_lstm = LSTM(n_units, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(n_output, activation=’softmax’)
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
# define inference encoder
encoder_model = Model(encoder_inputs, encoder_states)
# define inference decoder
decoder_state_input_h = Input(shape=(n_units,))
decoder_state_input_c = Input(shape=(n_units,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)
# return all models
return model, encoder_model, decoder_model
# generate target given source sequence
def predict_sequence(infenc, infdec, source, n_steps, cardinality):
# encode
state = infenc.predict(source)
# start of sequence input
target_seq = array([0.0 for _ in range(cardinality)]).reshape(1, 1, cardinality)
# collect predictions
output = list()
for t in range(n_steps):
# predict next char
yhat, h, c = infdec.predict([target_seq] + state)
# store prediction
output.append(yhat[0,0,:])
# update state
state = [h, c]
# update target sequence
target_seq = yhat
return array(output)
# decode a one hot encoded string
def one_hot_decode(encoded_seq):
return [argmax(vector) for vector in encoded_seq]
# configure problem
n_features = 50 + 1
n_steps_in = 6
n_steps_out = 3
# define model
train, infenc, infdec = define_models(n_features, n_features, 128)
train.compile(optimizer=’adam’, loss=’categorical_crossentropy’, metrics=[‘accuracy’])
# generate training dataset
X1, X2, y = get_dataset(n_steps_in, n_steps_out, n_features, 100000)
print(X1.shape,X2.shape,y.shape)
# train model
train.fit([X1, X2], y, epochs=1)
# evaluate LSTM
total, correct = 100, 0
for _ in range(total):
X1, X2, y = get_dataset(n_steps_in, n_steps_out, n_features, 1)
target = predict_sequence(infenc, infdec, X1, n_steps_out, n_features)
if array_equal(one_hot_decode(y[0]), one_hot_decode(target)):
correct += 1
print(‘Accuracy: %.2f%%’ % (float(correct)/float(total)*100.0))
# spot check some examples
for _ in range(10):
X1, X2, y = get_dataset(n_steps_in, n_steps_out, n_features, 1)
target = predict_sequence(infenc, infdec, X1, n_steps_out, n_features)
print(‘X=%s y=%s, yhat=%s’ % (one_hot_decode(X1[0]), one_hot_decode(y[0]), one_hot_decode(target)))
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
from random import randint
from numpy import array
from numpy import argmax
from numpy import array_equal
from keras.utils import to_categorical
from keras.models import Model
from keras.layers import Input
from keras.layers import LSTM
from keras.layers import Dense
# generate a sequence of random integers
def generate_sequence(length, n_unique):
return [randint(1, n_unique-1) for _ in range(length)]
# prepare data for the LSTM
def get_dataset(n_in, n_out, cardinality, n_samples):
X1, X2, y = list(), list(), list()
for _ in range(n_samples):
# generate source sequence
source = generate_sequence(n_in, cardinality)
# define padded target sequence
target = source[:n_out]
target.reverse()
# create padded input target sequence
target_in = [0] + target[:-1]
# encode
src_encoded = to_categorical([source], num_classes=cardinality)
tar_encoded = to_categorical([target], num_classes=cardinality)
tar2_encoded = to_categorical([target_in], num_classes=cardinality)
# store
X1.append(src_encoded)
X2.append(tar2_encoded)
y.append(tar_encoded)
return array(X1), array(X2), array(y)
# returns train, inference_encoder and inference_decoder models
def define_models(n_input, n_output, n_units):
# define training encoder
encoder_inputs = Input(shape=(None, n_input))
encoder = LSTM(n_units, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]
# define training decoder
decoder_inputs = Input(shape=(None, n_output))
decoder_lstm = LSTM(n_units, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
decoder_dense = Dense(n_output, activation=’softmax’)
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
# define inference encoder
encoder_model = Model(encoder_inputs, encoder_states)
# define inference decoder
decoder_state_input_h = Input(shape=(n_units,))
decoder_state_input_c = Input(shape=(n_units,))
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
decoder_outputs, state_h, state_c = decoder_lstm(decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_model = Model([decoder_inputs] + decoder_states_inputs, [decoder_outputs] + decoder_states)
# return all models
return model, encoder_model, decoder_model
# generate target given source sequence
def predict_sequence(infenc, infdec, source, n_steps, cardinality):
# encode
state = infenc.predict(source)
# start of sequence input
target_seq = array([0.0 for _ in range(cardinality)]).reshape(1, 1, cardinality)
# collect predictions
output = list()
for t in range(n_steps):
# predict next char
yhat, h, c = infdec.predict([target_seq] + state)
# store prediction
output.append(yhat[0,0,:])
# update state
state = [h, c]
# update target sequence
target_seq = yhat
return array(output)
# decode a one hot encoded string
def one_hot_decode(encoded_seq):
return [argmax(vector) for vector in encoded_seq]
# configure problem
n_features = 50 + 1
n_steps_in = 6
n_steps_out = 3
# define model
train, infenc, infdec = define_models(n_features, n_features, 128)
train.compile(optimizer=’adam’, loss=’categorical_crossentropy’, metrics=[‘accuracy’])
# generate training dataset
X1, X2, y = get_dataset(n_steps_in, n_steps_out, n_features, 100000)
print(X1.shape,X2.shape,y.shape)
# train model
train.fit([X1, X2], y, epochs=1)
# evaluate LSTM
total, correct = 100, 0
for _ in range(total):
X1, X2, y = get_dataset(n_steps_in, n_steps_out, n_features, 1)
target = predict_sequence(infenc, infdec, X1, n_steps_out, n_features)
if array_equal(one_hot_decode(y[0]), one_hot_decode(target)):
correct += 1
print(‘Accuracy: %.2f%%’ % (float(correct)/float(total)*100.0))
# spot check some examples
for _ in range(10):
X1, X2, y = get_dataset(n_steps_in, n_steps_out, n_features, 1)
target = predict_sequence(infenc, infdec, X1, n_steps_out, n_features)
print(‘X=%s y=%s, yhat=%s’ % (one_hot_decode(X1[0]), one_hot_decode(y[0]), one_hot_decode(target)))
Running the example first prints the shape of the prepared dataset.
(100000, 6, 51) (100000, 3, 51) (100000, 3, 51)
(100000, 6, 51) (100000, 3, 51) (100000, 3, 51)
Next, the model is fit. You should see a progress bar and the run should take less than one minute on a modern multi-core CPU.
100000/100000 [==============================] – 50s – loss: 0.6344 – acc: 0.7968
100000/100000 [==============================] – 50s – loss: 0.6344 – acc: 0.7968
Next, the model is evaluated and the accuracy printed. We can see that the model achieves 100% accuracy on new randomly generated examples.
Finally, 10 new examples are generated and target sequences are predicted. Again, we can see that the model correctly predicts the output sequence in each case and the expected value matches the reversed first 3 elements of the source sequences.
X=[22, 17, 23, 5, 29, 11] y=[23, 17, 22], yhat=[23, 17, 22]
X=[28, 2, 46, 12, 21, 6] y=[46, 2, 28], yhat=[46, 2, 28]
X=[12, 20, 45, 28, 18, 42] y=[45, 20, 12], yhat=[45, 20, 12]
X=[3, 43, 45, 4, 33, 27] y=[45, 43, 3], yhat=[45, 43, 3]
X=[34, 50, 21, 20, 11, 6] y=[21, 50, 34], yhat=[21, 50, 34]
X=[47, 42, 14, 2, 31, 6] y=[14, 42, 47], yhat=[14, 42, 47]
X=[20, 24, 34, 31, 37, 25] y=[34, 24, 20], yhat=[34, 24, 20]
X=[4, 35, 15, 14, 47, 33] y=[15, 35, 4], yhat=[15, 35, 4]
X=[20, 28, 21, 39, 5, 25] y=[21, 28, 20], yhat=[21, 28, 20]
X=[50, 38, 17, 25, 31, 48] y=[17, 38, 50], yhat=[17, 38, 50]
X=[22, 17, 23, 5, 29, 11] y=[23, 17, 22], yhat=[23, 17, 22]
X=[28, 2, 46, 12, 21, 6] y=[46, 2, 28], yhat=[46, 2, 28]
X=[12, 20, 45, 28, 18, 42] y=[45, 20, 12], yhat=[45, 20, 12]
X=[3, 43, 45, 4, 33, 27] y=[45, 43, 3], yhat=[45, 43, 3]
X=[34, 50, 21, 20, 11, 6] y=[21, 50, 34], yhat=[21, 50, 34]
X=[47, 42, 14, 2, 31, 6] y=[14, 42, 47], yhat=[14, 42, 47]
X=[20, 24, 34, 31, 37, 25] y=[34, 24, 20], yhat=[34, 24, 20]
X=[4, 35, 15, 14, 47, 33] y=[15, 35, 4], yhat=[15, 35, 4]
X=[20, 28, 21, 39, 5, 25] y=[21, 28, 20], yhat=[21, 28, 20]
X=[50, 38, 17, 25, 31, 48] y=[17, 38, 50], yhat=[17, 38, 50]
You now have a template for an encoder-decoder LSTM model that you can apply to your own sequence-to-sequence prediction problems.
Further Reading
This section provides more resources on the topic if you are looking to go deeper.
Related Posts
Keras Resources
Summary
In this tutorial, you discovered how to develop an encoder-decoder recurrent neural network for sequence-to-sequence prediction problems with Keras.
Specifically, you learned:
- How to correctly define a sophisticated encoder-decoder model in Keras for sequence-to-sequence prediction.
- How to define a contrived yet scalable sequence-to-sequence prediction problem that you can use to evaluate the encoder-decoder LSTM model.
- How to apply the encoder-decoder LSTM model in Keras to address the scalable integer sequence-to-sequence prediction problem.
Do you have any questions?
Ask your questions in the comments below and I will do my best to answer.
Develop LSTMs for Sequence Prediction Today!
Develop Your Own LSTM models in Minutes
…with just a few lines of python code
Discover how in my new Ebook:
Long Short-Term Memory Networks with Python
It provides self-study tutorials on topics like:
CNN LSTMs, Encoder-Decoder LSTMs, generative models, data preparation, making predictions and much more…
Finally Bring LSTM Recurrent Neural Networks to
Your Sequence Predictions Projects
Skip the Academics. Just Results.
See What’s Inside