Skip to content
Search
Generic filters
Exact matches only

Handwritten Digit Recognition using Convolutional Neural Networks in Python with Keras

Last Updated on September 13, 2019

A popular demonstration of the capability of deep learning techniques is object recognition in image data.

The “hello world” of object recognition for machine learning and deep learning is the MNIST dataset for handwritten digit recognition.

In this post you will discover how to develop a deep learning model to achieve near state of the art performance on the MNIST handwritten digit recognition task in Python using the Keras deep learning library.

After completing this tutorial, you will know:

  • How to load the MNIST dataset in Keras.
  • How to develop and evaluate a baseline neural network model for the MNIST problem.
  • How to implement and evaluate a simple Convolutional Neural Network for MNIST.
  • How to implement a close to state-of-the-art deep learning model for MNIST.

Discover how to develop deep learning models for a range of predictive modeling problems with just a few lines of code in my new book, with 18 step-by-step tutorials and 9 projects.

Let’s get started.

  • Update Oct/2016: Updated for Keras 1.1.0, TensorFlow 0.10.0 and scikit-learn v0.18.
  • Update Mar/2017: Updated for Keras 2.0.2, TensorFlow 1.0.1 and Theano 0.9.0.
  • Update Sep/2019: Updated for Keras 2.2.5 API.

Note, for an extended version of this tutorial see:

Handwritten Digit Recognition using Convolutional Neural Networks in Python with Keras

Handwritten Digit Recognition using Convolutional Neural Networks in Python with Keras
Photo by Jamie, some rights reserved.

Description of the MNIST Handwritten Digit Recognition Problem

The MNIST problem is a dataset developed by Yann LeCun, Corinna Cortes and Christopher Burges for evaluating machine learning models on the handwritten digit classification problem.

The dataset was constructed from a number of scanned document dataset available from the National Institute of Standards and Technology (NIST). This is where the name for the dataset comes from, as the Modified NIST or MNIST dataset.

Images of digits were taken from a variety of scanned documents, normalized in size and centered. This makes it an excellent dataset for evaluating models, allowing the developer to focus on the machine learning with very little data cleaning or preparation required.

Each image is a 28 by 28 pixel square (784 pixels total). A standard split of the dataset is used to evaluate and compare models, where 60,000 images are used to train a model and a separate set of 10,000 images are used to test it.

It is a digit recognition task. As such there are 10 digits (0 to 9) or 10 classes to predict. Results are reported using prediction error, which is nothing more than the inverted classification accuracy.

Excellent results achieve a prediction error of less than 1%. State-of-the-art prediction error of approximately 0.2% can be achieved with large Convolutional Neural Networks. There is a listing of the state-of-the-art results and links to the relevant papers on the MNIST and other datasets on Rodrigo Benenson’s webpage.

Need help with Deep Learning in Python?

Take my free 2-week email course and discover MLPs, CNNs and LSTMs (with code).

Click to sign-up now and also get a free PDF Ebook version of the course.

Start Your FREE Mini-Course Now!

Loading the MNIST dataset in Keras

The Keras deep learning library provides a convenience method for loading the MNIST dataset.

The dataset is downloaded automatically the first time this function is called and is stored in your home directory in ~/.keras/datasets/mnist.pkl.gz as a 15MB file.

This is very handy for developing and testing deep learning models.

To demonstrate how easy it is to load the MNIST dataset, we will first write a little script to download and visualize the first 4 images in the training dataset.

# Plot ad hoc mnist instances
from keras.datasets import mnist
import matplotlib.pyplot as plt
# load (downloaded if needed) the MNIST dataset
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# plot 4 images as gray scale
plt.subplot(221)
plt.imshow(X_train[0], cmap=plt.get_cmap(‘gray’))
plt.subplot(222)
plt.imshow(X_train[1], cmap=plt.get_cmap(‘gray’))
plt.subplot(223)
plt.imshow(X_train[2], cmap=plt.get_cmap(‘gray’))
plt.subplot(224)
plt.imshow(X_train[3], cmap=plt.get_cmap(‘gray’))
# show the plot
plt.show()

# Plot ad hoc mnist instances

from keras.datasets import mnist

import matplotlib.pyplot as plt

# load (downloaded if needed) the MNIST dataset

(X_train, y_train), (X_test, y_test) = mnist.load_data()

# plot 4 images as gray scale

plt.subplot(221)

plt.imshow(X_train[0], cmap=plt.get_cmap(‘gray’))

plt.subplot(222)

plt.imshow(X_train[1], cmap=plt.get_cmap(‘gray’))

plt.subplot(223)

plt.imshow(X_train[2], cmap=plt.get_cmap(‘gray’))

plt.subplot(224)

plt.imshow(X_train[3], cmap=plt.get_cmap(‘gray’))

# show the plot

plt.show()

You can see that downloading and loading the MNIST dataset is as easy as calling the mnist.load_data() function. Running the above example, you should see the image below.

Examples from the MNIST dataset

Examples from the MNIST dataset

Baseline Model with Multi-Layer Perceptrons

Do we really need a complex model like a convolutional neural network to get the best results with MNIST?

You can get very good results using a very simple neural network model with a single hidden layer. In this section we will create a simple multi-layer perceptron model that achieves an error rate of 1.74%. We will use this as a baseline for comparing more complex convolutional neural network models.

Let’s start off by importing the classes and functions we will need.

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.utils import np_utils

from keras.datasets import mnist

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import Dropout

from keras.utils import np_utils

Now we can load the MNIST dataset using the Keras helper function.


# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# load data

(X_train, y_train), (X_test, y_test) = mnist.load_data()

The training dataset is structured as a 3-dimensional array of instance, image width and image height. For a multi-layer perceptron model we must reduce the images down into a vector of pixels. In this case the 28×28 sized images will be 784 pixel input values.

We can do this transform easily using the reshape() function on the NumPy array. We can also reduce our memory requirements by forcing the precision of the pixel values to be 32 bit, the default precision used by Keras anyway.


# flatten 28*28 images to a 784 vector for each image
num_pixels = X_train.shape[1] * X_train.shape[2]
X_train = X_train.reshape((X_train.shape[0], num_pixels)).astype(‘float32’)
X_test = X_test.reshape((X_test.shape[0], num_pixels)).astype(‘float32’)

# flatten 28*28 images to a 784 vector for each image

num_pixels = X_train.shape[1] * X_train.shape[2]

X_train = X_train.reshape((X_train.shape[0], num_pixels)).astype(‘float32’)

X_test = X_test.reshape((X_test.shape[0], num_pixels)).astype(‘float32’)

The pixel values are gray scale between 0 and 255. It is almost always a good idea to perform some scaling of input values when using neural network models. Because the scale is well known and well behaved, we can very quickly normalize the pixel values to the range 0 and 1 by dividing each value by the maximum of 255.


# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255

# normalize inputs from 0-255 to 0-1

X_train = X_train / 255

X_test = X_test / 255

Finally, the output variable is an integer from 0 to 9. This is a multi-class classification problem. As such, it is good practice to use a one hot encoding of the class values, transforming the vector of class integers into a binary matrix.

We can easily do this using the built-in np_utils.to_categorical() helper function in Keras.


# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]

# one hot encode outputs

y_train = np_utils.to_categorical(y_train)

y_test = np_utils.to_categorical(y_test)

num_classes = y_test.shape[1]

We are now ready to create our simple neural network model. We will define our model in a function. This is handy if you want to extend the example later and try and get a better score.


# define baseline model
def baseline_model():
# create model
model = Sequential()
model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer=’normal’, activation=’relu’))
model.add(Dense(num_classes, kernel_initializer=’normal’, activation=’softmax’))
# Compile model
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
return model

# define baseline model

def baseline_model():

# create model

model = Sequential()

model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer=’normal’, activation=’relu’))

model.add(Dense(num_classes, kernel_initializer=’normal’, activation=’softmax’))

# Compile model

model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

return model

The model is a simple neural network with one hidden layer with the same number of neurons as there are inputs (784). A rectifier activation function is used for the neurons in the hidden layer.

A softmax activation function is used on the output layer to turn the outputs into probability-like values and allow one class of the 10 to be selected as the model’s output prediction. Logarithmic loss is used as the loss function (called categorical_crossentropy in Keras) and the efficient ADAM gradient descent algorithm is used to learn the weights.

We can now fit and evaluate the model. The model is fit over 10 epochs with updates every 200 images. The test data is used as the validation dataset, allowing you to see the skill of the model as it trains. A verbose value of 2 is used to reduce the output to one line for each training epoch.

Finally, the test dataset is used to evaluate the model and a classification error rate is printed.


# build the model
model = baseline_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print(“Baseline Error: %.2f%%” % (100-scores[1]*100))

# build the model

model = baseline_model()

# Fit the model

model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)

# Final evaluation of the model

scores = model.evaluate(X_test, y_test, verbose=0)

print(“Baseline Error: %.2f%%” % (100-scores[1]*100))

Tying this all together, the complete code listing is provided below.

# Baseline MLP for MNIST dataset
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import np_utils
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# flatten 28*28 images to a 784 vector for each image
num_pixels = X_train.shape[1] * X_train.shape[2]
X_train = X_train.reshape((X_train.shape[0], num_pixels)).astype(‘float32’)
X_test = X_test.reshape((X_test.shape[0], num_pixels)).astype(‘float32′)
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
# define baseline model
def baseline_model():
# create model
model = Sequential()
model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer=’normal’, activation=’relu’))
model.add(Dense(num_classes, kernel_initializer=’normal’, activation=’softmax’))
# Compile model
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
return model
# build the model
model = baseline_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print(“Baseline Error: %.2f%%” % (100-scores[1]*100))

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

# Baseline MLP for MNIST dataset

from keras.datasets import mnist

from keras.models import Sequential

from keras.layers import Dense

from keras.utils import np_utils

# load data

(X_train, y_train), (X_test, y_test) = mnist.load_data()

# flatten 28*28 images to a 784 vector for each image

num_pixels = X_train.shape[1] * X_train.shape[2]

X_train = X_train.reshape((X_train.shape[0], num_pixels)).astype(‘float32’)

X_test = X_test.reshape((X_test.shape[0], num_pixels)).astype(‘float32’)

# normalize inputs from 0-255 to 0-1

X_train = X_train / 255

X_test = X_test / 255

# one hot encode outputs

y_train = np_utils.to_categorical(y_train)

y_test = np_utils.to_categorical(y_test)

num_classes = y_test.shape[1]

# define baseline model

def baseline_model():

# create model

model = Sequential()

model.add(Dense(num_pixels, input_dim=num_pixels, kernel_initializer=’normal’, activation=’relu’))

model.add(Dense(num_classes, kernel_initializer=’normal’, activation=’softmax’))

# Compile model

model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

return model

# build the model

model = baseline_model()

# Fit the model

model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)

# Final evaluation of the model

scores = model.evaluate(X_test, y_test, verbose=0)

print(“Baseline Error: %.2f%%” % (100-scores[1]*100))

Running the example might take a few minutes when run on a CPU.

Note: Your specific results may vary given the stochastic nature of the learning algorithm. Consider running the example a few times.

You should see the output below. This very simple network defined in very few lines of code achieves a respectable error rate of 1.71%.

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
– 2s – loss: 0.2754 – acc: 0.9231 – val_loss: 0.1339 – val_acc: 0.9600
Epoch 2/10
– 2s – loss: 0.1089 – acc: 0.9684 – val_loss: 0.0935 – val_acc: 0.9717
Epoch 3/10
– 2s – loss: 0.0710 – acc: 0.9794 – val_loss: 0.0866 – val_acc: 0.9743
Epoch 4/10
– 2s – loss: 0.0496 – acc: 0.9854 – val_loss: 0.0732 – val_acc: 0.9766
Epoch 5/10
– 2s – loss: 0.0358 – acc: 0.9900 – val_loss: 0.0634 – val_acc: 0.9798
Epoch 6/10
– 2s – loss: 0.0257 – acc: 0.9933 – val_loss: 0.0597 – val_acc: 0.9826
Epoch 7/10
– 2s – loss: 0.0192 – acc: 0.9955 – val_loss: 0.0626 – val_acc: 0.9802
Epoch 8/10
– 2s – loss: 0.0142 – acc: 0.9969 – val_loss: 0.0612 – val_acc: 0.9817
Epoch 9/10
– 2s – loss: 0.0107 – acc: 0.9981 – val_loss: 0.0573 – val_acc: 0.9825
Epoch 10/10
– 2s – loss: 0.0081 – acc: 0.9985 – val_loss: 0.0558 – val_acc: 0.9829
Baseline Error: 1.71%

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

Train on 60000 samples, validate on 10000 samples

Epoch 1/10

– 2s – loss: 0.2754 – acc: 0.9231 – val_loss: 0.1339 – val_acc: 0.9600

Epoch 2/10

– 2s – loss: 0.1089 – acc: 0.9684 – val_loss: 0.0935 – val_acc: 0.9717

Epoch 3/10

– 2s – loss: 0.0710 – acc: 0.9794 – val_loss: 0.0866 – val_acc: 0.9743

Epoch 4/10

– 2s – loss: 0.0496 – acc: 0.9854 – val_loss: 0.0732 – val_acc: 0.9766

Epoch 5/10

– 2s – loss: 0.0358 – acc: 0.9900 – val_loss: 0.0634 – val_acc: 0.9798

Epoch 6/10

– 2s – loss: 0.0257 – acc: 0.9933 – val_loss: 0.0597 – val_acc: 0.9826

Epoch 7/10

– 2s – loss: 0.0192 – acc: 0.9955 – val_loss: 0.0626 – val_acc: 0.9802

Epoch 8/10

– 2s – loss: 0.0142 – acc: 0.9969 – val_loss: 0.0612 – val_acc: 0.9817

Epoch 9/10

– 2s – loss: 0.0107 – acc: 0.9981 – val_loss: 0.0573 – val_acc: 0.9825

Epoch 10/10

– 2s – loss: 0.0081 – acc: 0.9985 – val_loss: 0.0558 – val_acc: 0.9829

Baseline Error: 1.71%

Simple Convolutional Neural Network for MNIST

Now that we have seen how to load the MNIST dataset and train a simple multi-layer perceptron model on it, it is time to develop a more sophisticated convolutional neural network or CNN model.

Keras does provide a lot of capability for creating convolutional neural networks.

In this section we will create a simple CNN for MNIST that demonstrates how to use all of the aspects of a modern CNN implementation, including Convolutional layers, Pooling layers and Dropout layers.

The first step is to import the classes and functions needed.

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils

from keras.datasets import mnist

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import Dropout

from keras.layers import Flatten

from keras.layers.convolutional import Conv2D

from keras.layers.convolutional import MaxPooling2D

from keras.utils import np_utils

Next we need to load the MNIST dataset and reshape it so that it is suitable for use training a CNN. In Keras, the layers used for two-dimensional convolutions expect pixel values with the dimensions [pixels][width][height][channels].

Note, we are forcing so-called channels-last ordering for consistency in this example.

In the case of RGB, the last dimension pixels would be 3 for the red, green and blue components and it would be like having 3 image inputs for every color image. In the case of MNIST where the pixel values are gray scale, the pixel dimension is set to 1.


# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape(X_train.shape[0], 28, 28, 1).astype(‘float32’)
X_test = X_test.reshape(X_test.shape[0], 28, 28, 1).astype(‘float32’)

# load data

(X_train, y_train), (X_test, y_test) = mnist.load_data()

# reshape to be [samples][width][height][channels]

X_train = X_train.reshape(X_train.shape[0], 28, 28, 1).astype(‘float32’)

X_test = X_test.reshape(X_test.shape[0], 28, 28, 1).astype(‘float32’)

As before, it is a good idea to normalize the pixel values to the range 0 and 1 and one hot encode the output variables.


# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]

# normalize inputs from 0-255 to 0-1

X_train = X_train / 255

X_test = X_test / 255

# one hot encode outputs

y_train = np_utils.to_categorical(y_train)

y_test = np_utils.to_categorical(y_test)

num_classes = y_test.shape[1]

Next we define our neural network model.

Convolutional neural networks are more complex than standard multi-layer perceptrons, so we will start by using a simple structure to begin with that uses all of the elements for state of the art results. Below summarizes the network architecture.

  1. The first hidden layer is a convolutional layer called a Convolution2D. The layer has 32 feature maps, which with the size of 5×5 and a rectifier activation function. This is the input layer, expecting images with the structure outline above [pixels][width][height].
  2. Next we define a pooling layer that takes the max called MaxPooling2D. It is configured with a pool size of 2×2.
  3. The next layer is a regularization layer using dropout called Dropout. It is configured to randomly exclude 20% of neurons in the layer in order to reduce overfitting.
  4. Next is a layer that converts the 2D matrix data to a vector called Flatten. It allows the output to be processed by standard fully connected layers.
  5. Next a fully connected layer with 128 neurons and rectifier activation function.
  6. Finally, the output layer has 10 neurons for the 10 classes and a softmax activation function to output probability-like predictions for each class.

As before, the model is trained using logarithmic loss and the ADAM gradient descent algorithm.


def baseline_model():
# create model
model = Sequential()
model.add(Conv2D(32, (5, 5), input_shape=(1, 28, 28), activation=’relu’))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation=’relu’))
model.add(Dense(num_classes, activation=’softmax’))
# Compile model
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
return model

def baseline_model():

# create model

model = Sequential()

model.add(Conv2D(32, (5, 5), input_shape=(1, 28, 28), activation=’relu’))

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Dropout(0.2))

model.add(Flatten())

model.add(Dense(128, activation=’relu’))

model.add(Dense(num_classes, activation=’softmax’))

# Compile model

model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

return model

We evaluate the model the same way as before with the multi-layer perceptron. The CNN is fit over 10 epochs with a batch size of 200.


# build the model
model = baseline_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print(“CNN Error: %.2f%%” % (100-scores[1]*100))

# build the model

model = baseline_model()

# Fit the model

model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200, verbose=2)

# Final evaluation of the model

scores = model.evaluate(X_test, y_test, verbose=0)

print(“CNN Error: %.2f%%” % (100-scores[1]*100))

Tying this all together, the complete example is listed below.

# Simple CNN for the MNIST Dataset
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1)).astype(‘float32’)
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1)).astype(‘float32′)
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
# define a simple CNN model
def baseline_model():
# create model
model = Sequential()
model.add(Conv2D(32, (5, 5), input_shape=(28, 28, 1), activation=’relu’))
model.add(MaxPooling2D())
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation=’relu’))
model.add(Dense(num_classes, activation=’softmax’))
# Compile model
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
return model
# build the model
model = baseline_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print(“CNN Error: %.2f%%” % (100-scores[1]*100))

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

# Simple CNN for the MNIST Dataset

from keras.datasets import mnist

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import Dropout

from keras.layers import Flatten

from keras.layers.convolutional import Conv2D

from keras.layers.convolutional import MaxPooling2D

from keras.utils import np_utils

# load data

(X_train, y_train), (X_test, y_test) = mnist.load_data()

# reshape to be [samples][width][height][channels]

X_train = X_train.reshape((X_train.shape[0], 28, 28, 1)).astype(‘float32’)

X_test = X_test.reshape((X_test.shape[0], 28, 28, 1)).astype(‘float32’)

# normalize inputs from 0-255 to 0-1

X_train = X_train / 255

X_test = X_test / 255

# one hot encode outputs

y_train = np_utils.to_categorical(y_train)

y_test = np_utils.to_categorical(y_test)

num_classes = y_test.shape[1]

# define a simple CNN model

def baseline_model():

# create model

model = Sequential()

model.add(Conv2D(32, (5, 5), input_shape=(28, 28, 1), activation=’relu’))

model.add(MaxPooling2D())

model.add(Dropout(0.2))

model.add(Flatten())

model.add(Dense(128, activation=’relu’))

model.add(Dense(num_classes, activation=’softmax’))

# Compile model

model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

return model

# build the model

model = baseline_model()

# Fit the model

model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200)

# Final evaluation of the model

scores = model.evaluate(X_test, y_test, verbose=0)

print(“CNN Error: %.2f%%” % (100-scores[1]*100))

Running the example, the accuracy on the training and validation test is printed each epoch and at the end of the classification error rate is printed.

Note: Your specific results may vary given the stochastic nature of the learning algorithm. Consider running the example a few times.

Epochs may take about 45 seconds to run on the GPU (e.g. on AWS). You can see that the network achieves an error rate of 0.95%, which is better than our simple multi-layer perceptron model above.

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] – 7s 120us/step – loss: 0.2495 – acc: 0.9275 – val_loss: 0.0854 – val_acc: 0.9737
Epoch 2/10
60000/60000 [==============================] – 7s 117us/step – loss: 0.0763 – acc: 0.9772 – val_loss: 0.0513 – val_acc: 0.9840
Epoch 3/10
60000/60000 [==============================] – 7s 119us/step – loss: 0.0548 – acc: 0.9836 – val_loss: 0.0431 – val_acc: 0.9857
Epoch 4/10
60000/60000 [==============================] – 7s 116us/step – loss: 0.0424 – acc: 0.9869 – val_loss: 0.0347 – val_acc: 0.9887
Epoch 5/10
60000/60000 [==============================] – 7s 116us/step – loss: 0.0350 – acc: 0.9894 – val_loss: 0.0411 – val_acc: 0.9873
Epoch 6/10
60000/60000 [==============================] – 7s 117us/step – loss: 0.0274 – acc: 0.9911 – val_loss: 0.0370 – val_acc: 0.9869
Epoch 7/10
60000/60000 [==============================] – 7s 117us/step – loss: 0.0238 – acc: 0.9919 – val_loss: 0.0339 – val_acc: 0.9888
Epoch 8/10
60000/60000 [==============================] – 7s 117us/step – loss: 0.0209 – acc: 0.9931 – val_loss: 0.0355 – val_acc: 0.9895
Epoch 9/10
60000/60000 [==============================] – 7s 117us/step – loss: 0.0179 – acc: 0.9944 – val_loss: 0.0300 – val_acc: 0.9906
Epoch 10/10
60000/60000 [==============================] – 7s 119us/step – loss: 0.0137 – acc: 0.9955 – val_loss: 0.0297 – val_acc: 0.9905
CNN Error: 0.95%

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

Train on 60000 samples, validate on 10000 samples

Epoch 1/10

60000/60000 [==============================] – 7s 120us/step – loss: 0.2495 – acc: 0.9275 – val_loss: 0.0854 – val_acc: 0.9737

Epoch 2/10

60000/60000 [==============================] – 7s 117us/step – loss: 0.0763 – acc: 0.9772 – val_loss: 0.0513 – val_acc: 0.9840

Epoch 3/10

60000/60000 [==============================] – 7s 119us/step – loss: 0.0548 – acc: 0.9836 – val_loss: 0.0431 – val_acc: 0.9857

Epoch 4/10

60000/60000 [==============================] – 7s 116us/step – loss: 0.0424 – acc: 0.9869 – val_loss: 0.0347 – val_acc: 0.9887

Epoch 5/10

60000/60000 [==============================] – 7s 116us/step – loss: 0.0350 – acc: 0.9894 – val_loss: 0.0411 – val_acc: 0.9873

Epoch 6/10

60000/60000 [==============================] – 7s 117us/step – loss: 0.0274 – acc: 0.9911 – val_loss: 0.0370 – val_acc: 0.9869

Epoch 7/10

60000/60000 [==============================] – 7s 117us/step – loss: 0.0238 – acc: 0.9919 – val_loss: 0.0339 – val_acc: 0.9888

Epoch 8/10

60000/60000 [==============================] – 7s 117us/step – loss: 0.0209 – acc: 0.9931 – val_loss: 0.0355 – val_acc: 0.9895

Epoch 9/10

60000/60000 [==============================] – 7s 117us/step – loss: 0.0179 – acc: 0.9944 – val_loss: 0.0300 – val_acc: 0.9906

Epoch 10/10

60000/60000 [==============================] – 7s 119us/step – loss: 0.0137 – acc: 0.9955 – val_loss: 0.0297 – val_acc: 0.9905

CNN Error: 0.95%

Larger Convolutional Neural Network for MNIST

Now that we have seen how to create a simple CNN, let’s take a look at a model capable of close to state of the art results.

We import classes and function then load and prepare the data the same as in the previous CNN example.

# Larger CNN for the MNIST Dataset
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1)).astype(‘float32’)
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1)).astype(‘float32’)
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

# Larger CNN for the MNIST Dataset

from keras.datasets import mnist

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import Dropout

from keras.layers import Flatten

from keras.layers.convolutional import Conv2D

from keras.layers.convolutional import MaxPooling2D

from keras.utils import np_utils

# load data

(X_train, y_train), (X_test, y_test) = mnist.load_data()

# reshape to be [samples][width][height][channels]

X_train = X_train.reshape((X_train.shape[0], 28, 28, 1)).astype(‘float32’)

X_test = X_test.reshape((X_test.shape[0], 28, 28, 1)).astype(‘float32’)

# normalize inputs from 0-255 to 0-1

X_train = X_train / 255

X_test = X_test / 255

# one hot encode outputs

y_train = np_utils.to_categorical(y_train)

y_test = np_utils.to_categorical(y_test)

num_classes = y_test.shape[1]

This time we define a large CNN architecture with additional convolutional, max pooling layers and fully connected layers. The network topology can be summarized as follows.

  1. Convolutional layer with 30 feature maps of size 5×5.
  2. Pooling layer taking the max over 2*2 patches.
  3. Convolutional layer with 15 feature maps of size 3×3.
  4. Pooling layer taking the max over 2*2 patches.
  5. Dropout layer with a probability of 20%.
  6. Flatten layer.
  7. Fully connected layer with 128 neurons and rectifier activation.
  8. Fully connected layer with 50 neurons and rectifier activation.
  9. Output layer.


# define the larger model
def larger_model():
# create model
model = Sequential()
model.add(Conv2D(30, (5, 5), input_shape=(1, 28, 28), activation=’relu’))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(15, (3, 3), activation=’relu’))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation=’relu’))
model.add(Dense(50, activation=’relu’))
model.add(Dense(num_classes, activation=’softmax’))
# Compile model
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
return model

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

# define the larger model

def larger_model():

# create model

model = Sequential()

model.add(Conv2D(30, (5, 5), input_shape=(1, 28, 28), activation=’relu’))

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Conv2D(15, (3, 3), activation=’relu’))

model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Dropout(0.2))

model.add(Flatten())

model.add(Dense(128, activation=’relu’))

model.add(Dense(50, activation=’relu’))

model.add(Dense(num_classes, activation=’softmax’))

# Compile model

model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

return model

Like the previous two experiments, the model is fit over 10 epochs with a batch size of 200.


# build the model
model = larger_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print(“Large CNN Error: %.2f%%” % (100-scores[1]*100))

# build the model

model = larger_model()

# Fit the model

model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200)

# Final evaluation of the model

scores = model.evaluate(X_test, y_test, verbose=0)

print(“Large CNN Error: %.2f%%” % (100-scores[1]*100))

Tying this all together, the complete example is listed below.

# Larger CNN for the MNIST Dataset
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils
# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][width][height][channels]
X_train = X_train.reshape((X_train.shape[0], 28, 28, 1)).astype(‘float32’)
X_test = X_test.reshape((X_test.shape[0], 28, 28, 1)).astype(‘float32′)
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
# define the larger model
def larger_model():
# create model
model = Sequential()
model.add(Conv2D(30, (5, 5), input_shape=(28, 28, 1), activation=’relu’))
model.add(MaxPooling2D())
model.add(Conv2D(15, (3, 3), activation=’relu’))
model.add(MaxPooling2D())
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(128, activation=’relu’))
model.add(Dense(50, activation=’relu’))
model.add(Dense(num_classes, activation=’softmax’))
# Compile model
model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])
return model
# build the model
model = larger_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print(“Large CNN Error: %.2f%%” % (100-scores[1]*100))

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

# Larger CNN for the MNIST Dataset

from keras.datasets import mnist

from keras.models import Sequential

from keras.layers import Dense

from keras.layers import Dropout

from keras.layers import Flatten

from keras.layers.convolutional import Conv2D

from keras.layers.convolutional import MaxPooling2D

from keras.utils import np_utils

# load data

(X_train, y_train), (X_test, y_test) = mnist.load_data()

# reshape to be [samples][width][height][channels]

X_train = X_train.reshape((X_train.shape[0], 28, 28, 1)).astype(‘float32’)

X_test = X_test.reshape((X_test.shape[0], 28, 28, 1)).astype(‘float32’)

# normalize inputs from 0-255 to 0-1

X_train = X_train / 255

X_test = X_test / 255

# one hot encode outputs

y_train = np_utils.to_categorical(y_train)

y_test = np_utils.to_categorical(y_test)

num_classes = y_test.shape[1]

# define the larger model

def larger_model():

# create model

model = Sequential()

model.add(Conv2D(30, (5, 5), input_shape=(28, 28, 1), activation=’relu’))

model.add(MaxPooling2D())

model.add(Conv2D(15, (3, 3), activation=’relu’))

model.add(MaxPooling2D())

model.add(Dropout(0.2))

model.add(Flatten())

model.add(Dense(128, activation=’relu’))

model.add(Dense(50, activation=’relu’))

model.add(Dense(num_classes, activation=’softmax’))

# Compile model

model.compile(loss=’categorical_crossentropy’, optimizer=’adam’, metrics=[‘accuracy’])

return model

# build the model

model = larger_model()

# Fit the model

model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=10, batch_size=200)

# Final evaluation of the model

scores = model.evaluate(X_test, y_test, verbose=0)

print(“Large CNN Error: %.2f%%” % (100-scores[1]*100))

Running the example prints accuracy on the training and validation datasets each epoch and a final classification error rate.

Note: Your specific results may vary given the stochastic nature of the learning algorithm. Consider running the example a few times.

The model takes about 100 seconds to run per epoch. This slightly larger model achieves the respectable classification error rate of 0.83%.

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
60000/60000 [==============================] – 9s 157us/step – loss: 0.3871 – acc: 0.8776 – val_loss: 0.0884 – val_acc: 0.9715
Epoch 2/10
60000/60000 [==============================] – 9s 154us/step – loss: 0.1028 – acc: 0.9681 – val_loss: 0.0571 – val_acc: 0.9829
Epoch 3/10
60000/60000 [==============================] – 9s 153us/step – loss: 0.0740 – acc: 0.9781 – val_loss: 0.0468 – val_acc: 0.9851
Epoch 4/10
60000/60000 [==============================] – 9s 154us/step – loss: 0.0624 – acc: 0.9804 – val_loss: 0.0339 – val_acc: 0.9886
Epoch 5/10
60000/60000 [==============================] – 10s 161us/step – loss: 0.0496 – acc: 0.9845 – val_loss: 0.0383 – val_acc: 0.9878
Epoch 6/10
60000/60000 [==============================] – 9s 153us/step – loss: 0.0466 – acc: 0.9849 – val_loss: 0.0284 – val_acc: 0.9906
Epoch 7/10
60000/60000 [==============================] – 8s 141us/step – loss: 0.0375 – acc: 0.9884 – val_loss: 0.0282 – val_acc: 0.9909
Epoch 8/10
60000/60000 [==============================] – 8s 141us/step – loss: 0.0348 – acc: 0.9887 – val_loss: 0.0276 – val_acc: 0.9903
Epoch 9/10
60000/60000 [==============================] – 9s 144us/step – loss: 0.0317 – acc: 0.9900 – val_loss: 0.0254 – val_acc: 0.9922
Epoch 10/10
60000/60000 [==============================] – 8s 139us/step – loss: 0.0295 – acc: 0.9907 – val_loss: 0.0265 – val_acc: 0.9917
Large CNN Error: 0.83%

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

Train on 60000 samples, validate on 10000 samples

Epoch 1/10

60000/60000 [==============================] – 9s 157us/step – loss: 0.3871 – acc: 0.8776 – val_loss: 0.0884 – val_acc: 0.9715

Epoch 2/10

60000/60000 [==============================] – 9s 154us/step – loss: 0.1028 – acc: 0.9681 – val_loss: 0.0571 – val_acc: 0.9829

Epoch 3/10

60000/60000 [==============================] – 9s 153us/step – loss: 0.0740 – acc: 0.9781 – val_loss: 0.0468 – val_acc: 0.9851

Epoch 4/10

60000/60000 [==============================] – 9s 154us/step – loss: 0.0624 – acc: 0.9804 – val_loss: 0.0339 – val_acc: 0.9886

Epoch 5/10

60000/60000 [==============================] – 10s 161us/step – loss: 0.0496 – acc: 0.9845 – val_loss: 0.0383 – val_acc: 0.9878

Epoch 6/10

60000/60000 [==============================] – 9s 153us/step – loss: 0.0466 – acc: 0.9849 – val_loss: 0.0284 – val_acc: 0.9906

Epoch 7/10

60000/60000 [==============================] – 8s 141us/step – loss: 0.0375 – acc: 0.9884 – val_loss: 0.0282 – val_acc: 0.9909

Epoch 8/10

60000/60000 [==============================] – 8s 141us/step – loss: 0.0348 – acc: 0.9887 – val_loss: 0.0276 – val_acc: 0.9903

Epoch 9/10

60000/60000 [==============================] – 9s 144us/step – loss: 0.0317 – acc: 0.9900 – val_loss: 0.0254 – val_acc: 0.9922

Epoch 10/10

60000/60000 [==============================] – 8s 139us/step – loss: 0.0295 – acc: 0.9907 – val_loss: 0.0265 – val_acc: 0.9917

Large CNN Error: 0.83%

This is not an optimized network topology. Nor is a reproduction of a network topology from a recent paper. There is a lot of opportunity for you to tune and improve upon this model.

What is the best error rate score you can achieve?

Post your configuration and best score in the comments.

Resources on MNIST

The MNIST dataset is very well studied. Below are some additional resources you might like to look into.

Summary

In this post you discovered the MNIST handwritten digit recognition problem and deep learning models developed in Python using the Keras library that are capable of achieving excellent results.

Working through this tutorial you learned:

  • How to load the MNIST dataset in Keras and generate plots of the dataset.
  • How to reshape the MNIST dataset and develop a simple but well performing multi-layer perceptron model on the problem.
  • How to use Keras to create convolutional neural network models for MNIST.
  • How to develop and evaluate larger CNN models for MNIST capable of near world class results.

Do you have any questions about handwriting recognition with deep learning or this post? Ask your question in the comments and I will do my best to answer.

Develop Deep Learning Projects with Python!

Deep Learning with Python

 What If You Could Develop A Network in Minutes

…with just a few lines of Python

Discover how in my new Ebook:
Deep Learning With Python

It covers end-to-end projects on topics like:
Multilayer Perceptrons, Convolutional Nets and Recurrent Neural Nets, and more…

Finally Bring Deep Learning To
Your Own Projects

Skip the Academics. Just Results.

See What’s Inside

error: Content is protected !!