Handwritten Digits Recognition


We’ll show you how Neptune machine learning platform improves your machine learning process.

We will adapt the code from the deep learning Keras library to utilize features of Neptune. The example consists of a single Python file using Keras to train and evaluate a convolutional neural network that recognizes handwritten digits.

Integration of the code with Neptune Client Library will allow us to run the code as a Neptune job. We will discuss the improvements in our work that can be introduced by Neptune and implement them in our job.

Dataset Information

Dataset: The MNIST database of handwritten digits.

Dataset size: 70000 examples (60000 examples in the training set and 10000 examples in the test set)

Dataset description: Images of handwritten digits with labels from 0 to 9.

Business purpose: Automatic recognition of handwritten digits from digital photos.

Data set credits: Yann LeCun, Courant Institute, NYU; Corinna Cortes, Google Labs, New York; Christopher J.C.Burges, Microsoft Research, Redmond.

Improvements Provided by Neptune

The code from the mnist_cnn example runs for over an hour when it’s executed on a machine with Intel® Core™ i7-5500U CPU and 16 GB of RAM. Neptune gives us the possibility to monitor the model’s performance during training. If the model doesn’t perform well enough, we can stop the job’s execution to try to improve the experiment or change the approach.

Easy parametrization of the code is another helpful feature. We could run multiple instances of the same experiment with different parameters to compare their performance.

Our job trains a model that classifies images of handwritten digits. In the validation phase, we could identify the digits that were classified incorrectly. We can then view them to see where our neural network needs enhancements.

Neptune helps us with all of these issues:

Let’s Start Editing the Code!

To run the code from this example, you need to have installed:

We need the base source file to start with. Let’s create a file called mnist_cnn_neptune.py. We will go through the original code emphasizing places where it should be modified to be integrated with Neptune.

If you want to download the code that is ready to run, it’s available on GitHub.


At first, we add imports of the required libraries:

from __future__ import print_function
import numpy as np
np.random.seed(1337)  # for reproducibility

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.utils import np_utils
from keras import backend as K

Besides the original imports we need:

from keras.callbacks import Callback
from deepsense import neptune
from PIL import Image
import time

Job Configuration

In the next step we need to create Context to enable communication from the job to Neptune. Once we have the Context object, we can configure several channels for model’s metrics:

ctx = neptune.Context()

# Channels to send metrics' values (loss and accuracy) at the end of every batch.
batch_train_loss_channel = ctx.job.create_channel(
batch_train_acc_channel = ctx.job.create_channel(

# Channels to send metrics' values (loss and accuracy) at the end of every epoch.
epoch_train_loss_channel = ctx.job.create_channel(
epoch_train_acc_channel = ctx.job.create_channel(

epoch_validation_loss_channel = ctx.job.create_channel(
epoch_validation_acc_channel = ctx.job.create_channel(

# A channel to send info about the progress of the job.
logging_channel = ctx.job.create_channel(

# A channel to send images of digits that were not recognized correctly.
false_predictions_channel = ctx.job.create_channel(

We use two kinds of metrics measuring the quality of the model: the value of the loss function and accuracy. At the end of every batch, we want to send the values of the training metrics. At the end of every epoch, we want to send the values of training and evaluation metrics. For each metric, we need to create a NUMERIC channel.

An additional TEXT channel will be used to log events such as: the end of a batch, the end of an epoch or the end of the validation phase.

An IMAGE Channel will be used to send images of digits that were not recognized correctly. The images will be sent at the end of every epoch. We will be able to browse them on our job’s dashboard.

In the next step, we need to declare charts that will be displayed on our job’s dashboard. We want to create two charts with a single series to display the values of batch metrics, and another two charts with two series to display the values of epoch metrics. The charts displaying values of epoch metrics will allow us to compare the values of training and validation metrics for consecutive epochs.

# Charts displaying training metrics' values updated at the end of every batch.
    name='Batch training loss',
        'training loss': batch_train_loss_channel

    name='Batch training accuracy',
        'training': batch_train_acc_channel

# Charts displaying training and validation metrics updated at the end of every epoch.
    name='Epoch training and validation loss',
        'training': epoch_train_loss_channel,
        'validation': epoch_validation_loss_channel

    name='Epoch training and validation accuracy',
        'training': epoch_train_acc_channel,
        'validation': epoch_validation_acc_channel

Utility Functions

We need to declare a few functions for simple tasks, such as formatting timestamps or preparing an image of an incorrectly recognized digit.

# Format the timestamp in a human-readable format.
def format_timestamp(timestamp):
    return time.strftime('%H:%M:%S', time.localtime(timestamp))

# Prepare an image of an incorrectly recognized digit to be sent to Neptune.
def false_prediction_neptune_image(raw_image, index, epoch_number, prediction, actual):
    false_prediction_image = Image.fromarray(raw_image)
    image_name = '(epoch {}) #{}'.format(epoch_number, index)
    image_description = 'Predicted: {}, actual: {}.'.format(prediction, actual)
    return neptune.Image(

Custom Keras Callbacks

To update the model’s quality metrics at the end of batches and epochs, we need to write our own callbacks that subclass keras.callbacks.Callback. For more information about callbacks, read Keras documentation.

class BatchEndCallback(Callback):
    def __init__(self):
        self.batch_id = 0

    def on_batch_end(self, batch, logs={}):
        self.batch_id += 1

        # Send training metrics.
        batch_train_loss_channel.send(x=self.batch_id, y=float(logs.get('loss')))
        batch_train_acc_channel.send(x=self.batch_id, y=float(logs.get('acc')))

        # Log the end of the batch.
        timestamp = time.time()
        batch_end_message = '{} Batch {} finished, batch size = {}.'.format(
            format_timestamp(timestamp), self.batch_id, logs.get('size'))

        logging_channel.send(x=timestamp, y=batch_end_message)

class EpochEndCallback(Callback):
    def __init__(self):
        self.epoch_id = 0
        self.false_predictions = 0

    def on_epoch_end(self, epoch, logs={}):
        self.epoch_id += 1

        # Send training and validation metrics.
        epoch_train_loss_channel.send(x=self.epoch_id, y=float(logs.get('loss')))
        epoch_train_acc_channel.send(x=self.epoch_id, y=float(logs.get('acc')))

        epoch_validation_loss_channel.send(x=self.epoch_id, y=float(logs.get('val_loss')))
        epoch_validation_acc_channel.send(x=self.epoch_id, y=float(logs.get('val_acc')))

        # Predict the digits for images of the test set.
        validation_predictions = model.predict_classes(X_test)

        # Identify the incorrectly classified images and send them to Neptune Dashboard.
        for index, (prediction, actual) in enumerate(zip(validation_predictions, y_test)):
            if prediction != actual:
                self.false_predictions += 1
                false_prediction_image = false_prediction_neptune_image(
                    raw_X_test[index], index, self.epoch_id, prediction, actual)
                false_predictions_channel.send(x=self.false_predictions, y=false_prediction_image)

        # Log the end of the epoch.
        timestamp = time.time()
        epoch_end_message = '{} Epoch {}/{} finished.'.format(
            format_timestamp(timestamp), self.epoch_id, nb_epoch)

        logging_channel.send(x=timestamp, y=epoch_end_message)

The first callback, named BatchEndCallback, sends the values of training metrics to the corresponding channels at the end of every batch. In addition, it sends logging information with a timestamp to logging_channel.

The second callback, named EpochEndCallback, performs multiple tasks at the end of every epoch. At first, it sends the values of training and evaluation metrics to Neptune. In the next step, it uses the partially trained model to predict the digits for images of the test set. The incorrectly classified images are identified and sent to the false_predictions_channel. Finally, our callback sends logging information with a timestamp to logging_channel.

Parametrizing the Job

We want to easily run multiple instances of our experiment with different training parameters. Then based on the model’s performance we find the best parameters for our model. To parametrize our job, we can replace some of the values hardcoded in the original source file with job’s parameters. However, there is only one value we want to parametrize - kernel_size.

Let’s replace the convolution kernel size with a job’s parameter named kernel_size:

batch_size = 128
nb_classes = 10
nb_epoch = 12

# input image dimensions
img_rows, img_cols = 28, 28
# number of convolutional filters to use
nb_filters = 32
# size of pooling area for max pooling
pool_size = (2, 2)
# convolution kernel size
kernel_size = (ctx.params.kernel_size, ctx.params.kernel_size)

Storing Examples from the Test Set in a Variable

We need to memorize the original, unprocessed value of X_test. This is necessary in the other part of the code to enable extraction of the incorrectly classified images. Let’s do this just after loading the data.

# the data, shuffled and split between train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# let's store unprocessed Xs for the later use:
raw_X_test = X_test

Fit the model

The input data needs some preparation before the model can be fit. We leave the code unchanged.

if K.image_dim_ordering() == 'th':
    X_train = X_train.reshape(X_train.shape[0], 1, img_rows, img_cols)
    X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
    X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
    X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

model = Sequential()

model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))



Registering Callbacks

Now we are ready to call the fit() method on the model. In contrast to the original code, we will register callbacks we have previously declared. They will run at the end of batches and epochs.

model.fit(X_train, Y_train, batch_size=batch_size, nb_epoch=nb_epoch,
          verbose=1, validation_data=(X_test, Y_test),
          callbacks=[BatchEndCallback(), EpochEndCallback()])

Model Validation

Once training of our model is finished, we can calculate the overall accuracy and send it to logging_channel. Again, we need to modify the original code that was responsible for evaluation.

accuracy = model.evaluate(X_test, Y_test, verbose=0)[1]

timestamp = time.time()
    y='{} Evaluation finished. Overall accuracy = {}.'.format(
        format_timestamp(timestamp), accuracy))

Let’s Run the Job!

Once the source file is ready, we can run our job. In order to run our code as a Neptune job, we need to prepare a short configuration file describing the job:


name: Handwritten Digits Recognition
description: A convolutional neural network recognizing handwritten digits.
project: MNIST
  - name: kernel_size
    description: The convolution kernel size.
    type: int
    required: true

Our configuration file contains: job’s name and description, the project our job belongs to and a description of job’s parameters. Our job has only one parameter, named kernel_size, which is used in model training.

We can run the job using the neptune run command in the directory containing the mnist_cnn_neptune.py and neptune.yaml files. Let’s set kernel_size to 3 - it is the value that was previously hardcoded in the source file.

$ neptune run mnist_cnn_neptune.py -- --kernel_size 3

The job is now running. It will take over one hour to complete (depending on your computer’s capabilities). You can configure Keras to run on GPU to make it run faster.

The command’s output will contain a link to job’s dashboard. There we can observe logs sent to logging_channel and browse charts with values of metrics sent at the end of every batch and epoch.

> Job enqueued, id:
> To browse the job, follow:
> https://[your Neptune IP address]/#dashboard/job/a38afb52-40ca-4b4f-85b5-60ab78ca2d31

Running the Job with Different Parameters

The overall accuracy of the model trained with kernel_size = 3 is equal to 0.9875. It’s not a bad result, but we could get better performance with parameter tuning. Let’s run our job with kernel_size = 5:

$ neptune run mnist_cnn_neptune.py -- --kernel_size 5

With kernel_size = 5, the model’s accuracy will increase to 0.9915. The job may run about half an hour longer when executed on CPU.

Browse the Job’s Dashboard

Viewing Real-Time Charts and Logs

When we follow the link displayed in neptune run’s output, the job’s dashboard will open.

Job's dashboard

We can see 4 real-time charts with metrics of the model being trained. The two charts on the left side are updated after every batch. The two charts to the right are updated after every epoch.

Let’s open the Channels tab. We can see 8 tiles, each one corresponding to a single channel. The tiles contain channels’ names and their last values.

View of job's channels

To browse values sent to the channel, we need to click on the corresponding tile. Let’s click on the logging_channel tile to browse the logs.

View of logging_channel's values

Viewing the Incorrectly Classified Digits

After the end of every epoch, the incorrectly classified images are sent to the false_predictions channel. Once the first epoch of our job is finished, we are able to browse these images in the dashboard.

Let’s open the Channels tab and click on the false_predictions tile.

View of false_predictions' values

We will see the images of digits that were incorrectly classified by our model in the last completed epoch. Below each image there is a corresponding epoch number and an index of the image in the test set.

To see the the false predictions and the actual digits written, let’s switch to the detailed view.

Detailed view of false_predictions' values


We have successfully connected a piece of Python machine learning code with Neptune and learned how to use Neptune’s features to enhance our working process and make it more convenient.

This example only briefly explores the features offered by Neptune machine learning platform. To learn more about Neptune’s capabilities, read the documentation of a job, CLI and Architecture.

Further Modifications

Performance of the trained machine learning model may still be enhanced by tuning the other parameters. You can replace the remaining hardcoded training parameters with job’s parameters, run a couple jobs with different values of parameters and observe how they affect the overall accuracy.

You can also execute the code on GPU to make it run faster.

You can find the complete source code of the example at GitHub.