We will show you how Neptune machine learning platform can help to manage, visualize and compare machine learning experiments. The following example shows how easy it is to integrate Neptune with your existing code.
We will adapt the Linear Regression example from the scikit-learn library to utilize features of Neptune. The example consists of a single Python file using scikit-learn to train and evaluate a simple linear regression model that predicts disease progression of diabetes patients.
Integration of the code with Neptune Client Library will allow us to run the code multiple times as a single Grid Search Experiment. This kind of experiment provides effortless way to execute the same code with different parameters and to compare all the results in Web UI.
Dataset size: 442 examples (422 examples of the training set and 20 examples of the test set).
Dataset description: Ten normalized baseline variables: age, sex, body mass index, average blood pressure, and six blood serum measurements were obtained for each of 442 diabetes patients, as well as the response of interest, a quantitative measure of disease progression one year after baseline.
Business purpose: Predict disease progression for diabetes patients in the next year.
Data set credits: Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani (2004) “Least Angle Regression,” Annals of Statistics.
Using Neptune grid search functionality we can easily check which variable from the dataset is the best predictor (yields the best metric value). Neptune gives us convenient Web UI where we can compare and share results of our experiments. Additionally, for every execution Neptune stores parameter values and creates a snapshot of used source code so we are able to recreate every result.
Neptune stores images sent from the job’s code via image channels. We can use image channels to send a custom chart containing the regression line and target values.
To run the code from this example, you need to have the following installed:
We need the base source file to start with. Let’s download
and rename it to
At first, we will go through the changes that are required to integrate the code with Neptune.
If you want to download the code that is ready to run, it’s available on GitHub.
First, let’s add the imports of the additional libraries.
import matplotlib.pyplot as plt import numpy as np from sklearn import datasets, linear_model # The additional libraries. import io import time from deepsense import neptune from PIL import Image
In the next step, we need to create a Context object to enable communication from the job to Neptune.
ctx = neptune.Context()
Once we have the Context object, we can configure several channels to send logs, the image of the chart and the metric value.
We use a single metric to measure the quality of our model: the value of MSE. To send the metric value to Neptune,
we need to create a
NUMERIC channel. We will call it
# A channel to send the Mean Squared Error metric value. mse_channel = ctx.job.create_channel( name='MSE', channel_type=neptune.ChannelType.NUMERIC)
To send a chart with the regression line to Neptune, we need an image channel. Let’s name it
# A channel to send the regression chart. regression_chart_channel = ctx.job.create_channel( name='Regression chart', channel_type=neptune.ChannelType.IMAGE)
We also create a
TEXT channel named
logs_channel to send logging information about job’s execution.
# A channel to log information about job's execution. logs_channel = ctx.job.create_channel( name='logs', channel_type=neptune.ChannelType.TEXT)
Our simple regression model uses only one feature. In the original code, it’s the feature with
index 2 - the patient’s BMI. Let’s introduce a new numeric parameter named
feature_index so we can select the feature
index when running our experiment. That way we can test the importance of different features without changing the code.
After the line that loads the data set:
# Load the diabetes dataset diabetes = datasets.load_diabetes()
let’s add the following code:
# Add a tag containing the name of the feature. feature_names = ['age', 'sex', 'bmi', 'bp', 's1', 's2', 's3', 's4', 's5', 's6'] used_feature_name = feature_names[ctx.params.feature_index] ctx.job.tags.append('diabetes-feature-' + used_feature_name)
# Use only one feature diabetes_X = diabetes.data[:, np.newaxis, 2]
# Use only one feature diabetes_X = diabetes.data[:, np.newaxis, ctx.params.feature_index]
We replaced the hardcoded feature’s index with a job’s parameter and added a tag containing the name of the feature. The tag will be displayed on the job list, so we can easily identify the feature used in a specific experiment.
We leave the code training the model unchanged.
# Split the data into training/testing sets diabetes_X_train = diabetes_X[:-20] diabetes_X_test = diabetes_X[-20:] # Split the targets into training/testing sets diabetes_y_train = diabetes.target[:-20] diabetes_y_test = diabetes.target[-20:] # Create linear regression object regr = linear_model.LinearRegression() # Train the model using the training sets regr.fit(diabetes_X_train, diabetes_y_train)
Instead of printing information to the console, we want to send it to Neptune through channels. This will enable us to browse the information in the Web UI.
Let’s replace the code printing the model’s coefficients:
# The coefficients print('Coefficients: \n', regr.coef_)
# The coefficients logs_channel.send(x=time.time(), y='Coefficients: ' + str(regr.coef_))
Replace the code printing the mean squared error:
# The mean square error print("Mean squared error: %.2f" % np.mean((regr.predict(diabetes_X_test) - diabetes_y_test) ** 2))
# The mean square error mse = np.mean((regr.predict(diabetes_X_test) - diabetes_y_test) ** 2) mse_channel.send(x=time.time(), y=mse) logs_channel.send(x=time.time(), y="Mean squared error: %.2f" % mse)
Replace the code printing the variance score:
# Explained variance score: 1 is perfect prediction print('Variance score: %.2f' % regr.score(diabetes_X_test, diabetes_y_test))
# Explained variance score: 1 is perfect prediction logs_channel.send(x=time.time(), y='Variance score: %.2f' % regr.score(diabetes_X_test, diabetes_y_test))
The original code plots a chart with a regression line and displays it in a window. Let’s modify the code to send the chart to Neptune and make it visible in Web UI.
The original code:
# Plot outputs plt.scatter(diabetes_X_test, diabetes_y_test, color='black') plt.plot(diabetes_X_test, regr.predict(diabetes_X_test), color='blue', linewidth=3) plt.xticks(()) plt.yticks(()) plt.show()
The modified code:
# Plot outputs plt.scatter(diabetes_X_test, diabetes_y_test, color='black') plt.plot(diabetes_X_test, regr.predict(diabetes_X_test), color='blue', linewidth=3) # Convert the chart to an image. image_buffer = io.BytesIO() plt.savefig(image_buffer, format='png') image_buffer.seek(0) # Send the chart to Neptune through an image channel. regression_chart_channel.send( x=time.time(), y=neptune.Image( name='Regression chart', description='A chart containing predictions and target values ' 'for diabetes progression regression. ' 'Feature used: ' + used_feature_name, data=Image.open(image_buffer)))
Instead of displaying the chart in a window, we saved it as an image in a buffer. We also sent the image of the chart
to Neptune via
regression_chart_channel. To make the chart more descriptive, we removed the lines hiding the scale
for X and Y axes. The chart will be visible in the Web UI.
In order to create an experiment, we need to prepare a short configuration file describing it:
name: Diabetes Progression Prediction description: Linear Regression with scikit-learn on diabetes dataset. project: Diabetes parameters: - name: feature_index type: int description: The index of the feature used to train the linear regression model. It should be between 0 and 9. required: true metric: channel: MSE direction: minimize
Our configuration file contains: the experiment’s name and description, the project it belongs to,
the schema of it’s parameters that will be injected to the job and the metric used for comparison.
Our experiment has only one parameter, named
feature_index, which is responsible for
selecting the feature used during training.
We’ve also explicitly declared metric and bound it to MSE channel we use in our code to
send calculated mean squared error of model’s predictions.
minimize means that the job having smaller
MSE value is considered to be the better one.
Metric declaration is required in order to use grid search functionality.
Once the source and the configuration files are ready, we can create a grid search experiment to compare the results of possible features and check which one is the best predictor (yields the lowest MSE).
We can run the experiment using the neptune run command in the directory containing
$ neptune run plot_ols_neptune.py -- --feature_index "(0, 9, 1)"
As a parameter value for
feature_index we’ve passed
"(0, 9, 1)" - a range from 0 to 9 with step 1.
Passing multiple values as a parameter value enables grid search functionality.
In our case Neptune will create ten jobs - each with different value of
When the command is ran, your browser should automatically open the experiment’s dashboard in the Web UI where you can track the overall progress of the experiment.
Neptune automatically picks the job with the lowest
MSE and marks it as the best job.
Select it to enter it’s dashboard.
To view the regression line of the selected job let’s navigate to the Channels tab.
Next, click on the Regression chart tile. You will see the chart’s thumbnail with a description.
To view the chart in full resolution, let’s click on the top right corner of the thumbnail.
To view the logs sent from the job to Neptune during the execution, we need to open the “Channels” tab and click in the logs tile.
We have used Neptune to manage a simple machine learning experiment. Neptune allowed us to run a grid search experiment and track its progress. Furthermore, the best feature was selected for us. We have also visualized the predictions by plotting them against target values, using Neptune’s image channels.
That is only a part of Neptune’s features. To see an advanced example and explore more features, see the Handwritten Digits example.