Remote Job Execution

Neptune comes with a simple queuing mechanism which can be used for job remote execution.

NOTE: To execute Neptune job on a remote infrastructure you need to have a shared storage. This can be any storage that can be mounted as a file system on your operating system. Enqueuing environment and execution environment have to have access to the shared storage. This storage is used for storing snapshots of your code.

Basic Usage

When you want to execute the job on a remote infrastructure you should use enqueue command.

In neptune.yaml, you should set base path of Neptune Storage to the shared storage:

storage-base-path: /path/to/shared/storage

When you enqueue an experiment, it is created as queued.

neptune enqueue                    \

>
> Job enqueued, id: 78b7ba83-9bb8-4405-b0b7-793fae2b566b
>
> To browse the job, follow:
> https://YOUR_NEPTUNE_IP:YOUR_NEPTUNE_PORT/#dashboard/job/78b7ba83-9bb8-4405-b0b7-793fae2b566b
>

Then on a remote host you can run neptune exec.

Example:

neptune exec 78b7ba83-9bb8-4405-b0b7-793fae2b566b

Worker Script

To avoid getting access and logging to the remote execution environment to execute some job you can run simple script that will execute enqueued jobs automatically. To do that you should use neptune exec with --resources parameter.

Example of a worker script:

#!/bin/bash

while true; do
    neptune exec --resources gpu scikit-learn tensorflow
    sleep 1m
done

This script will infinitely execute jobs that were enqueued with requirements which are a subset of resources defined in the exec command. For example a job that was enqueued with --requirements gpu tensorflow will be executed by this worker script, but a job that was enqueued with --requirements gpu keras will not be executed by this worker, because worker did not declare keras as a resource.