Jsonnet pipeline specifications
is an experimental feature.
Pachyderm pipeline’s specification files are intuitive, simple, and language agnostic. They are, however, very static.
A jsonnet pipeline specification file is a thin wrapping layer atop of your JSON file, allowing you to parameterize a pipeline specification file, thus adding a dynamic component to the creation and update of pipelines.
With jsonnet pipeline specs, you can easily reuse the baseline of a given pipeline spec while experimenting with various values of given fields.
Jsonnet Specs #
Pachyderm’s Jsonnet pipeline specs are written in the open-source templating language jsonnet. Jsonnet wraps the baseline of a JSON file into a function, allowing the injection of parameters to a pipeline specification file.
All jsonnet pipeline specs have a .jsonnet
extension.
As an example, check the file edges.jsonnet
below. It is a parameterized version
of the edges pipeline spec edges.json
in the opencv example, used to inject a name modifier
and an input repository name into the original pipeline specifications.
Example 1 #
In this snippet of edges.jsonnet
, the parameter src
sits in place of what would have been
the value of the field repo
, as a placeholder for any parameter that will be passed to the Jsonnet pipeline spec.
input: {
pfs: {
name: "images",
glob: "/*",
repo: src,
}
},
See the full edges.jsonnet
here:
////
// Template arguments:
//
// suffix : An arbitrary suffix appended to the name of this pipeline, for
// disambiguation when multiple instances are created.
// src : the repo from which this pipeline will read the images to which
// it applies edge detection.
////
function(suffix, src)
{
pipeline: { name: "edges-"+suffix },
description: "OpenCV edge detection on "+src,
input: {
pfs: {
name: "images",
glob: "/*",
repo: src,
}
},
transform: {
cmd: [ "python3", "/edges.py" ],
image: "pachyderm/opencv:0.0.1"
}
}
Or check our full “jsonnet-ed” opencv example.
To create or update a pipeline using a jsonnet pipeline specification file:
- add the
--jsonnet
flag to yourpipeline create
orpipeline update
commands, followed by a local path to your jsonnet file or an url. - add
--arg <parameter-name>=value
for each variable.
Example 2 #
pachctl create pipeline --jsonnet jsonnet/edges.jsonnet --arg suffix=1 --arg src=images
The command above will generate a JSON file named edges-1.json
then create a pipeline of the same name taking the repository images
as its input.
Read Jsonnet’s complete standard library documentation to learn about all the variables types, string manipulation and mathematical functions, or assertions available to you.
At the minimum, your function should always have a parameter that acts as a name modifier. Pachyderm’s pipeline names are unique. You can quickly generate several pipelines from the same jsonnet pipeline specification file by adding a prefix or a suffix to its generic name.
Your .jsonnet file can create multiple pipelines at once as illustrated in our group example.
Use Cases #
Using jsonnet pipeline specifications, you could pass different images to the transform section of an otherwise identical JSON specification file to train multiple models on the same dataset, or switch between one input repo holding test data to another holding production data by parameterizing the input repo field.
During the development phase of a pipeline, it can be helpful to pass the tag of an image as a parameter: each re-build of the pipeline’s code requires you to increment your tag value; passing it as a parameter will save you the time to update your JSON specifications. You could also consider preparing a library of ready-made jsonnet pipeline specs for data science teams to instantiate, according to their own set of parameters.
We will let you imagine more use cases in which those jsonnet specs can be helpful to you.