To create a pipeline, you need to define a pipeline specification in YAML, JSON, or Jsonnet.
Before You Start #
A basic pipeline must have all of the following:
pipeline.name
: The name of your pipeline.transform.cmd
: The command that executes your user code.transform.img
: The image that contains your user code.input.pfs.repo
: The output repository for the transformed data.input.pfs.glob
: The glob pattern used to identify the shape of datums.
How to Create a Pipeline #
Via Local File #
Define a pipeline specification in YAML, JSON, or Jsonnet.
Pass the pipeline configuration to Pachyderm:
pachctl create pipeline -f <pipeline_spec>
Via URL #
- Find a pipeline specification hosted in a public or internal repository.
- Pass the pipeline configuration to Pachyderm:
pachctl create pipeline -f https://raw.githubusercontent.com/pachyderm/pachyderm/2.7.x/examples/opencv/edges.json
Via Jsonnet #
Jsonnet Pipeline specs let you create pipelines while passing a set of parameters dynamically, allowing you to reuse the baseline of a given pipeline while changing the values of chosen fields. You can, for example, create multiple pipelines out of the same jsonnet pipeline spec file while pointing each of them at different input repositories, parameterize a command line in the transform field of your pipelines, or dynamically pass various docker images to train different models on the same dataset.
For illustration purposes, in the following example, we are creating a pipeline named edges-1
and pointing its input repository at the repo ‘images’:
pachctl create pipeline --jsonnet jsonnet/edges.jsonnet --arg suffix=1 --arg src=images
You can define multiple pipeline specifications in one file by separating the specs with the following separator: ---
. This works in both JSON and YAML files.
Examples #
JSON #
{
"pipeline": {
"name": "edges"
},
"description": "A pipeline that performs image edge detection by using the OpenCV library.",
"transform": {
"cmd": [ "python3", "/edges.py" ],
"image": "pachyderm/opencv"
},
"input": {
"pfs": {
"repo": "images",
"glob": "/*"
}
}
}
YAML #
pipeline:
name: edges
description: A pipeline that performs image edge detection by using the OpenCV library.
transform:
cmd:
- python3
- "/edges.py"
image: pachyderm/opencv
input:
pfs:
repo: images
glob: "/*"
Considerations #
- When you create a pipeline, Pachyderm automatically creates an eponymous output
repository. However, if such a repo already exists, your pipeline will take
over the master branch. The files that were stored in the repo before
will still be in the
HEAD
of the branch.