Spec #
This is a top-level attribute of the pipeline spec.
{
"pipeline": {...},
"transform": {...},
"egress": {
// Egress to an object store
"URL": "s3://bucket/dir"
// Egress to a database
"sql_database": {
"url": string,
"file_format": {
"type": string,
"columns": [string]
},
"secret": {
"name": string,
"key": "PACHYDERM_SQL_PASSWORD"
}
},
},
...
}
Attributes #
Attribute | Description |
---|---|
URL | The URL of the object store where the pipeline’s output data should be written. |
sql_database | An optional field that is used to specify how the pipeline should write output data to a SQL database. |
url | The URL of the SQL database, in the format postgresql://user:password@host:port/database . |
file_format | The file format of the output data, which can be specified as csv or tsv . This field also includes the column names that should be included in the output. |
secret | The name and key of the Kubernetes secret that contains the password for the SQL database. |
Behavior #
The egress
field in a Pachyderm Pipeline Spec is used to specify how the pipeline should write the output data. The egress
field supports two types of outputs: writing to an object store and writing to a SQL database.
Data is pushed after the user code finishes running but before the job is marked as successful. For more information, see Egress Data to an object store or Egress Data to a database.
This is required if the pipeline needs to write output data to an external storage system.
When to Use #
You should use the egress
field in a Pachyderm Pipeline Spec when you need to write the output data from your pipeline to an external storage system, such as an object store or a SQL database.
Example scenarios:
Long-term data storage: If you need to store the output data from your pipeline for a long time, you can use the
egress
field to write the data to an object store, such as Amazon S3 or Google Cloud Storage.Data sharing: If you need to share the output data from your pipeline with external users or systems, you can use the
egress
field to write the data to an object store that is accessible to those users or systems.Analytics and reporting: If you need to perform further analytics or reporting on the output data from your pipeline, you can use the
egress
field to write the data to a SQL database that can be used for those purposes.