Helm

series

About #

The Deploy Target section defines where you’re deploying Pachyderm; this is typically located at the top of your values.yaml file.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:

# Deploy Target configures the storage backend and cloud provider settings (storage classes, etc). 
# options:  GOOGLE, AMAZON, MINIO, MICROSOFT, CUSTOM or LOCAL.
deployTarget: "AMAZON"

# Deploy Target configures the storage backend and cloud provider settings (storage classes, etc). 
# options:  GOOGLE, AMAZON, MINIO, MICROSOFT, CUSTOM or LOCAL.
deployTarget: "CUSTOM"

# Deploy Target configures the storage backend and cloud provider settings (storage classes, etc). 
# options:  GOOGLE, AMAZON, MINIO, MICROSOFT, CUSTOM or LOCAL.
deployTarget: "GOOGLE"

# Deploy Target configures the storage backend and cloud provider settings (storage classes, etc). 
# options:  GOOGLE, AMAZON, MINIO, MICROSOFT, CUSTOM or LOCAL.
deployTarget: "LOCAL"

# Deploy Target configures the storage backend and cloud provider settings (storage classes, etc). 
# options:  GOOGLE, AMAZON, MINIO, MICROSOFT, CUSTOM or LOCAL.
deployTarget: "MICROSOFT"

# Deploy Target configures the storage backend and cloud provider settings (storage classes, etc). 
# options:  GOOGLE, AMAZON, MINIO, MICROSOFT, CUSTOM or LOCAL.
deployTarget: "MINIO"

Global HCVs

About #

The Global section configures the connection to the PostgreSQL database. By default, it uses the included Postgres service.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:

global:
  postgresql:
    postgresqlAuthType: "md5" # sets the auth type used with postgres & pg-bounder; options include "md5" and "scram-sha-256"
    postgresqlUsername: "pachyderm" # defines the username to access the pachyderm and dex databases
    postgresqlExistingSecretName: "" # leave blank if using password
    postgresqlExistingSecretKey: "" # leave blank if using password
    postgresqlDatabase: "pachyderm" # defines the database name where pachyderm data will be stored
    postgresqlHost: "postgres" # defines the postgresql database host to connect to
    postgresqlPort: "5432"  # defines he postgresql database port to connect to
    postgresqlSSL: "disable" # defines the SSL mode used to connect pg-bouncer to postgrs
    postgresqlSSLCACert: "" # defines the CA Certificate required to connect to Postgres
    postgresqlSSLSecret: "" # defines the TLS Secret with cert/key to connect to Postgres
    identityDatabaseFullNameOverride: "" # defines the DB name that dex connects to; defaults to "Dex"
  imagePullSecrets: [] # allows you to pull images from private repositories; also added to pipeline workers

  # Example:
  # imagePullSecrets:
  #   - regcred

  customCaCerts: false # loads the cert file in pachd-tls-cert as the root cert for pachd, console, and enterprise-server 
  proxy: "" # sets server address for outbound cluster traffic
  noProxy: "" # if proxy is set, allows a comma-separated list of destinations that bypass the proxy
  securityContexts: # set security context runAs users. If running on openshift, set enabled to false as openshift creates its own contexts.
    enabled: true

global:
  postgresql:
    postgresqlAuthType: "md5" # sets the auth type used with postgres & pg-bounder; options include "md5" and "scram-sha-256"
    postgresqlUsername: "pachyderm" # defines the username to access the pachyderm and dex databases
    postgresqlPostgresPassword: "insecure-root-password" # leave blank if using a secret
    postgresqlDatabase: "pachyderm" # defines the database name where pachyderm data will be stored
    postgresqlHost: "postgres" # defines the postgresql database host to connect to
    postgresqlPort: "5432"  # defines he postgresql database port to connect to
    postgresqlSSL: "disable" # defines the SSL mode used to connect pg-bouncer to postgrs
    postgresqlSSLCACert: "" # defines the CA Certificate required to connect to Postgres
    postgresqlSSLSecret: "" # defines the TLS Secret with cert/key to connect to Postgres
    identityDatabaseFullNameOverride: "" # defines the DB name that dex connects to; defaults to "Dex"
  imagePullSecrets: [] # allows you to pull images from private repositories; also added to pipeline workers
  customCaCerts: false # loads the cert file in pachd-tls-cert as the root cert for pachd, console, and enterprise-server 
  proxy: "" # sets server address for outbound cluster traffic
  noProxy: "" # if proxy is set, allows a comma-separated list of destinations that bypass the proxy
  # Set security context runAs users. If running on openshift, set enabled to false as openshift creates its own contexts
  securityContexts:
    enabled: true

Console HCVs

About #

Console is the Graphical User Interface (GUI) for Pachyderm. Users that would prefer to navigate and manage through their project resources visually can connect to Console by authenticating against your configured OIDC. For personal-machine installations of Pachyderm, a user may access Console without authentication via localhost.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:

console:
  enabled: true # deploys Console UI
  annotations: {}
  image: # defines which image to use for the console; replicates the --console-image & --registry arguments to pachctl
    repository: "pachyderm/haberdashery" # defines image repo location
    pullPolicy: "IfNotPresent"
    tag: "2.3.3-1" # defines the image repo to pull from
  priorityClassName: ""
  nodeSelector: {}
  tolerations: []
  podLabels: {} # specifies labels to add to the console pod.
  resources: # specifies the resource request and limits; unset by default.
    {}

    #limits:
    #  cpu: "1"
    #  memory: "2G"
    #requests:
    #  cpu: "1"
    #  memory: "2G"
    
  config: # defines primary configuration settings, including authentication.
    reactAppRuntimeIssuerURI: ""  # defines the pachd oauth address accessible to outside clients.
    oauthRedirectURI: "" #  defines the oauth callback address within console that the pachd oauth service would redirect to.
    oauthClientID: "console" # defines the client identifier for the Console with pachd
    oauthClientSecret: "" # defines the secret configured for the client with pachd; if blank, autogenerated.
    oauthClientSecretSecretName: "" # uses the value of an existing k8s secret by pulling from the `OAUTH_CLIENT_SECRET` key.
    graphqlPort: 4000 # defines the http port that the console service will be accessible on.
    pachdAddress: "pachd-peer:30653"
    disableTelemetry: false # disables analytics and error data collection

  service:
    annotations: {}
    labels: {} # specifies labels to add to the console service.
    type: ClusterIP # specifies the Kubernetes type of the console service; default is `ClusterIP`.

console: 
  enabled: true # deploys Console UI
  annotations: {}
  image: # defines which image to use for the console; replicates the --console-image & --registry arguments to pachctl
    repository: "pachyderm/haberdashery" # defines image repo location
    pullPolicy: "IfNotPresent"
    tag: "2.3.3-1" # defines the image repo to pull from
  priorityClassName: ""
  nodeSelector: {}
  tolerations: []
  podLabels: {} # specifies labels to add to the console pod.
  resources: # specifies the resource request and limits; unset by default.
    {}

    #limits:
    #  cpu: "1"
    #  memory: "2G"
    #requests:
    #  cpu: "1"
    #  memory: "2G"

  config: # defines primary configuration settings, including authentication.
    reactAppRuntimeIssuerURI: ""  # defines the pachd oauth address accessible to outside clients.
    oauthRedirectURI: "" #  defines the oauth callback address within console that the pachd oauth service would redirect to.
    oauthClientID: "console" # defines the client identifier for the Console with pachd
    oauthClientSecret: "" # defines the secret configured for the client with pachd; if blank, autogenerated.
    oauthClientSecretSecretName: "" # uses the value of an existing k8s secret by pulling from the `OAUTH_CLIENT_SECRET` key.
    graphqlPort: 4000 # defines the http port that the console service will be accessible on.
    pachdAddress: "pachd-peer:30653"
    disableTelemetry: true # disables analytics and error data collection

  service:
    annotations: {}
    labels: {} # specifies labels to add to the console service.
    type: ClusterIP # specifies the Kubernetes type of the console service; default is `ClusterIP`.

console:
  enabled: false

Enterprise Server HCVs

About #

Enterprise Server is a production management layer that centralizes the licensing registration of multiple Pachyderm clusters for Enterprise use and the setup of user authorization/authentication via OIDC.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:


enterpriseServer:
  enabled: true
  affinity: {}
  annotations: {}
  tolerations: []
  priorityClassName: ""
  nodeSelector: {}
  service:
    type: ClusterIP
    apiGRPCPort: 31650
    prometheusPort: 31656
    oidcPort: 31657
    identityPort: 31658
    s3GatewayPort: 31600
  tls:
    enabled: false
  resources:
    {}
    
    #limits:
    #  cpu: "1"
    #  memory: "2G"
    #requests:
    #  cpu: "1"
    #  memory: "2G"

  podLabels: {} # specifies labels to add to the pachd pod.
  clusterDeploymentID: ""
  image:
    repository: "pachyderm/pachd"
    pullPolicy: "IfNotPresent"
    tag: "" #  defaults to the chart’s specified appVersion.


enterpriseServer:
  enabled: true
  affinity: {}
  annotations: {}
  tolerations: []
  priorityClassName: ""
  nodeSelector: {}
  service:
    type: ClusterIP
    apiGRPCPort: 31650
    prometheusPort: 31656
    oidcPort: 31657
    identityPort: 31658
    s3GatewayPort: 31600
  tls:
    enabled: true
    newSecret:
      create: true
      crt: ""
      key: ""
  resources:
    {}
    #limits:
    #  cpu: "1"
    #  memory: "2G"
    #requests:
    #  cpu: "1"
    #  memory: "2G"
  podLabels: {} #  specifies labels to add to the pachd pod.
  clusterDeploymentID: ""
  image:
    repository: "pachyderm/pachd"
    pullPolicy: "IfNotPresent"
    tag: "" # defaults to the chart’s specified appVersion.


enterpriseServer:
  enabled: true
  affinity: {}
  annotations: {}
  tolerations: []
  priorityClassName: ""
  nodeSelector: {}
  service:
    type: ClusterIP
    apiGRPCPort: 31650
    prometheusPort: 31656
    oidcPort: 31657
    identityPort: 31658
    s3GatewayPort: 31600
  tls:
    enabled: true
    secretName: ""
  resources:
    {}

    #limits:
    #  cpu: "1"
    #  memory: "2G"
    #requests:
    #  cpu: "1"
    #  memory: "2G"

  podLabels: {} # specifies labels to add to the pachd pod.
  clusterDeploymentID: ""
  image:
    repository: "pachyderm/pachd"
    pullPolicy: "IfNotPresent"
    tag: "" # defaults to the chart’s specified appVersion.


enterpriseServer:
  enabled: false

ETCD HCVs

About #

The ETCD section configures the ETCD cluster in the deployment.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

etcd:
  affinity: {}
  annotations: {}
  dynamicNodes: 1 # sets the number of nodes in the etcd StatefulSet;  analogous to the --dynamic-etcd-nodes argument to pachctl
  image:
    repository: "pachyderm/etcd"
    tag: "v3.5.1"
    pullPolicy: "IfNotPresent"
  maxTxnOps: 10000 # sets the --max-txn-ops in the container args
  priorityClassName: ""
  nodeSelector: {}
  podLabels: {} # specifies labels to add to the etcd pod.
  
  resources: # specifies the resource request and limits
    {}

    #limits:
    #  cpu: "1"
    #  memory: "2G"
    #requests:
    #  cpu: "1"
    #  memory: "2G"

  storageClass: "" #  defines what existing storage class to use; analogous to --etcd-storage-class argument to pachctl 
  storageSize: 10Gi # specifies the size of the volume to use for etcd.
  service:
    annotations: {} # specifies annotations to add to the etcd service.
    labels: {} # specifies labels to add to the etcd service.
    type: ClusterIP # specifies the Kubernetes type of the etcd service.
  tolerations: []

Ingress HCVs

About #

⚠️

ingress will be removed from the helm chart once the deployment of Pachyderm with a proxy becomes mandatory.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:

ingress:
  enabled: true
  annotations: {}
  host: ""
  uriHttpsProtoOverride: false # if true, adds the https protocol to the ingress URI routes without configuring certs
  tls:
    enabled: true
    secretName: ""

ingress:
  enabled: true
  annotations: {}
  host: ""
  uriHttpsProtoOverride: false # if true, adds the https protocol to the ingress URI routes without configuring certs
  tls:
    enabled: true
    newSecret:
      create: true
      crt: ""
      key: ""

ingress:
  enabled: true
  annotations: {}
  host: ""
  uriHttpsProtoOverride: false # if true, adds the https protocol to the ingress URI routes without configuring certs
  tls:
    enabled: false

Loki HCVs

About #

Loki Stack contains values that are passed to the loki-stack subchart. For more details on each service, see their official documentation:

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

loki-stack:
  loki:
    serviceAccount:
      automountServiceAccountToken: false
    persistence:
      enabled: true
      accessModes:
        - ReadWriteOnce
      size: 10Gi
      # More info for setting up storage classes on various cloud providers:
      # AWS: https://docs.aws.amazon.com/eks/latest/userguide/storage-classes.html
      # GCP: https://cloud.google.com/compute/docs/disks/performance#disk_types
      # Azure: https://docs.microsoft.com/en-us/azure/aks/concepts-storage#storage-classes
      storageClassName: ""
      annotations: {}
      priorityClassName: ""
      nodeSelector: {}
      tolerations: []
    config:
      limits_config:
        retention_period: 24h
        retention_stream:
          - selector: '{suite="pachyderm"}'
            priority: 1
            period: 168h # = 1 week
  grafana:
    enabled: false
  promtail:
    config:
      clients:
        - url: "http://{{ .Release.Name }}-loki:3100/loki/api/v1/push"
      snippets:
        # The scrapeConfigs section is copied from loki-stack-2.6.4
        # The pipeline_stages.match stanza has been added to prevent multiple lokis in a cluster from mixing their logs.
        scrapeConfigs: |
          - job_name: kubernetes-pods
            pipeline_stages:
              {{- toYaml .Values.config.snippets.pipelineStages | nindent 4 }}
              - match:
                  selector: '{namespace!="{{ .Release.Namespace }}"}'
                  action: drop
            kubernetes_sd_configs:
              - role: pod
            relabel_configs:
              - source_labels:
                  - __meta_kubernetes_pod_controller_name
                regex: ([0-9a-z-.]+?)(-[0-9a-f]{8,10})?
                action: replace
                target_label: __tmp_controller_name
              - source_labels:
                  - __meta_kubernetes_pod_label_app_kubernetes_io_name
                  - __meta_kubernetes_pod_label_app
                  - __tmp_controller_name
                  - __meta_kubernetes_pod_name
                regex: ^;*([^;]+)(;.*)?$
                action: replace
                target_label: app
              - source_labels:
                  - __meta_kubernetes_pod_label_app_kubernetes_io_instance
                  - __meta_kubernetes_pod_label_release
                regex: ^;*([^;]+)(;.*)?$
                action: replace
                target_label: instance
              - source_labels:
                  - __meta_kubernetes_pod_label_app_kubernetes_io_component
                  - __meta_kubernetes_pod_label_component
                regex: ^;*([^;]+)(;.*)?$
                action: replace
                target_label: component
              {{- if .Values.config.snippets.addScrapeJobLabel }}
              - replacement: kubernetes-pods
                target_label: scrape_job
              {{- end }}
              {{- toYaml .Values.config.snippets.common | nindent 4 }}
              {{- with .Values.config.snippets.extraRelabelConfigs }}
              {{- toYaml . | nindent 4 }}
              {{- end }}
        pipelineStages:
          - cri: {}
        common:
          # This is copy and paste of existing actions, so we don't lose them.
          # Cf. https://github.com/grafana/loki/issues/3519#issuecomment-1125998705
          - action: replace
            source_labels:
              - __meta_kubernetes_pod_node_name
            target_label: node_name
          - action: replace
            source_labels:
              - __meta_kubernetes_namespace
            target_label: namespace
          - action: replace
            replacement: $1
            separator: /
            source_labels:
              - namespace
              - app
            target_label: job
          - action: replace
            source_labels:
              - __meta_kubernetes_pod_name
            target_label: pod
          - action: replace
            source_labels:
              - __meta_kubernetes_pod_container_name
            target_label: container
          - action: replace
            replacement: /var/log/pods/*$1/*.log
            separator: /
            source_labels:
              - __meta_kubernetes_pod_uid
              - __meta_kubernetes_pod_container_name
            target_label: __path__
          - action: replace
            regex: true/(.*)
            replacement: /var/log/pods/*$1/*.log
            separator: /
            source_labels:
              - __meta_kubernetes_pod_annotationpresent_kubernetes_io_config_hash
              - __meta_kubernetes_pod_annotation_kubernetes_io_config_hash
              - __meta_kubernetes_pod_container_name
            target_label: __path__
          - action: keep
            regex: pachyderm
            source_labels:
              - __meta_kubernetes_pod_label_suite
          # this gets all kubernetes labels as well
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
    livenessProbe:
      failureThreshold: 5
      tcpSocket:
        port: http-metrics
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1

PachD HCVs

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:

pachd:
  enabled: true
  preflightChecks:
    enabled: true # runs kube validation preflight checks.
  affinity: {}
  annotations: {}
  clusterDeploymentID: "" # sets Pachyderm cluster ID.
  configJob:
    annotations: {}
  goMaxProcs: 0 # passed as GOMAXPROCS to the pachd container.
  image:
    repository: "pachyderm/pachd"
    pullPolicy: "IfNotPresent"
    tag: "" # sets worker image tag; defaults to appVersion.
  logFormat: "json"
  logLevel: "info"
  lokiDeploy: true 
  lokiLogging: true
  metrics:
    enabled: true
    endpoint: "" # provide the URL of the metrics endpoint.
  priorityClassName: ""
  nodeSelector: {}
  podLabels: {} # adds labels to the pachd pod.
  replicas: 1 # sets the number of pachd running pods
  resources: #  specifies the resource requests & limits
    {}
    #limits:
    #  cpu: "1"
    #  memory: "2G"
    #requests:
    #  cpu: "1"
    #  memory: "2G"

  requireCriticalServersOnly: false

  externalService:
    enabled: false # Creates a service that's safe to expose.
    loadBalancerIP: ""
    apiGRPCPort: 30650
    s3GatewayPort: 30600
    annotations: {}

  service:
    labels: {} # adds labels to the pachd service.
    type: "ClusterIP" # specifies pachd service's Kubernetes type
    annotations: {}
    apiGRPCPort: 30650
    prometheusPort: 30656
    oidcPort: 30657
    identityPort: 30658
    s3GatewayPort: 30600

    #apiGrpcPort:
    #  expose: true
    #  port: 30650

  activateEnterpriseMember: false # connects to an existing enterprise server.
  activateAuth: true # bootstraps auth via the config job.
  enterpriseLicenseKey: "" # activates enterprise if provided. 
  enterpriseLicenseKeySecretName: "" # pulls value from k8s secret key "enterprise-license-key"
  rootToken: "" # autogenerated if not provided; stored in k8s secret "pachyderm-bootstrap-config.rootToken"
  rootTokenSecretName: "" # passes rooToken value from k8s secret key "root-token"
  enterpriseSecret: "" # autogenerated if not provided; stored in k8s secret "pachyderm-bootstrap-config.enterpriseSecret"
  enterpriseSecretSecretName: "" # passes value from k8s secret key "enterprise-secret"
  oauthClientID: pachd
  oauthClientSecret: "" # autogenerated if not provided; stored in k8s secret "pachyderm-bootstrap-config.authConfig.clientSecret"
  oauthClientSecretSecretName: ""  # passes value from k8s secret key "pachd-oauth-client-secret"
  oauthRedirectURI: ""
  enterpriseServerToken: "" # authenticates to a enterprise server & registers this cluster as a member if activateEnterpriseMember is true.
  enterpriseServerTokenSecretName: "" # passes value from k8s secret key "enterprise-server-token" if activateEnterpriseMember is true. 
  enterpriseServerAddress: ""
  enterpriseCallbackAddress: ""
  localhostIssuer: "" # Indicates to pachd whether dex is embedded in its process; "true", "false", or ""
  pachAuthClusterRoleBindings: {} # map initial users to their list of roles.
  
  #   robot:wallie:
  #   - repoReader
  #   robot:eve:
  #   - repoWriter
 
  additionalTrustedPeers: [] # configures identity service to recognize trusted peers.

  #   - example-app

  serviceAccount:
    create: true
    additionalAnnotations: {}
    name: "pachyderm" 

  storage:
    backend: "" # options: GOOGLE, AMAZON, MINIO, MICROSOFT or LOCAL
    amazon:
      bucket: "" # sets the S3 bucket to use.
      cloudFrontDistribution: "" # sets the CloudFront distribution in the storage secrets. 
      customEndpoint: ""
      disableSSL: false
      id: "" #  sets the Amazon access key ID
      logOptions: "" # case-sensitive comma-separated list: 'Debug', 'Signing', 'HTTPBody', 'RequestRetries', 'EventStreamBody', or 'all'
      maxUploadParts: 10000
      verifySSL: true
      partSize: "5242880" # sets part size for object storage uploads; must be a string.
      region: "" # sets AWS region
      retries: 10
      reverse: true
      secret: ""  # sets the Amazon secret access key to use.
      timeout: "5m" #  sets the timeout for object storage requests.
      token: "" # sets the Amazon token to use.
      uploadACL: "bucket-owner-full-control" 
    google:
      bucket: ""
      cred: ""  # sets GCP service account private key as string. 

      # cred: |
      #  {
      #    "type": "service_account",
      #    "project_id": "…",
      #    "private_key_id": "…",
      #    "private_key": "-----BEGIN PRIVATE KEY-----\n…\n-----END PRIVATE KEY-----\n",
      #    "client_email": "…@….iam.gserviceaccount.com",
      #    "client_id": "…",
      #    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
      #    "token_uri": "https://oauth2.googleapis.com/token",
      #    "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
      #    "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/…%40….iam.gserviceaccount.com"
      #  }

    local:
      hostPath: "" # path where PFS metadata is stored; must end with "/".
      requireRoot: true # root required for hostpath, but we run rootless in CI
    microsoft:
      container: ""
      id: ""
      secret: ""
    minio:
      bucket: "" # sets bucket name. 
      endpoint: "" # format: hostname:port
      id: "" # username/id with readwrite access to the bucket.
      secret: "" # the secret/password of the user with readwrite access to the bucket.
      secure: "false" # enables https for minio if "true"
      signature: "" # Enables S3v2 support by setting signature to "1"; being deprecated. 
    putFileConcurrencyLimit: 100 # sets the maximum number of files to upload or fetch from remote sources uploadConcurrencyLimit sets the maximum number of concurrent; analogous to --put-file-concurrency-limit argument to pachctl
    uploadConcurrencyLimit: 100  # object storage uploads per Pachd instance; analogous to  --upload-concurrency-limit argument to pachctl
    compactionShardSizeThreshold: 0 # the total size of the files in a shard.
    compactionShardCountThreshold: 0 # the total number of files in a shard.
    memoryThreshold: 0
    levelFactor: 0
    maxFanIn: 10
    maxOpenFileSets: 50
    # diskCacheSize and memoryCacheSize are defined in units of 8 Mb chunks. The default is 100 chunks which is 800 Mb.
    diskCacheSize: 100
    memoryCacheSize: 100

  ppsWorkerGRPCPort: 1080
  storageGCPeriod: 0 # the number of seconds between PFS's garbage collection cycles; <0 disables garbage collection; 0 defaults to pachyderm's internal config.
  storageChunkGCPeriod: 0 # the number of seconds between chunk garbage collection cycles; <0 disables chunk garbage collection; 0 defaults to pachyderm's internal config.
  # There are three options for TLS:
  # 1. Disabled
  # 2. Enabled, existingSecret, specify secret name
  # 3. Enabled, newSecret, must specify cert, key and name
  tls:
    enabled: false
    secretName: ""
    newSecret:
      create: false
      crt: ""
      key: ""
  tolerations: []
  worker:
    image:
      repository: "pachyderm/worker"
      pullPolicy: "IfNotPresent"
      # Worker tag is set under pachd.image.tag (they should be kept in lock step)
    serviceAccount:
      create: true
      additionalAnnotations: {}
      name: "pachyderm-worker"  # sets the name of the worker service account; analogous to --worker-service-account argument to pachctl.
  rbac:
    create: true # indicates whether RBAC resources should be created; analogous to --no-rbac to pachctl
  # Set up default resources for pipelines that don't include any requests or limits.  The values
  # are k8s resource quantities, so "1Gi", "2", etc.  Set to "0" to disable setting any defaults.
  defaultPipelineCPURequest: ""
  defaultPipelineMemoryRequest: ""
  defaultPipelineStorageRequest: ""
  defaultSidecarCPURequest: ""
  defaultSidecarMemoryRequest: ""
  defaultSidecarStorageRequest: ""

pachd:
  enabled: true
  preflightChecks:
    # if enabled runs kube validation preflight checks.
    enabled: true
  affinity: {}
  annotations: {}
  # clusterDeploymentID sets the Pachyderm cluster ID.
  clusterDeploymentID: ""
  configJob:
    annotations: {}
  # goMaxProcs is passed as GOMAXPROCS to the pachd container.
  goMaxProcs: 0
  image:
    repository: "pachyderm/pachd"
    pullPolicy: "IfNotPresent"
    # tag defaults to the chart’s specified appVersion.
    # This sets the worker image tag as well (they should be kept in lock step)
    tag: ""
  logFormat: "json"
  logLevel: "info"
  # If lokiDeploy is true, a Pachyderm-specific instance of Loki will
  # be deployed.
  lokiDeploy: true
  # lokiLogging enables Loki logging if set.
  lokiLogging: true
  metrics:
    # enabled sets the METRICS environment variable if set.
    enabled: true
    # endpoint should be the URL of the metrics endpoint.
    endpoint: ""
  priorityClassName: ""
  nodeSelector: {}
  # podLabels specifies labels to add to the pachd pod.
  podLabels: {}
  # resources specifies the resource requests and limits
  # replicas sets the number of pachd running pods
  replicas: 1
  resources:
    {}
    #limits:
    #  cpu: "1"
    #  memory: "2G"
    #requests:
    #  cpu: "1"
    #  memory: "2G"
  # requireCriticalServersOnly only requires the critical pachd
  # servers to startup and run without errors.  It is analogous to the
  # --require-critical-servers-only argument to pachctl deploy.
  requireCriticalServersOnly: false
  # If enabled, External service creates a service which is safe to
  # be exposed externally
  externalService:
    enabled: false
    # (Optional) specify the existing IP Address of the load balancer
    loadBalancerIP: ""
    apiGRPCPort: 30650
    s3GatewayPort: 30600
    annotations: {}
  service:
    # labels specifies labels to add to the pachd service.
    labels: {}
    # type specifies the Kubernetes type of the pachd service.
    type: "ClusterIP"
    annotations: {}
    apiGRPCPort: 30650
    prometheusPort: 30656
    oidcPort: 30657
    identityPort: 30658
    s3GatewayPort: 30600
    #apiGrpcPort:
    #  expose: true
    #  port: 30650
  # DEPRECATED: activateEnterprise is no longer used.
  activateEnterprise: false
  ## if pachd.activateEnterpriseMember is set, enterprise will be activated and connected to an existing enterprise server.
  ## if pachd.enterpriseLicenseKey is set, enterprise will be activated.
  activateEnterpriseMember: false
  ## if pachd.activateAuth is set, auth will be bootstrapped by the config-job.
  activateAuth: true
  ## the license key used to activate enterprise features
  enterpriseLicenseKey: ""
  rootToken: ""

  enterpriseSecret: ""
  # enterpriseSecretSecretName is used to pass the enterprise secret value via an existing k8s secret.
  # The value is pulled from the key, "enterprise-secret".
  enterpriseSecretSecretName: ""
  # if a secret is not provided, a secret will be autogenerated on install and stored in the k8s secret 'pachyderm-bootstrap-config.authConfig.clientSecret'
  oauthClientID: pachd
  oauthClientSecret: ""
  # oauthClientSecretSecretName is used to set the OAuth Client Secret via an existing k8s secret.
  # The value is pulled from the key, "pachd-oauth-client-secret".
  oauthClientSecretSecretName: ""
  oauthRedirectURI: ""
  # DEPRECATED: enterpriseRootToken is deprecated, in favor of enterpriseServerToken
  # NOTE only used if pachd.activateEnterpriseMember == true
  enterpriseRootToken: ""
 
  # enterpriseServerToken represents a token that can authenticate to a separate Enterprise server,
  # and is used to complete the enterprise member registration process for this pachyderm cluster.
  # The user backing this token should have either the licenseAdmin & identityAdmin roles assigned, or
  # the clusterAdmin role.
  # NOTE: only used if pachd.activateEnterpriseMember == true
  enterpriseServerToken: ""
  # enterpriseServerTokenSecretName is used to pass the enterpriseServerToken value via an existing k8s secret.
  # The value is pulled from the key, "enterprise-server-token".
  enterpriseServerTokenSecretName: ""
  # only used if pachd.activateEnterpriseMember == true
  enterpriseServerAddress: ""
  enterpriseCallbackAddress: ""
  # Indicates to pachd whether dex is embedded in its process.
  localhostIssuer: "" # "true", "false", or "" (used string as bool doesn't support empty value)
  # set the initial pachyderm cluster role bindings, mapping a user to their list of roles
  # ex.
  # pachAuthClusterRoleBindings:
  #   robot:wallie:
  #   - repoReader
  #   robot:eve:
  #   - repoWriter
  pachAuthClusterRoleBindings: {}
  # additionalTrustedPeers is used to configure the identity service to recognize additional OIDC clients as trusted peers of pachd.
  # For example, see the following example or the dex docs (https://dexidp.io/docs/custom-scopes-claims-clients/#cross-client-trust-and-authorized-party).
  # additionalTrustedPeers:
  #   - example-app
  additionalTrustedPeers: []
  serviceAccount:
    create: true
    additionalAnnotations: {}
    name: "pachyderm" #TODO Set default in helpers / Wire up in templates
  storage:
    # backend configures the storage backend to use.  It must be one
    # of GOOGLE, AMAZON, MINIO, MICROSOFT or LOCAL. This is set automatically
    # if deployTarget is GOOGLE, AMAZON, MICROSOFT, or LOCAL
    backend: ""
    amazon:
      # bucket sets the S3 bucket to use.
      bucket: ""
      # cloudFrontDistribution sets the CloudFront distribution in the
      # storage secrets.  It is analogous to the
      # --cloudfront-distribution argument to pachctl deploy.
      cloudFrontDistribution: ""
      customEndpoint: ""
      # disableSSL disables SSL.  It is analogous to the --disable-ssl
      # argument to pachctl deploy.
      disableSSL: false
      # id sets the Amazon access key ID to use.  Together with secret
      # and token, it implements the functionality of the
      # --credentials argument to pachctl deploy.
      id: ""
      # logOptions sets various log options in Pachyderm’s internal S3
      # client.  Comma-separated list containing zero or more of:
      # 'Debug', 'Signing', 'HTTPBody', 'RequestRetries',
      # 'RequestErrors', 'EventStreamBody', or 'all'
      # (case-insensitive).  See 'AWS SDK for Go' docs for details.
      # logOptions is analogous to the --obj-log-options argument to
      # pachctl deploy.
      logOptions: ""
      # maxUploadParts sets the maximum number of upload parts.  It is
      # analogous to the --max-upload-parts argument to pachctl
      # deploy.
      maxUploadParts: 10000
      # verifySSL performs SSL certificate verification.  It is the
      # inverse of the --no-verify-ssl argument to pachctl deploy.
      verifySSL: true
      # partSize sets the part size for object storage uploads.  It is
      # analogous to the --part-size argument to pachctl deploy.  It
      # has to be a string due to Helm and YAML parsing integers as
      # floats.  Cf. https://github.com/helm/helm/issues/1707
      partSize: "5242880"
      # region sets the AWS region to use.
      region: ""
      # retries sets the number of retries for object storage
      # requests.  It is analogous to the --retries argument to
      # pachctl deploy.
      retries: 10
      # reverse reverses object storage paths.  It is analogous to the
      # --reverse argument to pachctl deploy.
      reverse: true
      # secret sets the Amazon secret access key to use.  Together with id
      # and token, it implements the functionality of the
      # --credentials argument to pachctl deploy.
      secret: ""
      # timeout sets the timeout for object storage requests.  It is
      # analogous to the --timeout argument to pachctl deploy.
      timeout: "5m"
      # token optionally sets the Amazon token to use.  Together with
      # id and secret, it implements the functionality of the
      # --credentials argument to pachctl deploy.
      token: ""
      # uploadACL sets the upload ACL for object storage uploads.  It
      # is analogous to the --upload-acl argument to pachctl deploy.
      uploadACL: "bucket-owner-full-control"
    google:
      bucket: ""
      # cred is a string containing a GCP service account private key,
      # in object (JSON or YAML) form.  A simple way to pass this on
      # the command line is with the set-file flag, e.g.:
      #
      #  helm install pachd -f my-values.yaml --set-file storage.google.cred=creds.json pachyderm/pachyderm
      cred: ""
      # Example:
      # cred: |
      #  {
      #    "type": "service_account",
      #    "project_id": "…",
      #    "private_key_id": "…",
      #    "private_key": "-----BEGIN PRIVATE KEY-----\n…\n-----END PRIVATE KEY-----\n",
      #    "client_email": "…@….iam.gserviceaccount.com",
      #    "client_id": "…",
      #    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
      #    "token_uri": "https://oauth2.googleapis.com/token",
      #    "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
      #    "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/…%40….iam.gserviceaccount.com"
      #  }
    local:
      # hostPath indicates the path on the host where the PFS metadata
      # will be stored.  It must end in /.  It is analogous to the
      # --host-path argument to pachctl deploy.
      hostPath: ""
      requireRoot: true #Root required for hostpath, but we run rootless in CI
    microsoft:
      container: ""
      id: ""
      secret: ""
    minio:
      # minio bucket name
      bucket: ""
      # the minio endpoint. Should only be the hostname:port, no http/https.
      endpoint: ""
      # the username/id with readwrite access to the bucket.
      id: ""
      # the secret/password of the user with readwrite access to the bucket.
      secret: ""
      # enable https for minio with "true" defaults to "false"
      secure: ""
      # Enable S3v2 support by setting signature to "1". This feature is being deprecated
      signature: ""
    # putFileConcurrencyLimit sets the maximum number of files to
    # upload or fetch from remote sources (HTTP, blob storage) using
    # PutFile concurrently.  It is analogous to the
    # --put-file-concurrency-limit argument to pachctl deploy.
    putFileConcurrencyLimit: 100
    # uploadConcurrencyLimit sets the maximum number of concurrent
    # object storage uploads per Pachd instance.  It is analogous to
    # the --upload-concurrency-limit argument to pachctl deploy.
    uploadConcurrencyLimit: 100
    # The shard size corresponds to the total size of the files in a shard.
    # The shard count corresponds to the total number of files in a shard.
    # If either criteria is met, a shard will be created.
    compactionShardSizeThreshold: 0
    compactionShardCountThreshold: 0
    memoryThreshold: 0
    levelFactor: 0
    maxFanIn: 10
    maxOpenFileSets: 50
    # diskCacheSize and memoryCacheSize are defined in units of 8 Mb chunks. The default is 100 chunks which is 800 Mb.
    diskCacheSize: 100
  

  ppsWorkerGRPCPort: 1080
  # the number of seconds between PFS's garbage collection cycles.
  # if this value is set to 0, it will default to pachyderm's internal configuration.
  # if this value is less than 0, it will turn off garbage collection.
  storageGCPeriod: 0
  # the number of seconds between chunk garbage collection cycles.
  # if this value is set to 0, it will default to pachyderm's internal configuration.
  # if this value is less than 0, it will turn off chunk garbage collection.
  storageChunkGCPeriod: 0
  # There are three options for TLS:
  # 1. Disabled
  # 2. Enabled, existingSecret, specify secret name
  # 3. Enabled, newSecret, must specify cert, key and name
  tls:
    enabled: false
    secretName: ""
    newSecret:
      create: false
      crt: ""
      key: ""
  tolerations: []
  worker:
    image:
      repository: "pachyderm/worker"
      pullPolicy: "IfNotPresent"
      # Worker tag is set under pachd.image.tag (they should be kept in lock step)
    serviceAccount:
      create: true
      additionalAnnotations: {}
      # name sets the name of the worker service account.  Analogous to
      # the --worker-service-account argument to pachctl deploy.
      name: "pachyderm-worker" #TODO Set default in helpers / Wire up in templates
  rbac:
    # create indicates whether RBAC resources should be created.
    # Setting it to false is analogous to passing --no-rbac to pachctl
    # deploy.
    create: true
  # Set up default resources for pipelines that don't include any requests or limits.  The values
  # are k8s resource quantities, so "1Gi", "2", etc.  Set to "0" to disable setting any defaults.
  defaultPipelineCPURequest: ""
  defaultPipelineMemoryRequest: ""
  defaultPipelineStorageRequest: ""
  defaultSidecarCPURequest: ""
  defaultSidecarMemoryRequest: ""
  defaultSidecarStorageRequest: ""

PachW HCVs

About #

PachW enables fine-grained control of where compaction and object-storage interaction occur by running storage tasks in a dedicated Kubernetes deployment. Users can configure PachW’s min and max replicas as well as define nodeSelectors, tolerations, and resource requests. Using PachW allows power users to save on costs by claiming fewer resources and running storage tasks on less expensive nodes.

⚠️

If you are upgrading to 2.5.0+ for the first time and you wish to use PachW, you must calculate how many maxReplicas you need. By default, PachW is set to maxReplicas:1 — however, that is not sufficient for production runs.

maxReplicas #

You should set the maxReplicas value to at least match the number of pipeline replicas that you have. For high performance, we suggest taking the following approach:

number of pipelines * highest parallelism spec * 1.5 = maxReplicas

Let’s say you have 6 pipelines. One of these pipelines has a parallelism spec value of 6, and the rest are 5 or fewer.

6 * 6 * 1.5 = 54

minReplicas #

Workloads that constantly process storage and compaction tasks because they are committing rapidly may want to increase minReplicas to have instances on standby.

nodeSelectors #

Workloads that utilize GPUs and other expensive resources may want to add a node selector to scope PachW instances to less expensive nodes.

Values #

Options:

pachw:
  inheritFromPachd: true # defaults below configuration options like 'resources' and 'tolerations' to  values from pachd
  maxReplicas: 1
  minReplicas: 0
  inSidecars: false
  #tolerations: []
  #affinity: {}
  #nodeSelector: {}

pachw:
  inheritFromPachd: true # defaults below configuration options like 'resources' and 'tolerations' to  values from pachd
  maxReplicas: 6 # set to match the number of pipeline replicas you have; sample formula: pipeline count * parallelism = target maxReplicas
  minReplicas: 1
  #tolerations: []
  #affinity: {}
  #nodeSelector: {}
  #resources: # sets kubernetes resource configuration for pachw pods. If not defined, config from pachd is reused. We recommend defining resources when running pachw with a high value of maxReplicas (when formula is: target maxReplicas * 1.5).
   #limits:
     #cpu: "1"
     #memory: "2G"
   #requests:
     #cpu: "1"
   #memory: "2G"

pachw:
  inheritFromPachd: false # defaults below configuration options like 'resources' and 'tolerations' to  values from pachd
  maxReplicas: 6 # set to match the number of pipeline replicas you have; sample formula: pipeline count * parallelism = target maxReplicas
  minReplicas: 1
  #tolerations: []
  #affinity: {}
  #nodeSelector: {}
  resources: # sets kubernetes resource configuration for pachw pods. If not defined, config from pachd is reused. We recommend defining resources when running pachw with a high value of maxReplicas (when formula is: target maxReplicas * 1.5).
   limits:
     cpu: "1"
     memory: "2G"
   requests:
     cpu: "1"
   memory: "2G"

pachw:
  inheritFromPachd: true # defaults below configuration options like 'resources' and 'tolerations' to  values from pachd
  inSidecars: true # processes storage related tasks in pipeline storage sidecars like version 2.4.2 or less.
  maxReplicas: 1
  #tolerations: []
  #affinity: {}
  #nodeSelector: {}

Kube Event Tail HCVs

About #

Kube Event Tail deploys a lightweight app that watches Kubernetes events and echoes them into logs.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:

kubeEventTail:
  enabled: true
  clusterScope: false # if true, watches just events in its namespace 
  image:
    repository: pachyderm/kube-event-tail
    pullPolicy: "IfNotPresent"
    tag: "v0.0.6"
  resources:
    limits:
      cpu: "1"
      memory: 100Mi
    requests:
      cpu: 100m
      memory: 45Mi

kubeEventTail:
  enabled: true
  clusterScope: true # if true, watches just events in its namespace 
  image:
    repository: pachyderm/kube-event-tail
    pullPolicy: "IfNotPresent"
    tag: "v0.0.6"
  resources:
    limits:
      cpu: "1"
      memory: 100Mi
    requests:
      cpu: 100m
      memory: 45Mi

kubeEventTail:
  enabled: false

PGBouncer HCVs

About #

The PGBouncer section configures a PGBouncer Postgres connection pooler.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.


pgbouncer:
  service:
    type: ClusterIP # defines the Kubernetes service type.
  annotations: {}
  priorityClassName: ""
  nodeSelector: {}
  tolerations: []
  image:
    repository: pachyderm/pgbouncer
    tag: 1.16.1-debian-10-r82
  resources: # defines resources in standard kubernetes format; unset by default.
    {}

    #limits:
    #  cpu: "1"
    #  memory: "2G"
    #requests:
    #  cpu: "1"
    #  memory: "2G"

  maxConnections: 10000 # defines the maximum number of concurrent connections into pgbouncer.
  defaultPoolSize: 80 # specifies the maximum number of concurrent connections from pgbouncer to the postgresql database.

PostgreSQL Subchart HCVs

About #

The PostgresQL section controls the Bitnami PostgreSQL subchart. Pachyderm runs on Kubernetes, is backed by an object store of your choice, and comes with a bundled version of PostgreSQL (metadata storage) by default.

We recommended disabling this bundled PostgreSQL and using a managed database instance (such as RDS, CloudSQL, or PostgreSQL Server) for production environments.

See storage class details for your provider:

AWS | Min: 500Gi (GP2) / 1,500 IOP
GCP | Min: 50Gi / 1,500 IOPS
Azure | Min: 256Gi / 1,100 IOPS

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:

postgresql:
  enabled: false # if false, you must specify a PostgreSQL database server connection @ global.postgresql

postgresql:
  enabled: true # if false, you must specify a PostgreSQL database server connection @ global.postgresql
  image:
    tag: "13.3.0"
  # DEPRECATED from pachyderm 2.1.5
  initdbScripts:
    dex.sh: |
      #!/bin/bash
      set -e
      psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
        CREATE DATABASE dex;
        GRANT ALL PRIVILEGES ON DATABASE dex TO "$POSTGRES_USER";
      EOSQL
  fullnameOverride: postgres
  persistence: # Specify the storage class for the postgresql Persistent Volume (PV)
    storageClass: "" #  specifies the size of the volume to use for postgresql
    size: 10Gi
    labels:
      suite: pachyderm
  primary:
    priorityClassName: ""
    nodeSelector: {}
    tolerations: []
  readReplicas:
    priorityClassName: ""
    nodeSelector: {}
    tolerations: []

CloudSQL Auth Proxy HCVs

About #

The CloudSQL Auth Proxy section configures the CloudSQL Auth Proxy for deploying Pachyderm on GCP with CloudSQL.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:


cloudsqlAuthProxy:
  connectionName: "" # may be found by running `gcloud sql instances describe INSTANCE_NAME --project PROJECT_ID`
  serviceAccount: ""   #  defines the account used to connect to the cloudSql instance
  iamLogin: false
  port: 5432   # the cloudql database port to expose. The default is `5432`
  enabled: true # controls whether to deploy the cloudsqlAuthProxy. Default is false.
  image:
    repository: "gcr.io/cloudsql-docker/gce-proxy" # the image repo to pull from; replicates --registry to pachctl
    pullPolicy: "IfNotPresent"
    tag: "1.23.0" # the image repo to pull from; replicates the --dash-image argument to pachctl deploy.
  priorityClassName: ""
  nodeSelector: {}
  tolerations: []
  podLabels: {}  # specifies labels to add to the dash pod.
  resources: {} # specifies the resource request and limits.

  #  requests:
  #    # The proxy's memory use scales linearly with the number of active
  #    # connections. Fewer open connections will use less memory. Adjust
  #    # this value based on your application's requirements.
  #    memory: ""
  #    # The proxy's CPU use scales linearly with the amount of IO between
  #    # the database and the application. Adjust this value based on your
  #    # application's requirements.
  #    cpu: ""

  service:
    labels: {} #  specifies labels to add to the cloudsql auth proxy service.
    type: ClusterIP # specifies the Kubernetes type of the cloudsql auth proxy service. The default is `ClusterIP`.


cloudsqlAuthProxy:
  enabled: false

OpenID Connect HCVs

About #

The OIDC section of the helm chart enables you to set up authentication through upstream IDPs. To use authentication, you must have an Enterprise license.

We recommend setting up this section alongside the Enterprise Server section of your Helm chart so that you can easily scale multiple clusters using the same authentication configurations.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

Options:

oidc:
  issuerURI: "" # inferred if running locally or using proxy
  requireVerifiedEmail: false # if true, email verification is required to authenticate
  IDTokenExpiry: 24h # if set, specifies the duration where OIDC ID Tokens are valid; parsed into golang's time.Duration: https://pkg.go.dev/time#example-ParseDuration
  RotationTokenExpiry: 48h # If set, enables OIDC rotation tokens, and specifies the duration where they are valid.
  userAccessibleOauthIssuerHost: "" # (Optional) Only set in cases where the issuerURI is not user accessible (ie. localhost install)
  mockIDP: true # if true, ignores upstreamIDPs in favor of a placeholder IDP with the username/password of "admin"/"password"

oidc:
  issuerURI: "" # inferred if running locally or using proxy
  requireVerifiedEmail: false # if true, email verification is required to authenticate
  IDTokenExpiry: 24h # if set, specifies the duration where OIDC ID Tokens are valid; parsed into golang's time.Duration: https://pkg.go.dev/time#example-ParseDuration
  RotationTokenExpiry: 48h # If set, enables OIDC rotation tokens, and specifies the duration where they are valid.
  userAccessibleOauthIssuerHost: "" # (Optional) Only set in cases where the issuerURI is not user accessible (ie. localhost install)
  upstreamIDPs: # defines a list of Identity Providers to use for authentication.  https://dexidp.io/docs/connectors/
    - id: idpConnector
      config:
        issuer: ""
        clientID: ""
        clientSecret: ""
        redirectURI: "http://localhost:30658/callback"
        insecureEnableGroups: true
        insecureSkipEmailVerified: true
        insecureSkipIssuerCallbackDomainCheck: true
        forwardedLoginParams:
        - login_hint
      name: idpConnector
      type: oidc
  
    - id: okta
      config:
        issuer: "https://dev-84362674.okta.com"
        clientID: "client_id"
        clientSecret: "notsecret"
        redirectURI: "http://localhost:30658/callback"
        insecureEnableGroups: true
        insecureSkipEmailVerified: true
        insecureSkipIssuerCallbackDomainCheck: true
        forwardedLoginParams:
        - login_hint
      name: okta
      type: oidc
  upstreamIDPsSecretName: "" # passes the upstreamIDPs value via an existing k8s secret (key: `upstream-idps`)
  dexCredentialSecretName: "" # mounts a credential file to the pachd pod at /dexcreds/ (e.g., serviceAccountFilePath: /dexcreds/googleAuth.json); required for some dex configs like Google.
  mockIDP: false # if true, ignores upstreamIDPs in favor of a placeholder IDP with the username/password of "admin"/"password"

oidc:
  issuerURI: "" # inferred if running locally or using proxy
  requireVerifiedEmail: false # if true, email verification is required to authenticate
  IDTokenExpiry: 24h # if set, specifies the duration where OIDC ID Tokens are valid; parsed into golang's time.Duration: https://pkg.go.dev/time#example-ParseDuration
  RotationTokenExpiry: 48h # If set, enables OIDC rotation tokens, and specifies the duration where they are valid.
  userAccessibleOauthIssuerHost: "" # (Optional) Only set in cases where the issuerURI is not user accessible (ie. localhost install)
  upstreamIDPs: # defines a list of Identity Providers to use for authentication.  https://dexidp.io/docs/connectors/
    - id: idpConnector
      config:
        issuer: ""
        clientID: ""
        clientSecret: ""
        redirectURI: "http://localhost:30658/callback"
        insecureEnableGroups: true
        insecureSkipEmailVerified: true
        insecureSkipIssuerCallbackDomainCheck: true
        forwardedLoginParams:
        - login_hint
      name: idpConnector
      type: oidc
  
    - id: okta
      config:
        issuer: "https://dev-84362674.okta.com"
        clientID: "client_id"
        clientSecret: "notsecret"
        redirectURI: "http://localhost:30658/callback"
        insecureEnableGroups: true
        insecureSkipEmailVerified: true
        insecureSkipIssuerCallbackDomainCheck: true
        forwardedLoginParams:
        - login_hint
      name: okta
      type: oidc
  upstreamIDPsSecretName: "" # passes the upstreamIDPs value via an existing k8s secret (key: `upstream-idps`)
  dexCredentialSecretName: "" # mounts a credential file to the pachd pod at /dexcreds/ (e.g., serviceAccountFilePath: /dexcreds/googleAuth.json); required for some dex configs like Google.
  mockIDP: false # if true, ignores upstreamIDPs in favor of a placeholder IDP with the username/password of "admin"/"password"
  additionalOIDCClient:
    - id: example-app
      secret: example-app-secret
      name: 'Example App'
      redirectURIs:
      - 'http://127.0.0.1:5555/callback'
  additionalClientsSecretName: ""

Test Connection HCVs

About #

The Test Connection section is used by Pachyderm to test the connection during installation. This config is used by organizations that do not have permission to pull Docker images directly from the Internet, and instead need to mirror locally.

Values #

The following section contains a series of tabs for commonly used configurations for this section of your values.yml Helm chart.

testConnection:
  image:
    repository: alpine
    tag: latest

Proxy HCVs

About #

Proxy is a service that handles all Pachyderm traffic (S3, Console, OIDC, Dex, GRPC) on a single port; It’s great for exposing you cluster directly to the Internet.

Values #


# The proxy is a service to handle all Pachyderm traffic (S3, Console, OIDC, Dex, GRPC) on a single
# port; good for exposing directly to the Internet.
proxy:
  # If enabled, create a proxy deployment (based on the Envoy proxy) and a service to expose it.  If
  # ingress is also enabled, any Ingress traffic will be routed through the proxy before being sent
  # to pachd or Console.
  enabled: true
  # The external hostname (including port if nonstandard) that the proxy will be reachable at.
  # If you have ingress enabled and an ingress hostname defined, the proxy will use that.
  # Ingress will be deprecated in the future so configuring the proxy host instead is recommended.
  host: ""
  # The number of proxy replicas to run.  1 should be fine, but if you want more for higher
  # availability, that's perfectly reasonable.  Each replica can handle 50,000 concurrent
  # connections.  There is an affinity rule to prefer scheduling the proxy pods on the same node as
  # pachd, so a number here that matches the number of pachd replicas is a fine configuration.
  # (Note that we don't guarantee to keep the proxy<->pachd traffic on-node or even in-region.)
  replicas: 1
  # The envoy image to pull.
  image:
    repository: "envoyproxy/envoy-distroless"
    tag: "v1.27.1"
    pullPolicy: "IfNotPresent"
  # Set up resources.  The proxy is configured to shed traffic before using 500MB of RAM, so that's
  # a resonable memory limit.  It doesn't need much CPU.
  resources:
    requests:
      cpu: 100m
      memory: 512Mi
    limits:
      memory: 512Mi
  # Any additional labels to add to the pods.  These are also added to the deployment and service
  # selectors.
  labels: {}
  # Any additional annotations to add to the pods.
  annotations: {}
  # A nodeSelector statement for each pod in the proxy Deployment, if desired.
  nodeSelector: {}
  # A tolerations statement for each pod in the proxy Deployment, if desired.
  tolerations: []
  # A priority class name for each pod in the proxy Deployment, if desired.
  priorityClassName: ""
  # Configure the service that routes traffic to the proxy.
  service:
    # The type of service can be ClusterIP, NodePort, or LoadBalancer.
    type: ClusterIP
    # If the service is a LoadBalancer, you can specify the IP address to use.
    loadBalancerIP: ""
    # The port to serve plain HTTP traffic on.
    httpPort: 80
    # The port to serve HTTPS traffic on, if enabled below.
    httpsPort: 443
    # If the service is a NodePort, you can specify the port to receive HTTP traffic on.
    httpNodePort: 30080
    httpsNodePort: 30443
    # Any additional annotations to add.
    annotations: {}
    # Any additional labels to add to the service itself (not the selector!).
    labels: {}
    # The proxy can also serve each backend service on a numbered port, and will do so for any port
    # not numbered 0 here.  If this service is of type NodePort, the port numbers here will be used
    # for the node port, and will need to be in the node port range.
    legacyPorts:
      console: 0 # legacy 30080, conflicts with default httpNodePort
      grpc: 0 # legacy 30650
      s3Gateway: 0 # legacy 30600
      oidc: 0 # legacy 30657
      identity: 0 # legacy 30658
      metrics: 0 # legacy 30656
    # externalTrafficPolicy determines cluster-wide routing policy; see "kubectl explain
    # service.spec.externalTrafficPolicy".
    externalTrafficPolicy: ""
  # Configuration for TLS (SSL, HTTPS).
  tls:
    # If true, enable TLS serving.  Enabling TLS is incompatible with support for legacy ports (you
    # can't get a generally-trusted certificate for port numbers), and disables support for
    # cleartext communication (cleartext requests will redirect to the secure server, and HSTS
    # headers are set to prevent downgrade attacks).
    #
    # Note that if you are planning on putting the proxy behind an ingress controller, you probably
    # want to configure TLS for the ingress controller, not the proxy.  This is intended for the
    # case where the proxy is exposed directly to the Internet.  (It is possible to have your
    # ingress controller talk to the proxy over TLS, in which case, it's fine to enable TLS here in
    # addition to in the ingress section above.)
    enabled: false
    # The secret containing "tls.key" and "tls.crt" keys that contain PEM-encoded private key and
    # certificate material.  Generate one with "kubectl create secret tls <name> --key=tls.key
    # --cert=tls.cert".  This format is compatible with the secrets produced by cert-manager, and
    # the proxy will pick up new data when cert-manager rotates the certificate.
    secretName: ""
    # If set, generate the secret from values here.  This is intended only for unit tests.
    secret: {}
preflightCheckJob:
  # If true, install a Kubernetes job that runs preflight checks from the configured Pachyderm
  # release.
  enabled: false

  # The version to preflight.  It is totally fine if this is newer than the currently-running pachd
  # version.
  image:
    repository: "pachyderm/pachd"
    pullPolicy: "IfNotPresent"
    tag: ""

  # misc k8s settings
  affinity: {}
  annotations: {}
  resources:
    {}
    #limits:
    #  cpu: "1"
    #  memory: "2G"
    #requests:
    #  cpu: "1"
    #  memory: "2G"
  priorityClassName: ""
  podLabels: {}
  nodeSelector: {}
  tolerations: []

  # logging settings
  sqlQueryLogs: false
  disableLogSampling: false