Before You Start #
This guide assumes that you have already tried Pachyderm locally and have all of the following installed:
1. Create an EKS Cluster #
- Use the eksctl tool to deploy an EKS Cluster:
eksctl create cluster --name pachyderm-cluster --region <region> -profile <your named profile>
- Verify deployment:
kubectl get all
2. Create an S3 Bucket #
- Run the following command:
aws s3api create-bucket --bucket ${BUCKET_NAME} --region ${AWS_REGION}
- Verify.
aws s3 ls
3. Enable Persistent Volumes Creation #
- Create an IAM OIDC provider for your cluster.
- Install the Amazon EBS Container Storage Interface (CSI) driver on your cluster.
- Create a gp3 storage class manifest file (e.g.,
gp3-storageclass.yaml
)kind: StorageClass apiVersion: storage.k8s.io/v1 metadata: name: gp3 annotations: storageclass.kubernetes.io/is-default-class: "true" provisioner: kubernetes.io/aws-ebs parameters: type: gp3 fsType: ext4
- Set gp3 to your default storage class.
kubectl apply -f gp3-storageclass.yaml
- Verify that it has been set as your default.
kubectl get storageclass
4. Set up an RDS PostgreSQL Instance #
By default, Pachyderm runs with a bundled version of PostgreSQL. For production environments, it is strongly recommended that you disable the bundled version and use an RDS PostgreSQL instance.
Aurora Serverless PostgreSQL is not supported.
- In the RDS console, create a database in the region matching your Pachyderm cluster.
- Choose the PostgreSQL engine.
- Select a PostgreSQL version >= 13.3.
- Configure your DB instance as follows:
SETTING | Recommended value |
---|---|
DB instance identifier | Fill in with a unique name across all of your DB instances in the current region. |
Master username | Choose your Admin username. |
Master password | Choose your Admin password. |
DB instance class | The standard default should work. You can change the instance type later on to optimize your performances and costs. |
Storage type and Allocated storage | If you select io1, keep the 100 GiB default size. Read more information on Storage for RDS on Amazon’s website. |
Storage autoscaling | If your workload is cyclical or unpredictable, enable storage autoscaling to allow RDS to scale up your storage when needed. |
Standby instance | We highly recommend creating a standby instance for production environments. |
VPC | Select the VPC of your Kubernetes cluster. Attention: After a database is created, you can’t change its VPC. Read more on VPCs and RDS on Amazon documentation. |
Subnet group | Pick a Subnet group or Create a new one. Read more about DB Subnet Groups on Amazon documentation. |
Public access | Set the Public access to No for production environments. |
VPC security group | Create a new VPC security group and open the postgreSQL port or use an existing one. |
Password authentication or Password and IAM database authentication | Choose one or the other. |
Database name | In the Database options section, enter Pachyderm’s Database name (We are using Pachyderm in this example.) and click Create database to create your PostgreSQL service. Your instance is running.Warning: If you do not specify a database name, Amazon RDS does not create a database. |
Standalone Clusters
If you are deploying a standalone cluster, you must create a second database named dex
in your RDS instance for Pachyderm’s authentication service. Read more about dex on PostgreSQL in Dex’s documentation.
Multi-cluster setups use Enterprise Server to handle authentication, so you do not need to create a dex
database.
- Create a new user account and grant it full CRUD permissions to both
Pachyderm
and (when applicable)dex
databases. Read about managing PostgreSQL users and roles in this blog. Pachyderm will use the same username to connect toPachyderm
as well as todex
.
5. Create a Values.yaml #
global:
postgresql:
postgresqlAuthType: "scram-sha-256" # use "md5" if using postgresql < 14
postgresqlUsername: "username"
postgresqlPassword: "password"
# The name of the database should be Pachyderm's ("pachyderm" in the example above), not "dex"
# See also
# postgresqlExistingSecretName: "<yoursecretname>"
postgresqlDatabase: "databasename"
# The postgresql database host to connect to. Defaults to postgres service in subchart
postgresqlHost: "RDS CNAME"
# The postgresql database port to connect to. Defaults to postgres server in subchart
postgresqlPort: "5432"
postgresql:
# turns off the install of the bundled postgres.
# If not using the built in Postgres, you must specify a Postgresql
# database server to connect to in global.postgresql
enabled: false
deployTarget: "AMAZON"
proxy:
enabled: true
service:
type: LoadBalancer
pachd:
storage:
amazon:
bucket: "bucket_name"
# this is an example access key ID taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html (AWS Credentials)
id: "AKIAIOSFODNN7EXAMPLE"
# this is an example secret access key taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html (AWS Credentials)
secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
region: "us-east-2"
externalService:
enabled: true
console:
enabled: true
global:
postgresql:
postgresqlAuthType: "scram-sha-256" # use "md5" if using postgresql < 14
postgresqlUsername: "username"
postgresqlPassword: "password"
# The name of the database should be Pachyderm's ("pachyderm" in the example above), not "dex"
# See also
# postgresqlExistingSecretName: "<yoursecretname>"
postgresqlDatabase: "databasename"
# The postgresql database host to connect to. Defaults to postgres service in subchart
postgresqlHost: "RDS CNAME"
# The postgresql database port to connect to. Defaults to postgres server in subchart
postgresqlPort: "5432"
postgresql:
# turns off the install of the bundled postgres.
# If not using the built in Postgres, you must specify a Postgresql
# database server to connect to in global.postgresql
enabled: false
deployTarget: "AMAZON"
proxy:
enabled: true
service:
type: LoadBalancer
pachd:
storage:
amazon:
bucket: "bucket_name"
# this is an example access key ID taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html (AWS Credentials)
id: "AKIAIOSFODNN7EXAMPLE"
# this is an example secret access key taken from https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html (AWS Credentials)
secret: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
region: "us-east-2"
# Enterprise key
enterpriseLicenseKey: "YOUR_ENTERPRISE_TOKEN"
console:
enabled: true
6. Configure Helm #
Run the following to add the Pachyderm repo to Helm:
helm repo add pachyderm https://helm.pachyderm.com
helm repo update
helm install pachyderm pachyderm/pachyderm -f my_pachyderm_values.yaml
7. Verify Installation #
- In a new terminal, run the following command to check the status of your pods:
kubectl get pods
NAME READY STATUS RESTARTS AGE
pod/console-5b67678df6-s4d8c 1/1 Running 0 2m8s
pod/etcd-0 1/1 Running 0 2m8s
pod/pachd-c5848b5c7-zwb8p 1/1 Running 0 2m8s
pod/pg-bouncer-7b855cb797-jqqpx 1/1 Running 0 2m8s
pod/postgres-0 1/1 Running 0 2m8s
- Re-run this command after a few minutes if
pachd
is not ready.
8. Connect to Cluster #
You’ll need your organization’s cluster URL (proxy.host) value to connect.
- Run the following command to get your cluster URL:
kubectl get services | grep pachyderm-proxy | awk '{print $4}'
Connect to your cluster:
Method:pachctl connect http://pachyderm.<your-proxy.host-value>
pachctl connect https://pachyderm.<your-proxy.host-value>
If the connection commands did not work together, run each separately.
Optionally open your browser and navigate to the Console UI.
You can check your Pachyderm version and connection to pachd
at any time with the following command:
pachctl version
COMPONENT VERSION
pachctl 2.8.0
pachd 2.8.0