Key Features and Benefits #
The following are the key features of Pachyderm that make it a powerful data processing platform.
Data-driven Pipelines #
- Automatically trigger pipelines based on changes in the data.
- Orchestrate batch or real-time data pipelines.
- Only process dependent changes in the data.
- Reproducibility and data lineage across all pipelines.
Version Control #
- Track every change to your data automatically.
- Works with any file type.
- Supports collaboration through a git-like structure of commits.
Autoscaling and Deduplication #
- Autoscale jobs based on resource demand.
- Automatically parallelize large data sets.
- Automatically deduplicate data across repositories.
Flexibility and Infrastructure Agnosticism #
- Use existing cloud or on-premises infrastructure.
- Process any data type, size, or scale in batch or real-time pipelines.
- Container-native architecture allows for developer autonomy.
- Integrates with existing tools and services, including CI/CD, logging, authentication, and data APIs.