oss-data-tools-landscape

Platform Management Tools

Platform management tools are essential for maintaining, optimizing, and governing data platforms. They encompass a wide range of functionalities including data quality assurance, data governance, workflow management, automation, and even environmental impact management (Green IT). These tools help organizations ensure the reliability, compliance, and efficiency of their data operations.

They can be broadly categorized into six main areas:

Available Tools

Here is a summary table of the main platform management tools we have identified, sorted by subcategory and alphabetical order.

Data Quality & Testing

Tool Subcategory Creation Date Stars Forks Contributors Last Release Latest Commit Meets Criteria* Link
Deequ Data Quality & Testing 07/08/2018 3508 567 72 19/08/2025 27/08/2025 Yes https://github.com/awslabs/deequ
Elementary Data Quality & Testing 30/08/2021 2154 197 74 31/08/2025 17/09/2025 Yes https://github.com/elementary-data/elementary
Great Expectations Data Quality & Testing 11/09/2017 10764 1625 402 16/09/2025 16/09/2025 Yes https://github.com/great-expectations/great_expectations
Soda Core Data Quality & Testing 14/12/2020 2173 242 49 12/06/2025 08/08/2025 Yes https://github.com/sodadata/soda-core

Governance

Tool Subcategory Creation Date Stars Forks Contributors Last Release Latest Commit Meets Criteria* Link
Amundsen Governance 14/05/2019 4650 971 213 14/08/2024 02/04/2025 Yes https://github.com/amundsen-io/amundsen
Apache Atlas Governance 22/07/2017 2001 895 138 N/A 14/09/2025 Yes https://github.com/apache/atlas
Datahub Governance 18/11/2015 11040 3212 411 22/08/2025 17/09/2025 Yes https://github.com/datahub-project/datahub
Magda Governance 23/08/2016 562 98 30 11/09/2025 11/09/2025 Yes https://github.com/magda-io/magda
Marquez Governance 05/07/2018 2016 367 101 24/10/2024 27/03/2025 Yes https://github.com/MarquezProject/marquez
Open Metadata Governance 01/08/2021 7541 1427 353 17/09/2025 17/09/2025 Yes https://github.com/open-metadata/OpenMetadata
Spline Governance 30/05/2017 640 159 24 14/07/2025 17/09/2025 Yes https://github.com/AbsaOSS/spline

Automation

Tool Subcategory Creation Date Stars Forks Contributors Last Release Latest Commit Meets Criteria* Link
n8n Automation 22/06/2019 138629 43815 420 17/09/2025 17/09/2025 Yes https://github.com/n8n-io/n8n
Zapier Platform Automation 06/06/2019 436 211 70 N/A 17/09/2025 Yes https://github.com/zapier/zapier-platform

Green IT

Tool Subcategory Creation Date Stars Forks Contributors Last Release Latest Commit Meets Criteria* Link
Cloud Carbon Footprint Green IT 17/11/2020 979 307 97 11/05/2024 07/07/2024 Yes https://github.com/cloud-carbon-footprint/cloud-carbon-footprint
Code Carbon Green IT 12/05/2020 1551 222 83 15/07/2025 15/09/2025 Yes https://github.com/mlco2/codecarbon
Green Analysis Tools Green IT 27/12/2018 156 36 9 27/08/2022 03/02/2025 No https://github.com/cnumr/GreenIT-Analysis
SCI Green IT 30/06/2021 284 56 15 18/04/2024 07/08/2025 No https://github.com/Green-Software-Foundation/sci

Workflow manager

Tool Subcategory Creation Date Stars Forks Contributors Last Release Latest Commit Meets Criteria* Link
Airflow Workflow manager 13/04/2015 42421 15610 416 29/08/2025 17/09/2025 Yes https://github.com/apache/airflow
Apache Dolphinscheduler Workflow manager 01/03/2019 13831 4889 359 25/08/2025 17/09/2025 Yes https://github.com/apache/dolphinscheduler
Argo Workflows Workflow manager 21/08/2017 16024 3376 421 11/09/2025 17/09/2025 Yes https://github.com/argoproj/argo-workflows
Dagster Workflow manager 30/04/2018 14014 1816 411 11/09/2025 17/09/2025 Yes https://github.com/dagster-io/dagster
Kestra Workflow manager 24/08/2019 21177 1829 183 16/09/2025 17/09/2025 Yes https://github.com/kestra-io/kestra
Luigi Workflow manager 20/09/2012 18485 2431 343 06/12/2024 16/05/2025 Yes https://github.com/spotify/luigi
Mage.ai Workflow manager 16/05/2022 8466 869 143 03/09/2025 11/09/2025 Yes https://github.com/mage-ai/mage-ai
Prefect Workflow manager 29/06/2018 20379 1942 353 17/09/2025 17/09/2025 Yes https://github.com/PrefectHQ/prefect

Compliance & Security

Tool Subcategory Creation Date Stars Forks Contributors Last Release Latest Commit Meets Criteria* Link
ARX Compliance & Security 13/04/2015 37015 14270 417 05/11/2024 07/11/2024 No https://github.com/arx-deidentifier/arx
Amnesia Compliance & Security 13/04/2015 37015 14270 417 05/11/2024 07/11/2024 No https://github.com/dTsitsigkos/Amnesia

*Criteria: >40 contributors, >500 stars, and recent releases/commit

Tool Descriptions by Subcategory

Data Quality

  1. Deequ: A library built on top of Apache Spark for defining “unit tests for data”, which helps ensuring data quality and integrity at scale. It enables defining data quality constraints and computing data quality metrics on large datasets.
  2. Elementary: An open-source data observability tool that monitors data quality and pipeline operations. It integrates with dbt and provides automated data quality testing, monitoring, and alerting capabilities.
  3. Great Expectations: A Python-based open-source library for validating, documenting, and profiling data. It helps teams maintain data quality and improve communication about data between teams.
  4. Soda Core: An open-source framework for data quality testing and monitoring. It allows users to create data quality checks using a simple YAML syntax and supports multiple data sources.

Governance

  1. Amundsen: A data discovery and metadata engine for improving the productivity of data analysts, data scientists, and engineers. It provides a searchable catalog of data resources across your organization.
  2. Apache Atlas: A scalable and extensible set of core foundational governance services. It enables enterprises to effectively and efficiently meet their compliance requirements and improve their data governance.
  3. Datahub: A modern data catalog built to enable end-to-end data discovery, data observability, and data governance. It provides a unified view of all data assets across an organization.
  4. Magda: A modern data catalog system that helps organizations manage, discover, and share their data assets. It focuses on federated data discovery and provides rich metadata management capabilities.
  5. Marquez: An open-source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata. It provides lineage tracking and version control for data assets.
  6. Open Metadata: A unified platform for metadata management, data discovery, and data governance. It offers comprehensive features for data quality, lineage, and collaboration.
  7. Spline: A data lineage tracking tool that automatically captures and visualizes data lineage from Apache Spark applications. It provides detailed insights into data transformations and dependencies.

Automation

  1. n8n: An extendable workflow automation tool with a focus on ease of use, flexibility, and the ability to run self-hosted. It offers a no-code interface for creating complex workflows, supports over 200 integrations, and allows users to create custom nodes and functionalities.
  2. Zapier Platform: An open-source platform for building integrations and automations between web applications. It provides a CLI and SDK for developers to create custom “Zaps” (automated workflows) and integrate new apps into the Zapier ecosystem.

Green IT

  1. Cloud Carbon Footprint: An application that estimates the energy use and carbon emissions of cloud computing workloads across different cloud providers.
  2. Code Carbon: A Python package that estimates the carbon emissions produced by computing resources during code execution.
  3. Green Analysis Tools: A set of tools for analyzing the environmental impact of software applications and websites.
  4. SCI: A specification that describes how to calculate a carbon intensity score for software applications, helping organizations measure and reduce their software’s environmental impact.

Workflow Manager

  1. Airflow: A platform to programmatically author, schedule, and monitor workflows. It allows you to define complex pipelines as directed acyclic graphs (DAGs).
  2. Apache Dolphinscheduler: A distributed and extensible workflow scheduler platform with a focus on cloud native architecture. It provides powerful scheduling capabilities and visual workflow management.
  3. Argo Workflows: A container-native workflow engine for orchestrating parallel jobs on Kubernetes. It’s designed for scenarios requiring complex job orchestration.
  4. Dagster: A data orchestrator for machine learning, analytics, and ETL. It provides a unified view of data pipelines and the assets they produce.
  5. Kestra: A modern orchestration and scheduling platform with powerful features for building and monitoring data pipelines.
  6. Luigi: A Python package that helps you build complex pipelines of batch jobs. It handles dependency resolution and workflow management.
  7. Mage.ai: A modern data pipeline tool for transforming and integrating data. It offers a user-friendly interface for building data pipelines.
  8. Prefect: A workflow management system designed for modern data stacks. It provides a flexible and intuitive API for defining and executing workflows.

Compliance & Security

  1. ARX: A comprehensive open-source software for anonymizing sensitive personal data. It implements a wide variety of privacy-preserving techniques and risk metrics for statistical disclosure control.
  2. Amnesia: A data anonymization tool that focuses on the protection of personal data through various anonymization techniques. It provides both a graphical user interface and API access for data anonymization tasks.

Making the Right Choice

When choosing platform management tools, consider these key factors:

For ensuring data quality, tools like Great Expectations or Deequ provide robust solutions. For data governance, platforms like Datahub or Open Metadata offer comprehensive capabilities. Complex workflow management might require tools like Airflow or Dagster. For business process automation, both n8n and Zapier Platform provide flexible solutions. Organizations focusing on environmental impact can leverage tools like Code Carbon or Cloud Carbon Footprint.

Remember that effective platform management often requires a combination of tools to address different aspects of data operations. The key is choosing tools that integrate well with your existing data stack and align with your organization’s strategy and goals.

The open-source community has developed numerous solutions for various aspects of data handling, including: