Data Governance and Data Pipeline - Data Engineering Digest

Practical First Steps In Data Governance For Long Term Success

Data Engineering Podcast

JUNE 2, 2024

Summary Modern businesses aspire to be data driven, and technologists enjoy working through the challenge of building data systems to support that goal. Data governance is the binding force between these two parts of the organization. At what point does a lack of an explicit governance policy become a liability?

Data Governance

Data Governance Government Data Lake High Quality Data

Low Friction Data Governance With Immuta

Data Engineering Podcast

DECEMBER 21, 2020

Summary Data governance is a term that encompasses a wide range of responsibilities, both technical and process oriented. One of the more complex aspects is that of access control to the data assets that an organization is responsible for managing. What is data governance? How is the Immuta platform architected?

Data Governance

Data Governance Government Data Lake Banking

A Holistic Approach To Data Governance Through Self Reflection At Collibra

Data Engineering Podcast

MAY 20, 2021

Summary Data governance is a phrase that means many different things to many different people. This is because it is actually a concept that encompasses the entire lifecycle of data, across all of the people in an organization who interact with it. RudderStack’s smart customer data pipeline is warehouse-first.

Data Governance

Data Governance Government Data Warehouse Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Beyond Legacy Detection: How AI-Driven Data Governance Surpasses Traditional Methods

Striim

MARCH 4, 2025

These incidents serve as a stark reminder that legacy data governance systems, built for a bygone era, are struggling to fend off modern cyber threats. They react too slowly, too rigidly, and cant keep pace with the dynamic, sophisticated attacks occurring today, leaving hackable data exposed.

Data Governance

Data Governance Government Healthcare NoSQL

Simplified End-to-End Development for Production-Ready Data Pipelines, Applications, and ML Models

Snowflake

JUNE 4, 2024

Snowflake’s new Python API (GA soon) simplifies data pipelines and is readily available through pip install snowflake. Additionally, Dynamic Tables are a new table type that you can use at every stage of your processing pipeline. Interact with Snowflake objects directly in Python. Automate or code, the choice is yours.

Data Pipeline

Data Pipeline Python SQL Database

A Guide to Data Pipelines (And How to Design One From Scratch)

Striim

SEPTEMBER 11, 2024

Data pipelines are the backbone of your business’s data architecture. Implementing a robust and scalable pipeline ensures you can effectively manage, analyze, and organize your growing data. We’ll answer the question, “What are data pipelines?” Table of Contents What are Data Pipelines?

Data Pipeline

Data Pipeline Designing Data Lake Data Warehouse

Data governance beyond SDX: Adding third party assets to Apache Atlas

Cloudera

MARCH 9, 2021

In this blog, we’ll highlight the key CDP aspects that provide data governance and lineage and show how they can be extended to incorporate metadata for non-CDP systems from across the enterprise. Sketch of the end-to-end data pipeline. Apache Atlas as a fundamental part of SDX. Assets: Files. RDBMS Database Table.

Data Governance

Data Governance Government Metadata Datasets

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

AI data engineers are data engineers that are responsible for developing and managing data pipelines that support AI and GenAI data products. Essential Skills for AI Data Engineers Expertise in Data Pipelines and ETL Processes A foundational skill for data engineers?

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines.

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Data Engineering Podcast

JUNE 30, 2024

How does the focus on data assets/data products shift your approach to observability as compared to a table/pipeline centric approach? With the focus on sharing ownership beyond the boundaries on the data team there is a strong correlation with data governance principles. Want to see Starburst in action?

Pipeline-centric

Pipeline-centric Engineering Data Lake High Quality Data

How To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

As we look towards 2025, it’s clear that data teams must evolve to meet the demands of evolving technology and opportunities. In this blog post, we’ll explore key strategies that data teams should adopt to prepare for the year ahead. The anticipated growth in data pipelines presents both challenges and opportunities.

Data Pipeline

Data Pipeline Metadata Data Workflow Data

An IBM Z Data Integration Success Story

Precisely

MARCH 28, 2025

The data generated was as varied as the departments relying on these applications. Some departments used IBM Db2, while others relied on VSAM files or IMS databases creating complex data governance processes and costly data pipeline maintenance.

Data Integration

Data Integration Pipeline-centric Database-centric Kafka

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines.

Data Lake

Data Lake High Quality Data Machine Learning Data Pipeline

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

[Starburst Logo]([link] This episode is brought to you by Starburst - a data lake analytics platform for data engineers who are battling to build and scale high quality data pipelines on the data lake.

Kafka

Kafka Data Lake High Quality Data SQL

Gain an AI Advantage with Data Governance and Quality

Precisely

AUGUST 29, 2024

Key Takeaways Data quality ensures your data is accurate, complete, reliable, and up to date – powering AI conclusions that reduce costs and increase revenue and compliance. Data observability continuously monitors data pipelines and alerts you to errors and anomalies. stored: where is it located?

Data Governance

Data Governance Government High Quality Data Datasets

Data Governance: Framework, Tools, Principles, Benefits

Knowledge Hut

APRIL 20, 2023

Data governance refers to the set of policies, procedures, mix of people and standards that organisations put in place to manage their data assets. It involves establishing a framework for data management that ensures data quality, privacy, security, and compliance with regulatory requirements.

Data Governance

Data Governance Government Data Cleanse Data Security

The Challenge of Data Quality and Availability—And Why It’s Holding Back AI and Analytics

Striim

APRIL 18, 2025

Business Intelligence Needs Fresh Insights: Data-driven organizations make strategic decisions based on dashboards, reports, and real-time analytics. If data is delayed, outdated, or missing key details, leaders may act on the wrong assumptions. Poor data management can lead to compliance risks, legal issues, and reputational damage.

High Quality Data

High Quality Data Business Intelligence Unstructured Data Data Pipeline

The last (but not least)”ops” you need for your data : DataGovops

François Nguyen

JANUARY 18, 2021

To finish the trilogy (Dataops, MLops), let’s talk about DataGovOps or how you can support your Data Governance initiative. Last part, it was added the data security and privacy part. Every data governance policy about this topic must be read by a code to act in your data platform (access management, masking, etc.)

Data Governance

Data Governance Metadata Government Data Pipeline

Stitching Together Enterprise Analytics With Microsoft Fabric

Data Engineering Podcast

JUNE 23, 2024

[Starburst Logo]([link] This episode is brought to you by Starburst - an end-to-end data lakehouse platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Want to see Starburst in action? Want to see Starburst in action?

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

6 Ways To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

As we look towards 2025, it’s clear that data teams must evolve to meet the demands of evolving technology and opportunities. In this blog post, we’ll explore key strategies that data teams should adopt to prepare for the year ahead. The anticipated growth in data pipelines presents both challenges and opportunities.

Data Pipeline

Data Pipeline Metadata Data Workflow Data

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

Dagster offers a new approach to building and running data platforms and data pipelines. Starburst Logo]([link] This episode is brought to you by Starburst - a data lake analytics platform for data engineers who are battling to build and scale high quality data pipelines on the data lake.

SQL

SQL Data Lake High Quality Data Machine Learning

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. How do we build data products ? How can we interoperate between the data domains ? We want interoperability for any data stored versus we have to think how to store the data in a specific node to optimize the processing.

Technology

Technology Architecture Google Cloud Metadata

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

[Starburst Logo]([link] This episode is brought to you by Starburst - an end-to-end data lakehouse platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Want to see Starburst in action? Want to see Starburst in action?

Data Lake

Data Lake High Quality Data Metadata Machine Learning

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines.

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines.

Building

Building Data Lake High Quality Data Machine Learning

Data Engineering Weekly #198

Data Engineering Weekly

NOVEMBER 24, 2024

The Recommendation Platform (RecP) leverages a structured pipeline approach to standardize the resolution of machine learning challenges, allowing for component reusability across various use cases and enabling customers to define complex recommendation logic.

Data Engineering

Data Engineering Data Engineer Engineering Insurance

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Data Science Blog: Data Engineering

MAY 20, 2024

Continuous Integration and Continuous Delivery (CI/CD) for Data Pipelines: It is a Game-Changer with AnalyticsCreator! The need for efficient and reliable data pipelines is paramount in data science and data engineering. They transform data into a consistent format for users to consume.

Data Pipeline

Data Pipeline BI Data Lake Data Warehouse

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Dagster offers a new approach to building and running data platforms and data pipelines. Starburst Logo]([link] This episode is brought to you by Starburst - a data lake analytics platform for data engineers who are battling to build and scale high quality data pipelines on the data lake.

Data Lake

Data Lake High Quality Data BI Data Workflow

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines.

Building

Building Data Lake High Quality Data Machine Learning

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

FEBRUARY 11, 2024

Dagster offers a new approach to building and running data platforms and data pipelines. Starburst Logo]([link] This episode is brought to you by Starburst - a data lake analytics platform for data engineers who are battling to build and scale high quality data pipelines on the data lake.

Data Lake

Data Lake High Quality Data Government Machine Learning

Release Management For Data Platform Services And Logic

Data Engineering Podcast

MAY 12, 2024

[Starburst Logo]([link] This episode is brought to you by Starburst - an end-to-end data lakehouse platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Want to see Starburst in action? Want to see Starburst in action?

Management

Management Data Lake High Quality Data Machine Learning

The Future of Data Governance: 4 Trends to Watch Out For

Monte Carlo

JANUARY 24, 2024

Data is among your company’s most valuable commodities, but only if you know how to manage it. More data, more access to data, and more regulations mean data governance has become a higher-stakes game. Data Governance Trends The biggest data governance trend isn’t really a trend at all—rather, it’s a state of mind.

Data Governance

Data Governance Government Data Lake Data Architecture

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

We have also included vendors for the specific use cases of ModelOps, MLOps, DataGovOps and DataSecOps which apply DataOps principles to machine learning, AI, data governance, and data security operations. . Airflow — An open-source platform to programmatically author, schedule, and monitor data pipelines.

Consulting

Consulting Machine Learning Data Science Data Pipeline

When And How To Conduct An AI Program

Data Engineering Podcast

MARCH 3, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines.

Programming

Programming Data Lake High Quality Data Machine Learning

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines.

Database

Database Technology Data Lake High Quality Data

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

[Starburst Logo]([link] This episode is brought to you by Starburst - an end-to-end data lakehouse platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Want to see Starburst in action? Want to see Starburst in action?

Process

Process Data Lake High Quality Data Machine Learning

Why Real-Time Data Will Define 2025

Striim

MARCH 14, 2025

The organizations that win in 2025 wont be the ones with the biggest AI modelstheyll be the ones with real-time, AI-ready data infrastructures that enable continuous learning, adaptive decision-making, and assist regulatory compliance at scale. Static AI models trained on stale data will deliver poor outcomes. Whats changing?

Government

Government Data Pipeline Data Lake Architecture

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Dagster offers a new approach to building and running data platforms and data pipelines. Starburst Logo]([link] This episode is brought to you by Starburst - a data lake analytics platform for data engineers who are battling to build and scale high quality data pipelines on the data lake.

Non-relational Database

Non-relational Database Relational Database Database Designing

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

[Starburst Logo]([link] This episode is brought to you by Starburst - a data lake analytics platform for data engineers who are battling to build and scale high quality data pipelines on the data lake.

Data Process

Data Process Process Data Lake High Quality Data

Which Team Should Own Data Quality?

Towards Data Science

JUNE 8, 2023

This post will focus on the most common team ownership models including: data engineering, data reliability engineering, analytics engineering, data quality analysts, and data governance teams. Why is data quality ownership important? The governance team treats every team output as a data product.

Data Governance

Data Governance Government Generalist Data Engineering

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines.

Database

Database Data Lake High Quality Data Data Workflow

Version Your Data Lakehouse Like Your Software With Nessie

Data Engineering Podcast

MARCH 10, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines.

Data Lake

Data Lake High Quality Data Architecture Machine Learning

How Skyscanner Enabled Data & AI Governance with Monte Carlo

Monte Carlo

NOVEMBER 21, 2024

We were excited to sit down with Skyscanner’s Principal Software Engineer JM Laplante and Director of Engineering Michael Ewins — fresh off his inspiring presentation at Big Data London — to learn how their teams are harnessing data lineage and observability to enable data governance at scale.

Government

Government Datasets Data Governance Data

How Skyscanner Enabled Data & AI Governance with Monte Carlo

Monte Carlo

NOVEMBER 21, 2024

We were excited to sit down with Skyscanner’s Principal Software Engineer JM Laplante and Director of Engineering Michael Ewins — fresh off his inspiring presentation at Big Data London — to learn how their teams are harnessing data lineage and observability to enable data governance at scale.

Government

Government Datasets Data Governance Data

Practical First Steps In Data Governance For Long Term Success

Low Friction Data Governance With Immuta

Webinars

Trending Sources

A Holistic Approach To Data Governance Through Self Reflection At Collibra

Webinars

Beyond Legacy Detection: How AI-Driven Data Governance Surpasses Traditional Methods

Simplified End-to-End Development for Production-Ready Data Pipelines, Applications, and ML Models

A Guide to Data Pipelines (And How to Design One From Scratch)

Data governance beyond SDX: Adding third party assets to Apache Atlas

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

How To Prepare Your Data Team for 2025

An IBM Z Data Integration Success Story

Making Email Better With AI At Shortwave

Troubleshooting Kafka In Production

Gain an AI Advantage with Data Governance and Quality

Data Governance: Framework, Tools, Principles, Benefits

The Challenge of Data Quality and Availability—And Why It’s Holding Back AI and Analytics

The last (but not least)”ops” you need for your data : DataGovops

Stitching Together Enterprise Analytics With Microsoft Fabric

6 Ways To Prepare Your Data Team for 2025

Tackling Real Time Streaming Data With SQL Using RisingWave

Toward a Data Mesh (part 2) : Architecture & Technologies

Being Data Driven At Stripe With Trino And Iceberg

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Weekly #198

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Build Your Second Brain One Piece At A Time

Data Sharing Across Business And Platform Boundaries

Release Management For Data Platform Services And Logic

The Future of Data Governance: 4 Trends to Watch Out For

The DataOps Vendor Landscape, 2021

When And How To Conduct An AI Program

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

X-Ray Vision For Your Flink Stream Processing With Datorios

Why Real-Time Data Will Define 2025

Designing A Non-Relational Database Engine

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Which Team Should Own Data Quality?

Reconciling The Data In Your Databases With Datafold

Version Your Data Lakehouse Like Your Software With Nessie

How Skyscanner Enabled Data & AI Governance with Monte Carlo

How Skyscanner Enabled Data & AI Governance with Monte Carlo

Stay Connected