Data Pipeline and High Quality Data - Data Engineering Digest

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Kafka

Kafka Data Lake High Quality Data SQL

The Challenge of Data Quality and Availability—And Why It’s Holding Back AI and Analytics

Striim

APRIL 18, 2025

Without high-quality, available data, companies risk misinformed decisions, compliance violations, and missed opportunities. Why AI and Analytics Require Real-Time, High-Quality Data To extract meaningful value from AI and analytics, organizations need data that is continuously updated, accurate, and accessible.

High Quality Data

High Quality Data Business Intelligence Unstructured Data Data Pipeline

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!

Data Lake

Data Lake High Quality Data Machine Learning Data Pipeline

Stitching Together Enterprise Analytics With Microsoft Fabric

Data Engineering Podcast

JUNE 23, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Dagster offers a new approach to building and running data platforms and data pipelines. Starburst : ![Starburst Starburst : ![Starburst

SQL

SQL Data Lake High Quality Data Machine Learning

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Lake

Data Lake High Quality Data Metadata Machine Learning

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!

Building

Building Data Lake High Quality Data Machine Learning

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

AI data engineers are data engineers that are responsible for developing and managing data pipelines that support AI and GenAI data products. Essential Skills for AI Data Engineers Expertise in Data Pipelines and ETL Processes A foundational skill for data engineers?

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Your first 30 days are free!

Data Lake

Data Lake High Quality Data BI Data Workflow

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Data Engineering Podcast

JUNE 30, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Pipeline-centric

Pipeline-centric Engineering Data Lake High Quality Data

Release Management For Data Platform Services And Logic

Data Engineering Podcast

MAY 12, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake.

Management

Management Data Lake High Quality Data Machine Learning

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake.

Process

Process Data Lake High Quality Data Machine Learning

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!

Building

Building Data Lake High Quality Data Machine Learning

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

FEBRUARY 11, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Dagster offers a new approach to building and running data platforms and data pipelines.

Data Lake

Data Lake High Quality Data Government Machine Learning

When And How To Conduct An AI Program

Data Engineering Podcast

MARCH 3, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!

Programming

Programming Data Lake High Quality Data Machine Learning

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!

Database

Database Technology Data Lake High Quality Data

Practical First Steps In Data Governance For Long Term Success

Data Engineering Podcast

JUNE 2, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Governance

Data Governance Government Data Lake High Quality Data

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Your first 30 days are free!

Non-relational Database

Non-relational Database Relational Database Database Designing

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Process

Data Process Process Data Lake High Quality Data

Monte Carlo and Databricks Partner to Deliver Data + AI Observability

Monte Carlo

MARCH 19, 2025

Monte Carlo and Databricks double-down on their partnership, helping organizations build trusted AI applications by expanding visibility into the data pipelines that fuel the Databricks Data Intelligence Platform. Read on to discover how we’re helping organizations ensure reliability across the entire data + AI lifecycle.

Unstructured Data

Unstructured Data Data Pipeline High Quality Data Banking

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

JANUARY 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Lake

Data Lake Building High Quality Data AWS

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!

Database

Database Data Lake High Quality Data Data Workflow

Version Your Data Lakehouse Like Your Software With Nessie

Data Engineering Podcast

MARCH 10, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!

Data Lake

Data Lake High Quality Data Architecture Machine Learning

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Data Engineering Podcast

NOVEMBER 12, 2023

If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Data lakes are notoriously complex. webapps vs. data pipelines vs. exploratory analysis, etc.)

Software Engineer

Software Engineer Software Engineering Engineering Data Lake

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Architecture

Architecture Data Lake High Quality Data SQL

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Dagster offers a new approach to building and running data platforms and data pipelines.

Project

Project Data Lake High Quality Data Data Workflow

Data Integration for AI: Top Use Cases and Steps for Success

Precisely

FEBRUARY 20, 2025

When your AI has access to all this high-quality data, you gain more relevant insights that help you power better decision-making and foster trust in AI outputs. This applies to both the development quality and performance characteristics of your data pipelines as well as the data quality and overlay governance for this process.

Data Integration

Data Integration Government Datasets Data Pipeline

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Systems

Systems Data Lake High Quality Data Google Cloud

Data Pipelines in the Healthcare Industry

DareData

JULY 29, 2020

With these points in mind, I argue that the biggest hurdle to the widespread adoption of these advanced techniques in the healthcare industry is not intrinsic to the industry itself, or in any way related to its practitioners or patients, but simply the current lack of high-quality data pipelines.

Data Pipeline

Data Pipeline Healthcare Medical Pipeline-centric

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Summary The first step of data pipelines is to move the data to a place where you can process and prepare it for its eventual purpose. Data transfer systems are a critical component of data enablement, and building them to support large volumes of information is a complex endeavor. Sponsored By: Starburst : ![Starburst

Systems

Systems Designing Data Lake SQL

Unlocking Your dbt Projects With Practical Advice For Practitioners

Data Engineering Podcast

NOVEMBER 19, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Project

Project Data Lake SQL High Quality Data

Shining Some Light In The Black Box Of PostgreSQL Performance

Data Engineering Podcast

NOVEMBER 5, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

PostgreSQL

PostgreSQL Data Lake SQL High Quality Data

Data Engineering Weekly #206

Data Engineering Weekly

FEBRUARY 2, 2025

The article advocates for a "shift left" approach to data processing, improving data accessibility, quality, and efficiency for operational and analytical use cases. The CDC approach addresses challenges like time travel, data validation, performance, and cost by replicating operational data to an AWS S3-based Iceberg Data Lake.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

No Python, No SQL Templates, No YAML: Why Your Open Source Data Quality Tool Should Generate 80% Of Your Data Quality Tests Automatically

DataKitchen

FEBRUARY 17, 2025

Current open-source frameworks like YAML-based Soda Core, Python-based Great Expectations, and dbt SQL are frameworks to help speed up the creation of data quality tests. They are all in the realm of software, domain-specific language to help you write data quality tests.

SQL

SQL Python Government Data Engineering

Designing Data Platforms For Fintech Companies

Data Engineering Podcast

DECEMBER 31, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Designing

Designing Data Lake High Quality Data SQL

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Podcast

DECEMBER 17, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data SQL Architecture

5 Takeaways from the Data Pipeline Automation Summit 2023

Ascend.io

APRIL 27, 2023

Going into the Data Pipeline Automation Summit 2023, we were thrilled to connect with our customers and partners and share the innovations we’ve been working on at Ascend. The summit explored the future of data pipeline automation and the endless possibilities it presents.

Data Pipeline

Data Pipeline Pipeline-centric Data Validation Data Engineering

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Podcast

DECEMBER 10, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake SQL High Quality Data Architecture

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure. While working in Azure with our customers, we have noticed several standard Azure tools people use to develop data pipelines and ETL or ELT processes. We counted ten ‘standard’ ways to transform and set up batch data pipelines in Microsoft Azure.

Data Pipeline

Data Pipeline BI Machine Learning Data Preparation

Build vs Buy Data Pipeline Guide

Monte Carlo

APRIL 24, 2023

Build vs buy orchestration tooling Unlike the other components we’ve discussed in Part 3, data pipelines don’t require orchestration to be considered functional—at least not at a foundational level. And data orchestration tools are generally easy to stand-up for initial use-cases. Missed Nishith’s 5 considerations?

Data Pipeline

Data Pipeline Building Data Ingestion BI

Data Engineering Weekly #178

Data Engineering Weekly

JUNE 30, 2024

Experience Enterprise-Grade Apache Airflow Astro augments Airflow with enterprise-grade features to enhance productivity, meet scalability and availability demands across your data pipelines, and more. It is a good reminder to the data industry that we need to solve the fundamentals of data engineering to utilize AI better.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Troubleshooting Kafka In Production

Webinars

Trending Sources

The Challenge of Data Quality and Availability—And Why It’s Holding Back AI and Analytics

Webinars

Making Email Better With AI At Shortwave

Stitching Together Enterprise Analytics With Microsoft Fabric

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Tackling Real Time Streaming Data With SQL Using RisingWave

Being Data Driven At Stripe With Trino And Iceberg

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Release Management For Data Platform Services And Logic

X-Ray Vision For Your Flink Stream Processing With Datorios

Build Your Second Brain One Piece At A Time

Data Sharing Across Business And Platform Boundaries

When And How To Conduct An AI Program

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Practical First Steps In Data Governance For Long Term Success

Designing A Non-Relational Database Engine

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Monte Carlo and Databricks Partner to Deliver Data + AI Observability

Build A Data Lake For Your Security Logs With Scanner

Reconciling The Data In Your Databases With Datafold

Version Your Data Lakehouse Like Your Software With Nessie

Modern Customer Data Platform Principles

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Addressing The Challenges Of Component Integration In Data Platform Architectures

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Integration for AI: Top Use Cases and Steps for Success

Data Migration Strategies For Large Scale Systems

Data Pipelines in the Healthcare Industry

Designing Data Transfer Systems That Scale

Unlocking Your dbt Projects With Practical Advice For Practitioners

Shining Some Light In The Black Box Of PostgreSQL Performance

Data Engineering Weekly #206

No Python, No SQL Templates, No YAML: Why Your Open Source Data Quality Tool Should Generate 80% Of Your Data Quality Tests Automatically

Designing Data Platforms For Fintech Companies

Adding An Easy Mode For The Modern Data Stack With 5X

5 Takeaways from the Data Pipeline Automation Summit 2023

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

Build vs Buy Data Pipeline Guide

Data Engineering Weekly #178

Stay Connected