Data Lake and Data Workflow - Data Engineering Digest

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

JANUARY 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Data Lake

Data Lake Building High Quality Data AWS

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. Visit [dataengineeringpodcast.com/data-council]([link] and use code *depod20* to register today!

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Kafka

Kafka Data Lake High Quality Data SQL

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

11 Data Engineering Best Practices To Streamline Your Data Workflows

ProjectPro

JUNE 6, 2025

These practices are crucial for building robust and scalable data pipelines, maintaining data quality, and enabling data-driven decision-making. Let us dive into some of the crucial best practices for data engineering that data engineers must implement in their data workflows and projects.

Data Workflow

Data Workflow Data Engineer Data Engineering Data Cleanse

New Fivetran connector streamlines data workflows for real-time insights

ThoughtSpot

SEPTEMBER 6, 2023

The pathway from ETL to actionable analytics can often feel disconnected and cumbersome, leading to frustration for data teams and long wait times for business users. And even when we manage to streamline the data workflow, those insights aren’t always accessible to users unfamiliar with antiquated business intelligence tools.

Data Workflow

Data Workflow Data Lake Raw Data Business Intelligence

Stitching Together Enterprise Analytics With Microsoft Fabric

Data Engineering Podcast

JUNE 23, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Data lakes in various forms have been gaining significant popularity as a unified interface to an organization's analytics. When is Fabric the wrong choice?

Data Lake

Data Lake High Quality Data Hadoop Government

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Data Pipeline Government

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Data Lake

Data Lake High Quality Data BI Data Workflow

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

SQL

SQL Data Lake High Quality Data Kafka

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform.

Data Lake

Data Lake High Quality Data Metadata Government

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Non-relational Database

Non-relational Database Relational Database Database Designing

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Hadoop Data Pipeline

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Data Engineering Podcast

JUNE 30, 2024

He highlights the role of data teams in modern organizations and how Synq is empowering them to achieve this. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Pipeline-centric

Pipeline-centric Engineering Data Lake High Quality Data

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

Data lakes are notoriously complex. Starburst Logo]([link] This episode is brought to you by Starburst - an end-to-end data lakehouse platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Data lakes are notoriously complex.

Process

Process Data Lake High Quality Data Government

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data. Data lakes are notoriously complex.

Database

Database Data Lake High Quality Data Data Workflow

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Data Process

Data Process Process Data Lake High Quality Data

When And How To Conduct An AI Program

Data Engineering Podcast

MARCH 3, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Programming

Programming Data Lake High Quality Data Data Pipeline

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Database

Database Technology Data Lake High Quality Data

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Building

Building Data Lake High Quality Data Machine Learning

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

FEBRUARY 11, 2024

In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process.

Data Lake

Data Lake High Quality Data Government Data Pipeline

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern customer data platform (CDP).

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Version Your Data Lakehouse Like Your Software With Nessie

Data Engineering Podcast

MARCH 10, 2024

Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. Data lakes are notoriously complex. What is involved in integrating Nessie into a given data stack?

Data Lake

Data Lake High Quality Data Architecture Data Pipeline

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Building

Building Data Lake High Quality Data Machine Learning

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Project

Project Data Lake High Quality Data Data Workflow

Release Management For Data Platform Services And Logic

Data Engineering Podcast

MAY 12, 2024

Data lakes are notoriously complex. Starburst Logo]([link] This episode is brought to you by Starburst - an end-to-end data lakehouse platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Data lakes are notoriously complex.

Management

Management Data Lake High Quality Data Government

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Architecture

Architecture Data Lake High Quality Data Java

7 Popular Azure ETL Tools for Data Engineers in 2025

ProjectPro

JUNE 6, 2025

Azure Data Factory 2. Azure Data Lake Storage 7. Azure Logic Apps Azure ETL Best Practices for Big Data Projects Get Your Hands-on Azure ETL Projects with ProjectPro! He explores their collaborative potential in orchestrating, exploring, and analyzing data, shaping a secure and comprehensive data engineering landscape.

ETL Tools

ETL Tools Data Engineering Data Engineer Data Lake

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues for every part of your data workflow, from migration to deployment. Datafold has recently launched a 3-in-1 product experience to support accelerated data migrations. Datafold : ![Datafold]([link]

Systems

Systems Designing Data Lake SQL

Practical First Steps In Data Governance For Long Term Success

Data Engineering Podcast

JUNE 2, 2024

In this episode she shares the practical steps to implementing a data governance practice in your organization, and the pitfalls to avoid. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Governance

Data Governance Government Data Lake High Quality Data

Microsoft Azure Data Factory Training Free For Beginners

ProjectPro

JUNE 6, 2025

The Microsoft Azure Data Factory Training is a beginner-friendly guide that explores the benefits and functionality of the Azure Data Factory. This training course showcases ADF’s scalability, flexibility, and seamless integration with Azure services like Blob Storage, SQL Database, and Data Lake Storage.

Data Lake

Data Lake Cloud Computing Data Workflow Data Pipeline

50+ Azure Data Factory Interview Questions and Answers [2025]

ProjectPro

JUNE 6, 2025

Azure Data Factory is a cloud-based, fully managed, serverless ETL and data integration service offered by Microsoft Azure for automating data movement from its native place to, say, a data lake or data warehouse using ETL (extract-transform-load) OR extract-load-transform (ELT).

Data Lake

Data Lake Metadata SQL Datasets

Unlocking Your dbt Projects With Practical Advice For Practitioners

Data Engineering Podcast

NOVEMBER 19, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Project

Project Data Lake High Quality Data SQL

Shining Some Light In The Black Box Of PostgreSQL Performance

Data Engineering Podcast

NOVEMBER 5, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

PostgreSQL

PostgreSQL Data Lake High Quality Data SQL

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Edureka

APRIL 22, 2025

It incorporates elements from several Microsoft products working together, like Power BI, Azure Synapse Analytics, Data Factory, and OneLake, into a single SaaS experience. No matter the workload, Fabric stores all data on OneLake, a single, unified data lake built on the Delta Lake model.

BI

BI Pipeline-centric Data Lake Google Cloud

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Data Engineering Podcast

NOVEMBER 12, 2023

If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Data lakes are notoriously complex. Paola Graziano by The Freak Fandango Orchestra / CC BY-SA 3.0

Software Engineer

Software Engineer Software Engineering Engineering Data Lake

Designing Data Platforms For Fintech Companies

Data Engineering Podcast

DECEMBER 31, 2023

In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Designing

Designing Data Lake High Quality Data SQL

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. With this 3rd platform generation, you have more real time data analytics and a cost reduction because it is easier to manage this infrastructure in the cloud thanks to managed services.

Technology

Technology Architecture Google Cloud Metadata

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Podcast

DECEMBER 17, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data SQL Architecture

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Podcast

DECEMBER 10, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data SQL Architecture

Data Engineering Weekly #206

Data Engineering Weekly

FEBRUARY 2, 2025

[link] Affirm: Expressive Time Travel and Data Validation for Financial Workloads Affirm migrated from daily MySQL snapshots to Change Data Capture (CDC) replay using Apache Iceberg for its data lake, improving data integrity and governance. link] All rights reserved ProtoGrowth Inc, India.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Powered by Trino, the query engine Apache Iceberg was designed for, Starburst is an open platform with support for all table formats including Apache Iceberg, Hive, and Delta Lake.

Systems

Systems Data Lake High Quality Data Google Cloud

9 Data Integration Projects For You To Practice in 2025

ProjectPro

JUNE 6, 2025

Design A Data Integration Pipeline Using Airflow And dbt You will develop a comprehensive data pipeline using the Airflow data integration platform in this retail data pipeline project. These steps ensure smooth data processing, maintain data integrity, and enable seamless analysis in retail data environments.

Data Integration

Data Integration Project Data Lake PostgreSQL

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Data Engineering Podcast

DECEMBER 18, 2022

In this episode Abe Gong brings his experiences with the Great Expectations project and community to discuss the technical and organizational considerations involved in implementing these constraints to your data workflows.

Metadata

Metadata Data Lake Business Intelligence MongoDB

Azure Databricks: Streamline Your Data Engineering Workflows

ProjectPro

JUNE 6, 2025

“Unlock the potential of your data with Azure Databricks: a unified analytics platform that combines the power of Apache Spark with the ease of Azure.” ” Azure Databricks is a fully managed service provided by Microsoft that offers the capabilities to create an open data lake house within the Azure cloud environment.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

Build A Data Lake For Your Security Logs With Scanner

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Webinars

Trending Sources

Troubleshooting Kafka In Production

Webinars

11 Data Engineering Best Practices To Streamline Your Data Workflows

New Fivetran connector streamlines data workflows for real-time insights

Stitching Together Enterprise Analytics With Microsoft Fabric

Making Email Better With AI At Shortwave

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Tackling Real Time Streaming Data With SQL Using RisingWave

Being Data Driven At Stripe With Trino And Iceberg

Designing A Non-Relational Database Engine

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

X-Ray Vision For Your Flink Stream Processing With Datorios

Reconciling The Data In Your Databases With Datafold

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

When And How To Conduct An AI Program

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Build Your Second Brain One Piece At A Time

Data Sharing Across Business And Platform Boundaries

Modern Customer Data Platform Principles

Version Your Data Lakehouse Like Your Software With Nessie

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Release Management For Data Platform Services And Logic

Addressing The Challenges Of Component Integration In Data Platform Architectures

7 Popular Azure ETL Tools for Data Engineers in 2025

Designing Data Transfer Systems That Scale

Practical First Steps In Data Governance For Long Term Success

Microsoft Azure Data Factory Training Free For Beginners

50+ Azure Data Factory Interview Questions and Answers [2025]

Unlocking Your dbt Projects With Practical Advice For Practitioners

Shining Some Light In The Black Box Of PostgreSQL Performance

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Designing Data Platforms For Fintech Companies

Toward a Data Mesh (part 2) : Architecture & Technologies

Adding An Easy Mode For The Modern Data Stack With 5X

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Weekly #206

Data Migration Strategies For Large Scale Systems

9 Data Integration Projects For You To Practice in 2025

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Azure Databricks: Streamline Your Data Engineering Workflows

Stay Connected