Data Lake and Data Pipeline - Data Engineering Digest

Keep Your Data Lake Fresh With Real Time Streams Using Estuary

Data Engineering Podcast

MAY 21, 2023

In this episode David Yaffe and Johnny Graettinger share the story behind the business and technology and how you can start using it today to build a real-time data lake without all of the headache. What is the impact of continuous data flows on dags/orchestration of transforms?

Data Lake

Data Lake Machine Learning Kafka Data Warehouse

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

JANUARY 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Data Lake

Data Lake Building High Quality Data AWS

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. Visit [dataengineeringpodcast.com/data-council]([link] and use code *depod20* to register today!

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

8 Essential Data Pipeline Design Patterns You Should Know

Monte Carlo

NOVEMBER 21, 2024

Whether it’s customer transactions, IoT sensor readings, or just an endless stream of social media hot takes, you need a reliable way to get that data from point A to point B while doing something clever with it along the way. That’s where data pipeline design patterns come in. Data Mesh Pattern 8.

Data Pipeline

Data Pipeline Designing Lambda Architecture Kafka

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Data Engineering Podcast

AUGUST 3, 2021

Summary Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With more real-time requirements and the increasing use of streaming data there has been a struggle to merge fast, incremental updates with large, historical analysis.

Data Lake

Data Lake Data Warehouse Hadoop Architecture

Realtime Data Applications Made Easier With Meroxa

Data Engineering Podcast

APRIL 23, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. Rudderstack]([link] RudderStack provides all your customer data pipelines in one platform.

Data Lake

Data Lake Kafka Machine Learning Data Warehouse

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Kafka

Kafka Data Lake High Quality Data SQL

A Guide to Data Pipelines (And How to Design One From Scratch)

Striim

SEPTEMBER 11, 2024

Data pipelines are the backbone of your business’s data architecture. Implementing a robust and scalable pipeline ensures you can effectively manage, analyze, and organize your growing data. We’ll answer the question, “What are data pipelines?” Table of Contents What are Data Pipelines?

Data Pipeline

Data Pipeline Designing Data Lake Data Warehouse

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

Cloudera

AUGUST 13, 2021

Every enterprise is trying to collect and analyze data to get better insights into their business. Whether it is consuming log files, sensor metrics, and other unstructured data, most enterprises manage and deliver data to the data lake and leverage various applications like ETL tools, search engines, and databases for analysis.

Data Pipeline

Data Pipeline Data Lake ETL Tools Unstructured Data

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Data Engineering Podcast

JUNE 25, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. Rudderstack]([link] RudderStack provides all your customer data pipelines in one platform.

Data Engineering

Data Engineering Data Engineer Python Engineering

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Data Engineering Podcast

SEPTEMBER 11, 2022

Building reliable data pipelines is a complex and costly undertaking with many layered requirements. In order to reduce the amount of time and effort required to build pipelines that power critical insights Manish Jethani co-founded Hevo Data. Data stacks are becoming more and more complex.

Data Pipeline

Data Pipeline Building MongoDB MySQL

Exploring Processing Patterns For Streaming Data Integration In Your Data Lake

Data Engineering Podcast

NOVEMBER 20, 2021

Summary One of the perennial challenges posed by data lakes is how to keep them up to date as new data is collected. In this episode Ori Rafael shares his experiences from Upsolver and building scalable stream processing for integrating and analyzing data, and what the tradeoffs are when coming from a batch oriented mindset.

Data Lake

Data Lake Data Integration Lambda Architecture Process

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex.

Data Lake

Data Lake High Quality Data Machine Learning Data Pipeline

Stitching Together Enterprise Analytics With Microsoft Fabric

Data Engineering Podcast

JUNE 23, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Data lakes in various forms have been gaining significant popularity as a unified interface to an organization's analytics. When is Fabric the wrong choice?

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Streaming Data Pipelines Made SQL With Decodable

Data Engineering Podcast

OCTOBER 28, 2021

In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Start trusting your data with Monte Carlo today! Start trusting your data with Monte Carlo today!

Data Pipeline

Data Pipeline SQL Data Warehouse Data Lake

Insights And Advice On Building A Data Lake Platform From Someone Who Learned The Hard Way

Data Engineering Podcast

MAY 15, 2022

Summary Designing a data platform is a complex and iterative undertaking which requires accounting for many conflicting needs. Designing a platform that relies on a data lake as its central architectural tenet adds additional layers of difficulty. When is a data lake architecture the wrong choice?

Data Lake

Data Lake Building Architecture BI

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

SQL

SQL Data Lake High Quality Data Machine Learning

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex.

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex.

Building

Building Data Lake High Quality Data Machine Learning

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises.

Data Lake

Data Lake High Quality Data BI Data Workflow

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform.

Data Lake

Data Lake High Quality Data Metadata Machine Learning

When And How To Conduct An AI Program

Data Engineering Podcast

MARCH 3, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex.

Programming

Programming Data Lake High Quality Data Machine Learning

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex.

Database

Database Technology Data Lake High Quality Data

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex.

Building

Building Data Lake High Quality Data Machine Learning

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

FEBRUARY 11, 2024

In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process. Dagster offers a new approach to building and running data platforms and data pipelines.

Data Lake

Data Lake High Quality Data Government Machine Learning

Version Your Data Lakehouse Like Your Software With Nessie

Data Engineering Podcast

MARCH 10, 2024

Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. Data lakes are notoriously complex. What is involved in integrating Nessie into a given data stack?

Data Lake

Data Lake High Quality Data Architecture Machine Learning

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Edureka

APRIL 22, 2025

It incorporates elements from several Microsoft products working together, like Power BI, Azure Synapse Analytics, Data Factory, and OneLake, into a single SaaS experience. No matter the workload, Fabric stores all data on OneLake, a single, unified data lake built on the Delta Lake model.

BI

BI Pipeline-centric Data Lake Google Cloud

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises.

Non-relational Database

Non-relational Database Relational Database Database Designing

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Data Engineering Podcast

JUNE 30, 2024

He highlights the role of data teams in modern organizations and how Synq is empowering them to achieve this. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Pipeline-centric

Pipeline-centric Engineering Data Lake High Quality Data

Release Management For Data Platform Services And Logic

Data Engineering Podcast

MAY 12, 2024

Data lakes are notoriously complex. Starburst Logo]([link] This episode is brought to you by Starburst - an end-to-end data lakehouse platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Data lakes are notoriously complex.

Management

Management Data Lake High Quality Data Machine Learning

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Data Process

Data Process Process Data Lake High Quality Data

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

Data lakes are notoriously complex. Starburst Logo]([link] This episode is brought to you by Starburst - an end-to-end data lakehouse platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Data lakes are notoriously complex.

Process

Process Data Lake High Quality Data Machine Learning

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex.

Database

Database Data Lake High Quality Data Data Workflow

A Roadmap To Bootstrapping The Data Team At Your Startup

Data Engineering Podcast

MAY 28, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. Rudderstack]([link] RudderStack provides all your customer data pipelines in one platform.

Data Lake

Data Lake Machine Learning Data Warehouse Education

Practical First Steps In Data Governance For Long Term Success

Data Engineering Podcast

JUNE 2, 2024

In this episode she shares the practical steps to implementing a data governance practice in your organization, and the pitfalls to avoid. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Governance

Data Governance Government Data Lake High Quality Data

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern customer data platform (CDP).

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

What Happens When The Abstractions Leak On Your Data

Data Engineering Podcast

MAY 14, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. Rudderstack]([link] RudderStack provides all your customer data pipelines in one platform.

Data Lake

Data Lake Machine Learning Data Warehouse AWS

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Data Engineering Podcast

NOVEMBER 12, 2023

If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Data lakes are notoriously complex. webapps vs. data pipelines vs. exploratory analysis, etc.)

Software Engineering

Software Engineering Software Engineer Engineering Data Lake

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

AI data engineers are data engineers that are responsible for developing and managing data pipelines that support AI and GenAI data products. Essential Skills for AI Data Engineers Expertise in Data Pipelines and ETL Processes A foundational skill for data engineers?

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Architecture

Architecture Data Lake High Quality Data SQL

Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams

Data Engineering Podcast

DECEMBER 28, 2022

Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. RudderStack helps you build a customer data platform on your warehouse or data lake.

Data Lake

Data Lake Data Warehouse Data Pipeline MongoDB

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Project

Project Data Lake High Quality Data Data Workflow

Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems

Data Engineering Podcast

DECEMBER 25, 2022

Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. RudderStack helps you build a customer data platform on your warehouse or data lake.

Machine Learning

Machine Learning Systems Data Lake Data Warehouse

Building Self Serve Business Intelligence With AI And Semantic Modeling At Zenlytic

Data Engineering Podcast

APRIL 16, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. Rudderstack]([link] RudderStack provides all your customer data pipelines in one platform.

Business Intelligence

Business Intelligence Building Data Lake BI

Keep Your Data Lake Fresh With Real Time Streams Using Estuary

Build A Data Lake For Your Security Logs With Scanner

Webinars

Trending Sources

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Webinars

8 Essential Data Pipeline Design Patterns You Should Know

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Realtime Data Applications Made Easier With Meroxa

Troubleshooting Kafka In Production

A Guide to Data Pipelines (And How to Design One From Scratch)

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Exploring Processing Patterns For Streaming Data Integration In Your Data Lake

Making Email Better With AI At Shortwave

Stitching Together Enterprise Analytics With Microsoft Fabric

Streaming Data Pipelines Made SQL With Decodable

Insights And Advice On Building A Data Lake Platform From Someone Who Learned The Hard Way

Tackling Real Time Streaming Data With SQL Using RisingWave

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Being Data Driven At Stripe With Trino And Iceberg

When And How To Conduct An AI Program

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Build Your Second Brain One Piece At A Time

Data Sharing Across Business And Platform Boundaries

Version Your Data Lakehouse Like Your Software With Nessie

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Designing A Non-Relational Database Engine

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Release Management For Data Platform Services And Logic

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

X-Ray Vision For Your Flink Stream Processing With Datorios

Reconciling The Data In Your Databases With Datafold

A Roadmap To Bootstrapping The Data Team At Your Startup

Practical First Steps In Data Governance For Long Term Success

Modern Customer Data Platform Principles

What Happens When The Abstractions Leak On Your Data

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Addressing The Challenges Of Component Integration In Data Platform Architectures

Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems

Building Self Serve Business Intelligence With AI And Semantic Modeling At Zenlytic

Stay Connected