Architecture, Data Lake and Data Workflow

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

JANUARY 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Data Lake

Data Lake Building High Quality Data AWS

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. To start, can you share your definition of what constitutes a "Data Lakehouse"?

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. Data lakes are notoriously complex. Materialize]([link] You shouldn't have to throw away the database to build with fast-changing data.

Kafka

Kafka Data Lake High Quality Data SQL

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team. Data lakes are notoriously complex. With Materialize, you can!

Architecture

Architecture Data Lake High Quality Data SQL

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. With this 3rd platform generation, you have more real time data analytics and a cost reduction because it is easier to manage this infrastructure in the cloud thanks to managed services.

Technology

Technology Architecture Google Cloud Metadata

Stitching Together Enterprise Analytics With Microsoft Fabric

Data Engineering Podcast

JUNE 23, 2024

Summary Data lakehouse architectures have been gaining significant adoption. To accelerate adoption in the enterprise Microsoft has created the Fabric platform, based on their OneLake architecture. Data lakes in various forms have been gaining significant popularity as a unified interface to an organization's analytics.

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Machine Learning Data Pipeline

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

SQL

SQL Data Lake High Quality Data Machine Learning

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Data Lake

Data Lake High Quality Data BI Data Workflow

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Edureka

APRIL 22, 2025

It incorporates elements from several Microsoft products working together, like Power BI, Azure Synapse Analytics, Data Factory, and OneLake, into a single SaaS experience. Its multi-cluster shared data architecture is one of its primary features.

BI

BI Pipeline-centric Data Lake Google Cloud

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Building

Building Data Lake High Quality Data Machine Learning

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Database

Database Technology Data Lake High Quality Data

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform. Can you describe what role Trino and Iceberg play in Stripe's data architecture?

Data Lake

Data Lake High Quality Data Metadata Machine Learning

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

Data lakes are notoriously complex. How much of the complexity is due to the nature of streaming data vs. the architectural realities of Flink? How has the lack of visibility into the flow of data in Flink impacted the ways that teams think about where/when/how to apply it? Data lakes are notoriously complex.

Process

Process Data Lake High Quality Data Machine Learning

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Non-relational Database

Non-relational Database Relational Database Database Designing

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Data Process

Data Process Process Data Lake High Quality Data

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

Summary Databases and analytics architectures have gone through several generational shifts. A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization. How has that changed the architectural approach to CDPs? Want to see Starburst in action?

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Building

Building Data Lake High Quality Data Machine Learning

Version Your Data Lakehouse Like Your Software With Nessie

Data Engineering Podcast

MARCH 10, 2024

Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. Data lakes are notoriously complex. Can you describe the architecture of Nessie?

Data Lake

Data Lake High Quality Data Architecture Machine Learning

When And How To Conduct An AI Program

Data Engineering Podcast

MARCH 3, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Programming

Programming Data Lake High Quality Data Machine Learning

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data. Data lakes are notoriously complex.

Database

Database Data Lake High Quality Data Data Workflow

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

FEBRUARY 11, 2024

In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process. Want to see Starburst in action?

Data Lake

Data Lake High Quality Data Government Machine Learning

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. Datafold has recently launched a 3-in-1 product experience to support accelerated data migrations. Data lakes are notoriously complex.

Systems

Systems Designing Data Lake SQL

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Project

Project Data Lake High Quality Data Data Workflow

Unlocking Your dbt Projects With Practical Advice For Practitioners

Data Engineering Podcast

NOVEMBER 19, 2023

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. Data lakes are notoriously complex. Materialize]([link] You shouldn't have to throw away the database to build with fast-changing data.

Project

Project Data Lake SQL High Quality Data

Shining Some Light In The Black Box Of PostgreSQL Performance

Data Engineering Podcast

NOVEMBER 5, 2023

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. Data lakes are notoriously complex. Materialize]([link] You shouldn't have to throw away the database to build with fast-changing data.

PostgreSQL

PostgreSQL Data Lake SQL High Quality Data

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Data Engineering Podcast

NOVEMBER 12, 2023

If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Data lakes are notoriously complex. With Materialize, you can! Sponsored By: Starburst : ![Starburst

Software Engineering

Software Engineering Software Engineer Engineering Data Lake

Designing Data Platforms For Fintech Companies

Data Engineering Podcast

DECEMBER 31, 2023

In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Designing

Designing Data Lake High Quality Data SQL

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Podcast

DECEMBER 17, 2023

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. Data lakes are notoriously complex. Materialize]([link] You shouldn't have to throw away the database to build with fast-changing data.

Data Lake

Data Lake High Quality Data SQL Architecture

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Podcast

DECEMBER 10, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management You shouldn't have to throw away the database to build with fast-changing data. Data lakes are notoriously complex. With Materialize, you can! That’s three free boards at dataengineeringpodcast.com/miro.

Data Lake

Data Lake SQL High Quality Data Architecture

Data Engineering Weekly #206

Data Engineering Weekly

FEBRUARY 2, 2025

[link] Adam Bellemare & Thomas Betts: The End of the Bronze Age: Rethinking the Medallion Architecture I’m always a bit uncomfortable with medallion architecture since it is a glorified term for the traditional ETL process. link] All rights reserved ProtoGrowth Inc, India.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

A Guide to Data Pipelines (And How to Design One From Scratch)

Striim

SEPTEMBER 11, 2024

Data pipelines are the backbone of your business’s data architecture. Implementing a robust and scalable pipeline ensures you can effectively manage, analyze, and organize your growing data. Benjamin Kennedy, Cloud Solutions Architect at Striim, emphasizes the outcome-driven nature of data pipelines.

Data Pipeline

Data Pipeline Designing Data Lake Data Warehouse

DataOps Architecture: 5 Key Components and How to Get Started

Databand.ai

AUGUST 30, 2023

DataOps Architecture: 5 Key Components and How to Get Started Ryan Yackel August 30, 2023 What Is DataOps Architecture? DataOps is a collaborative approach to data management that combines the agility of DevOps with the power of data analytics. As a result, they can be slow, inefficient, and prone to errors.

Architecture

Architecture Data Ingestion Data Governance Data Cleanse

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Monte Carlo

JUNE 14, 2023

In this post, we will help you quickly level up your overall knowledge of data pipeline architecture by reviewing: Table of Contents What is data pipeline architecture? Why is data pipeline architecture important? What is data pipeline architecture? Why is data pipeline architecture important?

Data Pipeline

Data Pipeline Architecture Data Lake Data Warehouse

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Data Engineering Podcast

SEPTEMBER 11, 2022

RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control.

Data Pipeline

Data Pipeline Building MongoDB MySQL

Getting the Most From Your Modern Data Platform: A Three-Phase Approach

Snowflake

JULY 22, 2024

Focus areas include: Migrating your current data functionalities and KPIs to Snowflake. Determining an architecture and a scalable data model to integrate more source systems in the future. top modernizing your data lake with Snowflake, watch our on demand webinar.

Government

Government Data Cloud Hadoop

Be Confident In Your Data Integration By Quickly Validating Matching Records With data-

Data Engineering Podcast

JULY 3, 2022

In order to quickly identify if and how two data systems are out of sync Gleb Mezhanskiy and Simon Eskildsen partnered to create the open source data-diff utility. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability.

Data Integration

Data Integration MongoDB MySQL Scala

Fire Your Super-Smart Data Consultants with DataOps

DataKitchen

JANUARY 25, 2022

When an especially gifted data engineer decided to leave, it left the team scrambling to support routine functions. The dynamic nature of the consulting team meant that architectural decisions made at the data engineering level were often short-sighted and incoherent.

Consulting

Consulting Recruitment Data Lake Data Engineering

The State of Data Engineering in 2024: Key Insights and Trends

Data Engineering Weekly

DECEMBER 16, 2024

Evolution of Data Lake Technologies The data lake ecosystem has matured significantly in 2024, particularly in table formats and storage technologies. This architecture incorporates real-time query processing and semantic search capabilities, enabling faster and more accurate content discovery.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

The Evolution of Table Formats

Monte Carlo

MAY 14, 2024

As organizations seek greater value from their data, data architectures are evolving to meet the demand — and table formats are no exception. It was designed to support high-volume data exchange and compatibility across different system versions, which is essential for streaming architectures such as Apache Kafka.

Data Lake

Data Lake Metadata Hadoop Data Governance

Data Engineering Trends With Aswin & Ananth

Data Engineering Weekly

DECEMBER 25, 2023

Integrating AI into data workflows is not just a trend but a paradigm shift, making data processes more efficient and intelligent. Lake House Architectures: The New Frontier Lakehouse architectures have been at the forefront of data engineering discussions this year.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

DataOps For Business Analytics Teams

DataKitchen

JANUARY 3, 2022

There’s a recent trend toward people creating data lake or data warehouse patterns and calling it data enablement or a data hub. DataOps expands upon this approach by focusing on the processes and workflows that create data enablement and business analytics. DataOps Process Hub.

Business Analyst

Business Analyst Data Lake Consulting Data Analytics

Unleashing the Power of CDC With Snowflake

Workfall

JUNE 12, 2023

Moreover, it facilitates the implementation of microservices architectures and event-driven systems, automating reactions to data changes without manual intervention. It captures incremental changes from transactional databases or other sources, efficiently loading them into data warehouses or data lakes.

Telecommunication

Telecommunication Metadata Healthcare Finance

Build A Data Lake For Your Security Logs With Scanner

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Webinars

Trending Sources

Troubleshooting Kafka In Production

Webinars

Addressing The Challenges Of Component Integration In Data Platform Architectures

Toward a Data Mesh (part 2) : Architecture & Technologies

Stitching Together Enterprise Analytics With Microsoft Fabric

Making Email Better With AI At Shortwave

Tackling Real Time Streaming Data With SQL Using RisingWave

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Being Data Driven At Stripe With Trino And Iceberg

X-Ray Vision For Your Flink Stream Processing With Datorios

Designing A Non-Relational Database Engine

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Modern Customer Data Platform Principles

Build Your Second Brain One Piece At A Time

Version Your Data Lakehouse Like Your Software With Nessie

When And How To Conduct An AI Program

Reconciling The Data In Your Databases With Datafold

Data Sharing Across Business And Platform Boundaries

Designing Data Transfer Systems That Scale

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Unlocking Your dbt Projects With Practical Advice For Practitioners

Shining Some Light In The Black Box Of PostgreSQL Performance

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Designing Data Platforms For Fintech Companies

Adding An Easy Mode For The Modern Data Stack With 5X

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Weekly #206

A Guide to Data Pipelines (And How to Design One From Scratch)

DataOps Architecture: 5 Key Components and How to Get Started

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Getting the Most From Your Modern Data Platform: A Three-Phase Approach

Be Confident In Your Data Integration By Quickly Validating Matching Records With data-

Fire Your Super-Smart Data Consultants with DataOps

The State of Data Engineering in 2024: Key Insights and Trends

The Evolution of Table Formats

Data Engineering Trends With Aswin & Ananth

DataOps For Business Analytics Teams

Unleashing the Power of CDC With Snowflake

Stay Connected