Remove Data Governance Remove Data Warehouse Remove Data Workflow
article thumbnail

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.

Data Lake 262
article thumbnail

Data Engineering Weekly #198

Data Engineering Weekly

link] Jon Osborn: Best Practices for Using QUERY_TAG in Snowflake The modern data warehouses are good at running at scale, given the cost is not a constraint. link] Grab: Metasense V2 - Enhancing, improving, and productionisation of LLM-powered data governance.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. How do we build data products ? How can we interoperate between the data domains ? As you can see, this is in the code part where you are building your data pipelines, a misnomer because this is an over simplification.

article thumbnail

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

SQL 173
article thumbnail

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake.

Process 147
article thumbnail

Modern Customer Data Platform Principles

Data Engineering Podcast

A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Lake 147
article thumbnail

Put Your Whole Data Team On The Same Page With Atlan

Data Engineering Podcast

Datafold also helps automate regression testing of ETL code with its Data Diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values. RudderStack’s smart customer data pipeline is warehouse-first.