Data Management and Data Warehouse - Data Engineering Digest

Data Integrity for AI: What’s Old is New Again

Precisely

JANUARY 9, 2025

The goal of this post is to understand how data integrity best practices have been embraced time and time again, no matter the technology underpinning. In the beginning, there was a data warehouse The data warehouse (DW) was an approach to data architecture and structured data management that really hit its stride in the early 1990s.

Data Integration

Data Integration Hadoop Data Warehouse Data Lake

Use Your Data Warehouse To Power Your Product Analytics With NetSpring

Data Engineering Podcast

MARCH 10, 2023

In this episode Priyendra Deshwal explains how NetSpring is designed to empower your product and data teams to build and explore insights around your products in a streamlined and maintainable workflow. Contact Info LinkedIn Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?

Data Warehouse

Data Warehouse Data Lake Machine Learning Data Science

Data Warehouses Vs Operational Data Stores Vs Data Lakes – How To Store Your Data For Analytics

Seattle Data Guy

AUGUST 2, 2023

A few months ago, I uploaded a video where I discussed data warehouses, data lakes, and transactional databases. However, the world of data management is evolving rapidly, especially with the resurgence of AI and machine learning.

Data Lake

Data Lake Data Warehouse Data Machine Learning

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Building Your Data Warehouse On Top Of PostgreSQL

Data Engineering Podcast

MAY 13, 2021

Summary There is a lot of attention on the database market and cloud data warehouses. While they provide a measure of convenience, they also require you to sacrifice a certain amount of control over your data. Firebolt is the fastest cloud data warehouse. Visit dataengineeringpodcast.com/firebolt to get started.

PostgreSQL

PostgreSQL Data Warehouse Building MySQL

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. Customers that require a hybrid of these to support many different tools and languages have built a data lakehouse.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

Creating Shared Context For Your Data Warehouse With A Controlled Vocabulary

Data Engineering Podcast

JANUARY 1, 2022

In this episode Emily Riederer shares her work to create a controlled vocabulary for managing the semantic elements of the data managed by her team and encoding it in the schema definitions in her data warehouse. star/snowflake schema, data vault, etc.) What do you have planned for the future of dbtplyr?

Data Warehouse

Data Warehouse BI Data Workflow Data Engineer

Paving The Road For Fast Analytics On Distributed Clouds With The Yellowbrick Data Warehouse

Data Engineering Podcast

MAY 27, 2021

Summary The data warehouse has become the focal point of the modern data platform. With increased usage of data across businesses, and a diversity of locations and environments where data needs to be managed, the warehouse engine needs to be fast and easy to manage.

Data Warehouse

Data Warehouse Cloud PostgreSQL Kafka

Realtime Data Applications Made Easier With Meroxa

Data Engineering Podcast

APRIL 23, 2023

In this episode DeVaris Brown discusses the types of applications that are possible when teams don't have to manage the complex infrastructure necessary to support continuous data flows. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines.

Data Lake

Data Lake Kafka Machine Learning Data Warehouse

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). To start, can you share your definition of what constitutes a "Data Lakehouse"? Closing Announcements Thank you for listening!

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Master Data Management: Common Misconceptions You Should Know

Precisely

OCTOBER 23, 2023

When most people think of master data management, they first think of customers and products. But master data encompasses so much more than data about customers and products. Challenges of Master Data Management A decade ago, master data management (MDM) was a much simpler proposition than it is today.

Data Management

Data Management Management Data Data Integration

Making The Total Cost Of Ownership For External Data Manageable With Crux

Data Engineering Podcast

JULY 17, 2022

In this episode Crux CTO Mark Etherington discusses the different costs involved in managing external data, how to think about the total return on investment for your data, and how the Crux platform is architected to reduce the toil involved in managing third party data. Tired of deploying bad data?

Data Management

Data Management Management Metadata MongoDB

Leading The Charge For The ELT Data Integration Pattern For Cloud Data Warehouses At Matillion

Data Engineering Podcast

MAY 1, 2022

He describes how the platform is architected, the challenges related to selling cloud technologies into enterprise organizations, and how you can adopt Matillion for your own workflows to reduce the maintenance burden of data integration workflows. No more shipping and praying, you can now know exactly what will change in your database!

Data Warehouse

Data Warehouse Data Integration Cloud Google Cloud

Composable data management at Meta

Engineering at Meta

MAY 22, 2024

In recent years, Meta’s data management systems have evolved into a composable architecture that creates interoperability, promotes reusability, and improves engineering efficiency. Data is at the core of every product and service at Meta. Data is at the core of every product and service at Meta.

Data Management

Data Management Management Data SQL

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Data Engineering Podcast

JUNE 25, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. Can you describe what SQLMesh is and the story behind it? DataOps is a term that has been co-opted and overloaded.

Data Engineer

Data Engineer Data Engineering Python Engineering

Data News — Week 24.11

Christophe Blefari

MARCH 15, 2024

Understand how BigQuery inserts, deletes and updates — Once again Vu took time to deep dive into BigQuery internal, this time to explain how data management is done. Pandera, a data validation library for dataframes, now supports Polars.

Metadata

Metadata Data Data Warehouse Software Engineer

Aligning Velox and Apache Arrow: Towards composable data management

Engineering at Meta

FEBRUARY 20, 2024

This new convergence helps Meta and the larger community build data management systems that are unified, more efficient, and composable. Meta’s Data Infrastructure teams have been rethinking how data management systems are designed.

Data Management

Data Management Bytes Management Datasets

An Exploration Of The Composable Customer Data Platform

Data Engineering Podcast

APRIL 9, 2023

When it was difficult to wire together the event collection, data modeling, reporting, and activation it made sense to buy monolithic products that handled every stage of the customer data lifecycle. Now that the data warehouse has taken center stage a new approach of composable customer data platforms is emerging.

Data Lake

Data Lake Data Warehouse Machine Learning Data

Reflecting On The Past 6 Years Of Data Engineering

Data Engineering Podcast

FEBRUARY 5, 2023

Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Sign up now for early access to Materialize and get started with the power of streaming data with the same simplicity and low implementation cost as batch cloud data warehouses.

Data Engineer

Data Engineer Data Engineering Engineering PostgreSQL

Data Lake vs. Data Warehouse vs. Data Lakehouse

Sync Computing

NOVEMBER 7, 2024

Data volume and velocity, governance, structure, and regulatory requirements have all evolved and continue to. Despite these limitations, data warehouses, introduced in the late 1980s based on ideas developed even earlier, remain in widespread use today for certain business intelligence and data analysis applications.

Data Lake

Data Lake Data Warehouse Business Intelligence Unstructured Data

The High Cost of Poor Data Warehouse Governance

Monte Carlo

SEPTEMBER 10, 2024

This truth was hammered home recently when ride-hailing giant Uber found itself on the receiving end of a staggering €290 million ($324 million) fine from the Dutch Data Protection Authority. Poor data warehouse governance practices that led to the improper handling of sensitive European driver data. The reason?

Data Warehouse

Data Warehouse Government Data Governance Metadata

On-Prem vs. The Cloud: Key Considerations

phData: Data Engineering

FEBRUARY 21, 2025

In this post, we will be particularly interested in the impact that cloud computing left on the modern data warehouse. We will explore the different options for data warehousing and how you can leverage this information to make the right decisions for your organization. Understanding the Basics What is a Data Warehouse?

Cloud

Cloud Data Warehouse Amazon Web Services Data Ingestion

The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 19, 2023

Summary Cloud data warehouses have unlocked a massive amount of innovation and investment in data applications, but they are still inherently limiting. Because of their complete ownership of your data they constrain the possibilities of what data you can store and how it can be used.

IT

IT Data Lake Metadata Data Warehouse

How Meta understands data at scale

Engineering at Meta

APRIL 28, 2025

Over the past decade, we have gained a deeper understanding of our data, by embedding privacy considerations into every stage of product development, ensuring a more secure and responsible approach to data management. Consider the data flow from online systems to the data warehouse, as shown in the diagram below.

Metadata

Metadata Data Utilities Data Warehouse

Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service

Data Engineering Podcast

JUNE 4, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. Can you describe what Agile Data Engine is and the story behind it? RudderStack also supports real-time use cases.

Data Engineer

Data Engineer Data Engineering Engineering Data Lake

A Roadmap To Bootstrapping The Data Team At Your Startup

Data Engineering Podcast

MAY 28, 2023

Ghalib Suleiman has been on both sides of this equation and joins the show to share his hard-won wisdom about how to start and grow a data team in the early days of company growth. Can you start by sharing your conception of the responsibilities of a data team? When is it more practical to outsource the data work?

Data Lake

Data Lake Machine Learning Data Warehouse Education

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern customer data platform (CDP).

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Databricks, Snowflake and the future

Christophe Blefari

JUNE 21, 2024

Snowflake was founded in 2012 around its data warehouse product, which is still its core offering, and Databricks was founded in 2013 from academia with Spark co-creator researchers, becoming Apache Spark in 2014. Databricks is focusing on simplification (serverless, auto BI 2 , improved PySpark) while evolving into a data warehouse.

Metadata

Metadata Data Warehouse BI MySQL

Data Mesh vs Data Warehouse: A Guide to Choosing the Right Data Architecture

Hevo

SEPTEMBER 10, 2024

Nowadays, when it comes to data management, every business has to make one critical decision: whether to use a Data Mesh or a Data Warehouse. Both are strong data management architectures, but they are designed to support different needs and various organizational structures.

Data Warehouse

Data Warehouse Architecture Data Architecture Data

Keep Your Data Lake Fresh With Real Time Streams Using Estuary

Data Engineering Podcast

MAY 21, 2023

In this episode David Yaffe and Johnny Graettinger share the story behind the business and technology and how you can start using it today to build a real-time data lake without all of the headache. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines.

Data Lake

Data Lake Machine Learning Kafka Data Warehouse

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

SQL

SQL Data Lake High Quality Data Machine Learning

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Data Engineering Podcast

AUGUST 3, 2021

In this episode Vinoth shares the history of the project, how its architecture allows for building more frequently updated analytical queries, and the work being done to add a more polished experience to the data lake paradigm. RudderStack’s smart customer data pipeline is warehouse-first.

Data Lake

Data Lake Data Warehouse Hadoop Architecture

What Happens When The Abstractions Leak On Your Data

Data Engineering Podcast

MAY 14, 2023

In this episode the host Tobias Macey shares his reflections on recent experiences where the abstractions leaked and some observances on how to deal with that situation in a data platform architecture. What do you have planned for the future of your data platform? What do you have planned for the future of your data platform?

Data Lake

Data Lake Machine Learning Data Warehouse AWS

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is supported by Code Comments, an original podcast from Red Hat. Data observability has been gaining adoption for a number of years now, with a large focus on data warehouses.

Process

Process Data Lake High Quality Data Machine Learning

The Downfall of the Data Engineer

Maxime Beauchemin

AUGUST 28, 2017

Consensus seeking Whether you think that old-school data warehousing concepts are fading or not, the quest to achieve conformed dimensions and conformed metrics is as relevant as it ever was. The data warehouse needs to reflect the business, and the business should have clarity on how it thinks about analytics.

Data Engineer

Data Engineer Data Engineering Engineering Software Engineer

Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams

Data Engineering Podcast

DECEMBER 28, 2022

In this episode he shares his thoughts on the strategic and tactical elements of moving your work as a data professional from being task-oriented to being product-oriented and the long term improvements in your productivity that it provides. RudderStack helps you build a customer data platform on your warehouse or data lake.

Data Lake

Data Lake Data Warehouse Data Pipeline MongoDB

Building Self Serve Business Intelligence With AI And Semantic Modeling At Zenlytic

Data Engineering Podcast

APRIL 16, 2023

In this episode Paul Blankley and Ryan Janssen explore the power of natural language driven data exploration combined with semantic modeling that enables an intuitive way for everyone in the business to access the data that they need to succeed in their work. Can you describe what Zenlytic is and the story behind it?

Business Intelligence

Business Intelligence Building Data Lake BI

Automate Your Pipeline Creation For Streaming Data Transformations With SQLake

Data Engineering Podcast

JANUARY 8, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data and analytics leaders, 2023 is your year to sharpen your leadership skills, refine your strategies and lead with purpose. Missing data? Can you describe what the SQLake product is and the story behind it?

PostgreSQL

PostgreSQL Data Lake Data Warehouse BI

Building Applications With Data As Code On The DataOS

Data Engineering Podcast

JANUARY 15, 2023

In this episode Srujan Akula explains how the system is implemented and how you can start using it today with your existing data systems. Trusted by the data teams at Fox, JetBlue, and PagerDuty, Monte Carlo solves the costly problem of broken data pipelines. Your flagship (only?) What is the scope and goal of that platform?

Coding

Coding Building PostgreSQL Data Lake

Trends and Takeaways from Banking and Payments’ Event of the Year

Snowflake

NOVEMBER 11, 2024

This is not surprising when you consider all the benefits, such as reducing complexity [and] costs and enabling zero-copy data access (ideal for centralizing data governance). Commercially, we heard AI use cases around treasury services, fraud detection and risk analytics.

Banking

Banking Finance Retail Food

Modern Data Management Essentials: Exploring Data Fabric

Precisely

JULY 18, 2024

Key Takeaways Data Fabric is a modern data architecture that facilitates seamless data access, sharing, and management across an organization. Data management recommendations and data products emerge dynamically from the fabric through automation, activation, and AI/ML analysis of metadata.

Data Management

Data Management Management Metadata Database-centric

Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems

Data Engineering Podcast

DECEMBER 25, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you're ready to build your next pipeline, or want to test out the projects you hear about on the show, you'll need somewhere to deploy it, so check out our friends at Linode. or any other destination you choose.

Machine Learning

Machine Learning Systems Data Lake Data Warehouse

Evaluating Change Data Capture Tools: A Comprehensive Guide

Data Engineering Weekly

AUGUST 6, 2024

As a result, lakehouses support more dynamic and flexible data architectures, catering to a broader range of analytics and operational workloads. For instance, in a fast-paced retail environment, lakehouses can ensure that inventory data remains up-to-date and accurate in the data warehouse, optimizing supply chain efficiency.

Data Lake

Data Lake Data Warehouse Database Data Architecture

Eliminate The Overhead In Your Data Integration With The Open Source dlt Library

Data Engineering Podcast

SEPTEMBER 3, 2023

Summary Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo.

Data Integration

Data Integration BI SQL Python

Build Trust In Your Data By Understanding Where It Comes From And How It Is Used With Stemma

Data Engineering Podcast

AUGUST 10, 2021

In this episode Mark Grover explains what he is building at Stemma, how it expands on the success of the Amundsen project, and why trust is the most important asset for data teams. RudderStack’s smart customer data pipeline is warehouse-first. RudderStack’s smart customer data pipeline is warehouse-first.

IT

IT Building Data Warehouse Python

Data Integrity for AI: What’s Old is New Again

Use Your Data Warehouse To Power Your Product Analytics With NetSpring

Webinars

Trending Sources

Data Warehouses Vs Operational Data Stores Vs Data Lakes – How To Store Your Data For Analytics

Webinars

Building Your Data Warehouse On Top Of PostgreSQL

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Creating Shared Context For Your Data Warehouse With A Controlled Vocabulary

Paving The Road For Fast Analytics On Distributed Clouds With The Yellowbrick Data Warehouse

Realtime Data Applications Made Easier With Meroxa

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Master Data Management: Common Misconceptions You Should Know

Making The Total Cost Of Ownership For External Data Manageable With Crux

Leading The Charge For The ELT Data Integration Pattern For Cloud Data Warehouses At Matillion

Composable data management at Meta

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Data News — Week 24.11

Aligning Velox and Apache Arrow: Towards composable data management

An Exploration Of The Composable Customer Data Platform

Reflecting On The Past 6 Years Of Data Engineering

Data Lake vs. Data Warehouse vs. Data Lakehouse

The High Cost of Poor Data Warehouse Governance

On-Prem vs. The Cloud: Key Considerations

The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse

How Meta understands data at scale

Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service

A Roadmap To Bootstrapping The Data Team At Your Startup

Modern Customer Data Platform Principles

Databricks, Snowflake and the future

Data Mesh vs Data Warehouse: A Guide to Choosing the Right Data Architecture

Keep Your Data Lake Fresh With Real Time Streams Using Estuary

Tackling Real Time Streaming Data With SQL Using RisingWave

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

What Happens When The Abstractions Leak On Your Data

X-Ray Vision For Your Flink Stream Processing With Datorios

The Downfall of the Data Engineer

Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams

Building Self Serve Business Intelligence With AI And Semantic Modeling At Zenlytic

Automate Your Pipeline Creation For Streaming Data Transformations With SQLake

Building Applications With Data As Code On The DataOS

Trends and Takeaways from Banking and Payments’ Event of the Year

Modern Data Management Essentials: Exploring Data Fabric

Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems

Evaluating Change Data Capture Tools: A Comprehensive Guide

Eliminate The Overhead In Your Data Integration With The Open Source dlt Library

Build Trust In Your Data By Understanding Where It Comes From And How It Is Used With Stemma

Stay Connected