Data Lake, Data Management and Data Warehouse

Data Integrity for AI: What’s Old is New Again

Precisely

JANUARY 9, 2025

The goal of this post is to understand how data integrity best practices have been embraced time and time again, no matter the technology underpinning. In the beginning, there was a data warehouse The data warehouse (DW) was an approach to data architecture and structured data management that really hit its stride in the early 1990s.

Data Integration

Data Integration Hadoop Data Warehouse Data Lake

Data Warehouses Vs Operational Data Stores Vs Data Lakes – How To Store Your Data For Analytics

Seattle Data Guy

AUGUST 2, 2023

A few months ago, I uploaded a video where I discussed data warehouses, data lakes, and transactional databases. However, the world of data management is evolving rapidly, especially with the resurgence of AI and machine learning.

Data Lake

Data Lake Data Warehouse Data Machine Learning

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. Customers that require a hybrid of these to support many different tools and languages have built a data lakehouse.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Keep Your Data Lake Fresh With Real Time Streams Using Estuary

Data Engineering Podcast

MAY 21, 2023

In this episode David Yaffe and Johnny Graettinger share the story behind the business and technology and how you can start using it today to build a real-time data lake without all of the headache. What is the impact of continuous data flows on dags/orchestration of transforms? Closing Announcements Thank you for listening!

Data Lake

Data Lake Machine Learning Kafka Data Warehouse

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. To start, can you share your definition of what constitutes a "Data Lakehouse"?

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Realtime Data Applications Made Easier With Meroxa

Data Engineering Podcast

APRIL 23, 2023

In this episode DeVaris Brown discusses the types of applications that are possible when teams don't have to manage the complex infrastructure necessary to support continuous data flows. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines.

Data Lake

Data Lake Kafka Machine Learning Data Warehouse

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Data Engineering Podcast

AUGUST 3, 2021

Summary Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With more real-time requirements and the increasing use of streaming data there has been a struggle to merge fast, incremental updates with large, historical analysis.

Data Lake

Data Lake Data Warehouse Hadoop Architecture

Use Your Data Warehouse To Power Your Product Analytics With NetSpring

Data Engineering Podcast

MARCH 10, 2023

In this episode Priyendra Deshwal explains how NetSpring is designed to empower your product and data teams to build and explore insights around your products in a streamlined and maintainable workflow. Visit: dataengineeringpodcast.com/data-council today! Don't miss out on their only event this year!

Data Warehouse

Data Warehouse Data Lake Machine Learning Data Science

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Data Engineering Podcast

JUNE 25, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. Can you describe what SQLMesh is and the story behind it? DataOps is a term that has been co-opted and overloaded.

Data Engineering

Data Engineering Data Engineer Python Engineering

Maintaining Your Data Lake At Scale With Spark

Data Engineering Podcast

JUNE 16, 2019

Summary Building and maintaining a data lake is a choose your own adventure of tools, services, and evolving best practices. The flexibility and freedom that data lakes provide allows for generating significant value, but it can also lead to anti-patterns and inconsistent quality in your analytics.

Data Lake

Data Lake Lambda Architecture Data Warehouse Hadoop

Straining Your Data Lake Through A Data Mesh

Data Engineering Podcast

JULY 22, 2019

Summary The current trend in data management is to centralize the responsibilities of storing and curating the organization’s information to a data engineering team. This organizational pattern is reinforced by the architectural pattern of data lakes as a solution for managing storage and access.

Data Lake

Data Lake Hadoop Data Architecture

Building Your Data Warehouse On Top Of PostgreSQL

Data Engineering Podcast

MAY 13, 2021

Summary There is a lot of attention on the database market and cloud data warehouses. While they provide a measure of convenience, they also require you to sacrifice a certain amount of control over your data. Firebolt is the fastest cloud data warehouse. Visit dataengineeringpodcast.com/firebolt to get started.

PostgreSQL

PostgreSQL Data Warehouse Building MySQL

Building A Data Lake For The Database Administrator At Upsolver

Data Engineering Podcast

JUNE 1, 2020

Summary Data lakes offer a great deal of flexibility and the potential for reduced cost for your analytics, but they also introduce a great deal of complexity. What used to be entirely managed by the database engine is now a composition of multiple systems that need to be properly configured to work in concert.

Data Lake

Data Lake Database Building Lambda Architecture

Building A Better Data Warehouse For The Cloud At Firebolt

Data Engineering Podcast

AUGUST 31, 2020

Summary Data warehouse technology has been around for decades and has gone through several generational shifts in that time. The current trends in data warehousing are oriented around cloud native architectures that take advantage of dynamic scaling and the separation of compute and storage.

Data Warehouse

Data Warehouse Cloud Building Data Lake

How Upsolver Is Building A Data Lake Platform In The Cloud with Yoni Iny - Episode 56

Data Engineering Podcast

NOVEMBER 11, 2018

Summary A data lake can be a highly valuable resource, as long as it is well built and well managed. In this episode Yoni Iny, CTO of Upsolver, discusses the various components that are necessary for a successful data lake project, how the Upsolver platform is architected, and how modern data lakes can benefit your organization.

Data Lake

Data Lake Building Kafka Cloud

Data Lake vs. Data Warehouse vs. Data Lakehouse

Sync Computing

NOVEMBER 7, 2024

Despite these limitations, data warehouses, introduced in the late 1980s based on ideas developed even earlier, remain in widespread use today for certain business intelligence and data analysis applications. While data warehouses are still in use, they are limited in use-cases as they only support structured data.

Data Lake

Data Lake Data Warehouse Business Intelligence Unstructured Data

Exploring Processing Patterns For Streaming Data Integration In Your Data Lake

Data Engineering Podcast

NOVEMBER 20, 2021

Summary One of the perennial challenges posed by data lakes is how to keep them up to date as new data is collected. In this episode Ori Rafael shares his experiences from Upsolver and building scalable stream processing for integrating and analyzing data, and what the tradeoffs are when coming from a batch oriented mindset.

Data Lake

Data Lake Data Integration Lambda Architecture Process

Data Lakes vs. Data Warehouses

Grouparoo

JANUARY 11, 2022

This article looks at the options available for storing and processing big data, which is too large for conventional databases to handle. There are two main options available, a data lake and a data warehouse. What is a Data Warehouse? What is a Data Lake?

Data Lake

Data Lake Data Warehouse Unstructured Data Raw Data

Scale Your Analytics On The Clickhouse Data Warehouse

Data Engineering Podcast

JULY 8, 2019

Summary The market for data warehouse platforms is large and varied, with options for every use case. It was interesting to learn about some of the custom data types and performance optimizations that are included. For someone getting started with Clickhouse can you describe how they should be thinking about data modeling?

Data Warehouse

Data Warehouse MySQL Hadoop Data Lake

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

SQL

SQL Data Lake High Quality Data Machine Learning

Insights And Advice On Building A Data Lake Platform From Someone Who Learned The Hard Way

Data Engineering Podcast

MAY 15, 2022

Summary Designing a data platform is a complex and iterative undertaking which requires accounting for many conflicting needs. Designing a platform that relies on a data lake as its central architectural tenet adds additional layers of difficulty. What are the elements that are still cumbersome or intractable?

Data Lake

Data Lake Building Architecture BI

An Exploration Of The Composable Customer Data Platform

Data Engineering Podcast

APRIL 9, 2023

When it was difficult to wire together the event collection, data modeling, reporting, and activation it made sense to buy monolithic products that handled every stage of the customer data lifecycle. Now that the data warehouse has taken center stage a new approach of composable customer data platforms is emerging.

Data Lake

Data Lake Data Warehouse Machine Learning Data

Data Lake vs. Data Warehouse: Differences and Similarities

U-Next

SEPTEMBER 7, 2022

The terms “ Data Warehouse ” and “ Data Lake ” may have confused you, and you have some questions. Essentially, this is the difference between a lake and a warehouse. On the other hand, a data warehouse contains historical data that has been cleaned and arranged. .

Data Lake

Data Lake Data Warehouse Unstructured Data Amazon Web Services

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern customer data platform (CDP).

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

A Roadmap To Bootstrapping The Data Team At Your Startup

Data Engineering Podcast

MAY 28, 2023

Ghalib Suleiman has been on both sides of this equation and joins the show to share his hard-won wisdom about how to start and grow a data team in the early days of company growth. Can you start by sharing your conception of the responsibilities of a data team? When is it more practical to outsource the data work?

Data Lake

Data Lake Machine Learning Data Warehouse Education

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is supported by Code Comments, an original podcast from Red Hat. Data lakes are notoriously complex. My thanks to the team at Code Comments for their support.

Process

Process Data Lake High Quality Data Machine Learning

Leading The Charge For The ELT Data Integration Pattern For Cloud Data Warehouses At Matillion

Data Engineering Podcast

MAY 1, 2022

He describes how the platform is architected, the challenges related to selling cloud technologies into enterprise organizations, and how you can adopt Matillion for your own workflows to reduce the maintenance burden of data integration workflows. No more shipping and praying, you can now know exactly what will change in your database!

Data Warehouse

Data Warehouse Data Integration Cloud Google Cloud

Data Lake vs Data Warehouse - Working Together in the Cloud

ProjectPro

AUGUST 11, 2021

“Data Lake vs Data Warehouse = Load First, Think Later vs Think First, Load Later” The terms data lake and data warehouse are frequently stumbled upon when it comes to storing large volumes of data. Data Warehouse Architecture What is a Data lake?

Data Lake

Data Lake Data Warehouse Cloud Hadoop

The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 19, 2023

Summary Cloud data warehouses have unlocked a massive amount of innovation and investment in data applications, but they are still inherently limiting. Because of their complete ownership of your data they constrain the possibilities of what data you can store and how it can be used. We feel your pain.

IT

IT Data Lake Metadata Data Warehouse

Reflecting On The Past 6 Years Of Data Engineering

Data Engineering Podcast

FEBRUARY 5, 2023

Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Sign up now for early access to Materialize and get started with the power of streaming data with the same simplicity and low implementation cost as batch cloud data warehouses.

Data Engineering

Data Engineering Data Engineer Engineering PostgreSQL

What Happens When The Abstractions Leak On Your Data

Data Engineering Podcast

MAY 14, 2023

In this episode the host Tobias Macey shares his reflections on recent experiences where the abstractions leaked and some observances on how to deal with that situation in a data platform architecture. What do you have planned for the future of your data platform? What do you have planned for the future of your data platform?

Data Lake

Data Lake Machine Learning Data Warehouse AWS

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Monte Carlo

AUGUST 25, 2023

Different vendors offering data warehouses, data lakes, and now data lakehouses all offer their own distinct advantages and disadvantages for data teams to consider. So let’s get to the bottom of the big question: what kind of data storage layer will provide the strongest foundation for your data platform?

Data Lake

Data Lake Data Warehouse Unstructured Data Raw Data

Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service

Data Engineering Podcast

JUNE 4, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. Can you describe what Agile Data Engine is and the story behind it?

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

Building Self Serve Business Intelligence With AI And Semantic Modeling At Zenlytic

Data Engineering Podcast

APRIL 16, 2023

In this episode Paul Blankley and Ryan Janssen explore the power of natural language driven data exploration combined with semantic modeling that enables an intuitive way for everyone in the business to access the data that they need to succeed in their work. Can you describe what Zenlytic is and the story behind it?

Business Intelligence

Business Intelligence Building Data Lake BI

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

Track data files within the table along with their column statistics. Open table formats enable efficient data management and retrieval by storing these files chronologically, with a history of DDL and DML actions and an index of data file locations. It can also be integrated into major data platforms like Snowflake.

Architecture

Architecture Systems Data Lake Google Cloud

Data Lake vs Data Warehouse: How to choose?

Hevo

SEPTEMBER 4, 2024

Currently, data management is a continually developing field that requires careful consideration when deciding which solution should be implemented to store, process, and analyze data effectively. There are two forms that are frequently selected: data warehouse vs data lake.

Data Lake

Data Lake Data Warehouse Data Data Management

Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams

Data Engineering Podcast

DECEMBER 28, 2022

In this episode he shares his thoughts on the strategic and tactical elements of moving your work as a data professional from being task-oriented to being product-oriented and the long term improvements in your productivity that it provides. RudderStack helps you build a customer data platform on your warehouse or data lake.

Data Lake

Data Lake Data Warehouse Data Pipeline MongoDB

Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems

Data Engineering Podcast

DECEMBER 25, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you're ready to build your next pipeline, or want to test out the projects you hear about on the show, you'll need somewhere to deploy it, so check out our friends at Linode. or any other destination you choose.

Machine Learning

Machine Learning Systems Data Lake Data Warehouse

How Column-Aware Development Tooling Yields Better Data Models

Data Engineering Podcast

JUNE 17, 2023

In this episode Satish Jayanthi explores the benefits of incorporating column-aware tooling in the data modeling process. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake.

Data Lake

Data Lake Machine Learning Metadata Data Architecture

Evaluating Change Data Capture Tools: A Comprehensive Guide

Data Engineering Weekly

AUGUST 6, 2024

The Dominance of the Lakehouses and the Mutation Support Lakehouses have become a standard pattern in data infrastructure, combining the best features of data lakes and warehouses. Unlike data lakes, which are predominantly append-only, lakehouses support data mutation natively.

Data Lake

Data Lake Data Warehouse Database Data Architecture

Build Better Tests For Your dbt Projects With Datafold And data-diff

Data Engineering Podcast

JUNE 11, 2023

In this episode Gleb Mezhanskiy shares some valuable advice and insights into how you can build reliable and well-tested data assets with dbt and data-diff. July 2021 and July 2022 about data-diff) What are the roadblocks to data testing/validation that you see teams run into most often?

Project

Project Building Data Lake Machine Learning

Automate Your Pipeline Creation For Streaming Data Transformations With SQLake

Data Engineering Podcast

JANUARY 8, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data and analytics leaders, 2023 is your year to sharpen your leadership skills, refine your strategies and lead with purpose. Missing data? Can you describe what the SQLake product is and the story behind it?

PostgreSQL

PostgreSQL Data Lake Data Warehouse BI

Building Applications With Data As Code On The DataOS

Data Engineering Podcast

JANUARY 15, 2023

In this episode Srujan Akula explains how the system is implemented and how you can start using it today with your existing data systems. Trusted by the data teams at Fox, JetBlue, and PagerDuty, Monte Carlo solves the costly problem of broken data pipelines. Your flagship (only?) What is the scope and goal of that platform?

Coding

Coding Building PostgreSQL Data Lake

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Data Engineering Podcast

DECEMBER 18, 2022

In this episode Abe Gong brings his experiences with the Great Expectations project and community to discuss the technical and organizational considerations involved in implementing these constraints to your data workflows. Can you describe what your conception of a data contract is? Closing Announcements Thank you for listening!

Metadata

Metadata Business Intelligence Data Lake BI

Data Integrity for AI: What’s Old is New Again

Data Warehouses Vs Operational Data Stores Vs Data Lakes – How To Store Your Data For Analytics

Webinars

Trending Sources

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Webinars

Keep Your Data Lake Fresh With Real Time Streams Using Estuary

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Realtime Data Applications Made Easier With Meroxa

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Use Your Data Warehouse To Power Your Product Analytics With NetSpring

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Maintaining Your Data Lake At Scale With Spark

Straining Your Data Lake Through A Data Mesh

Building Your Data Warehouse On Top Of PostgreSQL

Building A Data Lake For The Database Administrator At Upsolver

Building A Better Data Warehouse For The Cloud At Firebolt

How Upsolver Is Building A Data Lake Platform In The Cloud with Yoni Iny - Episode 56

Data Lake vs. Data Warehouse vs. Data Lakehouse

Exploring Processing Patterns For Streaming Data Integration In Your Data Lake

Data Lakes vs. Data Warehouses

Scale Your Analytics On The Clickhouse Data Warehouse

Tackling Real Time Streaming Data With SQL Using RisingWave

Insights And Advice On Building A Data Lake Platform From Someone Who Learned The Hard Way

An Exploration Of The Composable Customer Data Platform

Data Lake vs. Data Warehouse: Differences and Similarities

Modern Customer Data Platform Principles

A Roadmap To Bootstrapping The Data Team At Your Startup

X-Ray Vision For Your Flink Stream Processing With Datorios

Leading The Charge For The ELT Data Integration Pattern For Cloud Data Warehouses At Matillion

Data Lake vs Data Warehouse - Working Together in the Cloud

The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse

Reflecting On The Past 6 Years Of Data Engineering

What Happens When The Abstractions Leak On Your Data

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service

Building Self Serve Business Intelligence With AI And Semantic Modeling At Zenlytic

Why Open Table Format Architecture is Essential for Modern Data Systems

Data Lake vs Data Warehouse: How to choose?

Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams

Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems

How Column-Aware Development Tooling Yields Better Data Models

Evaluating Change Data Capture Tools: A Comprehensive Guide

Build Better Tests For Your dbt Projects With Datafold And data-diff

Automate Your Pipeline Creation For Streaming Data Transformations With SQLake

Building Applications With Data As Code On The DataOS

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Stay Connected