Data Lake - Data Engineering Digest

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Traditional databases excelled at structured data and transactional workloads but struggled with performance at scale as data volumes grew.

Data Lake

Data Lake Cloud Storage Metadata Data Warehouse

Databricks Delta Lake: A Scalable Data Lake Solution

ProjectPro

JUNE 6, 2025

Want to process peta-byte scale data with real-time streaming ingestions rates, build 10 times faster data pipelines with 99.999% reliability, witness 20 x improvement in query performance compared to traditional data lakes, enter the world of Databricks Delta Lake now. Delta Lake is a game-changer for big data.

Data Lake

Data Lake Data Warehouse Metadata Unstructured Data

A Comprehensive Guide to Data Lake vs. Data Warehouse

Analytics Vidhya

FEBRUARY 2, 2023

Now, businesses are looking for different types of data storage to store and manage their data effectively. Organizations can collect millions of data, but if they’re lacking in storing that data, those efforts […] The post A Comprehensive Guide to Data Lake vs. Data Warehouse appeared first on Analytics Vidhya.

Data Lake

Data Lake Data Warehouse Data Storage Data

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Setting up Data Lake on GCP using Cloud Storage and BigQuery

Analytics Vidhya

FEBRUARY 25, 2023

Introduction A data lake is a centralized and scalable repository storing structured and unstructured data. The need for a data lake arises from the growing volume, variety, and velocity of data companies need to manage and analyze.

Cloud Storage

Cloud Storage Data Lake Cloud Unstructured Data

How to Build a Data Lake?

ProjectPro

JUNE 6, 2025

This guide is your roadmap to building a data lake from scratch. We'll break down the fundamentals, walk you through the architecture, and share actionable steps to set up a robust and scalable data lake. That’s where data lakes come in. Table of Contents What is a Data Lake?

Data Lake

Data Lake Building Hadoop Raw Data

Keep Your Data Lake Fresh With Real Time Streams Using Estuary

Data Engineering Podcast

MAY 21, 2023

In this episode David Yaffe and Johnny Graettinger share the story behind the business and technology and how you can start using it today to build a real-time data lake without all of the headache. What is the impact of continuous data flows on dags/orchestration of transforms? RudderStack also supports real-time use cases.

Data Lake

Data Lake Kafka Machine Learning Data Warehouse

What is Azure Data Lake?

ProjectPro

JUNE 6, 2025

Many organizations are struggling to store, manage, and analyze data due to its exponential growth. Cloud-based data lakes allow organizations to gather any form of data, whether structured or unstructured, and make this data accessible for usage across various applications, to address these issues.

Data Lake

Data Lake Hadoop Big Data SQL

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

JANUARY 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Data Lake

Data Lake Building High Quality Data AWS

Data Lake vs Data Warehouse - Working Together in the Cloud

ProjectPro

JUNE 6, 2025

“Data Lake vs Data Warehouse = Load First, Think Later vs Think First, Load Later” The terms data lake and data warehouse are frequently stumbled upon when it comes to storing large volumes of data. Data Warehouse Architecture What is a Data lake? What is a Data lake?

Data Lake

Data Lake Data Warehouse Cloud Hadoop

Azure Data Lake Architecture: Migrating Big Data to The Cloud

ProjectPro

JUNE 6, 2025

Did you know that the global data lakes market will likely grow at a CAGR of 29.9% Modern businesses are more likely to make data-driven decisions. Organizations are generating a massive volume of data due to the rise in digitalization. What is Azure Data Lake ? and reach USD 17.60 billion by 2026?

Data Lake

Data Lake Big Data Architecture Cloud

How to Build and Work with AWS Data Lake?

Hevo

JANUARY 17, 2025

Digital tools and technologies help organizations generate large amounts of data daily, requiring efficient governance and management. This is where the AWS data lake comes in. With the AWS data lake, organizations and businesses can store, analyze, and process structured and unstructured data of any size.

Data Lake

Data Lake AWS Unstructured Data Building

Data Warehouses vs. Data Lakes vs. Data Marts: Need Help Deciding?

KDnuggets

OCTOBER 30, 2023

A comparative overview of data warehouses, data lakes, and data marts to help you make informed decisions on data storage solutions for your data architecture.

Data Lake

Data Lake Data Warehouse Data Storage Data

Data warehouses vs Data Lakes vs Databases – Which One Do You Need

Seattle Data Guy

DECEMBER 19, 2022

Whether its helping increase revenue by finding new customers or reducing costs, all of it starts with data.

Data Lake

Data Lake Data Warehouse Database Data Storage

A Data Lake, You Call It? It’s a Data Swamp

KDnuggets

FEBRUARY 5, 2024

How and why the data lake architecture often fails to meet its promises. And how better governance helps mitigate such challenges.

Data Lake

Data Lake IT Government Data

Predictions 2025: AI As Cybersecurity Tool and Target

Snowflake

JANUARY 8, 2025

Responding to data overload with a security data lake Security professionals have to continually up their game to make sure that, from all the data at their disposal, theyre using the correct inputs to identify vulnerabilities and incidents. In it, we discuss three layers of AI that can become an attack surface.

Data Lake

Data Lake Data Security Machine Learning Technology

How to Use Apache Iceberg Tables?

Analytics Vidhya

MARCH 12, 2025

In this article, we will explore the evolution of Iceberg, its key features like ACID transactions, partition evolution, and time travel, and how it integrates with modern data lakes. Well also dive into […] The post How to Use Apache Iceberg Tables? appeared first on Analytics Vidhya.

Data Lake

Data Lake Designing IT Data

Data Integrity for AI: What’s Old is New Again

Precisely

JANUARY 9, 2025

Deploying upstream data profiling, validation, and cleansing rules was required to ensure garbage wasnt coming in, and suddenly organizations were discussing their plans for big data governance when they had yet to figure out how to implement little data governance. A data lake!

Data Integration

Data Integration Hadoop Data Warehouse Data Lake

Data Warehouses Vs Operational Data Stores Vs Data Lakes – How To Store Your Data For Analytics

Seattle Data Guy

AUGUST 2, 2023

A few months ago, I uploaded a video where I discussed data warehouses, data lakes, and transactional databases. However, the world of data management is evolving rapidly, especially with the resurgence of AI and machine learning.

Data Lake

Data Lake Data Warehouse Data Machine Learning

Data lake tools: Top 8 vendors & how to evaluate them

RudderStack

JUNE 6, 2025

Explore top data lake tools to streamline storage, improve analytics, and scale your data architecture.

Data Lake

Data Lake Data Architecture Architecture Data

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. These patterns include both centralized storage patterns like data warehouse , data lake and data lakehouse , and distributed patterns such as data mesh.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. Visit [dataengineeringpodcast.com/data-council]([link] and use code *depod20* to register today!

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Realtime Data Applications Made Easier With Meroxa

Data Engineering Podcast

APRIL 23, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines.

Data Lake

Data Lake Kafka Machine Learning Data Warehouse

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Kafka

Kafka Data Lake High Quality Data SQL

What Is a Lakebase?

databricks

JUNE 11, 2025

Separation of storage and compute : Lakebases store their data in modern data lakes (object stores) in open formats, which enables scaling compute and storage separately, leading to lower TCO and eliminating lock-in. At zero, the cost of the lakebase is just the cost of storing the data on cheap data lakes.

Entertainment

Entertainment Data Lake Manufacturing Retail

Data Engineering Project for Beginners - Batch edition

Start Data Engineering

MAY 11, 2022

Data lake structure 5. Loading user purchase data into the data warehouse 5.2 Loading classified movie review data into the data warehouse 5.3 Introduction 2. Objective 3. Prerequisite 4.2 AWS Infrastructure costs 4.3 Code walkthrough 5.1 Generating user behavior metric 5.4. Checking results 6.

Data Engineering

Data Engineering Data Engineer Project Data Lake

Building a Life Sciences Knowledge Graph with a Data Lake

databricks

JANUARY 26, 2023

We thank Vishnu Vettrivel, Founder, and Alex Thomas, Principal Data Scientist, for their contributions. This is a collaborative post from Databricks and wisecube.ai.

Data Lake

Data Lake Building Data

Cloudera Lakehouse Optimizer Makes it Easier Than Ever to Deliver High-Performance Iceberg Tables

Cloudera

OCTOBER 10, 2024

It combines the flexibility and scalability of data lake storage with the data analytics, data governance, and data management functionality of the data warehouse.

IT

IT Data Lake Metadata Data Warehouse

A Comprehensive Guide on Delta Lake

Analytics Vidhya

FEBRUARY 27, 2023

Introduction Enterprises here and now catalyze vast quantities of data, which can be a high-end source of business intelligence and insight when used appropriately. Delta Lake allows businesses to access and break new data down in real time.

Data Lake

Data Lake Business Intelligence Designing Accessibility

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Data Engineering Podcast

JUNE 25, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines.

Data Engineering

Data Engineering Data Engineer Python Engineering

Stitching Together Enterprise Analytics With Microsoft Fabric

Data Engineering Podcast

JUNE 23, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Data lakes in various forms have been gaining significant popularity as a unified interface to an organization's analytics. When is Fabric the wrong choice?

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

Data stewards can also set up Request for Access (private preview) by setting a new visibility property on objects along with contact details so the right person can easily be reached to grant access.

Data Architecture

Data Architecture Architecture Data Lake Kafka

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Data Pipeline Machine Learning

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Snowflake

NOVEMBER 26, 2024

Before it migrated to Snowflake in 2022, WHOOP was using a catalog of tools — Amazon Redshift for SQL queries and BI tooling, Dremio for a data lake, PostgreSQL databases and others — that had ultimately become expensive to manage and difficult to maintain, let alone scale.

Data Warehouse

Data Warehouse Cloud PostgreSQL Data Lake

The Future of Data Lakehouses: A Fireside Chat with Vinoth Chandar - Founder CEO Onehouse & PMC Chair of Apache Hudi

Data Engineering Weekly

JANUARY 8, 2025

What if your data lake could do more than just store information—what if it could think like a database? As data lakehouses evolve, they transform how enterprises manage, store, and analyze their data.

Data Lake

Data Lake Datasets Retail Data Ingestion

Understanding Change Data Capture (CDC) in MySQL and PostgreSQL: BinLog vs. WAL + Logical Decoding

Towards Data Science

JANUARY 7, 2025

Conclusion By understanding how transaction logs work in MySQL and PostgreSQL, we gain valuable insights into how CDC tools leverage these logs to perform incremental replication to downstream systems such as streaming applications, data lakes, and analytics platforms.

PostgreSQL

PostgreSQL MySQL Data Lake Bytes

Open, Interoperable Storage with Iceberg Tables, Now Generally Available

Snowflake

JUNE 21, 2024

Snowflake is now making it even easier for customers to bring the platform’s usability, performance, governance and many workloads to more data with Iceberg tables (now generally available), unlocking full storage interoperability. Iceberg tables provide compute engine interoperability over a single copy of data.

Data Lake

Data Lake BI Business Intelligence Metadata

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

SQL

SQL Data Lake High Quality Data Data Pipeline

50+ Azure Data Factory Interview Questions and Answers [2025]

ProjectPro

JUNE 6, 2025

Azure Data Factory is a cloud-based, fully managed, serverless ETL and data integration service offered by Microsoft Azure for automating data movement from its native place to, say, a data lake or data warehouse using ETL (extract-transform-load) OR extract-load-transform (ELT).

Data Lake

Data Lake Metadata SQL Datasets

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Hadoop Data Pipeline

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data BI Data Workflow

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform.

Data Lake

Data Lake High Quality Data Metadata Machine Learning

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

ProjectPro

JUNE 6, 2025

It offers a simple and efficient solution for data processing in organizations. It offers users a data integration tool that organizes data from many sources, formats it, and stores it in a single repository, such as data lakes, data warehouses , etc., where it can be used to facilitate business decisions.

AWS

AWS Scala Metadata Data Lake

When And How To Conduct An AI Program

Data Engineering Podcast

MARCH 3, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Programming

Programming Data Lake High Quality Data Data Pipeline

How Apache Iceberg Is Changing the Face of Data Lakes

Databricks Delta Lake: A Scalable Data Lake Solution

Webinars

Trending Sources

A Comprehensive Guide to Data Lake vs. Data Warehouse

Webinars

Setting up Data Lake on GCP using Cloud Storage and BigQuery

How to Build a Data Lake?

Keep Your Data Lake Fresh With Real Time Streams Using Estuary

What is Azure Data Lake?

Build A Data Lake For Your Security Logs With Scanner

Data Lake vs Data Warehouse - Working Together in the Cloud

Azure Data Lake Architecture: Migrating Big Data to The Cloud

Top 15 Azure Data Lake Interview Questions and Answers For 2025

How to Build and Work with AWS Data Lake?

Data Warehouses vs. Data Lakes vs. Data Marts: Need Help Deciding?

Data warehouses vs Data Lakes vs Databases – Which One Do You Need

A Data Lake, You Call It? It’s a Data Swamp

Predictions 2025: AI As Cybersecurity Tool and Target

How to Use Apache Iceberg Tables?

Data Integrity for AI: What’s Old is New Again

Data Warehouses Vs Operational Data Stores Vs Data Lakes – How To Store Your Data For Analytics

Data lake tools: Top 8 vendors & how to evaluate them

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Realtime Data Applications Made Easier With Meroxa

Troubleshooting Kafka In Production

What Is a Lakebase?

Data Engineering Project for Beginners - Batch edition

Building a Life Sciences Knowledge Graph with a Data Lake

Cloudera Lakehouse Optimizer Makes it Easier Than Ever to Deliver High-Performance Iceberg Tables

A Comprehensive Guide on Delta Lake

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Stitching Together Enterprise Analytics With Microsoft Fabric

Simplifying Data Architecture and Security to Accelerate Value

Making Email Better With AI At Shortwave

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

The Future of Data Lakehouses: A Fireside Chat with Vinoth Chandar - Founder CEO Onehouse & PMC Chair of Apache Hudi

Understanding Change Data Capture (CDC) in MySQL and PostgreSQL: BinLog vs. WAL + Logical Decoding

Open, Interoperable Storage with Iceberg Tables, Now Generally Available

Tackling Real Time Streaming Data With SQL Using RisingWave

50+ Azure Data Factory Interview Questions and Answers [2025]

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Being Data Driven At Stripe With Trino And Iceberg

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

When And How To Conduct An AI Program

Stay Connected