Data Lake - Data Engineering Digest

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Traditional databases excelled at structured data and transactional workloads but struggled with performance at scale as data volumes grew.

Data Lake

Data Lake Cloud Storage Metadata Data Warehouse

A Comprehensive Guide to Data Lake vs. Data Warehouse

Analytics Vidhya

FEBRUARY 2, 2023

Now, businesses are looking for different types of data storage to store and manage their data effectively. Organizations can collect millions of data, but if they’re lacking in storing that data, those efforts […] The post A Comprehensive Guide to Data Lake vs. Data Warehouse appeared first on Analytics Vidhya.

Data Lake

Data Lake Data Warehouse Data Storage Data

Setting up Data Lake on GCP using Cloud Storage and BigQuery

Analytics Vidhya

FEBRUARY 25, 2023

Introduction A data lake is a centralized and scalable repository storing structured and unstructured data. The need for a data lake arises from the growing volume, variety, and velocity of data companies need to manage and analyze.

Cloud Storage

Cloud Storage Data Lake Cloud Unstructured Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Keep Your Data Lake Fresh With Real Time Streams Using Estuary

Data Engineering Podcast

MAY 21, 2023

In this episode David Yaffe and Johnny Graettinger share the story behind the business and technology and how you can start using it today to build a real-time data lake without all of the headache. What is the impact of continuous data flows on dags/orchestration of transforms? RudderStack also supports real-time use cases.

Data Lake

Data Lake Machine Learning Kafka Data Warehouse

Data Lake / Lakehouse Guide: Powered by Data Lake Table Formats (Delta Lake, Iceberg, Hudi)

Simon Späti

AUGUST 25, 2022

Image by Rachel Claire on Pexels Ever wanted or been asked to build an open-source Data Lake offloading data for analytics? Didn’t know the difference between a Data Lakehouse and a Data Warehouse? Asked yourself what components and features would that include.

Data Lake

Data Lake Data Warehouse Government Data

Data Lake / Lakehouse Guide: Powered by Data Lake Table Formats (Delta Lake, Iceberg, Hudi)

Simon Späti

SEPTEMBER 30, 2022

Image by Rachel Claire on Pexels Ever wanted or been asked to build an open-source Data Lake offloading data for analytics? Didn’t know the difference between a Data Lakehouse and a Data Warehouse? Asked yourself what components and features would that include.

Data Lake

Data Lake Data Warehouse Government Data

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

JANUARY 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Data Lake

Data Lake Building High Quality Data AWS

How to Build and Work with AWS Data Lake?

Hevo

JANUARY 17, 2025

Digital tools and technologies help organizations generate large amounts of data daily, requiring efficient governance and management. This is where the AWS data lake comes in. With the AWS data lake, organizations and businesses can store, analyze, and process structured and unstructured data of any size.

Data Lake

Data Lake AWS Unstructured Data Building

A Data Lake, You Call It? It’s a Data Swamp

KDnuggets

FEBRUARY 5, 2024

How and why the data lake architecture often fails to meet its promises. And how better governance helps mitigate such challenges.

Data Lake

Data Lake IT Government Data

Data Warehouses vs. Data Lakes vs. Data Marts: Need Help Deciding?

KDnuggets

OCTOBER 30, 2023

A comparative overview of data warehouses, data lakes, and data marts to help you make informed decisions on data storage solutions for your data architecture.

Data Lake

Data Lake Data Warehouse Data Storage Data

What is the difference between a data lake and a data warehouse?

Start Data Engineering

APRIL 12, 2022

Introduction Data lakes and data warehouses Data lake Data warehouse Criteria to choose lake and warehouse tools Conclusion Further reading References Introduction With the data ecosystem growing fast, new terms are coming up every week.

Data Lake

Data Lake Data Warehouse Data

Data warehouses vs Data Lakes vs Databases – Which One Do You Need

Seattle Data Guy

DECEMBER 19, 2022

Whether its helping increase revenue by finding new customers or reducing costs, all of it starts with data.

Data Lake

Data Lake Data Warehouse Database Data Storage

Predictions 2025: AI As Cybersecurity Tool and Target

Snowflake

JANUARY 8, 2025

Responding to data overload with a security data lake Security professionals have to continually up their game to make sure that, from all the data at their disposal, theyre using the correct inputs to identify vulnerabilities and incidents. In it, we discuss three layers of AI that can become an attack surface.

Data Lake

Data Lake Data Security Machine Learning Technology

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Data Engineering Podcast

AUGUST 3, 2021

Summary Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With more real-time requirements and the increasing use of streaming data there has been a struggle to merge fast, incremental updates with large, historical analysis.

Data Lake

Data Lake Data Warehouse Hadoop Architecture

How to Use Apache Iceberg Tables?

Analytics Vidhya

MARCH 12, 2025

In this article, we will explore the evolution of Iceberg, its key features like ACID transactions, partition evolution, and time travel, and how it integrates with modern data lakes. Well also dive into […] The post How to Use Apache Iceberg Tables? appeared first on Analytics Vidhya.

Data Lake

Data Lake Designing IT Data

Data Warehouses Vs Operational Data Stores Vs Data Lakes – How To Store Your Data For Analytics

Seattle Data Guy

AUGUST 2, 2023

A few months ago, I uploaded a video where I discussed data warehouses, data lakes, and transactional databases. However, the world of data management is evolving rapidly, especially with the resurgence of AI and machine learning.

Data Lake

Data Lake Data Warehouse Data Machine Learning

Data Integrity for AI: What’s Old is New Again

Precisely

JANUARY 9, 2025

Deploying upstream data profiling, validation, and cleansing rules was required to ensure garbage wasnt coming in, and suddenly organizations were discussing their plans for big data governance when they had yet to figure out how to implement little data governance. A data lake!

Data Integration

Data Integration Hadoop Data Warehouse Data Lake

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. These patterns include both centralized storage patterns like data warehouse , data lake and data lakehouse , and distributed patterns such as data mesh.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. Visit [dataengineeringpodcast.com/data-council]([link] and use code *depod20* to register today!

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Data Lakes and SQL: A Match Made in Data Heaven

KDnuggets

JANUARY 16, 2023

In this article, we will discuss the benefits of using SQL with a data lake and how it can help organizations unlock the full potential of their data.

Data Lake

Data Lake SQL Data IT

Data Access API over Data Lake Tables Without the Complexity

Towards Data Science

SEPTEMBER 27, 2023

Data Access API over Data Lake Tables Without the Complexity Build a robust GraphQL API service on top of your S3 data lake files with DuckDB and Go Photo by Joshua Sortino on Unsplash 1. This data might be primarily used for internal reporting, but might also be valuable for other services in our organization.

Data Lake

Data Lake Accessible Accessibility SQL

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Kafka

Kafka Data Lake High Quality Data SQL

Realtime Data Applications Made Easier With Meroxa

Data Engineering Podcast

APRIL 23, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines.

Data Lake

Data Lake Kafka Machine Learning Data Warehouse

Data Engineering Project for Beginners - Batch edition

Start Data Engineering

MAY 11, 2022

Data lake structure 5. Loading user purchase data into the data warehouse 5.2 Loading classified movie review data into the data warehouse 5.3 Introduction 2. Objective 3. Prerequisite 4.2 AWS Infrastructure costs 4.3 Code walkthrough 5.1 Generating user behavior metric 5.4. Checking results 6.

Data Engineering

Data Engineering Data Engineer Project Data Lake

Enabling Security for Hadoop Data Lake on Google Cloud Storage

Uber Engineering

OCTOBER 27, 2024

Ready to boost your Hadoop Data Lake security on GCP? Our latest blog dives into enabling security for Uber’s modernized batch data lake on Google Cloud Storage!

Cloud Storage

Cloud Storage Google Cloud Data Lake Hadoop

Building a Life Sciences Knowledge Graph with a Data Lake

databricks

JANUARY 26, 2023

We thank Vishnu Vettrivel, Founder, and Alex Thomas, Principal Data Scientist, for their contributions. This is a collaborative post from Databricks and wisecube.ai.

Data Lake

Data Lake Building Data

A Comprehensive Guide on Delta Lake

Analytics Vidhya

FEBRUARY 27, 2023

Introduction Enterprises here and now catalyze vast quantities of data, which can be a high-end source of business intelligence and insight when used appropriately. Delta Lake allows businesses to access and break new data down in real time.

Data Lake

Data Lake Business Intelligence Designing Accessible

DataMesh: How Uber laid the foundations for the data lake cloud migration

Uber Engineering

OCTOBER 27, 2024

Learn how Uber is streamlining the Cloud migration of its massive Data Lake by incorporating key Data Mesh principles.

Data Lake

Data Lake Cloud Data IT

Stitching Together Enterprise Analytics With Microsoft Fabric

Data Engineering Podcast

JUNE 23, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Data lakes in various forms have been gaining significant popularity as a unified interface to an organization's analytics. When is Fabric the wrong choice?

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Data Engineering Podcast

JUNE 25, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines.

Data Engineering

Data Engineering Data Engineer Python Engineering

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Machine Learning Data Pipeline

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

Data stewards can also set up Request for Access (private preview) by setting a new visibility property on objects along with contact details so the right person can easily be reached to grant access.

Data Architecture

Data Architecture Architecture Data Lake Kafka

The Future of Data Lakehouses: A Fireside Chat with Vinoth Chandar - Founder CEO Onehouse & PMC Chair of Apache Hudi

Data Engineering Weekly

JANUARY 8, 2025

What if your data lake could do more than just store information—what if it could think like a database? As data lakehouses evolve, they transform how enterprises manage, store, and analyze their data.

Data Lake

Data Lake Datasets Retail Data Ingestion

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Snowflake

NOVEMBER 26, 2024

Before it migrated to Snowflake in 2022, WHOOP was using a catalog of tools — Amazon Redshift for SQL queries and BI tooling, Dremio for a data lake, PostgreSQL databases and others — that had ultimately become expensive to manage and difficult to maintain, let alone scale.

Data Warehouse

Data Warehouse Cloud PostgreSQL Hadoop

Open, Interoperable Storage with Iceberg Tables, Now Generally Available

Snowflake

JUNE 21, 2024

Snowflake is now making it even easier for customers to bring the platform’s usability, performance, governance and many workloads to more data with Iceberg tables (now generally available), unlocking full storage interoperability. Iceberg tables provide compute engine interoperability over a single copy of data.

Data Lake

Data Lake BI Business Intelligence Metadata

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Edureka

APRIL 22, 2025

It incorporates elements from several Microsoft products working together, like Power BI, Azure Synapse Analytics, Data Factory, and OneLake, into a single SaaS experience. No matter the workload, Fabric stores all data on OneLake, a single, unified data lake built on the Delta Lake model.

BI

BI Pipeline-centric Data Lake Google Cloud

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

SQL

SQL Data Lake High Quality Data Machine Learning

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Building a Data Lake on PB scale with Apache Spark

Towards Data Science

JANUARY 26, 2023

How we deal with Big Data at Emplifi Continue reading on Towards Data Science »

Data Lake

Data Lake Building Data Science Big Data

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Building

Building Data Lake High Quality Data Machine Learning

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform.

Data Lake

Data Lake High Quality Data Metadata Machine Learning

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data BI Data Workflow

When And How To Conduct An AI Program

Data Engineering Podcast

MARCH 3, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Programming

Programming Data Lake High Quality Data Machine Learning

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Database

Database Technology Data Lake High Quality Data

How Apache Iceberg Is Changing the Face of Data Lakes

A Comprehensive Guide to Data Lake vs. Data Warehouse

Webinars

Trending Sources

Setting up Data Lake on GCP using Cloud Storage and BigQuery

Webinars

Keep Your Data Lake Fresh With Real Time Streams Using Estuary

Data Lake / Lakehouse Guide: Powered by Data Lake Table Formats (Delta Lake, Iceberg, Hudi)

Data Lake / Lakehouse Guide: Powered by Data Lake Table Formats (Delta Lake, Iceberg, Hudi)

Build A Data Lake For Your Security Logs With Scanner

How to Build and Work with AWS Data Lake?

A Data Lake, You Call It? It’s a Data Swamp

Data Warehouses vs. Data Lakes vs. Data Marts: Need Help Deciding?

What is the difference between a data lake and a data warehouse?

Data warehouses vs Data Lakes vs Databases – Which One Do You Need

Predictions 2025: AI As Cybersecurity Tool and Target

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

How to Use Apache Iceberg Tables?

Data Warehouses Vs Operational Data Stores Vs Data Lakes – How To Store Your Data For Analytics

Data Integrity for AI: What’s Old is New Again

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Lakes and SQL: A Match Made in Data Heaven

Data Access API over Data Lake Tables Without the Complexity

Troubleshooting Kafka In Production

Realtime Data Applications Made Easier With Meroxa

Data Engineering Project for Beginners - Batch edition

Enabling Security for Hadoop Data Lake on Google Cloud Storage

Building a Life Sciences Knowledge Graph with a Data Lake

A Comprehensive Guide on Delta Lake

DataMesh: How Uber laid the foundations for the data lake cloud migration

Stitching Together Enterprise Analytics With Microsoft Fabric

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Making Email Better With AI At Shortwave

Simplifying Data Architecture and Security to Accelerate Value

The Future of Data Lakehouses: A Fireside Chat with Vinoth Chandar - Founder CEO Onehouse & PMC Chair of Apache Hudi

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Open, Interoperable Storage with Iceberg Tables, Now Generally Available

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Tackling Real Time Streaming Data With SQL Using RisingWave

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Building a Data Lake on PB scale with Apache Spark

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Being Data Driven At Stripe With Trino And Iceberg

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

When And How To Conduct An AI Program

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Stay Connected