Data and Data Lake - Data Engineering Digest

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Traditional databases excelled at structured data and transactional workloads but struggled with performance at scale as data volumes grew.

Data Lake

Data Lake Cloud Storage Metadata Data Warehouse

A Comprehensive Guide to Data Lake vs. Data Warehouse

Analytics Vidhya

FEBRUARY 2, 2023

Introduction In this constantly growing era, the volume of data is increasing rapidly, and tons of data points are produced every second. Now, businesses are looking for different types of data storage to store and manage their data effectively.

Data Lake

Data Lake Data Warehouse Data Storage Data

Setting up Data Lake on GCP using Cloud Storage and BigQuery

Analytics Vidhya

FEBRUARY 25, 2023

Introduction A data lake is a centralized and scalable repository storing structured and unstructured data. The need for a data lake arises from the growing volume, variety, and velocity of data companies need to manage and analyze.

Cloud Storage

Cloud Storage Data Lake Cloud Unstructured Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Keep Your Data Lake Fresh With Real Time Streams Using Estuary

Data Engineering Podcast

MAY 21, 2023

Summary Batch vs. streaming is a long running debate in the world of data integration and transformation. In this episode David Yaffe and Johnny Graettinger share the story behind the business and technology and how you can start using it today to build a real-time data lake without all of the headache.

Data Lake

Data Lake Machine Learning Kafka Data Warehouse

Data Integrity for AI: What’s Old is New Again

Precisely

JANUARY 9, 2025

Does the LLM capture all the relevant data and context required for it to deliver useful insights? Not to mention the crazy stories about Gen AI making up answers without the data to back it up!) Are we allowed to use all the data, or are there copyright or privacy concerns? But simply moving the data wasnt enough.

Data Integration

Data Integration Hadoop Data Warehouse Data Lake

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

JANUARY 28, 2024

Summary Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. Cliff Crosland co-founded Scanner to provide fast querying of high scale log data for security auditing. SIEM) A query engine is useless without data to analyze.

Data Lake

Data Lake Building High Quality Data AWS

Data Lake / Lakehouse Guide: Powered by Data Lake Table Formats (Delta Lake, Iceberg, Hudi)

Simon Späti

AUGUST 25, 2022

Image by Rachel Claire on Pexels Ever wanted or been asked to build an open-source Data Lake offloading data for analytics? Didn’t know the difference between a Data Lakehouse and a Data Warehouse? Asked yourself what components and features would that include.

Data Lake

Data Lake Data Warehouse Government Data

Data Lake / Lakehouse Guide: Powered by Data Lake Table Formats (Delta Lake, Iceberg, Hudi)

Simon Späti

SEPTEMBER 30, 2022

Image by Rachel Claire on Pexels Ever wanted or been asked to build an open-source Data Lake offloading data for analytics? Didn’t know the difference between a Data Lakehouse and a Data Warehouse? Asked yourself what components and features would that include.

Data Lake

Data Lake Data Warehouse Government Data

Data Warehouses vs. Data Lakes vs. Data Marts: Need Help Deciding?

KDnuggets

OCTOBER 30, 2023

A comparative overview of data warehouses, data lakes, and data marts to help you make informed decisions on data storage solutions for your data architecture.

Data Lake

Data Lake Data Warehouse Data Storage Data

A Data Lake, You Call It? It’s a Data Swamp

KDnuggets

FEBRUARY 5, 2024

How and why the data lake architecture often fails to meet its promises. And how better governance helps mitigate such challenges.

Data Lake

Data Lake IT Government Data

What is the difference between a data lake and a data warehouse?

Start Data Engineering

APRIL 12, 2022

Introduction Data lakes and data warehouses Data lake Data warehouse Criteria to choose lake and warehouse tools Conclusion Further reading References Introduction With the data ecosystem growing fast, new terms are coming up every week.

Data Lake

Data Lake Data Warehouse Data

Data warehouses vs Data Lakes vs Databases – Which One Do You Need

Seattle Data Guy

DECEMBER 19, 2022

By Reseun McClendon Today, your enterprise must effectively collect, store, and integrate data from disparate sources to both provide operational and analytical benefits. Whether its helping increase revenue by finding new customers or reducing costs, all of it starts with data.

Data Lake

Data Lake Data Warehouse Database Data Storage

How to Build and Work with AWS Data Lake?

Hevo

JANUARY 17, 2025

Digital tools and technologies help organizations generate large amounts of data daily, requiring efficient governance and management. This is where the AWS data lake comes in. With the AWS data lake, organizations and businesses can store, analyze, and process structured and unstructured data of any size.

Data Lake

Data Lake AWS Unstructured Data Building

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. Join in with the event for the global data community, Data Council Austin.

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Data Engineering Podcast

AUGUST 3, 2021

Summary Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With more real-time requirements and the increasing use of streaming data there has been a struggle to merge fast, incremental updates with large, historical analysis.

Data Lake

Data Lake Data Warehouse Hadoop Architecture

Data Warehouses Vs Operational Data Stores Vs Data Lakes – How To Store Your Data For Analytics

Seattle Data Guy

AUGUST 2, 2023

A few months ago, I uploaded a video where I discussed data warehouses, data lakes, and transactional databases. However, the world of data management is evolving rapidly, especially with the resurgence of AI and machine learning.

Data Lake

Data Lake Data Warehouse Data Machine Learning

Realtime Data Applications Made Easier With Meroxa

Data Engineering Podcast

APRIL 23, 2023

Meroxa was created to enable teams of all sizes to deliver real-time data applications. In this episode DeVaris Brown discusses the types of applications that are possible when teams don't have to manage the complex infrastructure necessary to support continuous data flows. Can you describe what Meroxa is and the story behind it?

Data Lake

Data Lake Kafka Machine Learning Data Warehouse

Predictions 2025: AI As Cybersecurity Tool and Target

Snowflake

JANUARY 8, 2025

Advanced AI will open up new attack vectors and also deliver new tools for protecting an organizations data. But the underlying challenge is the sheer quantity of data that overworked cybersecurity teams face as they try to answer basic questions such as, Are we under attack?

Data Lake

Data Lake Data Security Machine Learning Technology

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. These patterns include both centralized storage patterns like data warehouse , data lake and data lakehouse , and distributed patterns such as data mesh.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

How to Use Apache Iceberg Tables?

Analytics Vidhya

MARCH 12, 2025

In this article, we will explore the evolution of Iceberg, its key features like ACID transactions, partition evolution, and time travel, and how it integrates with modern data lakes. Well also dive into […] The post How to Use Apache Iceberg Tables? appeared first on Analytics Vidhya.

Data Lake

Data Lake Designing IT Data

Data Engineering Project for Beginners - Batch edition

Start Data Engineering

MAY 11, 2022

Data lake structure 5. Loading user purchase data into the data warehouse 5.2 Loading classified movie review data into the data warehouse 5.3 Introduction 2. Objective 3. Prerequisite 4.2 AWS Infrastructure costs 4.3 Code walkthrough 5.1 Generating user behavior metric 5.4. Checking results 6.

Data Engineering

Data Engineering Data Engineer Project Data Lake

Data Lakes and SQL: A Match Made in Data Heaven

KDnuggets

JANUARY 16, 2023

In this article, we will discuss the benefits of using SQL with a data lake and how it can help organizations unlock the full potential of their data.

Data Lake

Data Lake SQL Data IT

Data Access API over Data Lake Tables Without the Complexity

Towards Data Science

SEPTEMBER 27, 2023

Data Access API over Data Lake Tables Without the Complexity Build a robust GraphQL API service on top of your S3 data lake files with DuckDB and Go Photo by Joshua Sortino on Unsplash 1. This data might be primarily used for internal reporting, but might also be valuable for other services in our organization.

Data Lake

Data Lake Accessible Accessibility SQL

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Snowflake

NOVEMBER 26, 2024

Many of our customers — from Marriott to AT&T — start their journey with the Snowflake AI Data Cloud by migrating their data warehousing workloads to the platform. Today we’re focusing on customers who migrated from a cloud data warehouse to Snowflake and some of the benefits they saw. million in cost savings annually.

Data Warehouse

Data Warehouse Cloud PostgreSQL Hadoop

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Data Engineering Podcast

JUNE 25, 2023

Summary Data transformation is a key activity for all of the organizational roles that interact with data. Because of its importance and outsized impact on what is possible for downstream data consumers it is critical that everyone is able to collaborate seamlessly. Can you describe what SQLMesh is and the story behind it?

Data Engineering

Data Engineering Data Engineer Python Engineering

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

It’s easy these days for an organization’s data infrastructure to begin looking like a maze, with an accumulation of point solutions here and there. Snowflake is committed to doing just that by continually adding features to help our customers simplify how they architect their data infrastructure. Here’s a closer look.

Data Architecture

Data Architecture Architecture Data Lake Kafka

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. Data lakes are notoriously complex.

Kafka

Kafka Data Lake High Quality Data SQL

Building a Life Sciences Knowledge Graph with a Data Lake

databricks

JANUARY 26, 2023

We thank Vishnu Vettrivel, Founder, and Alex Thomas, Principal Data Scientist, for their contributions. This is a collaborative post from Databricks and wisecube.ai.

Data Lake

Data Lake Building Data

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

SQL

SQL Data Lake High Quality Data Machine Learning

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

Snowflake

DECEMBER 4, 2024

Together with a dozen experts and leaders at Snowflake, I have done exactly that, and today we debut the result: the “ Snowflake Data + AI Predictions 2024 ” report. When you’re running a large language model, you need observability into how the model may change as it ingests new data. The next evolution in data is making it AI ready.

Unstructured Data

Unstructured Data Data Lake Deep Learning Structured Data

The Future of Data Lakehouses: A Fireside Chat with Vinoth Chandar - Founder CEO Onehouse & PMC Chair of Apache Hudi

Data Engineering Weekly

JANUARY 8, 2025

What if your data lake could do more than just store information—what if it could think like a database? As data lakehouses evolve, they transform how enterprises manage, store, and analyze their data. represented a significant leap forward in data lakehouse technology. Exploring Apache Hudi 1.0:

Data Lake

Data Lake Datasets Retail Data Ingestion

Data Warehousing Essentials: A Guide To Data Warehousing

Seattle Data Guy

FEBRUARY 10, 2024

Photo by Tiger Lily Data warehouses and data lakes play a crucial role for many businesses. It gives businesses access to the data from all of their various systems. As well as often integrating data so that end-users can answer business critical questions.

Data Lake

Data Lake Data Warehouse Data Accessibility

Enabling Security for Hadoop Data Lake on Google Cloud Storage

Uber Engineering

OCTOBER 27, 2024

Ready to boost your Hadoop Data Lake security on GCP? Our latest blog dives into enabling security for Uber’s modernized batch data lake on Google Cloud Storage!

Cloud Storage

Cloud Storage Google Cloud Data Lake Hadoop

A Comprehensive Guide on Delta Lake

Analytics Vidhya

FEBRUARY 27, 2023

Introduction Enterprises here and now catalyze vast quantities of data, which can be a high-end source of business intelligence and insight when used appropriately. Delta Lake allows businesses to access and break new data down in real time.

Data Lake

Data Lake Business Intelligence Designing Accessibility

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

Summary Stripe is a company that relies on data to power their products and business. In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform.

Data Lake

Data Lake High Quality Data Metadata Machine Learning

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

Summary A core differentiator of Dagster in the ecosystem of data orchestration is their focus on software defined assets as a means of building declarative workflows. Data lakes are notoriously complex. Your first 30 days are free! Want to see Starburst in action? What problems are you trying to solve with Dagster+?

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Release Management For Data Platform Services And Logic

Data Engineering Podcast

MAY 12, 2024

Summary Building a data platform is a substrantial engineering endeavor. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is supported by Code Comments, an original podcast from Red Hat. Data lakes are notoriously complex.

Management

Management Data Lake High Quality Data Machine Learning

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Want to see Starburst in action?

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

DataMesh: How Uber laid the foundations for the data lake cloud migration

Uber Engineering

OCTOBER 27, 2024

Learn how Uber is streamlining the Cloud migration of its massive Data Lake by incorporating key Data Mesh principles.

Data Lake

Data Lake Cloud Data IT

Data Preparation with SQL Cheatsheet

KDnuggets

JUNE 27, 2022

If your raw data is in a SQL-based data lake, why spend the time and money to export the data into a new platform for data prep?

Data Preparation

Data Preparation SQL Raw Data Data Lake

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Summary Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. Dagster offers a new approach to building and running data platforms and data pipelines.

Data Lake

Data Lake High Quality Data BI Data Workflow

Version Your Data Lakehouse Like Your Software With Nessie

Data Engineering Podcast

MARCH 10, 2024

Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. Data lakes are notoriously complex. Your first 30 days are free! Want to see Starburst in action?

Data Lake

Data Lake High Quality Data Architecture Machine Learning

Stitching Together Enterprise Analytics With Microsoft Fabric

Data Engineering Podcast

JUNE 23, 2024

Summary Data lakehouse architectures have been gaining significant adoption. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. What are the benefits of embedding Copilot into the data engine?

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Data Engineering Podcast

JUNE 30, 2024

Petr shares his journey from being an engineer to founding Synq, emphasizing the importance of treating data systems with the same rigor as engineering systems. He discusses the challenges and solutions in data reliability, including the need for transparency and ownership in data systems. Want to see Starburst in action?

Pipeline-centric

Pipeline-centric Engineering Data Lake High Quality Data

How Apache Iceberg Is Changing the Face of Data Lakes

A Comprehensive Guide to Data Lake vs. Data Warehouse

Webinars

Trending Sources

Setting up Data Lake on GCP using Cloud Storage and BigQuery

Webinars

Keep Your Data Lake Fresh With Real Time Streams Using Estuary

Data Integrity for AI: What’s Old is New Again

Build A Data Lake For Your Security Logs With Scanner

Data Lake / Lakehouse Guide: Powered by Data Lake Table Formats (Delta Lake, Iceberg, Hudi)

Data Lake / Lakehouse Guide: Powered by Data Lake Table Formats (Delta Lake, Iceberg, Hudi)

Data Warehouses vs. Data Lakes vs. Data Marts: Need Help Deciding?

A Data Lake, You Call It? It’s a Data Swamp

What is the difference between a data lake and a data warehouse?

Data warehouses vs Data Lakes vs Databases – Which One Do You Need

How to Build and Work with AWS Data Lake?

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Data Warehouses Vs Operational Data Stores Vs Data Lakes – How To Store Your Data For Analytics

Realtime Data Applications Made Easier With Meroxa

Predictions 2025: AI As Cybersecurity Tool and Target

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

How to Use Apache Iceberg Tables?

Data Engineering Project for Beginners - Batch edition

Data Lakes and SQL: A Match Made in Data Heaven

Data Access API over Data Lake Tables Without the Complexity

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Simplifying Data Architecture and Security to Accelerate Value

Troubleshooting Kafka In Production

Building a Life Sciences Knowledge Graph with a Data Lake

Tackling Real Time Streaming Data With SQL Using RisingWave

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

The Future of Data Lakehouses: A Fireside Chat with Vinoth Chandar - Founder CEO Onehouse & PMC Chair of Apache Hudi

Data Warehousing Essentials: A Guide To Data Warehousing

Enabling Security for Hadoop Data Lake on Google Cloud Storage

A Comprehensive Guide on Delta Lake

Being Data Driven At Stripe With Trino And Iceberg

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Release Management For Data Platform Services And Logic

Modern Customer Data Platform Principles

DataMesh: How Uber laid the foundations for the data lake cloud migration

Data Preparation with SQL Cheatsheet

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Version Your Data Lakehouse Like Your Software With Nessie

Stitching Together Enterprise Analytics With Microsoft Fabric

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Stay Connected