Data, Data Lake and Metadata - Data Engineering Digest

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Traditional databases excelled at structured data and transactional workloads but struggled with performance at scale as data volumes grew.

Data Lake

Data Lake Cloud Storage Metadata Data Warehouse

Databricks Delta Lake: A Scalable Data Lake Solution

ProjectPro

JUNE 6, 2025

Want to process peta-byte scale data with real-time streaming ingestions rates, build 10 times faster data pipelines with 99.999% reliability, witness 20 x improvement in query performance compared to traditional data lakes, enter the world of Databricks Delta Lake now. Delta Lake is a game-changer for big data.

Data Lake

Data Lake Data Warehouse Metadata Unstructured Data

How to Build a Data Lake?

ProjectPro

JUNE 6, 2025

This guide is your roadmap to building a data lake from scratch. We'll break down the fundamentals, walk you through the architecture, and share actionable steps to set up a robust and scalable data lake. Traditional data storage systems like data warehouses were designed to handle structured and preprocessed data.

Data Lake

Data Lake Building Hadoop Raw Data

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Level Up Your Data Platform With Active Metadata

Data Engineering Podcast

JUNE 19, 2022

Summary Metadata is the lifeblood of your data platform, providing information about what is happening in your systems. In order to level up their value a new trend of active metadata is being implemented, allowing use cases like keeping BI reports up to date, auto-scaling your warehouses, and automated data governance.

Metadata

Metadata MongoDB MySQL Scala

Data Lake vs Data Warehouse - Working Together in the Cloud

ProjectPro

JUNE 6, 2025

“Data Lake vs Data Warehouse = Load First, Think Later vs Think First, Load Later” The terms data lake and data warehouse are frequently stumbled upon when it comes to storing large volumes of data. Data Warehouse Architecture What is a Data lake?

Data Lake

Data Lake Data Warehouse Cloud Hadoop

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. These patterns include both centralized storage patterns like data warehouse , data lake and data lakehouse , and distributed patterns such as data mesh.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

50+ Azure Data Factory Interview Questions and Answers [2025]

ProjectPro

JUNE 6, 2025

Discover 50+ Azure Data Factory interview questions and answers for all experience levels. A report by ResearchAndMarkets projects the global data integration market size to grow from USD 12.24 A report by ResearchAndMarkets projects the global data integration market size to grow from USD 12.24 billion in 2020 to USD 24.84

Data Lake

Data Lake Metadata SQL Datasets

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

Snowflake

DECEMBER 4, 2024

Together with a dozen experts and leaders at Snowflake, I have done exactly that, and today we debut the result: the “ Snowflake Data + AI Predictions 2024 ” report. When you’re running a large language model, you need observability into how the model may change as it ingests new data. The next evolution in data is making it AI ready.

Unstructured Data

Unstructured Data Data Lake Metadata Deep Learning

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

It’s easy these days for an organization’s data infrastructure to begin looking like a maze, with an accumulation of point solutions here and there. Snowflake is committed to doing just that by continually adding features to help our customers simplify how they architect their data infrastructure. Here’s a closer look.

Data Architecture

Data Architecture Architecture Data Lake Kafka

Eliminate Friction In Your Data Platform Through Unified Metadata Using OpenMetadata

Data Engineering Podcast

NOVEMBER 10, 2021

Summary A significant source of friction and wasted effort in building and integrating data management systems is the fragmentation of metadata across various tools. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform.

Metadata

Metadata Data Warehouse Data Lake Kafka

Cloudera Lakehouse Optimizer Makes it Easier Than Ever to Deliver High-Performance Iceberg Tables

Cloudera

OCTOBER 10, 2024

The open data lakehouse is quickly becoming the standard architecture for unified multifunction analytics on large volumes of data. It combines the flexibility and scalability of data lake storage with the data analytics, data governance, and data management functionality of the data warehouse.

IT

IT Data Lake Metadata Data Warehouse

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

Summary Stripe is a company that relies on data to power their products and business. In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform.

Data Lake

Data Lake High Quality Data Metadata Government

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

ProjectPro

JUNE 6, 2025

Do ETL and data integration activities seem complex to you? Read this blog to understand everything about AWS Glue that makes it one of the most popular data integration solutions in the industry. Did you know the global big data market will likely reach $268.4 Businesses are leveraging big data now more than ever.

AWS

AWS Scala Metadata Data Lake

Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana

Data Engineering Podcast

SEPTEMBER 1, 2021

Summary The Presto project has become the de facto option for building scalable open source analytics in SQL for the data lake. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform.

Data Lake

Data Lake Cloud AWS SQL

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

The world we live in today presents larger datasets, more complex data, and diverse needs, all of which call for efficient, scalable data systems. Open Table Format (OTF) architecture now provides a solution for efficient data storage, management, and processing while ensuring compatibility across different platforms.

Architecture

Architecture Systems Data Lake Google Cloud

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Cloudera

OCTOBER 23, 2024

In August, we wrote about how in a future where distributed data architectures are inevitable, unifying and managing operational and business metadata is critical to successfully maximizing the value of data, analytics, and AI.

Metadata

Metadata BI Data Lake Business Intelligence

Empower Your Cyber Defenders with Real-Time Analytics Author: Carolyn Duby, Field CTO

Cloudera

NOVEMBER 15, 2024

In fact, according to the Identity Theft Resource Center (ITRC) Annual Data Breach Report , there were 2,365 cyber attacks in 2023 with more than 300 million victims, and a 72% increase in data breaches since 2021. However, there is a fundamental challenge standing in the way of being successful: data.

Metadata

Metadata Unstructured Data Data Lake Government

Secure Data Sharing and Interoperability Powered by Iceberg REST Catalog

Cloudera

DECEMBER 3, 2024

Many enterprises have heterogeneous data platforms and technology stacks across different business units or data domains. For decades, they have been struggling with scale, speed, and correctness required to derive timely, meaningful, and actionable insights from vast and diverse big data environments.

Metadata

Metadata SQL Data Warehouse Database

Reflecting On The Past 6 Years Of Data Engineering

Data Engineering Podcast

FEBRUARY 5, 2023

In that time there have been a number of generational shifts in how data engineering is done. Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Materialize]([link] Looking for the simplest way to get the freshest data possible to your teams?

Data Engineering

Data Engineering Data Engineer Engineering PostgreSQL

The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 19, 2023

Summary Cloud data warehouses have unlocked a massive amount of innovation and investment in data applications, but they are still inherently limiting. Because of their complete ownership of your data they constrain the possibilities of what data you can store and how it can be used. We feel your pain.

IT

IT Data Lake Metadata Data Warehouse

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Summary Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. Data lakes are notoriously complex.

Architecture

Architecture Data Lake High Quality Data Java

What is Apache Iceberg: Features, Architecture & Use Cases

ProjectPro

JUNE 6, 2025

Explore what is Apache Iceberg, what makes it different, and why it’s quickly becoming the new standard for data lake analytics. Data lakes were born from a vision to democratize data, enabling more people, tools, and applications to access a wider range of data. Metadata Layer 3.

Architecture

Architecture Data Lake Metadata Cloud Storage

Cloud Native Data Orchestration For Machine Learning And Data Engineering With Flyte

Data Engineering Podcast

MAY 22, 2022

Summary Machine learning has become a meaningful target for data applications, bringing with it an increase in the complexity of orchestrating the entire data flow. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform.

Machine Learning

Machine Learning Data Engineering Data Engineer Cloud

Open, Interoperable Storage with Iceberg Tables, Now Generally Available

Snowflake

JUNE 21, 2024

Thousands of customers have worked with Snowflake to cost-effectively build a secure data foundation as they look to solve a growing variety of business problems with more data. Increasingly customers are looking to expand that powerful foundation to a broader set of data across their enterprise. Why use Iceberg Tables?

Data Lake

Data Lake BI Government Business Intelligence

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. How do we build data products ? How can we interoperate between the data domains ?

Technology

Technology Architecture Google Cloud Metadata

Top 10 AWS Services for Data Engineering Projects

ProjectPro

JUNE 6, 2025

Data engineering is the foundation for data science and analytics by integrating in-depth knowledge of data technology, reliable data governance and security, and a solid grasp of data processing. Data engineers need to meet various requirements to build data pipelines.

AWS

AWS Data Engineering Data Engineer Engineering

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Data Engineering Podcast

DECEMBER 18, 2022

Summary One of the reasons that data work is so challenging is because no single person or team owns the entire process. This introduces friction in the process of collecting, processing, and using data. In order to reduce the potential for broken pipelines some teams have started to adopt the idea of data contracts.

Metadata

Metadata Data Lake Business Intelligence MongoDB

Data Engineering Weekly #209

Data Engineering Weekly

FEBRUARY 23, 2025

Try Astro Free → Editor’s Note: Data Council 2025, Apr 22-24, Oakland, CA Data Council has always been one of my favorite events to connect with and learn from the data engineering community. Data Council 2025 is set for April 22-24 in Oakland, CA. The results? will shape the future of DataOps.

Data Engineering

Data Engineering Data Engineer Engineering Kafka

2024 Governance Trends for Data Leaders

phData: Data Engineering

NOVEMBER 1, 2024

In an effort to better understand where data governance is heading, we spoke with top executives from IT, healthcare, and finance to hear their thoughts on the biggest trends, key challenges, and what insights they would recommend. With that, let’s get into the governance trends for data leaders! Want to Save This Guide for Later?

Government

Government Data Governance Finance Metadata

How Column-Aware Development Tooling Yields Better Data Models

Data Engineering Podcast

JUNE 17, 2023

In data systems one of the core architectural exercises is data modeling, which can have significant impacts on what is and is not possible for downstream use cases. By incorporating column-level lineage in the data modeling process it encourages a more robust and well-informed design.

Data Lake

Data Lake Metadata Data Architecture Data Warehouse

Emerging Big Data Trends for 2023

ProjectPro

JUNE 6, 2025

"Data and analytics are already shaking up multiple industries, and the effects will only become more pronounced as adoption reaches critical mass.” ” said the McKinsey Global Institute (MGI) in its executive overview of last month's report: "The Age of Analytics: Competing in a Data-Driven World."

Big Data

Big Data Hadoop Data Lake Data Governance

20 Best Open Source Big Data Projects to Contribute on GitHub

ProjectPro

JUNE 6, 2025

The adaptability and technical superiority of such open-source big data projects make them stand out for community use. As per the surveyors, Big data (35 percent), Cloud computing (39 percent), operating systems (33 percent), and the Internet of Things (31 percent) are all expected to be impacted by open source shortly.

Big Data

Big Data Project Metadata Programming Language

Microsoft Fabric Architecture Explained: Core Components & Benefit

Edureka

MAY 27, 2025

Microsoft Fabric is a next-generation data platform that combines business intelligence, data warehousing, real-time analytics, and data engineering into a single integrated SaaS framework. The architecture of Microsoft Fabric is based on several essential elements that work together to simplify data processes: 1.

Architecture

Architecture BI Business Intelligence Data Lake

Beyond Kafka: Conversation with Jark Wu on Fluss - Streaming Storage for Real-Time Analytics

Data Engineering Weekly

FEBRUARY 18, 2025

Fluss is a compelling new project in the realm of real-time data processing. Architecture Difference The first difference is the Data Model. In contrast, Fluss adopts a Lakehouse-native design with structured tables, explicit schemas, and support for all kinds of data types; it directly mirrors the Lakehouse paradigm.

Kafka

Kafka Lambda Architecture SQL Data Lake

Data Lake vs. Data Warehouse vs. Data Lakehouse

Sync Computing

NOVEMBER 7, 2024

Data is central to modern business and society. Depending on what sort of leaky analogy you prefer, data can be the new oil , gold , or even electricity. Of course, even the biggest data sets are worthless, and might even be a liability, if they arent organized properly.

Data Lake

Data Lake Data Warehouse Business Intelligence Unstructured Data

Build an Open Data Lakehouse with Iceberg Tables, Now in Public Preview

Snowflake

DECEMBER 4, 2023

Apache Iceberg’s ecosystem of diverse adopters, contributors and commercial support continues to grow, establishing itself as the industry standard table format for an open data lakehouse architecture. Snowflake’s support for Iceberg Tables is now in public preview, helping customers build and integrate Snowflake into their lake architecture.

Building

Building Metadata Cloud Storage AWS

Snowflake and S3 Data Lake

Cloudyard

DECEMBER 13, 2022

Read Time: 4 Minute, 23 Second During this post we will discuss how AWS S3 service and Snowflake integration can be used as Data Lake in current organizations. How customer has migrated On Premises EDW to Snowflake to leverage snowflake Data Lake capabilities. Create S3 bucket to hold the tables data.

Data Lake

Data Lake AWS Metadata Data

Data Engineering Weekly #215

Data Engineering Weekly

APRIL 6, 2025

The article summarizes the recent macro trends in AI and data engineering, focusing on Vibe coding, human-in-the-loop system design, and rapid simplification of developer tooling. As these assistants evolve, they signal a future where scalable, low-latency data pipelines become essential for seamless, intelligent user experiences.

Data Engineering

Data Engineering Data Engineer Engineering Kafka

How to Learn AWS for Data Engineering?

ProjectPro

JUNE 6, 2025

Becoming a successful aws data engineer demands you to learn AWS for data engineering and leverage its various services for building efficient business applications. million organizations that want to be data-driven choose AWS as their cloud services partner. Table of Contents Why Learn AWS for Data Engineering?

AWS

AWS Data Engineering Data Engineer Engineering

Tame The Entropy In Your Data Stack And Prevent Failures With Sifflet

Data Engineering Podcast

NOVEMBER 20, 2022

Sifflet is a platform that brings your entire data stack into focus to improve the reliability of your data assets and empower collaboration across your teams. In this episode CEO and founder Salma Bakouk shares her views on the causes and impacts of "data entropy" and how you can tame it before it leads to failures.

Data Lake

Data Lake MongoDB Data Ingestion Scala

Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI

Data Engineering Podcast

DECEMBER 29, 2022

Summary Making effective use of data requires proper context around the information that is being used. Rehgan Avon co-founded AlignAI to help address this challenge through a more purposeful platform designed to collect and distribute the knowledge of how and why data is used in a business. Missing data? Stale dashboards?

Management

Management Metadata Data Lake Business Intelligence

A 2025 Guide to Ace the Netflix Data Engineer Interview

ProjectPro

JUNE 6, 2025

Get ready for your Netflix Data Engineer interview in 2024 with this comprehensive guide. It's your go-to resource for practical tips and a curated list of frequently asked Netflix Data Engineer Interview Questions and Answers. That's where the role of Netflix Data Engineers comes in. petabytes of data. Interested?

Data Engineering

Data Engineering Data Engineer Engineering NoSQL

Building A Data Governance Bridge Between Cloud And Datacenters For The Enterprise At Privacera

Data Engineering Podcast

MARCH 27, 2022

Summary Data governance is a practice that requires a high degree of flexibility and collaboration at the organizational and technical levels. The growing prominence of cloud and hybrid environments in data management adds additional stress to an already complex endeavor.

Data Governance

Data Governance Government Cloud Building

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData: Data Engineering

SEPTEMBER 19, 2023

With the amount of data companies are using growing to unprecedented levels, organizations are grappling with the challenge of efficiently managing and deriving insights from these vast volumes of structured and unstructured data. What is a Data Lake?

Data Lake

Data Lake Process Metadata Data Warehouse

How Apache Iceberg Is Changing the Face of Data Lakes

Databricks Delta Lake: A Scalable Data Lake Solution

Webinars

Trending Sources

How to Build a Data Lake?

Webinars

Level Up Your Data Platform With Active Metadata

Data Lake vs Data Warehouse - Working Together in the Cloud

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

50+ Azure Data Factory Interview Questions and Answers [2025]

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

Simplifying Data Architecture and Security to Accelerate Value

Eliminate Friction In Your Data Platform Through Unified Metadata Using OpenMetadata

Cloudera Lakehouse Optimizer Makes it Easier Than Ever to Deliver High-Performance Iceberg Tables

Being Data Driven At Stripe With Trino And Iceberg

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana

Why Open Table Format Architecture is Essential for Modern Data Systems

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Empower Your Cyber Defenders with Real-Time Analytics Author: Carolyn Duby, Field CTO

Secure Data Sharing and Interoperability Powered by Iceberg REST Catalog

Reflecting On The Past 6 Years Of Data Engineering

The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse

Addressing The Challenges Of Component Integration In Data Platform Architectures

What is Apache Iceberg: Features, Architecture & Use Cases

Cloud Native Data Orchestration For Machine Learning And Data Engineering With Flyte

Open, Interoperable Storage with Iceberg Tables, Now Generally Available

Toward a Data Mesh (part 2) : Architecture & Technologies

Top 10 AWS Services for Data Engineering Projects

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Data Engineering Weekly #209

2024 Governance Trends for Data Leaders

How Column-Aware Development Tooling Yields Better Data Models

Emerging Big Data Trends for 2023

20 Best Open Source Big Data Projects to Contribute on GitHub

Microsoft Fabric Architecture Explained: Core Components & Benefit

Beyond Kafka: Conversation with Jark Wu on Fluss - Streaming Storage for Real-Time Analytics

Data Lake vs. Data Warehouse vs. Data Lakehouse

Build an Open Data Lakehouse with Iceberg Tables, Now in Public Preview

Snowflake and S3 Data Lake

Data Engineering Weekly #215

How to Learn AWS for Data Engineering?

Tame The Entropy In Your Data Stack And Prevent Failures With Sifflet

Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI

A 2025 Guide to Ace the Netflix Data Engineer Interview

Building A Data Governance Bridge Between Cloud And Datacenters For The Enterprise At Privacera

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

Stay Connected