Accessibility, Data Lake and Metadata - Data Engineering Digest

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Traditional databases excelled at structured data and transactional workloads but struggled with performance at scale as data volumes grew.

Data Lake

Data Lake Metadata Cloud Storage Data Warehouse

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. These patterns include both centralized storage patterns like data warehouse , data lake and data lakehouse , and distributed patterns such as data mesh.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

Snowflake

DECEMBER 4, 2024

The next evolution in data is making it AI ready. For years, an essential tenet of digital transformation has been to make data accessible, to break down silos so that the enterprise can draw value from all of its data. For this reason, internal-facing AI will continue to be the focus for the next couple of years.

Unstructured Data

Unstructured Data Data Lake Deep Learning Structured Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

With Hybrid Tables’ fast, high-concurrency point operations, you can store application and workflow state directly in Snowflake, serve data without reverse ETL and build lightweight transactional apps while maintaining a single governance and security model for both transactional and analytical data — all on one platform.

Data Architecture

Data Architecture Architecture Data Lake Kafka

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform. What are the other systems that feed into and rely on the Trino/Iceberg service?

Data Lake

Data Lake High Quality Data Metadata Machine Learning

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Cloudera

OCTOBER 23, 2024

In August, we wrote about how in a future where distributed data architectures are inevitable, unifying and managing operational and business metadata is critical to successfully maximizing the value of data, analytics, and AI.

Metadata

Metadata BI Data Lake Business Intelligence

Open, Interoperable Storage with Iceberg Tables, Now Generally Available

Snowflake

JUNE 21, 2024

Snowflake is now making it even easier for customers to bring the platform’s usability, performance, governance and many workloads to more data with Iceberg tables (now generally available), unlocking full storage interoperability. Iceberg tables provide compute engine interoperability over a single copy of data.

Data Lake

Data Lake BI Business Intelligence Metadata

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

First, we create an Iceberg table in Snowflake and then insert some data. Then, we add another column called HASHKEY , add more data, and locate the S3 file containing metadata for the iceberg table. In the screenshot below, we can see that the metadata file for the Iceberg table retains the snapshot history.

Architecture

Architecture Systems Data Lake Google Cloud

Reflecting On The Past 6 Years Of Data Engineering

Data Engineering Podcast

FEBRUARY 5, 2023

Sign up now for early access to Materialize and get started with the power of streaming data with the same simplicity and low implementation cost as batch cloud data warehouses. Go to [dataengineeringpodcast.com/materialize]([link] Support Data Engineering Podcast

Data Engineering

Data Engineering Data Engineer Engineering PostgreSQL

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Summary Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. Data lakes are notoriously complex.

Architecture

Architecture Data Lake High Quality Data SQL

The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 19, 2023

Because of their complete ownership of your data they constrain the possibilities of what data you can store and how it can be used. Can you describe what Iceberg is and its position in the data lake/lakehouse ecosystem? Acryl]([link] The modern data stack needs a reimagined metadata management platform.

IT

IT Data Lake Metadata Data Warehouse

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. With this 3rd platform generation, you have more real time data analytics and a cost reduction because it is easier to manage this infrastructure in the cloud thanks to managed services. What you have to code is this workflow !

Technology

Technology Architecture Google Cloud Metadata

Data Engineering Weekly #209

Data Engineering Weekly

FEBRUARY 23, 2025

[link] Alireza Sadeghi: Open Source Data Engineering Landscape 2025 This article comprehensively overviews the 2025 open-source data engineering landscape, highlighting key trends, active projects, and emerging technologies.

Data Engineering

Data Engineering Data Engineer Engineering Kafka

2024 Governance Trends for Data Leaders

phData: Data Engineering

NOVEMBER 1, 2024

Data governance plays a critical role in the successful implementation of Generative AI (GenAI) and large language models (LLM), with 86.7% It serves as a vital protective measure, ensuring proper data access while managing risks like data breaches and unauthorized use. of respondents rating it as highly impactful.

Government

Government Data Governance Finance Metadata

Beyond Kafka: Conversation with Jark Wu on Fluss - Streaming Storage for Real-Time Analytics

Data Engineering Weekly

FEBRUARY 18, 2025

Fluss uses Lakehouse as a tiered storage, and data will be converted and tiered into data lakes periodically; Fluss only retains a small portion of recent data. So you only need to store one copy of data for your streaming and Lakehouse. The TabletServer stores data and provides I/O services directly to users.

Kafka

Kafka Lambda Architecture SQL Architecture

Snowflake and S3 Data Lake

Cloudyard

DECEMBER 13, 2022

Read Time: 4 Minute, 23 Second During this post we will discuss how AWS S3 service and Snowflake integration can be used as Data Lake in current organizations. How customer has migrated On Premises EDW to Snowflake to leverage snowflake Data Lake capabilities.

Data Lake

Data Lake AWS Metadata Data

Data Lake vs. Data Warehouse vs. Data Lakehouse

Sync Computing

NOVEMBER 7, 2024

As the magnitude and role of data in society has changed, so have the tools for dealing with it. While a +3500 year data retention capability for data stored on clay tablets is impressive, the access latency and forward compatibility of clay tablets fall a little short. Save up to 50% on compute!

Data Lake

Data Lake Data Warehouse Business Intelligence Unstructured Data

Top Data Lake Vendors (Quick Reference Guide)

Monte Carlo

APRIL 24, 2023

Data lakes are useful, flexible data storage repositories that enable many types of data to be stored in its rawest state. Traditionally, after being stored in a data lake, raw data was then often moved to various destinations like a data warehouse for further processing, analysis, and consumption.

Data Lake

Data Lake Google Cloud Data Warehouse AWS

Data Lakes vs. Data Warehouses

Grouparoo

JANUARY 11, 2022

When it comes to storing large volumes of data, a simple database will be impractical due to the processing and throughput inefficiencies that emerge when managing and accessing big data. This article looks at the options available for storing and processing big data, which is too large for conventional databases to handle.

Data Lake

Data Lake Data Warehouse Unstructured Data Raw Data

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData: Data Engineering

SEPTEMBER 19, 2023

With the amount of data companies are using growing to unprecedented levels, organizations are grappling with the challenge of efficiently managing and deriving insights from these vast volumes of structured and unstructured data. What is a Data Lake? Consistency of data throughout the data lake.

Data Lake

Data Lake Process Metadata Data Warehouse

Iceberg Is An Implementation Detail

dbt Developer Hub

OCTOBER 3, 2024

These formats are changing the way data is stored and metadata accessed. Apache Iceberg is a high-performance open table format developed for modern data lakes. Iceberg Data Catalog - an open-source metadata management system that tracks the schema, partition, and versions of Iceberg tables.

Metadata

Metadata Data Lake Data Storage Accessibility

Building A Data Governance Bridge Between Cloud And Datacenters For The Enterprise At Privacera

Data Engineering Podcast

MARCH 27, 2022

Acryl Data provides DataHub as an easy to consume SaaS product which has been adopted by several companies. Signup for the SaaS product at dataengineeringpodcast.com/acryl RudderStack helps you build a customer data platform on your warehouse or data lake. Can you describe what Privacera is and the story behind it?

Data Governance

Data Governance Government Cloud Building

5 Reasons Data Discovery Platforms Are Best For Data Lakes

Monte Carlo

APRIL 1, 2021

Over the past few years, data lakes have emerged as a must-have for the modern data stack. But while the technologies powering our access and analysis of data have matured, the mechanics behind understanding this data in a distributed environment have lagged behind. Who has access to it?

Data Lake

Data Lake Data Warehouse Unstructured Data Government

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

AltexSoft

AUGUST 29, 2023

In 2010, a transformative concept took root in the realm of data storage and analytics — a data lake. The term was coined by James Dixon , Back-End Java, Data, and Business Intelligence Engineer, and it started a new era in how organizations could store, manage, and analyze their data. What is a data lake?

Data Lake

Data Lake Architecture IT Amazon Web Services

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Monte Carlo

AUGUST 25, 2023

That’s why it’s essential for teams to choose the right architecture for the storage layer of their data stack. But, the options for data storage are evolving quickly. Different vendors offering data warehouses, data lakes, and now data lakehouses all offer their own distinct advantages and disadvantages for data teams to consider.

Data Lake

Data Lake Data Warehouse Unstructured Data Raw Data

Data Mesh vs Data Lake: Pros, Cons, & How to Decide

Monte Carlo

JANUARY 23, 2023

When it comes to the data community, there’s always a debate broiling about something— and right now “data mesh vs data lake” is right at the top of that list. In this post we compare and contrast the data mesh vs data lake to illustrate the benefits of each and help discover what’s right for your data platform.

Data Lake

Data Lake Architecture Business Intelligence Unstructured Data

Unifying Iceberg Tables on Snowflake

Snowflake

AUGUST 31, 2023

Catalog Integration: Our newly developed Catalog Integration feature allows you to seamlessly plug Snowflake into other Iceberg catalogs tracking table metadata. Since 2021, Snowflake has had External Tables for the purpose of read-only querying external data lakes.

Metadata

Metadata AWS Data Lake Datasets

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

With CDW, as an integrated service of CDP, your line of business gets immediate resources needed for faster application launches and expedited data access, all while protecting the company’s multi-year investment in centralized data management, security, and governance. Proprietary file formats mean no one else is invited in!

IT

IT Data Lake Data Warehouse Cloud Storage

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Architecture

Architecture Metadata Kafka Government

Are Apache Iceberg Tables Right For Your Data Lake? 6 Reasons Why.

Monte Carlo

NOVEMBER 14, 2023

Databricks announced that Delta tables metadata will also be compatible with the Iceberg format, and Snowflake has also been moving aggressively to integrate with Iceberg. How Apache Iceberg tables structure metadata. Is your data lake a good fit for Iceberg? I think it’s safe to say it’s getting pretty cold in here.

Data Lake

Data Lake Metadata Data Warehouse SQL

Data Lake vs Data Warehouse - Working Together in the Cloud

ProjectPro

AUGUST 11, 2021

“Data Lake vs Data Warehouse = Load First, Think Later vs Think First, Load Later” The terms data lake and data warehouse are frequently stumbled upon when it comes to storing large volumes of data. Data Warehouse Architecture What is a Data lake?

Data Lake

Data Lake Data Warehouse Cloud Hadoop

Business Intelligence In The Palm Of Your Hand With Zing Data

Data Engineering Podcast

DECEMBER 4, 2022

Summary Business intelligence is the foremost application of data in organizations of all sizes. The typical conception of how it is accessed is through a web or desktop application running on a powerful laptop. Zing Data is building a mobile native platform for business intelligence.

Business Intelligence

Business Intelligence Metadata BI MongoDB

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

ProjectPro

FEBRUARY 8, 2023

It offers a simple and efficient solution for data processing in organizations. It offers users a data integration tool that organizes data from many sources, formats it, and stores it in a single repository, such as data lakes, data warehouses, etc., being data exactly matches the classifier, and 0.0

AWS

AWS Scala Metadata Data Lake

Understanding The Role Of The Chief Data Officer

Data Engineering Podcast

AUGUST 21, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. RudderStack helps you build a customer data platform on your warehouse or data lake.

Metadata

Metadata MongoDB MySQL Data Lake

A Primer On Enterprise Data Curation with Todd Walter - Episode 49

Data Engineering Podcast

SEPTEMBER 23, 2018

Summary As your data needs scale across an organization the need for a carefully considered approach to collection, storage, organization, and access becomes increasingly critical. In terms of infrastructure, what are the components of a modern data architecture and how has that changed over the years?

Data Lake

Data Lake Data Warehouse Data Architecture Architecture

The Security Challenges of Data Warehousing in the Cloud

Cloudera

NOVEMBER 5, 2020

Registering an Environment provides CDP with access to your cloud provider account and identifies the resources in your cloud provider account that CDP services can access or provision. When you register an Environment in CDP, a Data Lake is automatically deployed for that environment. Cloudera Data Warehouse Security.

Cloud

Cloud Data Lake Data Warehouse Metadata

Operational Database Security – Part 2

Cloudera

SEPTEMBER 23, 2020

Comprehensive auditing is provided to enable enterprises to effectively and efficiently meet their compliance requirements by auditing access and other types of operations across OpDB (through HBase). User, business classification of asset accessed. Policy outcome (access or deny). Policy outcome (access or deny).

Database

Database Data Lake Metadata Java

Supporting And Expanding The Arrow Ecosystem For Fast And Efficient Data Processing At Voltron Data

Data Engineering Podcast

NOVEMBER 27, 2022

Summary The data ecosystem has been growing rapidly, with new communities joining and bringing their preferred programming languages to the mix. This has led to inefficiencies in how data is stored, accessed, and shared across process and system boundaries. Atlan is the metadata hub for your data ecosystem.

Data Process

Data Process Process Metadata Business Intelligence

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

Data Engineering Podcast

AUGUST 28, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. RudderStack helps you build a customer data platform on your warehouse or data lake.

Data Engineering

Data Engineering Data Engineer MongoDB Metadata

Reflections On Designing A Data Platform From Scratch

Data Engineering Podcast

FEBRUARY 27, 2022

Visit them today at dataengineeringpodcast.com/timescale RudderStack helps you build a customer data platform on your warehouse or data lake. The warehouse (Bigquery, Snowflake, Redshift) has become the focal point of the "modern data stack" Data orchestration Who will be managing the workflow logic?

Designing

Designing Metadata Data Lake Relational Database

Insights from the Gartner Data & Analytics Summit in London: Embracing Data Leadership and Strategy

Precisely

JUNE 24, 2024

His key takeaways from the conversation were that “ data leaders are under tremendous pressure to collaborate within the C-Suite on projects that deliver true business value. Explore the key topics and insights from this event below, and get inspired to apply these takeaways for success in your own data-driven journey.

Food

Food Data Analytics Pharmaceutical Consulting

Migrate Hive data from CDH to CDP public cloud

Cloudera

JUNE 25, 2021

This blog post outlines detailed step by step instructions to perform Hive Replication from an on-prem CDH cluster to a CDP Public Cloud Data Lake. CDP Data Lake cluster versions – CM 7.4.0, Pre-Check: Data Lake Cluster. Understanding Ranger Policies in Data Lake Cluster. Runtime 7.2.8.

Cloud

Cloud Data Lake Cloud Storage Metadata

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

With Cloudera’s vision of hybrid data , enterprises adopting an open data lakehouse can easily get application interoperability and portability to and from on premises environments and any public cloud without worrying about data scaling. Why integrate Apache Iceberg with Cloudera Data Platform?

Data Lake

Data Lake Business Intelligence Metadata Data Warehouse

Snowflake Expands Partnership with Microsoft to Improve Interoperability Through Apache Iceberg

Snowflake

MAY 21, 2024

This will enable our joint customers to experience bidirectional data access between Snowflake and Microsoft Fabric, with a single copy of data with OneLake in Fabric. Organizations using both platforms will be able to do so more cost-effectively, rather than building pipelines or maintaining copies of data in each platform.

Metadata

Metadata Cloud Accessibility Accessible

How Apache Iceberg Is Changing the Face of Data Lakes

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Webinars

Trending Sources

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

Webinars

Simplifying Data Architecture and Security to Accelerate Value

Being Data Driven At Stripe With Trino And Iceberg

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Open, Interoperable Storage with Iceberg Tables, Now Generally Available

Why Open Table Format Architecture is Essential for Modern Data Systems

Reflecting On The Past 6 Years Of Data Engineering

Addressing The Challenges Of Component Integration In Data Platform Architectures

The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse

Toward a Data Mesh (part 2) : Architecture & Technologies

Data Engineering Weekly #209

2024 Governance Trends for Data Leaders

Beyond Kafka: Conversation with Jark Wu on Fluss - Streaming Storage for Real-Time Analytics

Snowflake and S3 Data Lake

Data Lake vs. Data Warehouse vs. Data Lakehouse

Top Data Lake Vendors (Quick Reference Guide)

Data Lakes vs. Data Warehouses

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

Iceberg Is An Implementation Detail

Building A Data Governance Bridge Between Cloud And Datacenters For The Enterprise At Privacera

5 Reasons Data Discovery Platforms Are Best For Data Lakes

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Data Mesh vs Data Lake: Pros, Cons, & How to Decide

Unifying Iceberg Tables on Snowflake

Get Your Analytics Insights Instantly – Without Abandoning Central IT

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Are Apache Iceberg Tables Right For Your Data Lake? 6 Reasons Why.

Data Lake vs Data Warehouse - Working Together in the Cloud

Business Intelligence In The Palm Of Your Hand With Zing Data

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

Understanding The Role Of The Chief Data Officer

A Primer On Enterprise Data Curation with Todd Walter - Episode 49

The Security Challenges of Data Warehousing in the Cloud

Operational Database Security – Part 2

Supporting And Expanding The Arrow Ecosystem For Fast And Efficient Data Processing At Voltron Data

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

Reflections On Designing A Data Platform From Scratch

Insights from the Gartner Data & Analytics Summit in London: Embracing Data Leadership and Strategy

Migrate Hive data from CDH to CDP public cloud

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Snowflake Expands Partnership with Microsoft to Improve Interoperability Through Apache Iceberg

Stay Connected