Data Architecture, Data Lake and Metadata

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Traditional databases excelled at structured data and transactional workloads but struggled with performance at scale as data volumes grew.

Data Lake

Data Lake Cloud Storage Metadata Data Warehouse

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

Whether it’s unifying transactional and analytical data with Hybrid Tables, improving governance for an open lakehouse with Snowflake Open Catalog or enhancing threat detection and monitoring with Snowflake Horizon Catalog , Snowflake is reducing the number of moving parts to give customers a fully managed service that just works.

Data Architecture

Data Architecture Architecture Data Lake Kafka

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. Each of these architectures has its own unique strengths and tradeoffs.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform. Can you describe what role Trino and Iceberg play in Stripe's data architecture?

Data Lake

Data Lake High Quality Data Metadata Machine Learning

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Cloudera

OCTOBER 23, 2024

In August, we wrote about how in a future where distributed data architectures are inevitable, unifying and managing operational and business metadata is critical to successfully maximizing the value of data, analytics, and AI.

Metadata

Metadata BI Data Lake Business Intelligence

Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana

Data Engineering Podcast

SEPTEMBER 1, 2021

Summary The Presto project has become the de facto option for building scalable open source analytics in SQL for the data lake. That leaves DataOps reactive to data quality issues and can make your consumers lose confidence in your data. lets you identify data quality issues and their root causes from a single dashboard.

Data Lake

Data Lake Cloud AWS SQL

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

First, we create an Iceberg table in Snowflake and then insert some data. Then, we add another column called HASHKEY , add more data, and locate the S3 file containing metadata for the iceberg table. In the screenshot below, we can see that the metadata file for the Iceberg table retains the snapshot history.

Architecture

Architecture Systems Data Lake Google Cloud

Open, Interoperable Storage with Iceberg Tables, Now Generally Available

Snowflake

JUNE 21, 2024

Snowflake is now making it even easier for customers to bring the platform’s usability, performance, governance and many workloads to more data with Iceberg tables (now generally available), unlocking full storage interoperability. Iceberg tables provide compute engine interoperability over a single copy of data.

Data Lake

Data Lake BI Business Intelligence Metadata

Breaking State and Local Data Silos with Modern Data Architectures

Cloudera

AUGUST 30, 2022

Modern data architectures. To eliminate or integrate these silos, the public sector needs to adopt robust data management solutions that support modern data architectures (MDAs). Deploying modern data architectures. Lack of sharing hinders the elimination of fraud, waste, and abuse. Forrester ).

Data Architecture

Data Architecture Architecture Data Lake NoSQL

How Column-Aware Development Tooling Yields Better Data Models

Data Engineering Podcast

JUNE 17, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. How has the move to the cloud for data warehousing/data platforms influenced the practice of data modeling?

Data Lake

Data Lake Machine Learning Metadata Data Architecture

Data Engineering Weekly #209

Data Engineering Weekly

FEBRUARY 23, 2025

[link] Alireza Sadeghi: Open Source Data Engineering Landscape 2025 This article comprehensively overviews the 2025 open-source data engineering landscape, highlighting key trends, active projects, and emerging technologies.

Data Engineering

Data Engineering Data Engineer Engineering Kafka

A Primer On Enterprise Data Curation with Todd Walter - Episode 49

Data Engineering Podcast

SEPTEMBER 23, 2018

Using the metaphor of a museum curator carefully managing the precious resources on display and in the vaults, he discusses the various layers of an enterprise data strategy. Can you walk through the stages of an ideal lifecycle for data within the context of an organizations uses for it?

Data Lake

Data Lake Data Warehouse Data Architecture Architecture

Modern Data Architectures Provide a Foundation for Innovation

Precisely

JUNE 6, 2023

At Precisely’s Trust ’23 conference, Chief Operating Officer Eric Yau hosted an expert panel discussion on modern data architectures. The group kicked off the session by exchanging ideas about what it means to have a modern data architecture.

Data Architecture

Data Architecture Architecture Metadata Data Lake

2024 Governance Trends for Data Leaders

phData: Data Engineering

NOVEMBER 1, 2024

Chief Technology Officer, Information Technology Industry Organizations have spent the past decade accumulating, maintaining, and securing data lakes/warehouses/fabrics that will now be expected to drive AI/LLM use cases. The technology for metadata management, data quality management, etc., No problem!

Government

Government Data Governance Finance Metadata

Data Fabric: The Future of Data Architecture

Monte Carlo

FEBRUARY 21, 2023

Today, as data sources become increasingly varied, data management becomes more complex, and agility and scalability become essential traits for data leaders, data fabric is quickly becoming the future of data architecture. If data fabric is the future, how can you get your organization up-to-speed?

Data Architecture

Data Architecture Architecture Metadata Unstructured Data

Data Fabric: The Future of Data Architecture

Monte Carlo

FEBRUARY 21, 2023

Today, as data sources become increasingly varied, data management becomes more complex, and agility and scalability become essential traits for data leaders, data fabric is quickly becoming the future of data architecture. If data fabric is the future, how can you get your organization up-to-speed?

Data Architecture

Data Architecture Architecture Metadata Unstructured Data

5 Reasons Data Discovery Platforms Are Best For Data Lakes

Monte Carlo

APRIL 1, 2021

Over the past few years, data lakes have emerged as a must-have for the modern data stack. But while the technologies powering our access and analysis of data have matured, the mechanics behind understanding this data in a distributed environment have lagged behind. Data discovery tools and platforms can help.

Data Lake

Data Lake Data Warehouse Unstructured Data Government

Data Mesh vs Data Lake: Pros, Cons, & How to Decide

Monte Carlo

JANUARY 23, 2023

When it comes to the data community, there’s always a debate broiling about something— and right now “data mesh vs data lake” is right at the top of that list. In this post we compare and contrast the data mesh vs data lake to illustrate the benefits of each and help discover what’s right for your data platform.

Data Lake

Data Lake Architecture Business Intelligence Unstructured Data

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

With Cloudera’s vision of hybrid data , enterprises adopting an open data lakehouse can easily get application interoperability and portability to and from on premises environments and any public cloud without worrying about data scaling. Why integrate Apache Iceberg with Cloudera Data Platform?

Data Lake

Data Lake Business Intelligence Metadata Data Warehouse

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes.

Data Lake

Data Lake Data Warehouse BI SQL

Hands-On Introduction to Delta Lake with (py)Spark

Towards Data Science

FEBRUARY 15, 2023

In this context, data management in an organization is a key point for the success of its projects involving data. One of the main aspects of correct data management is the definition of a data architecture. What is Delta Lake? The data became useless. The Lakehouse architecture was one of them.

Data Lake

Data Lake Data Warehouse Hadoop Architecture

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Architecture

Architecture Metadata Kafka Government

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

AltexSoft

AUGUST 29, 2023

In 2010, a transformative concept took root in the realm of data storage and analytics — a data lake. The term was coined by James Dixon , Back-End Java, Data, and Business Intelligence Engineer, and it started a new era in how organizations could store, manage, and analyze their data. What is a data lake?

Data Lake

Data Lake Architecture IT Amazon Web Services

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. First-generation – expensive, proprietary enterprise data warehouse and business intelligence platforms maintained by a specialized team drowning in technical debt.

Pharmaceutical

Pharmaceutical Data Lake Data Architecture Architecture

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Cloudera

MARCH 23, 2022

In fact, we recently announced the integration with our cloud ecosystem bringing the benefits of Iceberg to enterprises as they make their journey to the public cloud, and as they adopt more converged architectures like the Lakehouse. 1: Multi-function analytics . 3: Open Performance.

Metadata

Metadata Data Architecture Machine Learning BI

Habib Bank manages data at scale with Cloudera Data Platform

Cloudera

NOVEMBER 17, 2022

The Solution: CDP Private Cloud brings a next-generation hybrid architecture with cloud-native benefits to HBL’s data platform. HBL started their data journey in 2019 when data lake initiative was started to consolidate complex data sources and enable the bank to use single version of truth for decision making.

Banking

Banking Management Data Lake Professional Services

The Evolution of Table Formats

Monte Carlo

MAY 14, 2024

As organizations seek greater value from their data, data architectures are evolving to meet the demand — and table formats are no exception. At its core, a table format is a sophisticated metadata layer that defines, organizes, and interprets multiple underlying data files.

Data Lake

Data Lake Metadata Hadoop Data Governance

Data Architect: Role Description, Skills, Certifications and When to Hire

AltexSoft

FEBRUARY 11, 2023

To get a better understanding of a data architect’s role, let’s clear up what data architecture is. Data architecture is the organization and design of how data is collected, transformed, integrated, stored, and used by a company. Sample of a high-level data architecture blueprint for Azure BI programs.

Data Architect

Data Architect Certification Generalist Big Data

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

Mark: The first element in the process is the link between the source data and the entry point into the data platform. At Ramsey International (RI), we refer to that layer in the architecture as the foundation, but others call it a staging area, raw zone, or even a source data lake. What is a data fabric?

Data Lake

Data Lake Analytics Application Cloud Storage Architecture

The State of Data Engineering in 2024: Key Insights and Trends

Data Engineering Weekly

DECEMBER 16, 2024

Grab’s Metasense , Uber’s DataK9 , and Meta’s classification systems use AI to automatically categorize vast data sets, reducing manual efforts and improving accuracy. Beyond classification, organizations now use AI for automated metadata generation and data lineage tracking, creating more intelligent data infrastructures.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Modern Data Management Essentials: Exploring Data Fabric

Precisely

JULY 18, 2024

Key Takeaways Data Fabric is a modern data architecture that facilitates seamless data access, sharing, and management across an organization. Data management recommendations and data products emerge dynamically from the fabric through automation, activation, and AI/ML analysis of metadata.

Data Management

Data Management Management Metadata Database-centric

Educating ChatGPT on Data Lakehouse

Cloudera

MARCH 17, 2023

As the use of ChatGPT becomes more prevalent, I frequently encounter customers and data users citing ChatGPT’s responses in their discussions. I love the enthusiasm surrounding ChatGPT and the eagerness to learn about modern data architectures such as data lakehouses, data meshes, and data fabrics.

Education

Education Unstructured Data Data Lake Data Warehouse

Data Lakehouse: Concept, Key Features, and Architecture Layers

AltexSoft

NOVEMBER 10, 2021

The pun being obvious, there’s more to that than just a new term: Data lakehouses combine the best features of both data lakes and data warehouses and this post will explain this all. What is a data lakehouse? Data warehouse vs data lake vs data lakehouse: What’s the difference.

Architecture

Architecture Data Lake Data Warehouse Metadata

Data Scientist vs Data Engineer: Differences and Why You Need Both

AltexSoft

OCTOBER 30, 2021

Managing data and metadata. There are different ways how data can be stored: a data warehouse, numerous data lakes and data hubs , etc. Data engineers control how data is stored and structured within those locations. Providing data access tools. Let’s go through the main areas.

Data Engineering

Data Engineering Data Engineer Engineering Machine Learning

Moving Past ETL and ELT: Understanding the EtLT Approach

Ascend.io

AUGUST 31, 2023

In the dynamic world of data, many professionals are still fixated on traditional patterns of data warehousing and ETL, even while their organizations are migrating to the cloud and adopting cloud-native data services. Central to this transformation are two shifts.

Data Lake

Data Lake Data Warehouse ETL Tools Data Pipeline

Know Before You Go: Gartner Data & Analytics Summit 2025 in London

Monte Carlo

FEBRUARY 13, 2025

Summit Essentials Date & Location The Gartner Data & AI Summit takes place May 12-15th, 2025 in London, England. This year, the event will uncover the latest in data management, data trends, governance, and data architecture to deliver value for the future.

Data Analytics

Data Analytics Government Data Architecture Data Lake

Just Launched: Dremio SQL Query Engine Data Quality Monitoring

Monte Carlo

AUGUST 30, 2024

It’s our goal at Monte Carlo to provide data observability and quality across the enterprise by monitoring every system vital in the delivery of data from source to consumption. We started with popular modern data warehouses and quickly expanded our support as data lakes became data lakehouses.

SQL

SQL Engineering Data Lake High Quality Data

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

First, you must understand the existing challenges of the data team, including the data architecture and end-to-end toolchain. Figure 2: Example data pipeline with DataOps automation. In this project, I automated data extraction from SFTP, the public websites, and the email attachments. Monitoring Job Metadata.

Engineering

Engineering Business Analyst BI Metadata

Three Reference Architectures for Real-Time Analytics On Streaming Data

Rockset

APRIL 26, 2023

We’ve noticed many common patterns across streaming data architectures and we’ll be sharing a blueprint for three of the most popular: anomaly detection, IoT, and recommendations. Offline feature store : Detecting anomalies requires historical data in order to have a baseline for comparisons. The database has two primary jobs.

Architecture

Architecture Transportation Data Lake Insurance

DataOps Architecture: 5 Key Components and How to Get Started

Databand.ai

AUGUST 30, 2023

A DataOps architecture is the structural foundation that supports the implementation of DataOps principles within an organization. It encompasses the systems, tools, and processes that enable businesses to manage their data more efficiently and effectively. As a result, they can be slow, inefficient, and prone to errors.

Architecture

Architecture Data Ingestion Data Governance Data Cleanse

The Good and the Bad of Databricks Lakehouse Platform

AltexSoft

MARCH 30, 2023

What is Databricks Databricks is an analytics platform with a unified set of tools for data engineering, data management , data science, and machine learning. It combines the best elements of a data warehouse, a centralized repository for structured data, and a data lake used to host large amounts of raw data.

Scala

Scala Data Lake Machine Learning BI

Zero-ETL, ChatGPT, And The Future of Data Engineering

Towards Data Science

APRIL 3, 2023

Cons : Will this concept make it easier or more difficult for organizations to scale their data products? Another fundamental question, which could be asked of many of these futuristic data trends, is do the byproducts of data pipelines (code, data, metadata) contain value for data teams that is worth preserving?

Data Engineering

Data Engineering Data Engineer Engineering Data Warehouse

Data Engineering Glossary

Silectis

JANUARY 3, 2021

Big Query Google’s cloud data warehouse. Data Architecture Data architecture is a composition of models, rules, and standards for all data systems and interactions between them. Data Catalog An organized inventory of data assets relying on metadata to help with data management.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Data Engineering Weekly #114

Data Engineering Weekly

JANUARY 15, 2023

SiliconANGLE theCUBE: Analyst Predictions 2023 - The Future of Data Management By far one of the best analyses of trends in Data Management. 2023 predictions from the panel are; Unified metadata becomes kingmaker. The names hold less meaning to the outcome, but its fancy. link] All rights reserved ProtoGrowth Inc, India.

Data Engineering

Data Engineering Data Engineer Engineering Metadata

How Apache Iceberg Is Changing the Face of Data Lakes

Simplifying Data Architecture and Security to Accelerate Value

Trending Sources

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Being Data Driven At Stripe With Trino And Iceberg

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana

Why Open Table Format Architecture is Essential for Modern Data Systems

Open, Interoperable Storage with Iceberg Tables, Now Generally Available

Breaking State and Local Data Silos with Modern Data Architectures

How Column-Aware Development Tooling Yields Better Data Models

Data Engineering Weekly #209

A Primer On Enterprise Data Curation with Todd Walter - Episode 49

Modern Data Architectures Provide a Foundation for Innovation

2024 Governance Trends for Data Leaders

Data Fabric: The Future of Data Architecture

Data Fabric: The Future of Data Architecture

5 Reasons Data Discovery Platforms Are Best For Data Lakes

Data Mesh vs Data Lake: Pros, Cons, & How to Decide

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

The Future of the Data Lakehouse – Open

Hands-On Introduction to Delta Lake with (py)Spark

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

What is a Data Mesh?

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Habib Bank manages data at scale with Cloudera Data Platform

The Evolution of Table Formats

Data Architect: Role Description, Skills, Certifications and When to Hire

Demystifying Modern Data Platforms

The State of Data Engineering in 2024: Key Insights and Trends

Modern Data Management Essentials: Exploring Data Fabric

Educating ChatGPT on Data Lakehouse

Data Lakehouse: Concept, Key Features, and Architecture Layers

Data Scientist vs Data Engineer: Differences and Why You Need Both

Moving Past ETL and ELT: Understanding the EtLT Approach

Know Before You Go: Gartner Data & Analytics Summit 2025 in London

Just Launched: Dremio SQL Query Engine Data Quality Monitoring

A Day in the Life of a DataOps Engineer

Three Reference Architectures for Real-Time Analytics On Streaming Data

DataOps Architecture: 5 Key Components and How to Get Started

The Good and the Bad of Databricks Lakehouse Platform

Zero-ETL, ChatGPT, And The Future of Data Engineering

Data Engineering Glossary

Data Engineering Weekly #114

Stay Connected