Architecture and Metadata - Data Engineering Digest

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Cloudera

NOVEMBER 13, 2024

It leverages knowledge graphs to keep track of all the data sources and data flows, using AI to fill the gaps so you have the most comprehensive metadata management solution. Together, Cloudera and Octopai will help reinvent how customers manage their metadata and track lineage across all their data sources.

Metadata

Metadata Management Data Governance Government

Modern Data Architecture: Data Mesh and Data Fabric 101

Precisely

OCTOBER 31, 2024

Data mesh and data fabric are two modern data architectures that serve to enable better data flow, faster decision-making, and more agile operations. Both architectures share the goal of making data more actionable and accessible for users within an organization.

Data Architecture

Data Architecture Architecture Metadata Government

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

What if you could streamline your efforts while still building an architecture that best fits your business and technology needs? At BUILD 2024, we announced several enhancements and innovations designed to help you build and manage your data architecture on your terms. Here’s a closer look.

Data Architecture

Data Architecture Architecture Data Lake Kafka

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Unapologetically Technical Episode 20 – Shane Murray

Jesse Anderson

MAY 5, 2025

Shane diagrams Monte Carlo’s architecture, explaining how it uses agents, metadata, and query logs to provide lineage and monitor data health across complex stacks (Snowflake, Databricks, etc.). We then dive deep into Monte Carlo Data, defining data observability and the crucial concept of “data downtime” (TTD + TTR).

Unstructured Data

Unstructured Data Finance Metadata Architecture

Level Up Your Data Platform With Active Metadata

Data Engineering Podcast

JUNE 19, 2022

Summary Metadata is the lifeblood of your data platform, providing information about what is happening in your systems. In order to level up their value a new trend of active metadata is being implemented, allowing use cases like keeping BI reports up to date, auto-scaling your warehouses, and automated data governance.

Metadata

Metadata MongoDB MySQL Scala

Data Engineering Best Practices - #2. Metadata & Logging

Start Data Engineering

FEBRUARY 22, 2024

Setup & Logging architecture 3. Metadata: Information about pipeline runs, & data flowing through your pipeline 3.2. Introduction 2. Data Pipeline Logging Best Practices 3.1. Obtain visibility into the code’s execution sequence using text logs 3.3. Understand resource usage by tracking Metrics 3.4.

Metadata

Metadata Data Engineering Data Engineer Engineering

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

Open Table Format (OTF) architecture now provides a solution for efficient data storage, management, and processing while ensuring compatibility across different platforms. Adopting an Open Table Format architecture is becoming indispensable for modern data systems. In this blog, we will discuss: What is the Open Table format (OTF)?

Architecture

Architecture Systems Data Lake Google Cloud

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Iceberg tables become interoperable while maintaining ACID compliance by adding a layer of metadata to the data files in a users object storage.

Data Lake

Data Lake Cloud Storage Metadata Data Warehouse

Modern Data Governance: Trends for 2025

Precisely

JANUARY 30, 2025

Key Takeaways: Prioritize metadata maturity as the foundation for scalable, impactful data governance. The past year brought significant changes, from the growing importance of metadata maturity to the increasing convergence of data governance and data quality practices. How can you further improve your strategy moving forward?

Data Governance

Data Governance Government Metadata Data

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

How to build Data Products or never call me Data Pipeline any more You have this interesting schema in her second article on Data Mesh by Zhamak Dehghani : “Data mesh introduces the concept of data product as its architectural quantum. We are Data Teams versus we have to patch the server with the latest version and do the tests.

Technology

Technology Architecture Google Cloud Metadata

Agents of Change: Navigating 2025 with AI and Data Innovation

Data Engineering Weekly

DECEMBER 28, 2024

The debate around table formats and Lakehouse architectures continues, but the focus is on unifying data ecosystems to enable AI-driven insights. Moreover, we anticipate a growing emphasis on intelligent data platforms that unify data and metadata, further supported by efforts to enhance data cataloging and lineage tracking.

Unstructured Data

Unstructured Data Metadata Data Government

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Within the context of a data mesh architecture, I will present industry settings / use cases where the particular architecture is relevant and highlight the business value that it delivers against business and technology areas. Introduction to the Data Mesh Architecture and its Required Capabilities.

Architecture

Architecture Metadata Kafka Government

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team. With Materialize, you can! Want to see Starburst in action? Want to see Starburst in action?

Architecture

Architecture Data Lake High Quality Data SQL

Foundation Model for Personalized Recommendation

Netflix Tech

MARCH 28, 2025

This scenario underscored the need for a new recommender system architecture where member preference learning is centralized, enhancing accessibility and utility across different models. Therefore, its also important to let foundation models use metadata information of entities and inputs, not just member interaction data.

Metadata

Metadata Bytes Entertainment Data Mining

Eliminate Friction In Your Data Platform Through Unified Metadata Using OpenMetadata

Data Engineering Podcast

NOVEMBER 10, 2021

Summary A significant source of friction and wasted effort in building and integrating data management systems is the fragmentation of metadata across various tools. After experiencing the impacts of fragmented metadata and previous attempts at building a solution Suresh Srinivas and Sriharsha Chintalapani created the OpenMetadata project.

Metadata

Metadata Data Warehouse Data Lake BI

How Meta understands data at scale

Engineering at Meta

APRIL 28, 2025

Understanding DataSchema requires grasping schematization , which defines the logical structure and relationships of data assets, specifying field names, types, metadata, and policies. Creating a canonical representation for compliance tools. Accurate understanding of data, enabling the application of privacy safeguards at scale.

Metadata

Metadata Data Utilities Data Warehouse

How LinkedIn Adopted A GraphQL Architecture for Product Development

LinkedIn Engineering

APRIL 25, 2023

Though this microservice architecture has worked out really well for our API engineers, when our clients need to fetch data they find themselves talking to several of these microservices. Addressing partial failures and resilience issues due to multiple network calls in the distributed microservices architecture.

Architecture

Architecture Metadata Java Transportation

The Data Turf Wars are Over, But the Metadata Turf Wars Have Just Begun

Cloudera

AUGUST 6, 2024

Over the past several years, data leaders asked many questions about where they should keep their data and what architecture they should implement to serve an incredible breadth of analytic use cases. And for that future to be a reality, data teams must shift their attention to metadata, the new turf war for data.

Metadata

Metadata Government Datasets Architecture

The Modern Data Lakehouse: An Architectural Innovation

Cloudera

SEPTEMBER 9, 2022

The promise of a modern data lakehouse architecture. This is the promise of the modern data lakehouse architecture. These challenges require architecture changes and adoption of new table formats that can support massive scale, offer greater flexibility of compute engine and data types, and simplify schema evolution. .

Architecture

Architecture Metadata Machine Learning Unstructured Data

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

Data Engineering Podcast

OCTOBER 15, 2021

Summary The binding element of all data work is the metadata graph that is generated by all of the workflows that produce the assets used by teams across the organization. Can you describe the system architecture that you have built at Acryl? What are some examples of automated actions that can be triggered from metadata changes?

Metadata

Metadata BI Data Warehouse Government

Kubernetes Prometheus: Definition, Architecture, Pros & Cons

Knowledge Hut

JANUARY 2, 2024

It has gained popularity quickly over the past ten years as a result of being the best monitoring stack for contemporary applications because of its combination of querying features and cloud-native architecture. Some of them may be configured to filter and match container metadata, making them perfect for ephemeral Kubernetes workloads.

Architecture

Architecture Metadata Utilities Data Collection

The View From The Lakehouse Of Architectural Patterns For Your Data Platform

Data Engineering Podcast

JULY 3, 2022

These technological shifts have brought about corresponding changes in data and platform architectures for managing data and analytical workflows. She also discusses her views on the role of the data lakehouse as a building block for these architectures and the ongoing influence that it will have as the technology matures.

Architecture

Architecture Metadata MongoDB MySQL

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Cloudera

OCTOBER 23, 2024

In August, we wrote about how in a future where distributed data architectures are inevitable, unifying and managing operational and business metadata is critical to successfully maximizing the value of data, analytics, and AI. It is a critical feature for delivering unified access to data in distributed, multi-engine architectures.

Metadata

Metadata BI Data Lake Business Intelligence

Title Launch Observability at Netflix Scale

Netflix Tech

JANUARY 6, 2025

In this case, the main stakeholders are: - Title Launch Operators Role: Responsible for setting up the title and its metadata into our systems. TitleSetup A titles setup includes essential attributes like metadata (e.g., What is the architecture of the systems involved? artwork, trailers, supplemental messages).

Metadata

Metadata Algorithm Systems Building

Security Reference Architecture Summary for Cloudera Data Platform

Cloudera

JANUARY 21, 2022

This blog will summarise the security architecture of a CDP Private Cloud Base cluster. The architecture reflects the four pillars of security engineering best practice, Perimeter, Data, Access and Visibility. System metadata is reviewed and updated regularly. Security Architecture Improvements. Logical Architecture.

Architecture

Architecture Transportation Certification Government

Scale Unstructured Text Analytics with Batch LLM Inference

Snowflake

MARCH 6, 2025

Meanwhile, operations teams use entity extraction on documents to automate workflows and enable metadata-driven analytical filtering. As data volumes grow and AI automation expands, cost efficiency in processing with LLMs depends on both system architecture and model flexibility.

Unstructured Data

Unstructured Data Medical Media Data Workflow

Turbocharging Atlas: How we reduced server initialization time to less than 2 minutes

ThoughtSpot

NOVEMBER 5, 2024

In the realm of modern analytics platforms, where rapid and efficient processing of large datasets is essential, swift metadata access and management are critical for optimal system performance. Any delays in metadata retrieval can negatively impact user experience, resulting in decreased productivity and satisfaction. What is Atlas?

Metadata

Metadata PostgreSQL Java Database

A Reference Architecture for the Cloudera Private Cloud Base Data Platform

Cloudera

JULY 15, 2021

The release of Cloudera Data Platform (CDP) Private Cloud Base edition provides customers with a next generation hybrid cloud architecture. All three will be quorums of Zookeepers and HDFS Journal nodes to track changes to HDFS Metadata stored on the Namenodes. Introduction and Rationale. Networking .

Architecture

Architecture Cloud Kafka Hadoop

Establishing a Large Scale Learned Retrieval System at Pinterest

Pinterest Engineering

JANUARY 31, 2025

The general two-tower model architecture with training objective and serving illustration is in diagram Fig2. User sequence modeling in two-tower architecture. 4 illustrates the system architecture for embedding-based retrieval with auto retraining adopted. The metadata is generated together with the index.

Systems

Systems Metadata Machine Learning Architecture

How to Manage Risk with Modern Data Architectures

Cloudera

JUNE 29, 2023

Implementing a modern data architecture makes it possible for financial institutions to break down legacy data silos, simplifying data management, governance, and integration — and driving down costs. However, because most institutions lack a modern data architecture , they struggle to manage, integrate and analyze financial data at pace.

Data Architecture

Data Architecture Architecture Management Banking

Beyond Kafka: Conversation with Jark Wu on Fluss - Streaming Storage for Real-Time Analytics

Data Engineering Weekly

FEBRUARY 18, 2025

Architecture Difference The first difference is the Data Model. The fourth difference is the Lakehouse Architecture. Fluss embraces the Lakehouse Architecture. On the other hand, Fluss is a Kappa Architecture ; it stores one copy of data and presents it as a stream or a table, depending on the use case.

Kafka

Kafka Lambda Architecture SQL Architecture

Is Apache Iceberg the New Hadoop? Navigating the Complexities of Modern Data Lakehouses

Data Engineering Weekly

MARCH 5, 2025

This ecosystem includes: Catalogs: Services that manage metadata about Iceberg tables (e.g., Maintenance Processes: Operations that optimize Iceberg tables, such as compacting small files and managing metadata. Metadata Overhead: Iceberg relies heavily on metadata to track table changes and enable features like time travel.

Hadoop

Hadoop Metadata Data Ingestion Data Governance

Breaking State and Local Data Silos with Modern Data Architectures

Cloudera

AUGUST 30, 2022

Modern data architectures. To eliminate or integrate these silos, the public sector needs to adopt robust data management solutions that support modern data architectures (MDAs). Data Mesh: A type of data platform architecture that embraces the ubiquity of data in the enterprise by leveraging a domain-oriented, self-serve design.

Data Architecture

Data Architecture Architecture Data Lake NoSQL

Data Engineering Weekly #218

Data Engineering Weekly

APRIL 27, 2025

The blog outlines the challenges of traditional offset management, including inaccuracies stemming from control records and potential issues with stale metadata during leader changes. It highlights the benefits of committing the leader epoch alongside the offset.

Data Engineering

Data Engineering Data Engineer Engineering Kafka

2024 Governance Trends for Data Leaders

phData: Data Engineering

NOVEMBER 1, 2024

As data volumes grow, scalable solutions like data mesh and data fabric architectures are becoming more widespread due to their flexibility alongside complex organizational structures. Developing skills is also crucial, as teams need education on new architectures and technologies, along with fostering collaboration across different areas.

Government

Government Data Governance Finance Metadata

Collecting And Retaining Contextual Metadata For Powerful And Effective Data Discovery

Data Engineering Podcast

AUGUST 13, 2022

Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java.

Metadata

Metadata MongoDB MySQL Scala

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

Can you describe what role Trino and Iceberg play in Stripe's data architecture? what kinds of questions are you answering with table metadata what use case/team does that support comparative utility of iceberg REST catalog What are the shortcomings of Trino and Iceberg? Email hosts@dataengineeringpodcast.com with your story.

Data Lake

Data Lake High Quality Data Metadata Machine Learning

The Struggle Between Data Dark Ages and LLM Accuracy

Cloudera

DECEMBER 6, 2024

Hosted weekly by Paul Muller, The AI Forecast speaks to experts in the space to understand the ins and outs of AI in the enterprise, the kinds of data architectures and infrastructures that support it, the guardrails that should be put in place, and the success stories to emulateor cautionary tales to learn from.

Manufacturing

Manufacturing Retail Finance Metadata

Reflecting On The Past 6 Years Of Data Engineering

Data Engineering Podcast

FEBRUARY 5, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Your host is Tobias Macey and today I'm reflecting on the major trends in data engineering over the past 6 years Interview Introduction 6 years of running the Data Engineering Podcast Around the first time that data engineering was discussed as (..)

Data Engineering

Data Engineering Data Engineer Engineering PostgreSQL

Improving Pinterest Search Relevance Using Large Language Models

Pinterest Engineering

APRIL 4, 2025

Technical Design LLM as Relevance Model Model Architecture We use a cross-encoder language model to predict a Pins relevance to a query, along with Pin text, as shown in Figure 1. Figure 1: The cross-encoder architecture in the relevance teacher model. The task is formulated as a multiclass classification problem.

Machine Learning

Machine Learning Metadata Architecture Datasets

Build an Open Data Lakehouse with Iceberg Tables, Now in Public Preview

Snowflake

DECEMBER 4, 2023

Apache Iceberg’s ecosystem of diverse adopters, contributors and commercial support continues to grow, establishing itself as the industry standard table format for an open data lakehouse architecture. Snowflake’s support for Iceberg Tables is now in public preview, helping customers build and integrate Snowflake into their lake architecture.

Building

Building Metadata Cloud Storage AWS

2025 Planning Insights: Data Governance Adoption Has Risen Dramatically

Precisely

DECEMBER 9, 2024

To name a few: privacy and security considerations compliance demands interest in emerging data management architectures like data mesh and data fabric increased AI adoption The findings show that data governance is the most-cited data challenge inhibiting progress toward AI initiatives (62%). This is likely driven by various factors.

Data Governance

Data Governance Government Data Integration Architecture

Simplifying Multimodal Data Analysis with Snowflake Cortex AI

Snowflake

APRIL 16, 2025

Process all your data where it already lives Fragmented data environments and complex cloud architectures impede efficiency and innovation. This resulting streamlined architecture reduces complexity, accelerates time-to-insight and lowers total cost of ownership.

Data Analysis

Data Analysis Unstructured Data Manufacturing Retail

A Look At The Data Systems Behind The Gameplay For League Of Legends

Data Engineering Podcast

NOVEMBER 20, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Can you describe the current architecture of your data platform? Atlan is the metadata hub for your data ecosystem.

Systems

Systems Metadata Data Pipeline MongoDB

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Modern Data Architecture: Data Mesh and Data Fabric 101

Webinars

Trending Sources

Simplifying Data Architecture and Security to Accelerate Value

Webinars

Unapologetically Technical Episode 20 – Shane Murray

Level Up Your Data Platform With Active Metadata

Data Engineering Best Practices - #2. Metadata & Logging

Why Open Table Format Architecture is Essential for Modern Data Systems

How Apache Iceberg Is Changing the Face of Data Lakes

Modern Data Governance: Trends for 2025

Toward a Data Mesh (part 2) : Architecture & Technologies

Agents of Change: Navigating 2025 with AI and Data Innovation

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Addressing The Challenges Of Component Integration In Data Platform Architectures

Foundation Model for Personalized Recommendation

Eliminate Friction In Your Data Platform Through Unified Metadata Using OpenMetadata

How Meta understands data at scale

How LinkedIn Adopted A GraphQL Architecture for Product Development

The Data Turf Wars are Over, But the Metadata Turf Wars Have Just Begun

The Modern Data Lakehouse: An Architectural Innovation

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

Kubernetes Prometheus: Definition, Architecture, Pros & Cons

The View From The Lakehouse Of Architectural Patterns For Your Data Platform

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Title Launch Observability at Netflix Scale

Security Reference Architecture Summary for Cloudera Data Platform

Scale Unstructured Text Analytics with Batch LLM Inference

Turbocharging Atlas: How we reduced server initialization time to less than 2 minutes

A Reference Architecture for the Cloudera Private Cloud Base Data Platform

Establishing a Large Scale Learned Retrieval System at Pinterest

How to Manage Risk with Modern Data Architectures

Beyond Kafka: Conversation with Jark Wu on Fluss - Streaming Storage for Real-Time Analytics

Is Apache Iceberg the New Hadoop? Navigating the Complexities of Modern Data Lakehouses

Breaking State and Local Data Silos with Modern Data Architectures

Data Engineering Weekly #218

2024 Governance Trends for Data Leaders

Collecting And Retaining Contextual Metadata For Powerful And Effective Data Discovery

Being Data Driven At Stripe With Trino And Iceberg

The Struggle Between Data Dark Ages and LLM Accuracy

Reflecting On The Past 6 Years Of Data Engineering

Improving Pinterest Search Relevance Using Large Language Models

Build an Open Data Lakehouse with Iceberg Tables, Now in Public Preview

2025 Planning Insights: Data Governance Adoption Has Risen Dramatically

Simplifying Multimodal Data Analysis with Snowflake Cortex AI

A Look At The Data Systems Behind The Gameplay For League Of Legends

Stay Connected