Remove Blog Remove Metadata Remove Systems
article thumbnail

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Cloudera

Additionally, multiple copies of the same data locked in proprietary systems contribute to version control issues, redundancies, staleness, and management headaches. It leverages knowledge graphs to keep track of all the data sources and data flows, using AI to fill the gaps so you have the most comprehensive metadata management solution.

article thumbnail

Establishing a Large Scale Learned Retrieval System at Pinterest

Pinterest Engineering

Modern large-scale recommendation systems usually include multiple stages where retrieval aims at retrieving candidates from billions of candidate pools, and ranking predicts which item a user tends to engage from the trimmed candidate set retrieved from early stages [2]. General multi-stage recommendation system design in Pinterest.

Systems 67
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Level Up Your Data Platform With Active Metadata

Data Engineering Podcast

Summary Metadata is the lifeblood of your data platform, providing information about what is happening in your systems. A variety of platforms have been developed to capture and analyze that information to great effect, but they are inherently limited in their utility due to their nature as storage systems.

Metadata 130
article thumbnail

How Meta discovers data flows via lineage at scale

Engineering at Meta

It is a critical and powerful tool for scalable discovery of relevant data and data flows, which supports privacy controls across Metas systems. In this blog, we will delve into an early stage in PAI implementation: data lineage. Data lineage enables us to efficiently navigate these assets and protect user data.

article thumbnail

Unapologetically Technical Episode 20 – Shane Murray

Jesse Anderson

Shane diagrams Monte Carlo’s architecture, explaining how it uses agents, metadata, and query logs to provide lineage and monitor data health across complex stacks (Snowflake, Databricks, etc.). We then dive deep into Monte Carlo Data, defining data observability and the crucial concept of “data downtime” (TTD + TTR).

article thumbnail

How Meta understands data at scale

Engineering at Meta

Meta’s vast and diverse systems make it particularly challenging to comprehend its structure, meaning, and context at scale. We discovered that a flexible and incremental approach was necessary to onboard the wide variety of systems and languages used in building Metas products. We believe that privacy drives product innovation.

article thumbnail

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

The world we live in today presents larger datasets, more complex data, and diverse needs, all of which call for efficient, scalable data systems. In this blog, we will discuss: What is the Open Table format (OTF)? These systems are built on open standards and offer immense analytical and transactional processing flexibility.