Accessibility, Data Warehouse and Metadata

Data logs: The latest evolution in Meta’s access tools

Engineering at Meta

FEBRUARY 4, 2025

Here we explore initial system designs we considered, an overview of the current architecture, and some important principles Meta takes into account in making data accessible and easy to understand. Users have a variety of tools they can use to manage and access their information on Meta platforms. What are data logs?

Accessibility

Accessibility Accessible Raw Data Data Warehouse

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Traditional databases excelled at structured data and transactional workloads but struggled with performance at scale as data volumes grew.

Data Lake

Data Lake Metadata Cloud Storage Data Warehouse

How Meta discovers data flows via lineage at scale

Engineering at Meta

JANUARY 22, 2025

These stages propagate through various systems including function-based systems that load, process, and propagate data through stacks of function calls in different programming languages (e.g., For simplicity, we will demonstrate these for the web, the data warehouse, and AI, per the diagram below. Hack, C++, Python, etc.)

Data Warehouse

Data Warehouse SQL Programming Language Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How Meta understands data at scale

Engineering at Meta

APRIL 28, 2025

Challenge Approach Understanding at scale (lack of foundation) At Meta, we manage hundreds of data systems and millions of assets across our family of apps. Each product features its own distinct data model, physical schema, query language, and access patterns. Creating a canonical representation for compliance tools.

Metadata

Metadata Data Utilities Data Warehouse

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. These patterns include both centralized storage patterns like data warehouse , data lake and data lakehouse , and distributed patterns such as data mesh.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

Snowflake

DECEMBER 4, 2024

The next evolution in data is making it AI ready. For years, an essential tenet of digital transformation has been to make data accessible, to break down silos so that the enterprise can draw value from all of its data. For this reason, internal-facing AI will continue to be the focus for the next couple of years.

Unstructured Data

Unstructured Data Data Lake Deep Learning Structured Data

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Netflix Tech

OCTOBER 27, 2020

Usually Data scientists and engineers write Extract-Transform-Load (ETL) jobs and pipelines using big data compute technologies, like Spark or Presto , to process this data and periodically compute key information for a member or a video. The processed data is typically stored as data warehouse tables in AWS S3.

Data Warehouse

Data Warehouse Datasets Data Big Data

Key considerations when making a decision on a Cloud Data Warehouse

Cloudera

MAY 17, 2021

Making a decision on a cloud data warehouse is a big deal. Modernizing your data warehousing experience with the cloud means moving from dedicated, on-premises hardware focused on traditional relational analytics on structured data to a modern platform.

Data Warehouse

Data Warehouse Cloud Government Metadata

Reflecting On The Past 6 Years Of Data Engineering

Data Engineering Podcast

FEBRUARY 5, 2023

Sign up now for early access to Materialize and get started with the power of streaming data with the same simplicity and low implementation cost as batch cloud data warehouses. Go to [dataengineeringpodcast.com/materialize]([link] Support Data Engineering Podcast

Data Engineering

Data Engineering Data Engineer Engineering PostgreSQL

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

First, we create an Iceberg table in Snowflake and then insert some data. Then, we add another column called HASHKEY , add more data, and locate the S3 file containing metadata for the iceberg table. In the screenshot below, we can see that the metadata file for the Iceberg table retains the snapshot history.

Architecture

Architecture Systems Data Lake Google Cloud

The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 19, 2023

Summary Cloud data warehouses have unlocked a massive amount of innovation and investment in data applications, but they are still inherently limiting. Because of their complete ownership of your data they constrain the possibilities of what data you can store and how it can be used.

IT

IT Data Lake Metadata Data Warehouse

Is Apache Iceberg the New Hadoop? Navigating the Complexities of Modern Data Lakehouses

Data Engineering Weekly

MARCH 5, 2025

This ecosystem includes: Catalogs: Services that manage metadata about Iceberg tables (e.g., Compute Engines: Tools that query and process data stored in Iceberg tables (e.g., Maintenance Processes: Operations that optimize Iceberg tables, such as compacting small files and managing metadata. Trino, Spark, Snowflake, DuckDB).

Hadoop

Hadoop Metadata Data Ingestion Data Governance

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. With this 3rd platform generation, you have more real time data analytics and a cost reduction because it is easier to manage this infrastructure in the cloud thanks to managed services. What you have to code is this workflow !

Technology

Technology Architecture Google Cloud Metadata

Importance of Column Selection in AI-driven automated insights

ThoughtSpot

APRIL 7, 2025

The challenge, however, lies in accessing the relevant data. This is problematic due to the following reasons: High Cost: Fetching data for each cloud data warehouse (CDW) query with specific filters is computationally expensive. Data filtering algorithms Lets look at the algorithm at work.

Metadata

Metadata Algorithm BI Machine Learning

A Cost-Effective Data Warehouse Solution in CDP Public Cloud – Part1

Cloudera

FEBRUARY 9, 2021

Today’s customers have a growing need for a faster end to end data ingestion to meet the expected speed of insights and overall business demand. This ‘need for speed’ drives a rethink on building a more modern data warehouse solution, one that balances speed with platform cost management, performance, and reliability.

Data Warehouse

Data Warehouse Cloud Kafka Cloud Storage

The Rise of the Data Engineer

Maxime Beauchemin

JANUARY 20, 2017

Data modeling is changing Typical data modeling techniques — like the star schema — which defined our approach to data modeling for the analytics workloads typically associated with data warehouses, are less relevant than they once were.

Data Engineering

Data Engineering Data Engineer Engineering ETL Tools

The High Cost of Poor Data Warehouse Governance

Monte Carlo

SEPTEMBER 10, 2024

This truth was hammered home recently when ride-hailing giant Uber found itself on the receiving end of a staggering €290 million ($324 million) fine from the Dutch Data Protection Authority. Poor data warehouse governance practices that led to the improper handling of sensitive European driver data. The reason?

Data Warehouse

Data Warehouse Government Data Governance Metadata

Enabling Self-Service Business Insights with Cloudera Data Warehouse

Cloudera

JANUARY 11, 2021

How self-service data warehousing frees IT resources. Cloudera Data Warehouse (CDW) is a cloud service and an integral part of the newly released Cloudera Data Platform (CDP). Key features are: Highly scalable and performant open-source engines for BI and data warehousing workloads. Simplified provisioning.

Data Warehouse

Data Warehouse Pharmaceutical Data Lake BI

Stop Overcomplicating Data Quality

Towards Data Science

DECEMBER 10, 2024

Take advantage of old school databasetricks In the last 1015 years weve seen massive changes to the data industry, notably big data, parallel processing, cloud computing, data warehouses, and new tools (lots and lots of newtools). Consequently, weve had to say goodbye to some things to make room for all this new stuff.

PostgreSQL

PostgreSQL Data Python SQL

The Downfall of the Data Engineer

Maxime Beauchemin

AUGUST 28, 2017

Consensus seeking Whether you think that old-school data warehousing concepts are fading or not, the quest to achieve conformed dimensions and conformed metrics is as relevant as it ever was. The data warehouse needs to reflect the business, and the business should have clarity on how it thinks about analytics.

Data Engineering

Data Engineering Data Engineer Engineering Software Engineer

Data Lake vs. Data Warehouse vs. Data Lakehouse

Sync Computing

NOVEMBER 7, 2024

While a +3500 year data retention capability for data stored on clay tablets is impressive, the access latency and forward compatibility of clay tablets fall a little short. Similarly, data platforms based on the business needs of the past dont always meet the needs of today. Save up to 50% on compute! Book a Demo!

Data Lake

Data Lake Data Warehouse Business Intelligence Unstructured Data

Data Lakes vs. Data Warehouses

Grouparoo

JANUARY 11, 2022

When it comes to storing large volumes of data, a simple database will be impractical due to the processing and throughput inefficiencies that emerge when managing and accessing big data. This article looks at the options available for storing and processing big data, which is too large for conventional databases to handle.

Data Lake

Data Lake Data Warehouse Unstructured Data Raw Data

A Look At The Data Systems Behind The Gameplay For League Of Legends

Data Engineering Podcast

NOVEMBER 20, 2022

Summary The majority of blog posts and presentations about data engineering and analytics assume that the consumers of those efforts are internal business users accessing an environment controlled by the business. Atlan is the metadata hub for your data ecosystem.

Systems

Systems Metadata Data Pipeline MongoDB

Ready-to-go sample data pipelines with Dataflow

Netflix Tech

DECEMBER 3, 2022

The most commonly used one is dataflow project , which helps folks in managing their data pipeline repositories through creation, testing, deployment and few other activities. It lets you create YAML formatted mock data files based on selected tables, columns and a few rows of data from the Netflix data warehouse.

Data Pipeline

Data Pipeline Scala Metadata Food

Unlocking The Power of Data Lineage In Your Platform with OpenLineage

Data Engineering Podcast

MAY 18, 2021

RudderStack’s smart customer data pipeline is warehouse-first. It builds your customer data warehouse and your identity graph on your data warehouse, with support for Snowflake, Google BigQuery, Amazon Redshift, and more. Interview Introduction How did you get involved in the area of data management?

Metadata

Metadata Kafka Data Warehouse Hadoop

The Hidden Threats in Your Data Warehouse Layers (And How to Fix Them)

Monte Carlo

AUGUST 6, 2024

Data warehouses are the centralized repositories that store and manage data from various sources. They are integral to an organization’s data strategy, ensuring data accessibility, accuracy, and utility. Integration Layer : Where your data transformations and business logic are applied.

Data Warehouse

Data Warehouse Raw Data Machine Learning BI

Building A Data Governance Bridge Between Cloud And Datacenters For The Enterprise At Privacera

Data Engineering Podcast

MARCH 27, 2022

You have full control over your data and their plugin system lets you integrate with all of your other data tools, including data warehouses and SaaS platforms. Acryl]([link] The modern data stack needs a reimagined metadata management platform.

Data Governance

Data Governance Government Cloud Building

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

While cloud-native, point-solution data warehouse services may serve your immediate business needs, there are dangers to the corporation as a whole when you do your own IT this way. Cloudera Data Warehouse (CDW) is here to save the day! CDW is an integrated data warehouse service within Cloudera Data Platform (CDP).

IT

IT Data Lake Data Warehouse Cloud Storage

Is Modern Data Warehouse Architecture Broken?

Monte Carlo

APRIL 16, 2022

The data warehouse is the foundation of the modern data stack, so it caught our attention when we saw Convoy head of data Chad Sanderson declare, “ the data warehouse is broken ” on LinkedIn. Treating data like an API. Immutable data warehouses have challenges too.

Data Warehouse

Data Warehouse Architecture Data Data Architect

Extreme data center pressure? Burst to the cloud with CDP!

Cloudera

NOVEMBER 12, 2020

Your sunk costs are minimal and if a workload or project you are supporting becomes irrelevant, you can quickly spin down your cloud data warehouses and not be “stuck” with unused infrastructure. Cloud deployments for suitable workloads gives you the agility to keep pace with rapidly changing business and data needs.

Cloud

Cloud Data Warehouse Banking Data

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Monte Carlo

AUGUST 25, 2023

Different vendors offering data warehouses, data lakes, and now data lakehouses all offer their own distinct advantages and disadvantages for data teams to consider. So let’s get to the bottom of the big question: what kind of data storage layer will provide the strongest foundation for your data platform?

Data Lake

Data Lake Data Warehouse Unstructured Data Raw Data

ThoughtSpot Sage: data security with large language models

ThoughtSpot

MAY 31, 2023

This includes services that: Manage and monitor the tenant-specific resources—this does not include access to tenant data Maintains indexed data to serve as your application home page. This multi-tenant service isolates the tenant metadata index, authorizing and filtering the search answer requests from every tenant.

Data Security

Data Security Metadata Data Warehouse Transportation

Business Intelligence In The Palm Of Your Hand With Zing Data

Data Engineering Podcast

DECEMBER 4, 2022

Summary Business intelligence is the foremost application of data in organizations of all sizes. The typical conception of how it is accessed is through a web or desktop application running on a powerful laptop. Zing Data is building a mobile native platform for business intelligence.

Business Intelligence

Business Intelligence Metadata BI MongoDB

Data Lake vs Data Warehouse - Working Together in the Cloud

ProjectPro

AUGUST 11, 2021

“Data Lake vs Data Warehouse = Load First, Think Later vs Think First, Load Later” The terms data lake and data warehouse are frequently stumbled upon when it comes to storing large volumes of data. Data Warehouse Architecture What is a Data lake?

Data Lake

Data Lake Data Warehouse Cloud Hadoop

A decade of scaling (real-time) analytics and master data at Picnic

Picnic Engineering

MARCH 28, 2025

TL;DR Over the past decade, Picnic transformed its approach to dataevolving from a single, all-purpose data team into multiple specialized teams using a lean, scalable tech stack. We empowered analysts by giving them access and training to the same tools as engineers, dramatically increasing speed and impact. The impact was massive.

Data Warehouse

Data Warehouse PostgreSQL Python Machine Learning

A Primer On Enterprise Data Curation with Todd Walter - Episode 49

Data Engineering Podcast

SEPTEMBER 23, 2018

Summary As your data needs scale across an organization the need for a carefully considered approach to collection, storage, organization, and access becomes increasingly critical. In terms of infrastructure, what are the components of a modern data architecture and how has that changed over the years?

Data Lake

Data Lake Data Warehouse Data Architecture Architecture

What "Data Lineage Done Right" Looks Like And How They're Doing It At Manta

Data Engineering Podcast

JULY 31, 2022

Summary Data lineage is the roadmap for your data platform, providing visibility into all of the dependencies for any report, machine learning model, or data warehouse table that you are working with. Atlan is the metadata hub for your data ecosystem. Can you describe what Manta is and the story behind it?

IT

IT Metadata MongoDB MySQL

The Security Challenges of Data Warehousing in the Cloud

Cloudera

NOVEMBER 5, 2020

Many organizations struggle to meet growing and variable data warehouse demands. This is exactly what Cloudera Data Platform (CDP) provides to the Cloudera Data Warehouse. CDP is a data platform that is optimized for both business units and central IT. . Cloudera Data Warehouse Security.

Cloud

Cloud Data Lake Data Warehouse Metadata

Next Stop – Building a Data Pipeline from Edge to Insight

Cloudera

FEBRUARY 8, 2021

Geolocation data is also stored, which will help map customer locations to latitudes and longitudes to better understand where these motors are located after being sold in a vehicle. ECC will use Cloudera Data Engineering (CDE) to address the above data challenges (see Fig. 2 ECC data enrichment pipeline.

Data Pipeline

Data Pipeline Building Manufacturing Data Warehouse

Data News — Week 23.05

Christophe Blefari

FEBRUARY 3, 2023

The article shows our Netflix art creators are using past data to create new artworks. ebay, Variable Hub a data access layer for risk decisioning — Looks like a feature store but for risk topics. The idea is to create a unified layer that stores all the data needed to take decisions. It makes sense.

BI

BI Google Cloud Machine Learning SQL

Modern Data Engineering

Towards Data Science

NOVEMBER 4, 2023

Often it is a data warehouse solution (DWH) in the central part of our infrastructure. Data warehouse exmaple. It’s worth mentioning that its data frame transformations have been included in one of the basic methods of data loading for many modern data warehouses.

Data Engineering

Data Engineering Data Engineer Engineering BI

Ensono Cuts Costs with Snowflake Connector for ServiceNow

Snowflake

APRIL 23, 2024

If you’re a Snowflake customer using ServiceNow’s popular SaaS application to manage your digital workloads, data integration is about to get a lot easier — and less costly. The connector provides immediate access to up-to-date ServiceNow data without the need to manually integrate against API endpoints.

Data Warehouse

Data Warehouse Data Integration Consulting SQL

Supporting And Expanding The Arrow Ecosystem For Fast And Efficient Data Processing At Voltron Data

Data Engineering Podcast

NOVEMBER 27, 2022

Summary The data ecosystem has been growing rapidly, with new communities joining and bringing their preferred programming languages to the mix. This has led to inefficiencies in how data is stored, accessed, and shared across process and system boundaries. Atlan is the metadata hub for your data ecosystem.

Data Process

Data Process Process Metadata Business Intelligence

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

Cloudera

AUGUST 31, 2021

Cloudera and Accenture demonstrate strength in their relationship with an accelerator called the Smart Data Transition Toolkit for migration of legacy data warehouses into Cloudera Data Platform. Accenture’s Smart Data Transition Toolkit . Are you looking for your data warehouse to support the hybrid multi-cloud?

Data Warehouse

Data Warehouse Database-centric Metadata Cloud

Data logs: The latest evolution in Meta’s access tools

How Apache Iceberg Is Changing the Face of Data Lakes

Webinars

Trending Sources

How Meta discovers data flows via lineage at scale

Webinars

How Meta understands data at scale

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Key considerations when making a decision on a Cloud Data Warehouse

Reflecting On The Past 6 Years Of Data Engineering

Why Open Table Format Architecture is Essential for Modern Data Systems

The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse

Is Apache Iceberg the New Hadoop? Navigating the Complexities of Modern Data Lakehouses

Toward a Data Mesh (part 2) : Architecture & Technologies

Importance of Column Selection in AI-driven automated insights

A Cost-Effective Data Warehouse Solution in CDP Public Cloud – Part1

The Rise of the Data Engineer

The High Cost of Poor Data Warehouse Governance

Enabling Self-Service Business Insights with Cloudera Data Warehouse

Stop Overcomplicating Data Quality

The Downfall of the Data Engineer

Data Lake vs. Data Warehouse vs. Data Lakehouse

Data Lakes vs. Data Warehouses

A Look At The Data Systems Behind The Gameplay For League Of Legends

Ready-to-go sample data pipelines with Dataflow

Unlocking The Power of Data Lineage In Your Platform with OpenLineage

The Hidden Threats in Your Data Warehouse Layers (And How to Fix Them)

Building A Data Governance Bridge Between Cloud And Datacenters For The Enterprise At Privacera

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Is Modern Data Warehouse Architecture Broken?

Extreme data center pressure? Burst to the cloud with CDP!

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

ThoughtSpot Sage: data security with large language models

Business Intelligence In The Palm Of Your Hand With Zing Data

Data Lake vs Data Warehouse - Working Together in the Cloud

A decade of scaling (real-time) analytics and master data at Picnic

A Primer On Enterprise Data Curation with Todd Walter - Episode 49

What "Data Lineage Done Right" Looks Like And How They're Doing It At Manta

The Security Challenges of Data Warehousing in the Cloud

Next Stop – Building a Data Pipeline from Edge to Insight

Data News — Week 23.05

Modern Data Engineering

Ensono Cuts Costs with Snowflake Connector for ServiceNow

Supporting And Expanding The Arrow Ecosystem For Fast And Efficient Data Processing At Voltron Data

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

Stay Connected