Data Warehouse, High Quality Data and Metadata

How Meta discovers data flows via lineage at scale

Engineering at Meta

JANUARY 22, 2025

In order to build high-quality data lineage, we developed different techniques to collect data flow signals across different technology stacks: static code analysis for different languages, runtime instrumentation, and input and output data matching, etc. Hack, C++, Python, etc.)

Data Warehouse

Data Warehouse SQL Programming Language Data

The Rise of the Data Engineer

Maxime Beauchemin

JANUARY 20, 2017

Data modeling is changing Typical data modeling techniques — like the star schema — which defined our approach to data modeling for the analytics workloads typically associated with data warehouses, are less relevant than they once were.

Data Engineering

Data Engineering Data Engineer Engineering ETL Tools

Implementing Data Contracts in the Data Warehouse

Monte Carlo

JANUARY 25, 2023

In this article, Chad Sanderson , Head of Product, Data Platform , at Convoy and creator of Data Quality Camp , introduces a new application of data contracts: in your data warehouse. In the last couple of posts , I’ve focused on implementing data contracts in production services.

Data Warehouse

Data Warehouse Data High Quality Data Metadata

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Data Quality Score: The next chapter of data quality at Airbnb

Airbnb Tech

NOVEMBER 28, 2023

However, for all of our uncertified data, which remained the majority of our offline data, we lacked visibility into its quality and didn’t have clear mechanisms for up-leveling it. How could we scale the hard-fought wins and best practices of Midas across our entire data warehouse?

Data Warehouse

Data Warehouse Metadata Data Certification

From Big Data to Better Data: Ensuring Data Quality with Verity

Lyft Engineering

OCTOBER 3, 2023

High-quality data is necessary for the success of every data-driven company. It is now the norm for tech companies to have a well-developed data platform. This makes it easy for engineers to generate, transform, store, and analyze data at the petabyte scale. What and Where is Data Quality?

Big Data

Big Data Metadata Data Warehouse Data

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

With this announcement, we welcome our customer data teams to streamline data transformation pipelines in their open data lakehouse using any engine on top of data in any format in any form factor and deliver high quality data that their business can trust. The Open Data Lakehouse .

Data Warehouse

Data Warehouse Data Lake Government High Quality Data

Data Engineering Weekly #186

Data Engineering Weekly

AUGUST 25, 2024

It then passes through various ranking systems like Mustang, Superroot, and NavBoost, which refine the results to the top 10 based on factors like content quality, user behavior, and link analysis. The author did an amazing job of describing how Parquet stores the data and compression and metadata strategies.

Data Engineering

Data Engineering Data Engineer Engineering Database-centric

Modern Data Management Essentials: Exploring Data Fabric

Precisely

JULY 18, 2024

Data management recommendations and data products emerge dynamically from the fabric through automation, activation, and AI/ML analysis of metadata. As data grows exponentially, so do the complexities of managing and leveraging it to fuel AI and analytics.

Data Management

Data Management Management Metadata Database-centric

5 Layers of Data Lakehouse Architecture Explained

Monte Carlo

JANUARY 5, 2024

You know what they always say: data lakehouse architecture is like an onion. …ok, Data lakehouse architecture combines the benefits of data warehouses and data lakes, bringing together the structure and performance of a data warehouse with the flexibility of a data lake. Metadata layer 4.

Architecture

Architecture Data Lake Metadata Unstructured Data

Data Lakehouse Architecture Explained: 5 Layers

Monte Carlo

JANUARY 5, 2024

You know what they always say: data lakehouse architecture is like an onion. …ok, Data lakehouse architecture combines the benefits of data warehouses and data lakes, bringing together the structure and performance of a data warehouse with the flexibility of a data lake. Metadata layer 4.

Architecture

Architecture Data Lake Metadata Unstructured Data

Just Launched: Dremio SQL Query Engine Data Quality Monitoring

Monte Carlo

AUGUST 30, 2024

It’s our goal at Monte Carlo to provide data observability and quality across the enterprise by monitoring every system vital in the delivery of data from source to consumption. We started with popular modern data warehouses and quickly expanded our support as data lakes became data lakehouses.

SQL

SQL Engineering Data Lake High Quality Data

Available Now! Automated Testing for Data Transformations

Wayne Yaddow

FEBRUARY 18, 2025

Carefully curated test data (realistic samples, edge cases, golden datasets) that reveal issuesearly. Proper tooling & environment (Python ecosystem for Great Expectations, data warehouse credentials and macros fordbt).

Data Pipeline

Data Pipeline SQL Raw Data Python

What is Data Observability? 5 Key Pillars To Know

Monte Carlo

AUGUST 10, 2023

Data lineage provides the answer by telling you which upstream sources and downstream ingestors were impacted, as well as which teams are generating the data and who is accessing it. Both terms are focused on the practice of ensuring healthy, high quality data across an organization. It is still relevant today.

Data Pipeline

Data Pipeline Software Engineering Software Engineer Machine Learning

What is Data Fabric: Architecture, Principles, Advantages, and Ways to Implement

AltexSoft

AUGUST 22, 2022

A data fabric is an architecture design presented as an integration and orchestration layer built on top of multiple disjointed data sources like relational databases , data warehouses , data lakes, data marts , IoT , legacy systems, etc., to provide a unified view of all enterprise data.

Architecture

Architecture Metadata Data Lake Machine Learning

4 Native Snowflake Data Quality Checks & Features You Should Know

Monte Carlo

APRIL 21, 2022

Adopting a cloud data warehouse like Snowflake is an important investment for any organization that wants to get the most value out of their data. This query will fetch a list of all tables within a database, along with helpful metadata about their settings.

Metadata

Metadata Bytes Government Data

The Symbiotic Relationship Between AI and Data Engineering

Ascend.io

FEBRUARY 28, 2024

While data engineering and Artificial Intelligence (AI) may seem like distinct fields at first glance, their symbiosis is undeniable. The foundation of any AI system is high-quality data. Here lies the critical role of data engineering: preparing and managing data to feed AI models.

Data Engineering

Data Engineering Data Engineer Engineering Metadata

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

DataKitchen

SEPTEMBER 21, 2023

Data in Place refers to the organized structuring and storage of data within a specific storage medium, be it a database, bucket store, files, or other storage platforms. In the contemporary data landscape, data teams commonly utilize data warehouses or lakes to arrange their data into L1, L2, and L3 layers.

Raw Data

Raw Data Data Business Intelligence Data Engineering

Interpreting the Gartner Data Observability Market Guide

Monte Carlo

AUGUST 13, 2024

In addition to discussing what the data observability category is, Gartner also makes a point to explain how it’s both different and complements traditional data quality approaches, as well as the vendors placed in their augmented data quality category that have their own critical capabilities—some of which overlap.

Data

Data Data Warehouse Data Pipeline Data Architecture

Best Data Observability Tools (with RFP Template and Analyst Reports)

Monte Carlo

FEBRUARY 8, 2024

How does it impact data warehouse/lakehouse performance? Data observability tools should offer both broad automated metadata monitoring across all the tables once they have been added to your selected schemas, as well as deep monitoring for issues inherent in the data itself. Robust role based access controls?

BI

BI Data Warehouse Data Pipeline Data

Managing Big Data Quality And 4 Reasons To Go Smaller

Monte Carlo

JUNE 23, 2022

At some point in the last two decades, the size of our data became inextricably linked to our ego. We watched enviously as FAANG companies talked about optimizing hundreds of petabyes in their data lakes or data warehouses. We imagined what it would be like to manage big data quality at that scale.

Big Data

Big Data Management Machine Learning Data Warehouse

Creating Value With a Data-Centric Culture: Essential Capabilities to Treat Data as a Product

Ascend.io

JUNE 8, 2023

Whether it be the marketing team seeking customer insights, the finance team working on budgeting, or executives crafting business strategies, data needs to be shared in a manner that aligns with their specific objectives and competencies. It is the stage where data truly becomes a product, delivering tangible value to its end users.

Pipeline-centric

Pipeline-centric Database-centric Data Ingestion Data Pipeline

What is dbt Testing? Definition, Best Practices, and More

Monte Carlo

AUGUST 30, 2023

Often, teams run custom data tests as part of a deployment pipeline, or scheduled on production systems via job schedulers like Apache Airflow, dbt Cloud, or via in-built schedulers in your data warehouse solution. We could talk data quality all day long. Here are some common use cases for dbt tests.

SQL

SQL Datasets Database High Quality Data

Celebrating the New Pioneers of Data Reliability

Monte Carlo

AUGUST 17, 2021

“What excites me about Monte Carlo is their vision for making the delivery of data more reliable and transparent through observability. Their artificial intelligence data-driven platform relies on high-quality data to make coverage recommendations for customers.

Insurance

Insurance Retail Data Pipeline Portfolio

61 Data Observability Use Cases From Real Data Teams

Monte Carlo

MAY 17, 2023

Data Warehouse (Or Lakehouse) Migration 34. Integrate Data Stacks Post Merger 35. Know When To Fix Vs. Refactor Data Pipelines Improve DataOps Processes 37. “We Another common breaking schema change scenario is when data teams sync their production database with their data warehouse as is the case with Freshly.

Data

Data Data Pipeline Data Engineering Data Engineer

61 Data Observability Use Cases That Aren’t Totally Made Up

Monte Carlo

MAY 17, 2023

Data warehouse (or Lakehouse) migration 34. Integrate Data Stacks Post Merger 35. Know When To Fix Vs. Refactor Data Pipelines Improve DataOps Processes 37. “We Another common breaking schema change scenario is when data teams sync their production database with their data warehouse as is the case with Freshly.

Data Pipeline

Data Pipeline Data Engineering Data Engineer Data

Data Engineering Digest

How Meta discovers data flows via lineage at scale

The Rise of the Data Engineer

Webinars

Trending Sources

Implementing Data Contracts in the Data Warehouse

Webinars

Data Quality Score: The next chapter of data quality at Airbnb

From Big Data to Better Data: Ensuring Data Quality with Verity

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Data Engineering Weekly #186

Modern Data Management Essentials: Exploring Data Fabric

5 Layers of Data Lakehouse Architecture Explained

Data Lakehouse Architecture Explained: 5 Layers

Just Launched: Dremio SQL Query Engine Data Quality Monitoring

Available Now! Automated Testing for Data Transformations

What is Data Observability? 5 Key Pillars To Know

What is Data Fabric: Architecture, Principles, Advantages, and Ways to Implement

4 Native Snowflake Data Quality Checks & Features You Should Know

The Symbiotic Relationship Between AI and Data Engineering

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

Interpreting the Gartner Data Observability Market Guide

Best Data Observability Tools (with RFP Template and Analyst Reports)

Managing Big Data Quality And 4 Reasons To Go Smaller

Creating Value With a Data-Centric Culture: Essential Capabilities to Treat Data as a Product

What is dbt Testing? Definition, Best Practices, and More

Celebrating the New Pioneers of Data Reliability

61 Data Observability Use Cases From Real Data Teams

61 Data Observability Use Cases That Aren’t Totally Made Up

Stay Connected