Remove Data Warehouse Remove High Quality Data Remove Metadata
article thumbnail

How Meta discovers data flows via lineage at scale

Engineering at Meta

In order to build high-quality data lineage, we developed different techniques to collect data flow signals across different technology stacks: static code analysis for different languages, runtime instrumentation, and input and output data matching, etc. Hack, C++, Python, etc.)

article thumbnail

The Rise of the Data Engineer

Maxime Beauchemin

Data modeling is changing Typical data modeling techniques — like the star schema  — which defined our approach to data modeling for the analytics workloads typically associated with data warehouses, are less relevant than they once were.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Implementing Data Contracts in the Data Warehouse

Monte Carlo

In this article, Chad Sanderson , Head of Product, Data Platform , at Convoy and creator of Data Quality Camp , introduces a new application of data contracts: in your data warehouse. In the last couple of posts , I’ve focused on implementing data contracts in production services.

article thumbnail

Data Quality Score: The next chapter of data quality at Airbnb

Airbnb Tech

However, for all of our uncertified data, which remained the majority of our offline data, we lacked visibility into its quality and didn’t have clear mechanisms for up-leveling it. How could we scale the hard-fought wins and best practices of Midas across our entire data warehouse?

article thumbnail

From Big Data to Better Data: Ensuring Data Quality with Verity

Lyft Engineering

High-quality data is necessary for the success of every data-driven company. It is now the norm for tech companies to have a well-developed data platform. This makes it easy for engineers to generate, transform, store, and analyze data at the petabyte scale. What and Where is Data Quality?

article thumbnail

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

With this announcement, we welcome our customer data teams to streamline data transformation pipelines in their open data lakehouse using any engine on top of data in any format in any form factor and deliver high quality data that their business can trust. The Open Data Lakehouse .

article thumbnail

Data Engineering Weekly #186

Data Engineering Weekly

It then passes through various ranking systems like Mustang, Superroot, and NavBoost, which refine the results to the top 10 based on factors like content quality, user behavior, and link analysis. The author did an amazing job of describing how Parquet stores the data and compression and metadata strategies.