Data Pipeline, Data Warehouse and Metadata

Ready-to-go sample data pipelines with Dataflow

Netflix Tech

DECEMBER 3, 2022

by Jasmine Omeke , Obi-Ike Nwoke , Olek Gorajek Intro This post is for all data practitioners, who are interested in learning about bootstrapping, standardization and automation of batch data pipelines at Netflix. You may remember Dataflow from the post we wrote last year titled Data pipeline asset management with Dataflow.

Data Pipeline

Data Pipeline Scala Metadata Food

Data News — Week 24.11

Christophe Blefari

MARCH 15, 2024

Attributing Snowflake cost to whom it belongs — Fernando gives ideas about metadata management to attribute better Snowflake cost. Forward thinking Dataviz is hierarchical — Malloy, once again, provides an excellent article about a new way to see data visualisations. This is Croissant. It's inspirational.

Metadata

Metadata Data Datasets Data Warehouse

How Meta understands data at scale

Engineering at Meta

APRIL 28, 2025

To ensure comprehensive protection, it is essential to apply the necessary steps to all systems that store or process data, including distributed systems (web systems, chat, mobile and backend services) and data warehouses. Consider the data flow from online systems to the data warehouse, as shown in the diagram below.

Metadata

Metadata Data Utilities Data Warehouse

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Eliminate Friction In Your Data Platform Through Unified Metadata Using OpenMetadata

Data Engineering Podcast

NOVEMBER 10, 2021

Summary A significant source of friction and wasted effort in building and integrating data management systems is the fragmentation of metadata across various tools. Start trusting your data with Monte Carlo today! Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads?

Metadata

Metadata Data Warehouse Data Lake BI

Next Stop – Building a Data Pipeline from Edge to Insight

Cloudera

FEBRUARY 8, 2021

Below is the entire set of steps in the data lifecycle, and each step in the lifecycle will be supported by a dedicated blog post(see Fig. 1): Data Collection – data ingestion and monitoring at the edge (whether the edge be industrial sensors or people in a vehicle showroom). 2 ECC data enrichment pipeline.

Data Pipeline

Data Pipeline Building Manufacturing Data Warehouse

Keeping Your Data Warehouse In Order With DataForm

Data Engineering Podcast

OCTOBER 14, 2019

Summary Managing a data warehouse can be challenging, especially when trying to maintain a common set of patterns. They provide an AWS-native, serverless, data infrastructure that installs in your VPC. Datacoral helps data engineers build and manage the flow of data pipelines without having to manage any infrastructure.

Data Warehouse

Data Warehouse PostgreSQL AWS Programming Language

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

Data Engineering Podcast

OCTOBER 15, 2021

Summary The binding element of all data work is the metadata graph that is generated by all of the workflows that produce the assets used by teams across the organization. The DataHub project was created as a way to bring order to the scale of LinkedIn’s data needs. No more scripts, just SQL.

Metadata

Metadata BI Data Warehouse Government

Optimizing data warehouse storage

Netflix Tech

DECEMBER 21, 2020

By Anupom Syam Background At Netflix, our current data warehouse contains hundreds of Petabytes of data stored in AWS S3 , and each day we ingest and create additional Petabytes. Some of the optimizations are prerequisites for a high-performance data warehouse.

Data Warehouse

Data Warehouse Metadata Algorithm Data

Data Pipeline Observability: A Model For Data Engineers

Databand.ai

JUNE 28, 2023

Data Pipeline Observability: A Model For Data Engineers Eitan Chazbani June 29, 2023 Data pipeline observability is your ability to monitor and understand the state of a data pipeline at any time. We believe the world’s data pipelines need better data observability.

Data Pipeline

Data Pipeline Data Engineering Data Engineer Engineering

Data Engineering Weekly #198

Data Engineering Weekly

NOVEMBER 24, 2024

link] Jon Osborn: Best Practices for Using QUERY_TAG in Snowflake The modern data warehouses are good at running at scale, given the cost is not a constraint. The service offers configurable counter types optimized for various use cases with a unified Control Plane configuration. I’ve seen a similar work by Ben E.

Data Engineering

Data Engineering Data Engineer Engineering Insurance

Data logs: The latest evolution in Meta’s access tools

Engineering at Meta

FEBRUARY 4, 2025

Meta joins the Data Transfer Project and has continuously led the development of shared technologies that enable users to port their data from one platform to another. 2024: Users can access data logs in Download Your Information. What are data logs?

Accessible

Accessible Accessibility Raw Data Data Warehouse

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Data Engineering Podcast

DECEMBER 18, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan's active metadata capabilities. Struggling with broken pipelines? Missing data? Atlan is the metadata hub for your data ecosystem.

Metadata

Metadata Business Intelligence Data Lake BI

Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI

Data Engineering Podcast

DECEMBER 29, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan's active metadata capabilities. Struggling with broken pipelines? Missing data? Atlan is the metadata hub for your data ecosystem.

Management

Management Metadata Business Intelligence Data Lake

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. With this 3rd platform generation, you have more real time data analytics and a cost reduction because it is easier to manage this infrastructure in the cloud thanks to managed services.

Technology

Technology Architecture Google Cloud Metadata

A Look At The Data Systems Behind The Gameplay For League Of Legends

Data Engineering Podcast

NOVEMBER 20, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. The biggest challenge with modern data systems is understanding what data you have, where it is located, and who is using it.

Systems

Systems Metadata Data Pipeline MongoDB

Collecting And Retaining Contextual Metadata For Powerful And Effective Data Discovery

Data Engineering Podcast

AUGUST 13, 2022

Data stacks are becoming more and more complex. This brings infinite possibilities for data pipelines to break and a host of other issues, severely deteriorating the quality of the data and causing teams to lose trust. Data stacks are becoming more and more complex.

Metadata

Metadata MongoDB MySQL Scala

Making The Total Cost Of Ownership For External Data Manageable With Crux

Data Engineering Podcast

JULY 17, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code.

Data Management

Data Management Management Metadata MongoDB

Combining The Simplicity Of Spreadsheets With The Power Of Modern Data Infrastructure At Canvas

Data Engineering Podcast

JUNE 19, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code.

Metadata

Metadata Unstructured Data MongoDB MySQL

Bring Geospatial Analytics Across Disparate Datasets Into Your Toolkit With The Unfolded Platform

Data Engineering Podcast

JUNE 26, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code.

Datasets

Datasets Unstructured Data Metadata MongoDB

Building A System Of Record For Your Organization's Data Ecosystem At Metaphor

Data Engineering Podcast

DECEMBER 19, 2021

Summary Building a well managed data ecosystem for your organization requires a holistic view of all of the producers, consumers, and processors of information. The team at Metaphor are building a fully connected metadata layer to provide both technical and social intelligence about your data. No more scripts, just SQL.

Systems

Systems Building Metadata Data Warehouse

How Column-Aware Development Tooling Yields Better Data Models

Data Engineering Podcast

JUNE 17, 2023

What is the importance of embedding column-level lineage awareness into transformation tool vs. layering on top w/ dedicated lineage/metadata tooling? What are the most interesting, innovative, or unexpected ways that you have seen column-aware data modeling used? ML, reverse ETL, etc.) ML, reverse ETL, etc.)

Data Lake

Data Lake Machine Learning Metadata Data Architecture

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Data Science Blog: Data Engineering

MAY 20, 2024

Continuous Integration and Continuous Delivery (CI/CD) for Data Pipelines: It is a Game-Changer with AnalyticsCreator! The need for efficient and reliable data pipelines is paramount in data science and data engineering. They transform data into a consistent format for users to consume.

Data Pipeline

Data Pipeline BI Data Lake Data Warehouse

Unlocking The Power of Data Lineage In Your Platform with OpenLineage

Data Engineering Podcast

MAY 18, 2021

Summary Data lineage is the common thread that ties together all of your data pipelines, workflows, and systems. In order to get a holistic understanding of your data quality, where errors are occurring, or how a report was constructed you need to track the lineage of the data from beginning to end.

Metadata

Metadata Kafka Data Warehouse Hadoop

Supporting And Expanding The Arrow Ecosystem For Fast And Efficient Data Processing At Voltron Data

Data Engineering Podcast

NOVEMBER 27, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Struggling with broken pipelines? Missing data? Stale dashboards? If this resonates with you, you’re not alone.

Data Process

Data Process Process Metadata Business Intelligence

The View From The Lakehouse Of Architectural Patterns For Your Data Platform

Data Engineering Podcast

JULY 3, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code.

Architecture

Architecture Metadata MongoDB MySQL

Modern Data Engineering

Towards Data Science

NOVEMBER 4, 2023

I’d like to discuss some popular Data engineering questions: Modern data engineering (DE). Does your DE work well enough to fuel advanced data pipelines and Business intelligence (BI)? Are your data pipelines efficient? Often it is a data warehouse solution (DWH) in the central part of our infrastructure.

Data Engineering

Data Engineering Data Engineer Engineering BI

Business Intelligence In The Palm Of Your Hand With Zing Data

Data Engineering Podcast

DECEMBER 4, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Data engineers don’t enjoy writing, maintaining, and modifying ETL pipelines all day, every day.

Business Intelligence

Business Intelligence Metadata BI MongoDB

Solving Data Lineage Tracking And Data Discovery At WeWork

Data Engineering Podcast

DECEMBER 16, 2019

Summary Building clean datasets with reliable and reproducible ingestion pipelines is completely useless if it’s not possible to find them and understand their provenance. The solution to discoverability and tracking of data lineage is to incorporate a metadata repository into your data platform.

Metadata

Metadata PostgreSQL Datasets Data Warehouse

An Exploration Of Tobias' Experience In Building A Data Lakehouse From Scratch

Data Engineering Podcast

DECEMBER 25, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan's active metadata capabilities. Struggling with broken pipelines? Missing data? Again, be prepared to have metadata challenges especially.

Building

Building Metadata Business Intelligence Data Lake

An Engineering Guide to Data Quality - A Data Contract Perspective - Part 2

Data Engineering Weekly

MAY 16, 2023

I won’t bore you with the importance of data quality in the blog. Instead, Let’s examine the current data pipeline architecture and ask why data quality is expensive. Instead of looking at the implementation of the data quality frameworks, Let's examine the architectural patterns of the data pipeline.

Engineering

Engineering Kafka Data Pipeline Data Warehouse

How to learn data engineering

Christophe Blefari

JANUARY 20, 2024

Data engineering inherits from years of data practices in US big companies. Hadoop initially led the way with Big Data and distributed computing on-premise to finally land on Modern Data Stack — in the cloud — with a data warehouse at the center. My advice on this point is to learn from others.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Real World Change Data Capture At Datacoral

Data Engineering Podcast

MARCH 22, 2021

Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask.

Data Warehouse

Data Warehouse Metadata Hadoop Data Lake

Charting the Path of Riskified's Data Platform Journey

Data Engineering Podcast

JULY 10, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code.

Metadata

Metadata MongoDB MySQL Machine Learning

Altus Data Warehouse

Cloudera

SEPTEMBER 9, 2018

We are proud to announce the general availability of Cloudera Altus Data Warehouse , the only cloud data warehousing service that brings the warehouse to the data. Modern data warehousing for the cloud. Modern data warehousing for the cloud. Using Cloudera Altus for your cloud data warehouse.

Data Warehouse

Data Warehouse Metadata Cloud Storage Cloud

The Hidden Threats in Your Data Warehouse Layers (And How to Fix Them)

Monte Carlo

AUGUST 6, 2024

Data warehouses are the centralized repositories that store and manage data from various sources. They are integral to an organization’s data strategy, ensuring data accessibility, accuracy, and utility. However, beneath their surface lies a host of invisible risks embedded within the data warehouse layers.

Data Warehouse

Data Warehouse Raw Data Machine Learning BI

Making The Open Data Lakehouse Affordable Without The Overhead At Iomete

Data Engineering Podcast

OCTOBER 9, 2022

Summary The core of any data platform is the centralized storage and processing layer. For many that is a data warehouse, but in order to support a diverse and constantly changing set of uses and technologies the data lakehouse is a paradigm that offers a useful balance of scale and cost, with performance and ease of use.

Metadata

Metadata MongoDB AWS MySQL

What "Data Lineage Done Right" Looks Like And How They're Doing It At Manta

Data Engineering Podcast

JULY 31, 2022

Summary Data lineage is the roadmap for your data platform, providing visibility into all of the dependencies for any report, machine learning model, or data warehouse table that you are working with. Atlan is the metadata hub for your data ecosystem. Can you describe what Manta is and the story behind it?

IT

IT Metadata MongoDB MySQL

Stick All Of Your Systems And Data Together With SaaSGlue As Your Workflow Manager

Data Engineering Podcast

JULY 5, 2021

Summary At the core of every data pipeline is an workflow manager (or several). Deploying, managing, and scaling that orchestration can consume a large fraction of a data team’s energy so it is important to pick something that provides the power and flexibility that you need.

Systems

Systems Management Data Warehouse Programming Language

Understanding The Immune System With Data At ImmunAI

Data Engineering Podcast

FEBRUARY 20, 2022

RudderStack’s smart customer data pipeline is warehouse-first. It builds your customer data warehouse and your identity graph on your data warehouse, with support for Snowflake, Google BigQuery, Amazon Redshift, and more. RudderStack’s smart customer data pipeline is warehouse-first.

Systems

Systems Software Engineer Software Engineering Data Warehouse

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Monte Carlo

AUGUST 25, 2023

Different vendors offering data warehouses, data lakes, and now data lakehouses all offer their own distinct advantages and disadvantages for data teams to consider. So let’s get to the bottom of the big question: what kind of data storage layer will provide the strongest foundation for your data platform?

Data Lake

Data Lake Data Warehouse Unstructured Data Raw Data

How to Simplify Data Pipelines with DBT and Airflow?

Workfall

AUGUST 14, 2023

Reading Time: 7 minutes In today’s data-driven world, efficient data pipelines have become the backbone of successful organizations. These pipelines ensure that data flows smoothly from various sources to its intended destinations, enabling businesses to make informed decisions and gain valuable insights.

Data Pipeline

Data Pipeline Data Raw Data Database

Extreme data center pressure? Burst to the cloud with CDP!

Cloudera

NOVEMBER 12, 2020

Your sunk costs are minimal and if a workload or project you are supporting becomes irrelevant, you can quickly spin down your cloud data warehouses and not be “stuck” with unused infrastructure. Cloud deployments for suitable workloads gives you the agility to keep pace with rapidly changing business and data needs.

Cloud

Cloud Data Warehouse Banking Data

Put Your Whole Data Team On The Same Page With Atlan

Data Engineering Podcast

APRIL 5, 2021

Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days. Once you sign up and create an alert in Datafold for your company data, they will send you a cool water flask.

Data Warehouse

Data Warehouse Data Pipeline BI Metadata

The Grand Vision And Present Reality of DataOps

Data Engineering Podcast

MAY 3, 2021

They explain how to think about your data systems in a holistic and maintainable fashion, the security challenges that threaten to derail your efforts, and the power of using metadata as the foundation of everything that you do. Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code.

Data Warehouse

Data Warehouse Data Pipeline BI Metadata

Ready-to-go sample data pipelines with Dataflow

Data News — Week 24.11

Webinars

Trending Sources

How Meta understands data at scale

Webinars

Eliminate Friction In Your Data Platform Through Unified Metadata Using OpenMetadata

Next Stop – Building a Data Pipeline from Edge to Insight

Keeping Your Data Warehouse In Order With DataForm

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

Optimizing data warehouse storage

Data Pipeline Observability: A Model For Data Engineers

Data Engineering Weekly #198

Data logs: The latest evolution in Meta’s access tools

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI

Toward a Data Mesh (part 2) : Architecture & Technologies

A Look At The Data Systems Behind The Gameplay For League Of Legends

Collecting And Retaining Contextual Metadata For Powerful And Effective Data Discovery

Making The Total Cost Of Ownership For External Data Manageable With Crux

Combining The Simplicity Of Spreadsheets With The Power Of Modern Data Infrastructure At Canvas

Bring Geospatial Analytics Across Disparate Datasets Into Your Toolkit With The Unfolded Platform

Building A System Of Record For Your Organization's Data Ecosystem At Metaphor

How Column-Aware Development Tooling Yields Better Data Models

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Unlocking The Power of Data Lineage In Your Platform with OpenLineage

Supporting And Expanding The Arrow Ecosystem For Fast And Efficient Data Processing At Voltron Data

The View From The Lakehouse Of Architectural Patterns For Your Data Platform

Modern Data Engineering

Business Intelligence In The Palm Of Your Hand With Zing Data

Solving Data Lineage Tracking And Data Discovery At WeWork

An Exploration Of Tobias' Experience In Building A Data Lakehouse From Scratch

An Engineering Guide to Data Quality - A Data Contract Perspective - Part 2

How to learn data engineering

Real World Change Data Capture At Datacoral

Charting the Path of Riskified's Data Platform Journey

Altus Data Warehouse

The Hidden Threats in Your Data Warehouse Layers (And How to Fix Them)

Making The Open Data Lakehouse Affordable Without The Overhead At Iomete

What "Data Lineage Done Right" Looks Like And How They're Doing It At Manta

Stick All Of Your Systems And Data Together With SaaSGlue As Your Workflow Manager

Understanding The Immune System With Data At ImmunAI

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

How to Simplify Data Pipelines with DBT and Airflow?

Extreme data center pressure? Burst to the cloud with CDP!

Put Your Whole Data Team On The Same Page With Atlan

The Grand Vision And Present Reality of DataOps

Stay Connected