Data Pipeline, Engineering and Metadata

Data Engineering Best Practices - #2. Metadata & Logging

Start Data Engineering

FEBRUARY 22, 2024

Data Pipeline Logging Best Practices 3.1. Metadata: Information about pipeline runs, & data flowing through your pipeline 3.2. Introduction 2. Setup & Logging architecture 3. Obtain visibility into the code’s execution sequence using text logs 3.3. Monitoring UI & Traceability 3.5.

Metadata

Metadata Data Engineering Data Engineer Engineering

Ready-to-go sample data pipelines with Dataflow

Netflix Tech

DECEMBER 3, 2022

by Jasmine Omeke , Obi-Ike Nwoke , Olek Gorajek Intro This post is for all data practitioners, who are interested in learning about bootstrapping, standardization and automation of batch data pipelines at Netflix. You may remember Dataflow from the post we wrote last year titled Data pipeline asset management with Dataflow.

Data Pipeline

Data Pipeline Scala Metadata Food

Level Up Your Data Platform With Active Metadata

Data Engineering Podcast

JUNE 19, 2022

Summary Metadata is the lifeblood of your data platform, providing information about what is happening in your systems. In order to level up their value a new trend of active metadata is being implemented, allowing use cases like keeping BI reports up to date, auto-scaling your warehouses, and automated data governance.

Metadata

Metadata MongoDB MySQL Scala

Webinars

Apache Airflow®: The Ultimate Guide to DAG Writing

MORE WEBINARS

Data News — Week 24.11

Christophe Blefari

MARCH 15, 2024

Cognition AI introduced Devin — Devin is the first AI software engineer, Devin can, unassisted, do software engineering tasks like fixing Github issues (13% of success, previously best was ~5%), apply to jobs on Upwork, train and fine-tune its own models. Arrow doing a lot of the data operation heavy lifting.

Metadata

Metadata Datasets Data Data Warehouse

Data Pipeline Observability: A Model For Data Engineers

Databand.ai

JUNE 28, 2023

Data Pipeline Observability: A Model For Data Engineers Eitan Chazbani June 29, 2023 Data pipeline observability is your ability to monitor and understand the state of a data pipeline at any time. We believe the world’s data pipelines need better data observability.

Data Pipeline

Data Pipeline Data Engineering Data Engineer Engineering

Declarative Data Pipelines with Hoptimator

LinkedIn Engineering

JUNE 26, 2023

However, we've found that this vertical self-service model doesn't work particularly well for data pipelines, which involve wiring together many different systems into end-to-end data flows. Data pipelines power foundational parts of LinkedIn's infrastructure, including replication between data centers.

Data Pipeline

Data Pipeline Kafka SQL MySQL

Eliminate Friction In Your Data Platform Through Unified Metadata Using OpenMetadata

Data Engineering Podcast

NOVEMBER 10, 2021

Summary A significant source of friction and wasted effort in building and integrating data management systems is the fragmentation of metadata across various tools. Data engineers struggling with unreliable data need look no further than Monte Carlo, the world’s first end-to-end, fully automated Data Observability Platform!

Metadata

Metadata Data Warehouse Data Lake BI

Next Stop – Building a Data Pipeline from Edge to Insight

Cloudera

FEBRUARY 8, 2021

Below is the entire set of steps in the data lifecycle, and each step in the lifecycle will be supported by a dedicated blog post(see Fig. 1): Data Collection – data ingestion and monitoring at the edge (whether the edge be industrial sensors or people in a vehicle showroom). Conclusion.

Data Pipeline

Data Pipeline Building Manufacturing Data Warehouse

How to learn data engineering

Christophe Blefari

JANUARY 20, 2024

Learn data engineering, all the references ( credits ) This is a special edition of the Data News. But right now I'm in holidays finishing a hiking week in Corsica 🥾 So I wrote this special edition about: how to learn data engineering in 2024. Who are the data engineers?

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Data Engineering Weekly #177

Data Engineering Weekly

JUNE 24, 2024

Experience Enterprise-Grade Apache Airflow Astro augments Airflow with enterprise-grade features to enhance productivity, meet scalability and availability demands across your data pipelines, and more. Bessemer publishes the Data + AI infrastructure market map to help companies understand the landscape and the key players.

Data Engineering

Data Engineering Data Engineer Engineering Google Cloud

Our First Netflix Data Engineering Summit

Netflix Tech

DECEMBER 14, 2023

Engineers from across the company came together to share best practices on everything from Data Processing Patterns to Building Reliable Data Pipelines. The result was a series of talks which we are now sharing with the rest of the Data Engineering community!

Data Engineering

Data Engineering Data Engineer Engineering Metadata

1. Streamlining Membership Data Engineering at Netflix with Psyberg

Netflix Tech

NOVEMBER 14, 2023

By Abhinaya Shetty , Bharath Mummadisetty At Netflix, our Membership and Finance Data Engineering team harnesses diverse data related to plans, pricing, membership life cycle, and revenue to fuel analytics, power various dashboards, and make data-informed decisions.

Data Engineering

Data Engineering Data Engineer Engineering Metadata

Why Column-Aware Metadata Is Key to Automating Data Transformations

Snowflake

JANUARY 25, 2023

Over the multiple decades I’ve spent in the data industry, one observation has remained nearly constant: the majority of the work in building a data analytics platform revolves around data transformations (what we used to call “the T in ETL or ELT”). We cannot scale our expertise as fast as we can scale the Data Cloud.

Metadata

Metadata Data Pipeline Government Data

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

Data Engineering Podcast

OCTOBER 15, 2021

Summary The binding element of all data work is the metadata graph that is generated by all of the workflows that produce the assets used by teams across the organization. The DataHub project was created as a way to bring order to the scale of LinkedIn’s data needs. How is the governance of DataHub being managed?

Metadata

Metadata BI Data Warehouse Government

Data Engineering Weekly #176

Data Engineering Weekly

JUNE 16, 2024

Experience Enterprise-Grade Apache Airflow Astro augments Airflow with enterprise-grade features to enhance productivity, meet scalability and availability demands across your data pipelines, and more. link] Picnic: Open-sourcing dbt-score: lint model metadata with ease!

Data Engineering

Data Engineering Data Engineer Engineering Kafka

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Powered by Trino, the query engine Apache Iceberg was designed for, Starburst is an open platform with support for all table formats including Apache Iceberg, Hive, and Delta Lake.

Data Lake

Data Lake High Quality Data Metadata Machine Learning

Cloud Native Data Orchestration For Machine Learning And Data Engineering With Flyte

Data Engineering Podcast

MAY 22, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.

Machine Learning

Machine Learning Data Engineering Data Engineer Cloud

A Look At The Data Systems Behind The Gameplay For League Of Legends

Data Engineering Podcast

NOVEMBER 20, 2022

Summary The majority of blog posts and presentations about data engineering and analytics assume that the consumers of those efforts are internal business users accessing an environment controlled by the business. Atlan is the metadata hub for your data ecosystem.

Systems

Systems Metadata Data Pipeline MongoDB

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Data Engineering Podcast

DECEMBER 18, 2022

In this episode Abe Gong brings his experiences with the Great Expectations project and community to discuss the technical and organizational considerations involved in implementing these constraints to your data workflows. Atlan is the metadata hub for your data ecosystem. Struggling with broken pipelines? Missing data?

Metadata

Metadata Business Intelligence Data Lake BI

Expanding The Reach of Business Intelligence Through Ubiquitous Embedded Analytics With Sisense

Data Engineering Podcast

OCTOBER 30, 2022

In this episode Amir Orad discusses the Sisense platform and how it facilitates the embedding of analytics and data insights in every aspect of organizational and end-user experiences. Atlan is the metadata hub for your data ecosystem. And don’t forget to thank them for their continued support of this show!

Business Intelligence

Business Intelligence Metadata MongoDB MySQL

Build Data Products Without A Data Team Using AgileData

Data Engineering Podcast

NOVEMBER 13, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.

Building

Building Metadata MongoDB MySQL

Effective Pandas Patterns For Data Engineering

Data Engineering Podcast

JANUARY 30, 2022

Summary Pandas is a powerful tool for cleaning, transforming, manipulating, or enriching data, among many other potential uses. As a result it has become a standard tool for data engineers for a wide range of applications. You can observe your pipelines with built in metadata search and column level lineage.

Data Engineering

Data Engineering Data Engineer Engineering Python

Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI

Data Engineering Podcast

DECEMBER 29, 2022

In this episode she shares the strategic and tactical elements of how to make more effective use of the technical and organizational resources that are available to you for getting work done with data. Atlan is the metadata hub for your data ecosystem. Struggling with broken pipelines? Missing data?

Management

Management Metadata Business Intelligence Data Lake

An Engineering Guide to Data Quality - A Data Contract Perspective - Part 2

Data Engineering Weekly

MAY 16, 2023

I won’t bore you with the importance of data quality in the blog. Instead, Let’s examine the current data pipeline architecture and ask why data quality is expensive. Instead of looking at the implementation of the data quality frameworks, Let's examine the architectural patterns of the data pipeline.

Engineering

Engineering Kafka Data Pipeline Data Warehouse

Collecting And Retaining Contextual Metadata For Powerful And Effective Data Discovery

Data Engineering Podcast

AUGUST 13, 2022

In this episode Shinji Kim discusses the challenges of data discovery and how to collect and preserve additional context about each piece of information so that you can find what you need when you don’t even know what you’re looking for yet. Data stacks are becoming more and more complex.

Metadata

Metadata MongoDB MySQL Scala

An Introduction To Data And Analytics Engineering For Non-Programmers

Data Engineering Podcast

JANUARY 15, 2022

In this episode Brian McMillan shares his work on the book "Building Data Products" and how he is working to educate business users and data professionals about the combination of technical, economical, and business considerations that need to be blended for these projects to succeed.

Engineering

Engineering Electronics Data Pipeline ETL Tools

Build Better Data Products By Creating Data, Not Consuming It

Data Engineering Podcast

NOVEMBER 6, 2022

Summary A lot of the work that goes into data engineering is trying to make sense of the "data exhaust" from other applications and services. Atlan is the metadata hub for your data ecosystem. Data engineers don’t enjoy writing, maintaining, and modifying ETL pipelines all day, every day.

Building

Building IT Metadata MongoDB

Data Engineering Weekly #198

Data Engineering Weekly

NOVEMBER 24, 2024

Editor’s Note: Launching Data & Gen-AI courses in 2025 I can’t believe DEW will reach almost its 200th edition soon. What I started as a fun hobby has become one of the top-rated newsletters in the data engineering industry.

Data Engineering

Data Engineering Data Engineer Engineering Insurance

Making The Total Cost Of Ownership For External Data Manageable With Crux

Data Engineering Podcast

JULY 17, 2022

In this episode Crux CTO Mark Etherington discusses the different costs involved in managing external data, how to think about the total return on investment for your data, and how the Crux platform is architected to reduce the toil involved in managing third party data. Atlan is the metadata hub for your data ecosystem.

Data Management

Data Management Management Metadata MongoDB

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

Data Engineering Podcast

AUGUST 28, 2022

Summary The dream of every engineer is to automate all of their tasks. For data engineers, this is a monumental undertaking. Orchestration engines are one step in that direction, but they are not a complete solution. Atlan is the metadata hub for your data ecosystem.

Data Engineering

Data Engineering Data Engineer MongoDB Metadata

How To Bring Agile Practices To Your Data Projects

Data Engineering Podcast

OCTOBER 23, 2022

In this episode Shane Gibson shares practical advice and insights from his years of experience as a consultant and engineer working in data about how to adopt agile principles in your data work so that you can move faster and provide more value to the business, while building systems that are maintainable and adaptable.

Project

Project Metadata MongoDB MySQL

The Data Discovery Team

Jesse Anderson

NOVEMBER 14, 2023

A Guest Post by Ole Olesen-Bagneux In this blog post I would like to describe a new data team, that I call ‘the data discovery team’. 1) The data discovery team must work on discovering the IT landscape. That is done via a careful examination of all metadata repositories describing data sources.

Metadata

Metadata Data Science Big Data Data

Metadata: What Is It and Why it Matters

Ascend.io

JULY 11, 2024

Metadata is the information that provides context and meaning to data, ensuring it’s easily discoverable, organized, and actionable. It enhances data quality, governance, and automation, transforming raw data into valuable insights. This is what managing data without metadata feels like. Chaos, right?

Metadata

Metadata IT Government High Quality Data

Data Engineering Weekly #179

Data Engineering Weekly

JULY 7, 2024

Experience Enterprise-Grade Apache Airflow Astro augments Airflow with enterprise-grade features to enhance productivity, meet scalability and availability demands across your data pipelines, and more. The author highlights Paimon’s consistency model by examining the metadata model.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

Combining The Simplicity Of Spreadsheets With The Power Of Modern Data Infrastructure At Canvas

Data Engineering Podcast

JUNE 19, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.

Metadata

Metadata Unstructured Data MongoDB MySQL

Bring Geospatial Analytics Across Disparate Datasets Into Your Toolkit With The Unfolded Platform

Data Engineering Podcast

JUNE 26, 2022

In this episode Isaac Brodsky explains how the Unfolded platform is architected, their experience joining the team at Foursquare, and how you can start using it for analyzing your spatial data today. Atlan is the metadata hub for your data ecosystem. And don’t forget to thank them for their continued support of this show!

Datasets

Datasets Unstructured Data Metadata MongoDB

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. We want interoperability for any data stored versus we have to think how to store the data in a specific node to optimize the processing. We want to have our hands free and be totally devoted to devops principles.

Technology

Technology Architecture Google Cloud Metadata

Data Engineering Weekly #186

Data Engineering Weekly

AUGUST 25, 2024

Take Astro (the fully managed Airflow solution) for a test drive today and unlock a suite of features designed to simplify, optimize, and scale your data pipelines. Try For Free → Conference Alert: Data Engineering for AI/ML This is a virtual conference at the intersection of Data and AI.

Data Engineering

Data Engineering Data Engineer Engineering Database-centric

What Are Intelligent Data Pipelines?

Ascend.io

APRIL 26, 2023

Data teams worldwide are building data pipelines with the point solutions that make up the “modern data stack.” Pipelines built with this approach are slow and require constant manual reprogramming and updates. For a deep dive into this level of automation, take a look at our What Is Data Pipeline Automation paper.

Data Pipeline

Data Pipeline Datasets Metadata Data

Snowflake Invests in Metaplane for Deep, End-to-End Observability in the Data Cloud

Snowflake

MAY 15, 2024

We’re excited to announce today that Snowflake has invested in Metaplane , a leading end-to-end data observability platform that helps data teams improve the quality and performance of their data. Metaplane ensures that every company can trust the data that powers their business.

Cloud

Cloud Metadata Data Pipeline Government

Data Scientist vs Data Engineer: Differences and Why You Need Both

AltexSoft

OCTOBER 30, 2021

Was Nikola Tesla a scientist or engineer? These men didn’t stop at scientific research and ended up conceptualizing or engineering their inventions. Engineers are not only the ones bearing helmets and operating on construction sites. Data science vs data engineering. How about Edison? Or Da Vinci?

Data Engineering

Data Engineering Data Engineer Engineering Machine Learning

Business Intelligence In The Palm Of Your Hand With Zing Data

Data Engineering Podcast

DECEMBER 4, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. Missing data?

Business Intelligence

Business Intelligence Metadata BI MongoDB

3. Psyberg: Automated end to end catch up

Netflix Tech

NOVEMBER 14, 2023

Now, let’s explore the state of our pipelines after incorporating Psyberg. Pipelines After Psyberg Let’s explore how different modes of Psyberg could help with a multistep data pipeline. The session metadata table can then be read to determine the pipeline input.

Metadata

Metadata Data Pipeline Scala Data Workflow

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team. Developing event-driven pipelines is going to be a lot easier - Meet Functions!

Architecture

Architecture Data Lake High Quality Data SQL

Data Engineering Best Practices - #2. Metadata & Logging

Ready-to-go sample data pipelines with Dataflow

Level Up Your Data Platform With Active Metadata

Webinars

Data News — Week 24.11

Data Pipeline Observability: A Model For Data Engineers

Declarative Data Pipelines with Hoptimator

Eliminate Friction In Your Data Platform Through Unified Metadata Using OpenMetadata

Next Stop – Building a Data Pipeline from Edge to Insight

How to learn data engineering

Data Engineering Weekly #177

Our First Netflix Data Engineering Summit

1. Streamlining Membership Data Engineering at Netflix with Psyberg

Why Column-Aware Metadata Is Key to Automating Data Transformations

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

Data Engineering Weekly #176

Being Data Driven At Stripe With Trino And Iceberg

Cloud Native Data Orchestration For Machine Learning And Data Engineering With Flyte

A Look At The Data Systems Behind The Gameplay For League Of Legends

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Expanding The Reach of Business Intelligence Through Ubiquitous Embedded Analytics With Sisense

Build Data Products Without A Data Team Using AgileData

Effective Pandas Patterns For Data Engineering

Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI

An Engineering Guide to Data Quality - A Data Contract Perspective - Part 2

Collecting And Retaining Contextual Metadata For Powerful And Effective Data Discovery

An Introduction To Data And Analytics Engineering For Non-Programmers

Build Better Data Products By Creating Data, Not Consuming It

Data Engineering Weekly #198

Making The Total Cost Of Ownership For External Data Manageable With Crux

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

How To Bring Agile Practices To Your Data Projects

The Data Discovery Team

Metadata: What Is It and Why it Matters

Data Engineering Weekly #179

Combining The Simplicity Of Spreadsheets With The Power Of Modern Data Infrastructure At Canvas

Bring Geospatial Analytics Across Disparate Datasets Into Your Toolkit With The Unfolded Platform

Toward a Data Mesh (part 2) : Architecture & Technologies

Data Engineering Weekly #186

What Are Intelligent Data Pipelines?

Snowflake Invests in Metaplane for Deep, End-to-End Observability in the Data Cloud

Data Scientist vs Data Engineer: Differences and Why You Need Both

Business Intelligence In The Palm Of Your Hand With Zing Data

3. Psyberg: Automated end to end catch up

Addressing The Challenges Of Component Integration In Data Platform Architectures

Stay Connected