Data, Data Pipeline and Metadata - Data Engineering Digest

Build Better Data Pipelines with SQL and Python in Snowflake

Snowflake

JUNE 10, 2025

Data transformations are the engine room of modern data operations — powering innovations in AI, analytics and applications. As the core building blocks of any effective data strategy, these transformations are crucial for constructing robust and scalable data pipelines.

Data Pipeline

Data Pipeline SQL Python Building

10 AWS Redshift Project Ideas to Build Data Pipelines

ProjectPro

JUNE 6, 2025

Today, businesses use traditional data warehouses to centralize massive amounts of raw data from business operations. Since data needs to be accessible easily, organizations use Amazon Redshift as it offers seamless integration with business intelligence tools and helps you train and deploy machine learning models using SQL commands.

Data Pipeline

Data Pipeline AWS Project Building

10+ Top Data Pipeline Tools to Streamline Your Data Journey

ProjectPro

JUNE 6, 2025

Today, data engineers are constantly dealing with a flood of information and the challenge of turning it into something useful. The journey from raw data to meaningful insights is no walk in the park. It requires a skillful blend of data engineering expertise and the strategic use of tools designed to streamline this process.

Data Pipeline

Data Pipeline Google Cloud AWS Kafka

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Ready-to-go sample data pipelines with Dataflow

Netflix Tech

DECEMBER 3, 2022

by Jasmine Omeke , Obi-Ike Nwoke , Olek Gorajek Intro This post is for all data practitioners, who are interested in learning about bootstrapping, standardization and automation of batch data pipelines at Netflix. You may remember Dataflow from the post we wrote last year titled Data pipeline asset management with Dataflow.

Data Pipeline

Data Pipeline Scala Metadata Food

Data Engineering Best Practices - #2. Metadata & Logging

Start Data Engineering

FEBRUARY 22, 2024

Data Pipeline Logging Best Practices 3.1. Metadata: Information about pipeline runs, & data flowing through your pipeline 3.2. Introduction 2. Setup & Logging architecture 3. Obtain visibility into the code’s execution sequence using text logs 3.3. Monitoring UI & Traceability 3.5.

Metadata

Metadata Data Engineering Data Engineer Engineering

Level Up Your Data Platform With Active Metadata

Data Engineering Podcast

JUNE 19, 2022

Summary Metadata is the lifeblood of your data platform, providing information about what is happening in your systems. In order to level up their value a new trend of active metadata is being implemented, allowing use cases like keeping BI reports up to date, auto-scaling your warehouses, and automated data governance.

Metadata

Metadata MongoDB Scala MySQL

Apache Airflow for Beginners - Build Your First Data Pipeline

ProjectPro

JUNE 6, 2025

Over the years, individuals and businesses have continuously become data-driven. The urge to implement data-driven insights into business processes has consequently increased the data volumes involved. Open source tools like Apache Airflow have been developed to cope with the challenges of handling voluminous data.

Data Pipeline

Data Pipeline Building Data Lake Raw Data

Data News — Week 24.11

Christophe Blefari

MARCH 15, 2024

Saying mainly that " Sora is a tool to extend creativity " Last point Mira has been mocked and criticised online because as a CTO she wasn't able to say on which public / licensed data Sora has been trained on. This is related to Paris testing automated video surveillance during Olympics. This is Croissant.

Metadata

Metadata Software Engineering Data Warehouse Software Engineer

How To Build A Batch Data Pipeline?

ProjectPro

JUNE 6, 2025

Building a batch pipeline is essential for processing large volumes of data efficiently and reliably. Are you ready to step into the heart of big data projects and take control of data like a pro? Are you ready to step into the heart of big data projects and take control of data like a pro?

Data Pipeline

Data Pipeline Building Retail Data Ingestion

Build your data pipelines like the Toyota Way

François Nguyen

FEBRUARY 28, 2021

Today, we are going to apply these principles to the data pipelines. The idea is to transpose these 7 principles to data pipeline knowing that Data pipelines are 100% flexible : if you have the skills, you build the pipeline you want. How does a bad data pipeline process look like ?

Data Pipeline

Data Pipeline Building Manufacturing BI

Declarative Data Pipelines with Hoptimator

LinkedIn Engineering

JUNE 26, 2023

However, we've found that this vertical self-service model doesn't work particularly well for data pipelines, which involve wiring together many different systems into end-to-end data flows. Data pipelines power foundational parts of LinkedIn's infrastructure, including replication between data centers.

Data Pipeline

Data Pipeline Kafka MySQL SQL

Why Column-Aware Metadata Is Key to Automating Data Transformations

Snowflake

JANUARY 25, 2023

Data, data, data. It does seem we are not only surrounded by talk about data, but by the actual data itself. We are collecting data from every nook and cranny of the universe (literally!). We cannot scale our expertise as fast as we can scale the Data Cloud. This must change. The solution?

Metadata

Metadata Data Pipeline Government Data

How Meta understands data at scale

Engineering at Meta

APRIL 28, 2025

Managing and understanding large-scale data ecosystems is a significant challenge for many organizations, requiring innovative solutions to efficiently safeguard user data. To address these challenges, we made substantial investments in advanced data understanding technologies, as part of our Privacy Aware Infrastructure (PAI).

Metadata

Metadata Data Utilities Data Warehouse

Databricks Delta Lake: A Scalable Data Lake Solution

ProjectPro

JUNE 6, 2025

Want to process peta-byte scale data with real-time streaming ingestions rates, build 10 times faster data pipelines with 99.999% reliability, witness 20 x improvement in query performance compared to traditional data lakes, enter the world of Databricks Delta Lake now. Delta Lake is a game-changer for big data.

Data Lake

Data Lake Data Warehouse Metadata Unstructured Data

Eliminate Friction In Your Data Platform Through Unified Metadata Using OpenMetadata

Data Engineering Podcast

NOVEMBER 10, 2021

Summary A significant source of friction and wasted effort in building and integrating data management systems is the fragmentation of metadata across various tools. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform.

Metadata

Metadata Data Warehouse Data Lake BI

Next Stop – Building a Data Pipeline from Edge to Insight

Cloudera

FEBRUARY 8, 2021

You can read part 1, here: Digital Transformation is a Data Journey From Edge to Insight. The first blog introduced a mock connected vehicle manufacturing company, The Electric Car Company (ECC), to illustrate the manufacturing data path through the data lifecycle. 1 The enterprise data lifecycle.

Data Pipeline

Data Pipeline Manufacturing Building Data Warehouse

Data Pipeline Observability: A Model For Data Engineers

Databand.ai

JUNE 28, 2023

Data Pipeline Observability: A Model For Data Engineers Eitan Chazbani June 29, 2023 Data pipeline observability is your ability to monitor and understand the state of a data pipeline at any time. We believe the world’s data pipelines need better data observability.

Data Pipeline

Data Pipeline Data Engineer Data Engineering Engineering

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

Data Engineering Podcast

OCTOBER 15, 2021

Summary The binding element of all data work is the metadata graph that is generated by all of the workflows that produce the assets used by teams across the organization. The DataHub project was created as a way to bring order to the scale of LinkedIn’s data needs. No more scripts, just SQL.

Metadata

Metadata BI Data Warehouse Government

50+ Azure Data Factory Interview Questions and Answers [2025]

ProjectPro

JUNE 6, 2025

Discover 50+ Azure Data Factory interview questions and answers for all experience levels. A report by ResearchAndMarkets projects the global data integration market size to grow from USD 12.24 A report by ResearchAndMarkets projects the global data integration market size to grow from USD 12.24 billion in 2020 to USD 24.84

Data Lake

Data Lake Metadata SQL Datasets

Data Engineering Weekly #198

Data Engineering Weekly

NOVEMBER 24, 2024

Editor’s Note: Launching Data & Gen-AI courses in 2025 I can’t believe DEW will reach almost its 200th edition soon. What I started as a fun hobby has become one of the top-rated newsletters in the data engineering industry. We are planning many exciting product lines to trial and launch in 2025.

Data Engineer

Data Engineer Data Engineering Engineering Insurance

How To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

As we approach 2025, data teams find themselves at a pivotal juncture. The rapid evolution of technology and the increasing demand for data-driven insights have placed immense pressure on these teams. The future of data teams depends on their ability to adapt to new challenges and seize emerging opportunities.

Data Pipeline

Data Pipeline Metadata Data Workflow Data

The Best Data Dictionary Tools in 2025

Monte Carlo

APRIL 28, 2025

Different teams love using the same data in totally different ways. Thats where data dictionary tools come in. A data dictionary tool helps define and organize your data so everyones speaking the same language. A data dictionary tool helps define and organize your data so everyones speaking the same language.

Metadata

Metadata Hadoop Data SQL

Top 10 AWS Services for Data Engineering Projects

ProjectPro

JUNE 6, 2025

Data engineering is the foundation for data science and analytics by integrating in-depth knowledge of data technology, reliable data governance and security, and a solid grasp of data processing. Data engineers need to meet various requirements to build data pipelines.

AWS

AWS Data Engineer Data Engineering Project

The Data Turf Wars are Over, But the Metadata Turf Wars Have Just Begun

Cloudera

AUGUST 6, 2024

Over the past several years, data leaders asked many questions about where they should keep their data and what architecture they should implement to serve an incredible breadth of analytic use cases. The future for most data teams will be multi-cloud and hybrid. It no longer matters where the data is.

Metadata

Metadata Government Datasets Architecture

How To Use Airbyte, dbt-teradata, Dagster, and Teradata Vantage™ for Seamless Data Integration

Teradata

MAY 30, 2025

Register now Home Insights Data platform Article How To Use Airbyte, dbt-teradata, Dagster, and Teradata Vantage™ for Seamless Data Integration Build and orchestrate a data pipeline in Teradata Vantage using Airbyte, Dagster, and dbt. In this case, we select Sample Data (Faker). dbt-core dagster==1.7.9

Data Integration

Data Integration Raw Data Metadata Data Pipeline

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

Summary Stripe is a company that relies on data to power their products and business. In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform.

Data Lake

Data Lake High Quality Data Metadata Machine Learning

How to Build an End to End Machine Learning Pipeline?

ProjectPro

JUNE 6, 2025

What is a Machine Learning Pipeline? A machine learning pipeline helps automate machine learning workflows by processing and integrating data sets into a model, which can then be evaluated and delivered. A well-built pipeline helps in the flexibility of the model implementation.

Machine Learning

Machine Learning Building Amazon Web Services Deep Learning

Data logs: The latest evolution in Meta’s access tools

Engineering at Meta

FEBRUARY 4, 2025

Were sharing how Meta built support for data logs, which provide people with additional data about how they use our products. Here we explore initial system designs we considered, an overview of the current architecture, and some important principles Meta takes into account in making data accessible and easy to understand.

Accessible

Accessible Accessibility Raw Data Data Warehouse

Empower Your Cyber Defenders with Real-Time Analytics Author: Carolyn Duby, Field CTO

Cloudera

NOVEMBER 15, 2024

In fact, according to the Identity Theft Resource Center (ITRC) Annual Data Breach Report , there were 2,365 cyber attacks in 2023 with more than 300 million victims, and a 72% increase in data breaches since 2021. However, there is a fundamental challenge standing in the way of being successful: data.

Metadata

Metadata Unstructured Data Data Lake Government

3 Must Know AWS ETL Tools for Data Engineers

ProjectPro

JUNE 6, 2025

Due to emerging cloud technologies, many companies are increasingly migrating their data using ETL workflows. Most of them use old, inflexible, and vulnerable RDBMS or other types of data storage. ETL processes are useful for moving many data sources to a single data warehousing location.

ETL Tools

ETL Tools AWS Data Engineer Data Engineering

Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue)

Towards Data Science

APRIL 6, 2023

Today’s post follows the same philosophy: fitting local and cloud pieces together to build a data pipeline. And, when it comes to data engineering solutions, it’s no different: They have databases, ETL tools, streaming platforms, and so on — a set of tools that makes our life easier (as long as you pay for them). not sponsored.

Data Pipeline

Data Pipeline AWS Amazon Web Services Python

Build Data Products Without A Data Team Using AgileData

Data Engineering Podcast

NOVEMBER 13, 2022

Summary Building data products is an undertaking that has historically required substantial investments of time and talent. With the rise in cloud platforms and self-serve data technologies the barrier of entry is dropping. Atlan is the metadata hub for your data ecosystem.

Building

Building Metadata MongoDB MySQL

Data Engineering Weekly #222

Data Engineering Weekly

JUNE 1, 2025

Join Dagster and Neurospace to learn: - How to build AI pipelines with orchestration baked in - How to track data lineage for audits and traceability - Tips for designing compliant workflows under the EU AI Act Register for the technical session DuckDB: DuckLake - SQL as a Lakehouse Format DuckDB announced a new open table format, DuckLake.

Data Engineer

Data Engineer Data Engineering Engineering Relational Database

6 Ways To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

As we approach 2025, data teams find themselves at a pivotal juncture. The rapid evolution of technology and the increasing demand for data-driven insights have placed immense pressure on these teams. The future of data teams depends on their ability to adapt to new challenges and seize emerging opportunities.

Data Pipeline

Data Pipeline Metadata Data Workflow Data

The Rise of Streaming Data Architectures: What You Need to Know

Precisely

JANUARY 6, 2025

Key takeaways : Real-time data is critical for exceptional customer experiences. Customers expect immediate responses and personalized interactions, and streaming data architectures help you meet these expectations. Real-time data unlocks actionable insights and competitive advantage. How do you meet these high expectations?

Data Architecture

Data Architecture Architecture Pipeline-centric Banking

20 Best Open Source Big Data Projects to Contribute on GitHub

ProjectPro

JUNE 6, 2025

The adaptability and technical superiority of such open-source big data projects make them stand out for community use. As per the surveyors, Big data (35 percent), Cloud computing (39 percent), operating systems (33 percent), and the Internet of Things (31 percent) are all expected to be impacted by open source shortly.

Big Data

Big Data Project Metadata Programming Language

Build Better Data Products By Creating Data, Not Consuming It

Data Engineering Podcast

NOVEMBER 6, 2022

Summary A lot of the work that goes into data engineering is trying to make sense of the "data exhaust" from other applications and services. Atlan is the metadata hub for your data ecosystem. Data engineers don’t enjoy writing, maintaining, and modifying ETL pipelines all day, every day.

IT

IT Building Metadata MongoDB

The Data Discovery Team

Jesse Anderson

NOVEMBER 14, 2023

A Guest Post by Ole Olesen-Bagneux In this blog post I would like to describe a new data team, that I call ‘the data discovery team’. Data discovery is thought of in different ways in data science and in information science respectfully. In an enterprise data reality, searching for data is a bit of a hassle.

Metadata

Metadata Data Science Big Data Data

A Complete Guide to Scale Your Data Pipelines and Data Products with Contract Testing and Dbt

Towards Data Science

OCTOBER 25, 2023

Not too long ago, almost all data architectures and data team structures followed a centralized approach. As a data or analytics engineer, you knew where to find all the transformation logic and models because they were all in the same codebase. There was only one data team, two at most.

Data Pipeline

Data Pipeline SQL Data Architecture Data

The Ultimate 101 Guide to Apache Airflow DAGS

ProjectPro

JUNE 6, 2025

Looking for an efficient tool for streamlining and automating your data processing workflows? Let's consider an example of a data processing pipeline that involves ingesting data from various sources, cleaning it, and then performing analysis. Apache Airflow DAGs are your one-stop solution!

Data Pipeline

Data Pipeline PostgreSQL Python Database

A Look At The Data Systems Behind The Gameplay For League Of Legends

Data Engineering Podcast

NOVEMBER 20, 2022

Summary The majority of blog posts and presentations about data engineering and analytics assume that the consumers of those efforts are internal business users accessing an environment controlled by the business. Atlan is the metadata hub for your data ecosystem.

Systems

Systems Metadata MongoDB Data Pipeline

Collecting And Retaining Contextual Metadata For Powerful And Effective Data Discovery

Data Engineering Podcast

AUGUST 13, 2022

Summary Data is useless if it isn’t being used, and you can’t use it if you don’t know where it is. Data catalogs were the first solution to this problem, but they are only helpful if you know what you are looking for. Data stacks are becoming more and more complex. Sifflet also offers a 2-week free trial.

Metadata

Metadata MongoDB Scala MySQL

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Cloudera

MARCH 14, 2023

We just announced the general availability of Cloudera DataFlow Designer , bringing self-service data flow development to all CDP Public Cloud customers. In this blog post we will put these capabilities in context and dive deeper into how the built-in, end-to-end data flow life cycle enables self-service data pipeline development.

Data Pipeline

Data Pipeline Designing Kafka Metadata

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Data Engineering Podcast

DECEMBER 18, 2022

Summary One of the reasons that data work is so challenging is because no single person or team owns the entire process. This introduces friction in the process of collecting, processing, and using data. In order to reduce the potential for broken pipelines some teams have started to adopt the idea of data contracts.

Metadata

Metadata Data Lake Business Intelligence MongoDB

Build Better Data Pipelines with SQL and Python in Snowflake

10 AWS Redshift Project Ideas to Build Data Pipelines

Webinars

Trending Sources

10+ Top Data Pipeline Tools to Streamline Your Data Journey

Webinars

Ready-to-go sample data pipelines with Dataflow

Data Engineering Best Practices - #2. Metadata & Logging

Level Up Your Data Platform With Active Metadata

Apache Airflow for Beginners - Build Your First Data Pipeline

Data News — Week 24.11

How To Build A Batch Data Pipeline?

Build your data pipelines like the Toyota Way

Declarative Data Pipelines with Hoptimator

Why Column-Aware Metadata Is Key to Automating Data Transformations

How Meta understands data at scale

Databricks Delta Lake: A Scalable Data Lake Solution

Eliminate Friction In Your Data Platform Through Unified Metadata Using OpenMetadata

Next Stop – Building a Data Pipeline from Edge to Insight

Data Pipeline Observability: A Model For Data Engineers

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

50+ Azure Data Factory Interview Questions and Answers [2025]

Data Engineering Weekly #198

How To Prepare Your Data Team for 2025

The Best Data Dictionary Tools in 2025

Top 10 AWS Services for Data Engineering Projects

The Data Turf Wars are Over, But the Metadata Turf Wars Have Just Begun

How To Use Airbyte, dbt-teradata, Dagster, and Teradata Vantage™ for Seamless Data Integration

Being Data Driven At Stripe With Trino And Iceberg

How to Build an End to End Machine Learning Pipeline?

Data logs: The latest evolution in Meta’s access tools

Empower Your Cyber Defenders with Real-Time Analytics Author: Carolyn Duby, Field CTO

3 Must Know AWS ETL Tools for Data Engineers

Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue)

Build Data Products Without A Data Team Using AgileData

Data Engineering Weekly #222

6 Ways To Prepare Your Data Team for 2025

The Rise of Streaming Data Architectures: What You Need to Know

20 Best Open Source Big Data Projects to Contribute on GitHub

Build Better Data Products By Creating Data, Not Consuming It

The Data Discovery Team

A Complete Guide to Scale Your Data Pipelines and Data Products with Contract Testing and Dbt

The Ultimate 101 Guide to Apache Airflow DAGS

A Look At The Data Systems Behind The Gameplay For League Of Legends

Collecting And Retaining Contextual Metadata For Powerful And Effective Data Discovery

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Stay Connected