Data and Data Pipeline - Data Engineering Digest

Best Automation Tools In 2025 for Data Pipelines, Integrations, and More

Seattle Data Guy

MARCH 31, 2025

RevOps teams want to streamline processes… Read more The post Best Automation Tools In 2025 for Data Pipelines, Integrations, and More appeared first on Seattle Data Guy. But automation isnt just for analytics.

Data Pipeline

Data Pipeline Machine Learning Data Process

Alternatives to Talend – How To Migrate Away From Talend For Your Data Pipelines

Seattle Data Guy

MARCH 19, 2025

Data integration is critical for organizations of all sizes and industriesand one of the leading providers of data integration tools is Talend, which offers the flagship product Talend Studio. In 2023, Talend was acquired by Qlik, combining the two companies data integration and analytics tools under one roof.

Data Pipeline

Data Pipeline Data Integration Data Big Data

Monitoring Data Quality for Your Big Data Pipelines Made Easy

Analytics Vidhya

NOVEMBER 8, 2023

In the data-driven world […] The post Monitoring Data Quality for Your Big Data Pipelines Made Easy appeared first on Analytics Vidhya. Determine success by the precision of your charts, the equipment’s dependability, and your crew’s expertise. A single mistake, glitch, or slip-up could endanger the trip.

Big Data

Big Data Data Pipeline Data IT

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Building cost effective data pipelines with Python & DuckDB

Start Data Engineering

MAY 28, 2024

Building efficient data pipelines with DuckDB 4.1. Use DuckDB to process data, not for multiple users to access data 4.2. Cost calculation: DuckDB + Ephemeral VMs = dirt cheap data processing 4.3. Processing data less than 100GB? Introduction 2. Project demo 3. Use DuckDB 4.4.

Data Pipeline

Data Pipeline Python Building Data

Entity Resolution: Your Guide to Deciding Whether to Build It or Buy It

Adding high-quality entity resolution capabilities to enterprise applications, services, data fabrics or data pipelines can be daunting and expensive. Organizations often invest millions of dollars and years of effort to achieve subpar results.

IT

How to Implement a Data Pipeline Using Amazon Web Services?

Analytics Vidhya

FEBRUARY 6, 2023

Introduction The demand for data to feed machine learning models, data science research, and time-sensitive insights is higher than ever thus, processing the data becomes complex. To make these processes efficient, data pipelines are necessary. appeared first on Analytics Vidhya.

Amazon Web Services

Amazon Web Services Data Pipeline Machine Learning Data Science

Kafka to MongoDB: Building a Streamlined Data Pipeline

Analytics Vidhya

FEBRUARY 28, 2024

Introduction Data is fuel for the IT industry and the Data Science Project in today’s online world. IT industries rely heavily on real-time insights derived from streaming data sources. Handling and processing the streaming data is the hardest work for Data Analysis.

MongoDB

MongoDB Data Pipeline Kafka Building

Data Pipeline Design Patterns - #1. Data flow patterns

Start Data Engineering

DECEMBER 11, 2022

Data pipeline patterns 3.1. Multi-hop pipelines 3.3.2. Conditional/ Dynamic pipelines 3.3.3. Disconnected data pipelines 4. Source Ordering 2.3. Sink Overwritability 3. Extraction patterns 3.1.1. Time ranged 3.1.2. Full Snapshot 3.1.3. Lookback 3.1.4. Streaming 3.2. Behavioral 3.2.1. Idempotent 3.2.2.

Data Pipeline

Data Pipeline Designing Data

Apache Airflow® Crash Course: From 0 to Running your Pipeline in the Cloud

With over 30 million monthly downloads, Apache Airflow is the tool of choice for programmatically authoring, scheduling, and monitoring data pipelines. Airflow enables you to define workflows as Python code, allowing for dynamic and scalable pipelines suitable to any use case from ETL/ELT to running ML/AI operations in production.

Cloud

How To Future-Proof Your Data Pipelines

Ascend.io

NOVEMBER 14, 2024

Why Future-Proofing Your Data Pipelines Matters Data has become the backbone of decision-making in businesses across the globe. The ability to harness and analyze data effectively can make or break a company’s competitive edge. Resilience and adaptability are the cornerstones of a future-proof data pipeline.

Data Pipeline

Data Pipeline Amazon Web Services Data Integration Data

How to add tests to your data pipelines

Start Data Engineering

OCTOBER 12, 2021

Introduction Testing your data pipeline 1. Data quality testing 3. Unit and contract testing Conclusion Further reading Introduction Testing data pipelines are different from testing other applications, like a website backend. End-to-end system testing 2. Monitoring and alerting 4.

Data Pipeline

Data Pipeline Data Systems

PyArrow vs Polars (vs DuckDB) for Data Pipelines.

Confessions of a Data Guy

JULY 24, 2024

We all keep hearing about Arrow this and Arrow that … seems every new tool built today for Data Engineering seems to be at least partly based on Arrow’s in-memory format. So, […] The post PyArrow vs Polars (vs DuckDB) for Data Pipelines. appeared first on Confessions of a Data Guy.

Data Pipeline

Data Pipeline Data Engineering Data Engineer Data

Setup Mage AI with Postgres to Build and Manage Your Data Pipeline

Analytics Vidhya

SEPTEMBER 12, 2024

Introduction Imagine yourself as a data professional tasked with creating an efficient data pipeline to streamline processes and generate real-time information. Sounds challenging, right? That’s where Mage AI comes in to ensure that the lenders operating online gain a competitive edge.

Data Pipeline

Data Pipeline Building Management Data

8 Essential Data Pipeline Design Patterns You Should Know

Monte Carlo

NOVEMBER 21, 2024

Let’s set the scene: your company collects data, and you need to do something useful with it. Whether it’s customer transactions, IoT sensor readings, or just an endless stream of social media hot takes, you need a reliable way to get that data from point A to point B while doing something clever with it along the way.

Data Pipeline

Data Pipeline Designing Lambda Architecture Kafka

Ready-to-go sample data pipelines with Dataflow

Netflix Tech

DECEMBER 3, 2022

by Jasmine Omeke , Obi-Ike Nwoke , Olek Gorajek Intro This post is for all data practitioners, who are interested in learning about bootstrapping, standardization and automation of batch data pipelines at Netflix. You may remember Dataflow from the post we wrote last year titled Data pipeline asset management with Dataflow.

Data Pipeline

Data Pipeline Scala Metadata Food

Using Rust to write a Data Pipeline. Thoughts. Musings.

Confessions of a Data Guy

JANUARY 13, 2023

Rust has been on my mind a lot lately, probably because of Data Engineering boredom, watching Spark clusters chug along like some medieval farm worker endlessly trudging through the muck and mire of life. appeared first on Confessions of a Data Guy. appeared first on Confessions of a Data Guy.

Data Pipeline

Data Pipeline Data Engineering Data Engineer Data

5 Tips for Building Scalable Data Pipelines

KDnuggets

NOVEMBER 25, 2024

Building data pipelines is a very important skill that you should learn as a data engineer. A data pipeline is just a series of procedures that transport data from one location to another, frequently changing it along the way.

Data Pipeline

Data Pipeline Building Transportation Data

Building Data Pipelines to Create Apps with Large Language Models

KDnuggets

NOVEMBER 2, 2023

For production grade LLM apps, you need a robust data pipeline. This article talks about the different stages of building a Gen AI data pipeline and what is included in these stages.

Data Pipeline

Data Pipeline Building Data

Unpacking The Seven Principles Of Modern Data Pipelines

Data Engineering Podcast

AUGUST 13, 2023

Summary Data pipelines are the core of every data product, ML model, and business intelligence dashboard. The folks at Rivery distilled the seven principles of modern data pipelines that will help you stay out of trouble and be productive with your data.

Data Pipeline

Data Pipeline BI SQL Machine Learning

Building Data Pipeline with Prefect

KDnuggets

AUGUST 28, 2024

Learn how to build and deploy an end-to-end data pipeline using Prefect with a few lines of code.

Data Pipeline

Data Pipeline Building Data Coding

Building a Formula 1 Streaming Data Pipeline With Kafka and Risingwave

KDnuggets

SEPTEMBER 5, 2023

Build a streaming data pipeline using Formula 1 data, Python, Kafka, RisingWave as the streaming database, and visualize all the real-time data in Grafana.

Data Pipeline

Data Pipeline Kafka Building Python

Data Engineering Projects

Start Data Engineering

JUNE 14, 2024

Run Data Pipelines 2.1. Batch pipelines 3.3. Stream pipelines 3.4. Event-driven pipelines 3.5. LLM RAG pipelines 4. Introduction Whether you are new to data engineering or have been in the data field for a few years, one of the most challenging parts of learning new frameworks is setting them up!

Data Engineering

Data Engineering Data Engineer Project Engineering

Data Engineering for Streaming Data on GCP

Analytics Vidhya

APRIL 3, 2023

Introduction Companies can access a large pool of data in the modern business environment, and using this data in real-time may produce insightful results that can spur corporate success. Real-time dashboards such as GCP provide strong data visualization and actionable information for decision-makers.

Data Engineering

Data Engineering Data Engineer Engineering Data

Configure and Manage Data Pipelines Replication in Snowflake with Ease

Snowflake

OCTOBER 3, 2023

We are excited to announce the availability of data pipelines replication, which is now in public preview. In the event of an outage, this powerful new capability lets you easily replicate and failover your entire data ingestion and transformations pipelines in Snowflake with minimal downtime.

Data Pipeline

Data Pipeline Management Data Ingestion Data

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

Snowflake

APRIL 17, 2024

In today’s data-driven world, developer productivity is essential for organizations to build effective and reliable products, accelerate time to value, and fuel ongoing innovation. This allows your applications to handle large data sets and complex workflows efficiently.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineer

Building Databricks Data Pipelines 101

Confessions of a Data Guy

MARCH 29, 2024

Have you ever wondered at a high level what it’s like to build production-level data pipelines on Databricks? The post Building Databricks Data Pipelines 101 appeared first on Confessions of a Data Guy. What does it look like, what tools do you use?

Data Pipeline

Data Pipeline Building Data IT

How to Extract Data from APIs for Data Pipelines using Python

Start Data Engineering

APRIL 14, 2025

API Data extraction = GET-ting data from a server 3.1. GET data 3.1.1. GET data for a specific entity 3. HTTP is a protocol commonly used for websites 2.1.1. Request: Ask the Internet exactly what you want 2.1.2. Response is what you get from the server 3.

Data Pipeline

Data Pipeline Python Data Systems

Simplified End-to-End Development for Production-Ready Data Pipelines, Applications, and ML Models

Snowflake

JUNE 4, 2024

The rise of AI, for example, will depend on the collaboration between data and development. We’re increasingly seeing software engineering workloads that are deeply intertwined with a strong data foundation. This means faster development and happier data teams. Let’s dive deeper into what we announced.

Data Pipeline

Data Pipeline Python SQL Database

Understanding Streaming Data Pipelines

Hevo

DECEMBER 20, 2024

In this fast-paced digital era, multiple sources like IoT devices, social media platforms, and financial systems generate the data continuously and in real-time. Every business wants to analyze these data in real-time to be ahead in the competitive game. It has the ability to […]

Data Pipeline

Data Pipeline Media Data Systems

What is a Data Platform?

Confessions of a Data Guy

JANUARY 8, 2025

You know, for all the hoards of content, books, and videos produced in the “Data Space” over the last few years, famous or others, it seems I find there are volumes of information on the pieces and parts of working in Data. appeared first on Confessions of a Data Guy.

Data Storage

Data Storage Data Pipeline Data IT

Data Orchestration Trends: The Shift From Data Pipelines to Data Products

Simon Späti

JUNE 20, 2022

Data consumers, such as data analysts, and business users, care mostly about the production of data assets. On the other hand, data engineers have historically focused on modeling the dependencies between tasks (instead of data assets) with an orchestrator tool. How can we reconcile both worlds?

Data Pipeline

Data Pipeline Data Data Engineering Data Engineer

Data News — Week 24.11

Christophe Blefari

MARCH 15, 2024

Saying mainly that " Sora is a tool to extend creativity " Last point Mira has been mocked and criticised online because as a CTO she wasn't able to say on which public / licensed data Sora has been trained on. Pandera, a data validation library for dataframes, now supports Polars.

Metadata

Metadata Data Data Warehouse Software Engineering

Realtime Data Applications Made Easier With Meroxa

Data Engineering Podcast

APRIL 23, 2023

Meroxa was created to enable teams of all sizes to deliver real-time data applications. In this episode DeVaris Brown discusses the types of applications that are possible when teams don't have to manage the complex infrastructure necessary to support continuous data flows. Can you describe what Meroxa is and the story behind it?

Data Lake

Data Lake Kafka Machine Learning Data Warehouse

A Guide to Data Pipelines (And How to Design One From Scratch)

Striim

SEPTEMBER 11, 2024

Data pipelines are the backbone of your business’s data architecture. Implementing a robust and scalable pipeline ensures you can effectively manage, analyze, and organize your growing data. That’s where real-time data, and stream processing can help. We’ll answer the question, “What are data pipelines?”

Data Pipeline

Data Pipeline Designing Data Lake Data Warehouse

What Data Engineers Really Do?

Analytics Vidhya

JUNE 25, 2023

In a data-driven world, behind-the-scenes heroes like data engineers play a crucial role in ensuring smooth data flow. A data engineer investigates the issue, identifies a glitch in the e-commerce platform’s data funnel, and swiftly implements seamless data pipelines.

Data Engineering

Data Engineering Data Engineer Engineering Data Pipeline

Data Pipeline Design Patterns - #2. Coding patterns in Python

Start Data Engineering

JANUARY 12, 2023

Introduction Sample project Code design patterns 1. Functional design 2. Factory pattern 3. Strategy pattern 4. Singleton, & Object pool patterns Python helpers 1. Dataclass 3. Context Managers 4. Testing with pytest 5.

Designing

Designing Coding Python Data Pipeline

Databricks Named a Leader in Stream Processing and Cloud Data Pipelines

databricks

JULY 8, 2024

We are proud to announce two new analyst reports recognizing Databricks in the data engineering and data streaming space: IDC MarketScape: Worldwide Analytic.

Data Pipeline

Data Pipeline Process Cloud Data Engineering

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. Join in with the event for the global data community, Data Council Austin.

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Should Python Data Pipelines be Function based or Object-Oriented (OOP)?

Start Data Engineering

FEBRUARY 10, 2025

Data transformations as functions lead to maintainable code 3. Track pipeline progress (logging, Observer) with objects 3.3. Use objects to store configurations of data systems (e.g., Class lets you define reusable code and pipeline patterns 4.1. Class lets you define reusable code and pipeline patterns 4.1.

Data Pipeline

Data Pipeline Python Coding Data

Airflow Alternatives for Data Orchestration

Analytics Vidhya

AUGUST 7, 2024

Introduction Apache Airflow is a crucial component in data orchestration and is known for its capability to handle intricate workflows and automate data pipelines. Many organizations have chosen it due to its flexibility and strong scheduling capabilities.

Data Pipeline

Data Pipeline Data Process Data Workflow

Streamlining Success: The Complete Guide to Data Pipeline Optimization

Hevo

JANUARY 26, 2025

Is your business incapacitated due to slow and unreliable data pipelines in today’s hyper-competitive environment? Data pipelines are the backbone that guarantees real-time access to critical information for informed and quicker decisions. The data pipeline market is set to grow from USD 6.81

Data Pipeline

Data Pipeline Data Accessibility Accessible

How to turn a 1000-line messy SQL into a modular, & easy-to-maintain data pipeline?

Start Data Engineering

FEBRUARY 3, 2025

Introduction If you’ve been in the data space long enough, you would have come across really long SQL scripts that someone had written years ago. Split your CTAs/Subquery into separate functions (or models if using dbt) 2.3. Unit test your functions for maintainability and evolution of logic 3. Conclusion 4. Required reading 1.

SQL

SQL Data Pipeline Data

Top 10 Data Engineering & AI Trends for 2025

Monte Carlo

NOVEMBER 26, 2024

Here’s where leading futurist and investor Tomasz Tunguz thinks data and AI stands at the end of 2024—plus a few predictions of my own. 2025 data engineering trends incoming. Small data is the future of AI (Tomasz) 7. The lines are blurring for analysts and data engineers (Barr) 8. Table of Contents 1.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Best Automation Tools In 2025 for Data Pipelines, Integrations, and More

Alternatives to Talend – How To Migrate Away From Talend For Your Data Pipelines

Webinars

Trending Sources

Monitoring Data Quality for Your Big Data Pipelines Made Easy

Webinars

Building cost effective data pipelines with Python & DuckDB

Entity Resolution: Your Guide to Deciding Whether to Build It or Buy It

How to Implement a Data Pipeline Using Amazon Web Services?

Kafka to MongoDB: Building a Streamlined Data Pipeline

Data Pipeline Design Patterns - #1. Data flow patterns

Top 10 Data Pipeline Interview Questions to Read in 2023

Apache Airflow® Crash Course: From 0 to Running your Pipeline in the Cloud

How To Future-Proof Your Data Pipelines

How to add tests to your data pipelines

PyArrow vs Polars (vs DuckDB) for Data Pipelines.

Setup Mage AI with Postgres to Build and Manage Your Data Pipeline

8 Essential Data Pipeline Design Patterns You Should Know

Ready-to-go sample data pipelines with Dataflow

Using Rust to write a Data Pipeline. Thoughts. Musings.

5 Tips for Building Scalable Data Pipelines

Building Data Pipelines to Create Apps with Large Language Models

Unpacking The Seven Principles Of Modern Data Pipelines

Building Data Pipeline with Prefect

Building a Formula 1 Streaming Data Pipeline With Kafka and Risingwave

Data Engineering Projects

Data Engineering for Streaming Data on GCP

Configure and Manage Data Pipelines Replication in Snowflake with Ease

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

Building Databricks Data Pipelines 101

How to Extract Data from APIs for Data Pipelines using Python

Simplified End-to-End Development for Production-Ready Data Pipelines, Applications, and ML Models

Understanding Streaming Data Pipelines

What is a Data Platform?

Data Orchestration Trends: The Shift From Data Pipelines to Data Products

Data News — Week 24.11

Realtime Data Applications Made Easier With Meroxa

A Guide to Data Pipelines (And How to Design One From Scratch)

What Data Engineers Really Do?

Data Pipeline Design Patterns - #2. Coding patterns in Python

Databricks Named a Leader in Stream Processing and Cloud Data Pipelines

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Should Python Data Pipelines be Function based or Object-Oriented (OOP)?

Airflow Alternatives for Data Orchestration

Streamlining Success: The Complete Guide to Data Pipeline Optimization

How to turn a 1000-line messy SQL into a modular, & easy-to-maintain data pipeline?

Top 10 Data Engineering & AI Trends for 2025

Stay Connected