Data Pipeline - Data Engineering Digest

Best Automation Tools In 2025 for Data Pipelines, Integrations, and More

Seattle Data Guy

MARCH 31, 2025

Whether automating a report or setting up retraining pipelines for machine learning models, the idea was always the same: do less manual work and get more consistent results. But automation isnt just for analytics.

Data Pipeline

Data Pipeline Machine Learning Data Process

Alternatives to Talend – How To Migrate Away From Talend For Your Data Pipelines

Seattle Data Guy

MARCH 19, 2025

In 2023, Talend was acquired by Qlik, combining the two companies data integration and analytics tools under one roof. In January 2024, Talend discontinued Talend Open… Read more The post Alternatives to Talend How To Migrate Away From Talend For Your Data Pipelines appeared first on Seattle Data Guy.

Data Pipeline

Data Pipeline Data Integration Data Big Data

Building cost effective data pipelines with Python & DuckDB

Start Data Engineering

MAY 28, 2024

Building efficient data pipelines with DuckDB 4.1. Use DuckDB to process data, not for multiple users to access data 4.2. Cost calculation: DuckDB + Ephemeral VMs = dirt cheap data processing 4.3. Processing data less than 100GB? Introduction 2. Project demo 3. Use DuckDB 4.4.

Data Pipeline

Data Pipeline Python Building Data

Webinars

Going Beyond Chatbots: Connecting AI to Your Tools, Systems, & Data

Smart Tech + Human Expertise = How to Modernize Manufacturing Without Losing Control

MORE WEBINARS

Monitoring Data Quality for Your Big Data Pipelines Made Easy

Analytics Vidhya

NOVEMBER 8, 2023

In the data-driven world […] The post Monitoring Data Quality for Your Big Data Pipelines Made Easy appeared first on Analytics Vidhya. Determine success by the precision of your charts, the equipment’s dependability, and your crew’s expertise. A single mistake, glitch, or slip-up could endanger the trip.

Big Data

Big Data Data Pipeline Data IT

Entity Resolution: Your Guide to Deciding Whether to Build It or Buy It

Adding high-quality entity resolution capabilities to enterprise applications, services, data fabrics or data pipelines can be daunting and expensive. Organizations often invest millions of dollars and years of effort to achieve subpar results.

IT

How to Implement a Data Pipeline Using Amazon Web Services?

Analytics Vidhya

FEBRUARY 6, 2023

Introduction The demand for data to feed machine learning models, data science research, and time-sensitive insights is higher than ever thus, processing the data becomes complex. To make these processes efficient, data pipelines are necessary. appeared first on Analytics Vidhya.

Amazon Web Services

Amazon Web Services Data Pipeline Machine Learning Data Science

Kafka to MongoDB: Building a Streamlined Data Pipeline

Analytics Vidhya

FEBRUARY 28, 2024

Handling and processing the streaming data is the hardest work for Data Analysis. We know that streaming data is data that is emitted at high volume […] The post Kafka to MongoDB: Building a Streamlined Data Pipeline appeared first on Analytics Vidhya.

MongoDB

MongoDB Data Pipeline Kafka Building

Data Pipeline Design Patterns - #1. Data flow patterns

Start Data Engineering

DECEMBER 11, 2022

Data pipeline patterns 3.1. Multi-hop pipelines 3.3.2. Conditional/ Dynamic pipelines 3.3.3. Disconnected data pipelines 4. Source Ordering 2.3. Sink Overwritability 3. Extraction patterns 3.1.1. Time ranged 3.1.2. Full Snapshot 3.1.3. Lookback 3.1.4. Streaming 3.2. Behavioral 3.2.1. Idempotent 3.2.2.

Data Pipeline

Data Pipeline Designing Data

Apache Airflow® Crash Course: From 0 to Running your Pipeline in the Cloud

With over 30 million monthly downloads, Apache Airflow is the tool of choice for programmatically authoring, scheduling, and monitoring data pipelines. Airflow enables you to define workflows as Python code, allowing for dynamic and scalable pipelines suitable to any use case from ETL/ELT to running ML/AI operations in production.

Cloud

How To Future-Proof Your Data Pipelines

Ascend.io

NOVEMBER 14, 2024

Why Future-Proofing Your Data Pipelines Matters Data has become the backbone of decision-making in businesses across the globe. The ability to harness and analyze data effectively can make or break a company’s competitive edge. Resilience and adaptability are the cornerstones of a future-proof data pipeline.

Data Pipeline

Data Pipeline Amazon Web Services Data Integration Data

How to add tests to your data pipelines

Start Data Engineering

OCTOBER 12, 2021

Introduction Testing your data pipeline 1. Data quality testing 3. Unit and contract testing Conclusion Further reading Introduction Testing data pipelines are different from testing other applications, like a website backend. End-to-end system testing 2. Monitoring and alerting 4.

Data Pipeline

Data Pipeline Data Systems

PyArrow vs Polars (vs DuckDB) for Data Pipelines.

Confessions of a Data Guy

JULY 24, 2024

We all keep hearing about Arrow this and Arrow that … seems every new tool built today for Data Engineering seems to be at least partly based on Arrow’s in-memory format. So, […] The post PyArrow vs Polars (vs DuckDB) for Data Pipelines. appeared first on Confessions of a Data Guy.

Data Pipeline

Data Pipeline Data Engineering Data Engineer Data

Setup Mage AI with Postgres to Build and Manage Your Data Pipeline

Analytics Vidhya

SEPTEMBER 12, 2024

Introduction Imagine yourself as a data professional tasked with creating an efficient data pipeline to streamline processes and generate real-time information. Sounds challenging, right? That’s where Mage AI comes in to ensure that the lenders operating online gain a competitive edge.

Data Pipeline

Data Pipeline Building Management Data

Ready-to-go sample data pipelines with Dataflow

Netflix Tech

DECEMBER 3, 2022

by Jasmine Omeke , Obi-Ike Nwoke , Olek Gorajek Intro This post is for all data practitioners, who are interested in learning about bootstrapping, standardization and automation of batch data pipelines at Netflix. You may remember Dataflow from the post we wrote last year titled Data pipeline asset management with Dataflow.

Data Pipeline

Data Pipeline Scala Metadata Food

8 Essential Data Pipeline Design Patterns You Should Know

Monte Carlo

NOVEMBER 21, 2024

Whether it’s customer transactions, IoT sensor readings, or just an endless stream of social media hot takes, you need a reliable way to get that data from point A to point B while doing something clever with it along the way. That’s where data pipeline design patterns come in. Data Mesh Pattern 8.

Data Pipeline

Data Pipeline Designing Lambda Architecture Kafka

5 Tips for Building Scalable Data Pipelines

KDnuggets

NOVEMBER 25, 2024

Building data pipelines is a very important skill that you should learn as a data engineer. A data pipeline is just a series of procedures that transport data from one location to another, frequently changing it along the way.

Data Pipeline

Data Pipeline Building Transportation Data

Building Data Pipelines to Create Apps with Large Language Models

KDnuggets

NOVEMBER 2, 2023

For production grade LLM apps, you need a robust data pipeline. This article talks about the different stages of building a Gen AI data pipeline and what is included in these stages.

Data Pipeline

Data Pipeline Building Data

Using Rust to write a Data Pipeline. Thoughts. Musings.

Confessions of a Data Guy

JANUARY 13, 2023

Rust has been on my mind a lot lately, probably because of Data Engineering boredom, watching Spark clusters chug along like some medieval farm worker endlessly trudging through the muck and mire of life. appeared first on Confessions of a Data Guy.

Data Pipeline

Data Pipeline Data Engineering Data Engineer Data

Building Data Pipeline with Prefect

KDnuggets

AUGUST 28, 2024

Learn how to build and deploy an end-to-end data pipeline using Prefect with a few lines of code.

Data Pipeline

Data Pipeline Building Data Coding

Unpacking The Seven Principles Of Modern Data Pipelines

Data Engineering Podcast

AUGUST 13, 2023

Summary Data pipelines are the core of every data product, ML model, and business intelligence dashboard. The folks at Rivery distilled the seven principles of modern data pipelines that will help you stay out of trouble and be productive with your data.

Data Pipeline

Data Pipeline BI SQL Machine Learning

Building a Formula 1 Streaming Data Pipeline With Kafka and Risingwave

KDnuggets

SEPTEMBER 5, 2023

Build a streaming data pipeline using Formula 1 data, Python, Kafka, RisingWave as the streaming database, and visualize all the real-time data in Grafana.

Data Pipeline

Data Pipeline Kafka Building Python

Configure and Manage Data Pipelines Replication in Snowflake with Ease

Snowflake

OCTOBER 3, 2023

We are excited to announce the availability of data pipelines replication, which is now in public preview. In the event of an outage, this powerful new capability lets you easily replicate and failover your entire data ingestion and transformations pipelines in Snowflake with minimal downtime.

Data Pipeline

Data Pipeline Management Data Ingestion Data

Building Databricks Data Pipelines 101

Confessions of a Data Guy

MARCH 29, 2024

Have you ever wondered at a high level what it’s like to build production-level data pipelines on Databricks? The post Building Databricks Data Pipelines 101 appeared first on Confessions of a Data Guy. What does it look like, what tools do you use?

Data Pipeline

Data Pipeline Building Data IT

Understanding Streaming Data Pipelines

Hevo

DECEMBER 20, 2024

In this fast-paced digital era, multiple sources like IoT devices, social media platforms, and financial systems generate the data continuously and in real-time. Every business wants to analyze these data in real-time to be ahead in the competitive game. It has the ability to […]

Data Pipeline

Data Pipeline Media Data Systems

Simplified End-to-End Development for Production-Ready Data Pipelines, Applications, and ML Models

Snowflake

JUNE 4, 2024

Snowflake’s new Python API (GA soon) simplifies data pipelines and is readily available through pip install snowflake. Additionally, Dynamic Tables are a new table type that you can use at every stage of your processing pipeline. Interact with Snowflake objects directly in Python. Automate or code, the choice is yours.

Data Pipeline

Data Pipeline Python SQL Database

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

Snowflake

APRIL 17, 2024

Since the previous Python connector API mostly communicated via SQL, it also hindered the ability to manage Snowflake objects natively in Python, restricting data pipeline efficiency and the ability to complete complex tasks. Or, experience these features firsthand at our free Dev Day event on June 6th in the Demo Zone.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineer

A Guide to Data Pipelines (And How to Design One From Scratch)

Striim

SEPTEMBER 11, 2024

Data pipelines are the backbone of your business’s data architecture. Implementing a robust and scalable pipeline ensures you can effectively manage, analyze, and organize your growing data. We’ll answer the question, “What are data pipelines?” Table of Contents What are Data Pipelines?

Data Pipeline

Data Pipeline Designing Data Lake Data Warehouse

How to Extract Data from APIs for Data Pipelines using Python

Start Data Engineering

APRIL 14, 2025

API Data extraction = GET-ting data from a server 3.1. GET data 3.1.1. GET data for a specific entity 3. Introduction 2. APIs are a way to communicate between systems on the Internet 2.1. HTTP is a protocol commonly used for websites 2.1.1. Request: Ask the Internet exactly what you want 2.1.2.

Data Pipeline

Data Pipeline Python Data Systems

Data Pipeline Design Patterns - #2. Coding patterns in Python

Start Data Engineering

JANUARY 12, 2023

Introduction Sample project Code design patterns 1. Functional design 2. Factory pattern 3. Strategy pattern 4. Singleton, & Object pool patterns Python helpers 1. Dataclass 3. Context Managers 4. Testing with pytest 5.

Designing

Designing Coding Python Data Pipeline

Streamlining Success: The Complete Guide to Data Pipeline Optimization

Hevo

JANUARY 26, 2025

Is your business incapacitated due to slow and unreliable data pipelines in today’s hyper-competitive environment? Data pipelines are the backbone that guarantees real-time access to critical information for informed and quicker decisions. The data pipeline market is set to grow from USD 6.81

Data Pipeline

Data Pipeline Data Accessible Accessibility

Databricks Named a Leader in Stream Processing and Cloud Data Pipelines

databricks

JULY 8, 2024

We are proud to announce two new analyst reports recognizing Databricks in the data engineering and data streaming space: IDC MarketScape: Worldwide Analytic.

Data Pipeline

Data Pipeline Process Cloud Data Engineer

Data Orchestration Trends: The Shift From Data Pipelines to Data Products

Simon Späti

JUNE 20, 2022

Data consumers, such as data analysts, and business users, care mostly about the production of data assets. On the other hand, data engineers have historically focused on modeling the dependencies between tasks (instead of data assets) with an orchestrator tool. How can we reconcile both worlds?

Data Pipeline

Data Pipeline Data Data Engineer Data Engineering

Should Python Data Pipelines be Function based or Object-Oriented (OOP)?

Start Data Engineering

FEBRUARY 10, 2025

Data transformations as functions lead to maintainable code 3. Track pipeline progress (logging, Observer) with objects 3.3. Use objects to store configurations of data systems (e.g., Class lets you define reusable code and pipeline patterns 4.1. Introduction 2. Objects help track things (aka state) 3.1. Spark, etc.)

Data Pipeline

Data Pipeline Python Coding Data

Fueling the Future of GenAI with NiFi: Cloudera DataFlow 2.9 Delivers Enhanced Efficiency and Adaptability

Cloudera

DECEMBER 4, 2024

Our customers rely on NiFi as well as the associated sub-projects (Apache MiNiFi and Registry) to connect to structured, unstructured, and multi-modal data from a variety of data sources – from edge devices to SaaS tools to server logs and change data capture streams. Cloudera DataFlow 2.9

Data Pipeline

Data Pipeline Data Ingestion Data Preparation Architecture

How to turn a 1000-line messy SQL into a modular, & easy-to-maintain data pipeline?

Start Data Engineering

FEBRUARY 3, 2025

Introduction If you’ve been in the data space long enough, you would have come across really long SQL scripts that someone had written years ago. Introduction 2. Split your SQL into smaller parts 2.1. Start with a baseline validation to ensure that your changes do not change the output too much 2.2. Conclusion 4. Required reading 1.

SQL

SQL Data Pipeline Data

Why use Apache Airflow (or any orchestrator)?

Start Data Engineering

JUNE 24, 2024

Features crucial to building and maintaining data pipelines 2.1. Schedulers to run data pipelines at specified frequency 2.2. Orchestrators to define the order of execution of your pipeline tasks 2.2.1. Define the order of execution of pipeline tasks with a DAG 2.2.2. Introduction 2.

Data Pipeline

Data Pipeline Coding Building Data

Data Engineering Projects

Start Data Engineering

JUNE 14, 2024

Run Data Pipelines 2.1. Batch pipelines 3.3. Stream pipelines 3.4. Event-driven pipelines 3.5. LLM RAG pipelines 4. Introduction Whether you are new to data engineering or have been in the data field for a few years, one of the most challenging parts of learning new frameworks is setting them up!

Data Engineering

Data Engineering Data Engineer Project Engineering

Data Engineering for Streaming Data on GCP

Analytics Vidhya

APRIL 3, 2023

Introduction Companies can access a large pool of data in the modern business environment, and using this data in real-time may produce insightful results that can spur corporate success. Real-time dashboards such as GCP provide strong data visualization and actionable information for decision-makers.

Data Engineering

Data Engineering Data Engineer Engineering Data

Learn How to Build Airtight Data Pipelines for your AI Initiatives

databricks

OCTOBER 24, 2023

"I can't think of anything that's been more powerful since the desktop computer." — Michael Carbin, Associate Professor, MIT, and Founding Advisor, MosaicML A.

Data Pipeline

Data Pipeline Building Data

What is a Data Platform?

Confessions of a Data Guy

JANUARY 8, 2025

You know, for all the hoards of content, books, and videos produced in the “Data Space” over the last few years, famous or others, it seems I find there are volumes of information on the pieces and parts of working in Data. appeared first on Confessions of a Data Guy.

Data Storage

Data Storage Data Pipeline Data IT

What Data Engineers Really Do?

Analytics Vidhya

JUNE 25, 2023

A data engineer investigates the issue, identifies a glitch in the e-commerce platform’s data funnel, and swiftly implements seamless data pipelines. While data scientists and analysts receive […] The post What Data Engineers Really Do? appeared first on Analytics Vidhya.

Data Engineering

Data Engineering Data Engineer Engineering Data Pipeline

Accelerate AI Development with Snowflake

Snowflake

NOVEMBER 11, 2024

Here’s how Snowflake Cortex AI and Snowflake ML are accelerating the delivery of trusted AI solutions for the most critical generative AI applications: Natural language processing (NLP) for data pipelines: Large language models (LLMs) have a transformative potential, but they often batch inference integration into pipelines, which can be cumbersome.

Unstructured Data

Unstructured Data SQL AWS Healthcare

Airflow Alternatives for Data Orchestration

Analytics Vidhya

AUGUST 7, 2024

Introduction Apache Airflow is a crucial component in data orchestration and is known for its capability to handle intricate workflows and automate data pipelines. Many organizations have chosen it due to its flexibility and strong scheduling capabilities.

Data Pipeline

Data Pipeline Data Process Data Workflow

Best Automation Tools In 2025 for Data Pipelines, Integrations, and More

Alternatives to Talend – How To Migrate Away From Talend For Your Data Pipelines

Webinars

Trending Sources

Building cost effective data pipelines with Python & DuckDB

Webinars

Monitoring Data Quality for Your Big Data Pipelines Made Easy

Entity Resolution: Your Guide to Deciding Whether to Build It or Buy It

How to Implement a Data Pipeline Using Amazon Web Services?

Kafka to MongoDB: Building a Streamlined Data Pipeline

Top 10 Data Pipeline Interview Questions to Read in 2023

Data Pipeline Design Patterns - #1. Data flow patterns

Apache Airflow® Crash Course: From 0 to Running your Pipeline in the Cloud

How To Future-Proof Your Data Pipelines

How to add tests to your data pipelines

PyArrow vs Polars (vs DuckDB) for Data Pipelines.

Setup Mage AI with Postgres to Build and Manage Your Data Pipeline

Ready-to-go sample data pipelines with Dataflow

8 Essential Data Pipeline Design Patterns You Should Know

5 Tips for Building Scalable Data Pipelines

Building Data Pipelines to Create Apps with Large Language Models

Using Rust to write a Data Pipeline. Thoughts. Musings.

Building Data Pipeline with Prefect

Unpacking The Seven Principles Of Modern Data Pipelines

Building a Formula 1 Streaming Data Pipeline With Kafka and Risingwave

Configure and Manage Data Pipelines Replication in Snowflake with Ease

Building Databricks Data Pipelines 101

Understanding Streaming Data Pipelines

Simplified End-to-End Development for Production-Ready Data Pipelines, Applications, and ML Models

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

A Guide to Data Pipelines (And How to Design One From Scratch)

How to Extract Data from APIs for Data Pipelines using Python

Data Pipeline Design Patterns - #2. Coding patterns in Python

Streamlining Success: The Complete Guide to Data Pipeline Optimization

Databricks Named a Leader in Stream Processing and Cloud Data Pipelines

Data Orchestration Trends: The Shift From Data Pipelines to Data Products

Should Python Data Pipelines be Function based or Object-Oriented (OOP)?

Fueling the Future of GenAI with NiFi: Cloudera DataFlow 2.9 Delivers Enhanced Efficiency and Adaptability

How to turn a 1000-line messy SQL into a modular, & easy-to-maintain data pipeline?

Why use Apache Airflow (or any orchestrator)?

Data Engineering Projects

Data Engineering for Streaming Data on GCP

Learn How to Build Airtight Data Pipelines for your AI Initiatives

What is a Data Platform?

What Data Engineers Really Do?

Accelerate AI Development with Snowflake

Airflow Alternatives for Data Orchestration

Stay Connected