Data Pipeline and Demo - Data Engineering Digest

Building cost effective data pipelines with Python & DuckDB

Start Data Engineering

MAY 28, 2024

Project demo 3. Building efficient data pipelines with DuckDB 4.1. Use DuckDB to process data, not for multiple users to access data 4.2. Cost calculation: DuckDB + Ephemeral VMs = dirt cheap data processing 4.3. Processing data less than 100GB? Introduction 2. Use DuckDB 4.4.

Data Pipeline

Data Pipeline Python Building Data

8 Essential Data Pipeline Design Patterns You Should Know

Monte Carlo

NOVEMBER 21, 2024

Whether it’s customer transactions, IoT sensor readings, or just an endless stream of social media hot takes, you need a reliable way to get that data from point A to point B while doing something clever with it along the way. That’s where data pipeline design patterns come in. Data Mesh Pattern 8.

Data Pipeline

Data Pipeline Designing Lambda Architecture Kafka

Calling All Builders: Get Hands-On With AI and Apps

Snowflake

NOVEMBER 4, 2024

Register now and join thousands of fellow developers, data scientists and engineers to learn about the future of AI agents, how to effectively scale pandas, how to create retrieval-augmented generation (RAG) chatbots and much, much more. From Snowflake Native Apps to machine learning, there’s sure to be something fresh for everyone.

Unstructured Data

Unstructured Data Python Machine Learning Data Pipeline

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

Snowflake

APRIL 17, 2024

Since the previous Python connector API mostly communicated via SQL, it also hindered the ability to manage Snowflake objects natively in Python, restricting data pipeline efficiency and the ability to complete complex tasks. Or, experience these features firsthand at our free Dev Day event on June 6th in the Demo Zone.

Data Pipeline

Data Pipeline Python Data Engineer Data Engineering

Data Pipeline Observability: A Model For Data Engineers

Databand.ai

JUNE 28, 2023

Data Pipeline Observability: A Model For Data Engineers Eitan Chazbani June 29, 2023 Data pipeline observability is your ability to monitor and understand the state of a data pipeline at any time. We believe the world’s data pipelines need better data observability.

Data Pipeline

Data Pipeline Data Engineer Data Engineering Engineering

A Guide to Data Pipelines (And How to Design One From Scratch)

Striim

SEPTEMBER 11, 2024

Data pipelines are the backbone of your business’s data architecture. Implementing a robust and scalable pipeline ensures you can effectively manage, analyze, and organize your growing data. We’ll answer the question, “What are data pipelines?” Table of Contents What are Data Pipelines?

Data Pipeline

Data Pipeline Designing Data Lake Data Warehouse

Making Data Pipelines Self-Serve For Everyone With Shipyard

Data Engineering Podcast

JUNE 1, 2021

Summary Every part of the business relies on data, yet only a small team has the context and expertise to build and maintain workflows and data pipelines to transform, clean, and integrate it. RudderStack’s smart customer data pipeline is warehouse-first.

Data Pipeline

Data Pipeline Data Warehouse Data Data Engineer

Analysing historical and live data with ksqlDB and Elastic Cloud

Confluent

NOVEMBER 20, 2020

Building data pipelines isn’t always straightforward. The gap between the shiny “hello world” examples of demos and the gritty reality of messy data and imperfect formats is sometimes all too […].

Cloud

Cloud Data Pipeline Data Building

The Challenge of Data Quality and Availability—And Why It’s Holding Back AI and Analytics

Striim

APRIL 18, 2025

How Organizations Can Overcome Data Quality and Availability Challenges Many businesses are shifting toward real-time data pipelines to ensure their AI and analytics strategies are built on reliable information. Enabling AI & ML with Adaptive Data Pipelines AI models require ongoing updates to stay relevant.

High Quality Data

High Quality Data Business Intelligence Unstructured Data Data Pipeline

Stream Rows and Kafka Topics Directly into Snowflake with Snowpipe Streaming

Snowflake

MARCH 2, 2023

Snowflake enables organizations to be data-driven by offering an expansive set of features for creating performant, scalable, and reliable data pipelines that feed dashboards, machine learning models, and applications. But before data can be transformed and served or shared, it must be ingested from source systems.

Kafka

Kafka Data Ingestion Data Pipeline Cloud Storage

Snowflake Startup Challenge 2025: Meet the Top 10

Snowflake

APRIL 9, 2025

Teams can deploy and manage experiments using data available in Snowflake in real time, without having to deal with access to a third-party platform, exfiltrating sensitive conversion data or building complex data pipelines to get the data they need.

Pharmaceutical

Pharmaceutical Manufacturing Data Ingestion SQL

An instant demo of data lineage is worth a thousand words

Datakin

AUGUST 10, 2021

Blog An instant demo of data lineage is worth a thousand words Written by Ross Turk on August 10, 2021 They say that a picture is worth a thousand words. If you’ve ever tried to describe how all the jobs in your data pipeline are interrelated using just words, I am sure it wasn’t easy.

Data Pipeline

Data Pipeline Metadata Data IT

Applying Data Pipeline Principles in Practice: Exploring the Kafka Summit Keynote Demo

Confluent

JUNE 22, 2022

How to use data pipelines, unlock the benefits of real-time data flow, and achieve seamless data streaming and analytics at scale with Confluent.

Data Pipeline

Data Pipeline Kafka Data

Demo: Supercharging Data Engineering with Magpie for Snowflake®

Silectis

JANUARY 22, 2021

For those using a robust analytics database, such as the Snowflake® Data Cloud , adding the power of a data engineering platform can help maximize the value you’re getting out of that database. Magpie Fills in the Gaps for Better Data Engineering And we’re not talking about just ETL (extract, transform, load). Magpie can help.

Data Engineer

Data Engineer Data Engineering Engineering Data Warehouse

Monte Carlo and Databricks Partner to Deliver Data + AI Observability

Monte Carlo

MARCH 19, 2025

Monte Carlo and Databricks double-down on their partnership, helping organizations build trusted AI applications by expanding visibility into the data pipelines that fuel the Databricks Data Intelligence Platform. This comprehensive visibility helps teams identify and resolve data issues before they cascade into AI failures.

Unstructured Data

Unstructured Data Data Pipeline High Quality Data Banking

What are Smart Data Pipelines? 9 Key Smart Data Pipelines Capabilities

Striim

AUGUST 14, 2024

When implemented effectively, smart data pipelines seamlessly integrate data from diverse sources, enabling swift analysis and actionable insights. They empower data analysts and business users alike by providing critical information while protecting sensitive production systems. What is a Smart Data Pipeline?

Data Pipeline

Data Pipeline Data Architecture Transportation

Monte Carlo and Snowflake Partner to Provide Observability into Unstructured Data

Monte Carlo

MARCH 6, 2025

With their extended partnership, data + AI observability leader and the Data AI Cloud bring reliability to structured and unstructured data pipelines in Snowflake Cortex AI. Read on for more details and find out how were thinking about unstructured data observability for AI. Why observability for unstructured data?

Unstructured Data

Unstructured Data Data Pipeline Coding Data

Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams

Data Engineering Podcast

DECEMBER 28, 2022

Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold.

Data Lake

Data Lake Data Warehouse Data Pipeline MongoDB

Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems

Data Engineering Podcast

DECEMBER 25, 2022

Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold.

Machine Learning

Machine Learning Systems Data Lake Data Warehouse

What is a Data Pipeline (and 7 Must-Have Features of Modern Data Pipelines)

Striim

OCTOBER 11, 2024

A well-executed data pipeline can make or break your company’s ability to leverage real-time insights and stay competitive. Thriving in today’s world requires building modern data pipelines that make moving data and extracting valuable insights quick and simple. What is a Data Pipeline?

Data Pipeline

Data Pipeline MongoDB Unstructured Data Data Lake

Monte Carlo Releases Mastering Data Quality And Your ABCs, World’s First-Ever Children’s Book on Data Quality

Monte Carlo

APRIL 1, 2024

And now, from the mind of Barr Moses, comes the historic next children’s literary classic: Mastering Data Quality And Your ABCs. After all, in the age of virtual reality, generative AI, and cyber trucks, why shouldn’t children also learn how to write their first dbt test or spin up their first data observability solution? .

Data Pipeline

Data Pipeline Media Education Data

Complete Guide to Data Transformation: Basics to Advanced

Ascend.io

OCTOBER 28, 2024

Tools like Python’s requests library or ETL/ELT tools can facilitate data enrichment by automating the retrieval and merging of external data. Read More: Discover how to build a data pipeline in 6 steps Data Integration Data integration involves combining data from different sources into a single, unified view.

Raw Data

Raw Data Datasets Aggregated Data Data Pipeline

How to analyze and resolve data pipeline incidents in Databand

Databand.ai

SEPTEMBER 9, 2022

How to analyze and resolve data pipeline incidents in Databand Niv Sluzki 2022-09-09 13:00:12 A data pipeline failure can cripple your downstream data flows. Whether it failed to start or quit unexpectedly, you need to know immediately if there is a pipeline incident.

Data Pipeline

Data Pipeline Datasets AWS Data

Top 10 Data Engineering & AI Trends for 2025

Monte Carlo

NOVEMBER 26, 2024

It’s important to be able to talk about them, but in reality, no one has been able to deploy them and no one has had any success outside of a demo. We’re going to see an explosion in the total number of pipelines but with much smaller data volumes.” But the more pipelines expand, the more difficult data quality becomes.

Data Engineer

Data Engineer Data Engineering Engineering Unstructured Data

Accelerate Development and Productivity with DevOps in Snowflake

Snowflake

JUNE 10, 2024

” —David Webb, Data Architect at Travelpass Build modern data pipelines with Snowflake Python APIs Snowflake’s latest suite of Python APIs (GA soon) simplifies the data pipeline development process with Python.

Python

Python Data Pipeline SQL Database

How to create data pipeline and data quality SLA alerts in Databand

Databand.ai

SEPTEMBER 19, 2022

How to create data pipeline and data quality SLA alerts in Databand Helen Soloveichik 2022-09-20 01:49:30 Data engineers often get inundated by alerts from data issues. Databand helps fix this problem by breaking through noisy alerts with focused alerting and routing when a data pipeline and quality issues occur.

Data Pipeline

Data Pipeline Datasets Data Data Engineer

The Best Data Dictionary Tools in 2025

Monte Carlo

APRIL 28, 2025

Thats where data observability comes in. Tools like Monte Carlo monitor your data pipelines and flag issueslike broken jobs, missing records, or sudden spikes before they mess up your reports. Thats how you move from just working with data to actually making confident, smart decisions based on it.

Metadata

Metadata Hadoop Data SQL

Best Practices for Real-Time Stream Processing

Striim

MARCH 21, 2025

Striim customers often utilize a single streaming source for delivery into Kafka, Cloud Data Warehouses, and cloud storage, simultaneously and in real-time. Building streaming data pipelines shouldnt require custom coding Building data pipelines and working with streaming data should not require custom coding.

Process

Process Data Warehouse Kafka Data Pipeline

Unlocking Real-Time Decision-Making with High-Velocity Data Analytics

Striim

APRIL 10, 2025

Seamless Integration for Instant Insight: To maximize the benefits of real-time analytics, organizations need platforms that can seamlessly integrate AI models into their data pipelines. Striim provides the architecture to apply trained models to incoming data as it flows through the system.

Data Analytics

Data Analytics Algorithm Datasets Data

No Python, No SQL Templates, No YAML: Why Your Open Source Data Quality Tool Should Generate 80% Of Your Data Quality Tests Automatically

DataKitchen

FEBRUARY 17, 2025

Current open-source frameworks like YAML-based Soda Core, Python-based Great Expectations, and dbt SQL are frameworks to help speed up the creation of data quality tests. They are all in the realm of software, domain-specific language to help you write data quality tests. Download Now Request Demo

SQL

SQL Python Government Data Engineer

Real-Time AI for Crisis Management: Responding Faster with Smarter Systems

Striim

JANUARY 30, 2025

This approach delivers meaningful insights the moment data arrives, supported by continuous learning algorithms that adapt models dynamically to evolving conditions. By integrating AI/ML models directly into streaming data pipelines, organizations can detect anomalies, predict cascading impacts, and execute real-time interventions.

Systems

Systems Management Hospitality Healthcare

5 Takeaways from the Data Pipeline Automation Summit 2023

Ascend.io

APRIL 27, 2023

Going into the Data Pipeline Automation Summit 2023, we were thrilled to connect with our customers and partners and share the innovations we’ve been working on at Ascend. The summit explored the future of data pipeline automation and the endless possibilities it presents.

Data Pipeline

Data Pipeline Pipeline-centric Data Validation Data Engineer

Data News — Week 24.28

Christophe Blefari

JULY 13, 2024

kyutai released Moshi — Moshi is a "voice-enabled AI" The team as kyutai developed the model with an audio interface-first with an audio language model, which make the conversation with the AI more real (demo at 5:00 min) as it can interrupt you or kinda "think" (meaning for predict the next audio segment) while it speaks.

Kafka

Kafka AWS Data Database

Striim 5.0 Release: Introducing Stripe Reader Connector for Real-Time Payment Data Insights

Striim

FEBRUARY 26, 2025

With this setting, the connector initially reads all the data from selected objects and then switches to incremental loading, ensuring ongoing updates are seamlessly captured. This flexibility makes it easy to keep your data pipeline running efficiently. Ready to power your business with real-time data?

Data Consolidation

Data Consolidation Data Workflow Data Pipeline Data

Making The Total Cost Of Ownership For External Data Manageable With Crux

Data Engineering Podcast

JULY 17, 2022

Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code.

Data Management

Data Management Management Metadata MongoDB

Striim 5.0 Release: Unlock Real-Time Salesforce Integration with Powerful New Connectors

Striim

FEBRUARY 26, 2025

Bi-Directional Pipelines: Combining the Salesforce CDC Reader with Striims Snowflake CDC Reader creates the fastest bi-directional pipelines, keeping both Salesforce and Snowflake in sync. Unified Platform: Striim integrates Salesforce with other databases, providing a unified platform for handling all your Salesforce data pipelines.

BI

BI Cloud Data Pipeline Data Integration

Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle

Data Engineering Podcast

DECEMBER 18, 2022

Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold.

Data Lake

Data Lake Data Warehouse Data Pipeline MongoDB

Combining The Simplicity Of Spreadsheets With The Power Of Modern Data Infrastructure At Canvas

Data Engineering Podcast

JUNE 19, 2022

Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code.

Metadata

Metadata Unstructured Data MongoDB MySQL

Clean Up Your Data Using Scalable Entity Resolution And Data Mastering With Zingg

Data Engineering Podcast

NOVEMBER 6, 2022

Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudder Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code.

MongoDB

MongoDB MySQL Scala Machine Learning

Tame The Entropy In Your Data Stack And Prevent Failures With Sifflet

Data Engineering Podcast

NOVEMBER 20, 2022

Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold.

Data Lake

Data Lake Data Ingestion MongoDB MySQL

Self Service is Simply Efficient – Cloudera DataFlow Designer GA announcement

Cloudera

MARCH 14, 2023

Data leaders will be able to simplify and accelerate the development and deployment of data pipelines, saving time and money by enabling true self service. It is no secret that data leaders are under immense pressure. For more information or to see a demo, go to the DataFlow Product page.

Designing

Designing Data Pipeline Data Governance Government

Top 10 Data & AI Trends for 2025

Towards Data Science

DECEMBER 16, 2024

Its important to be able to talk about them, no one has had any success outside of a demo. Pipelines are expandingbut quality coverage isnt(Tomasz) At a dinner with a bunch of heads of AI, I asked how many people were satisfied with the quality of the outputs, and no one raised their hands.

Unstructured Data

Unstructured Data Data Food Data Engineering

Adopting Real-Time Data At Organizations Of Every Size

Data Engineering Podcast

DECEMBER 4, 2022

Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold.

Data Lake

Data Lake MongoDB MySQL Data Warehouse

How to activate the Snowflake Data Cloud with AI-Powered Analytics

ThoughtSpot

JUNE 23, 2023

AI-assisted data modeling on shared data workloads Data sharing is critical to inform decision making across the organization. Snowflake eliminates the data sharing complexities of traditional data pipelines —making data secure, governed, and easily ready to query.

Cloud

Cloud Programming Language Government Data Security

Building cost effective data pipelines with Python & DuckDB

8 Essential Data Pipeline Design Patterns You Should Know

Webinars

Trending Sources

Calling All Builders: Get Hands-On With AI and Apps

Webinars

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

Data Pipeline Observability: A Model For Data Engineers

A Guide to Data Pipelines (And How to Design One From Scratch)

Making Data Pipelines Self-Serve For Everyone With Shipyard

Analysing historical and live data with ksqlDB and Elastic Cloud

The Challenge of Data Quality and Availability—And Why It’s Holding Back AI and Analytics

Stream Rows and Kafka Topics Directly into Snowflake with Snowpipe Streaming

Snowflake Startup Challenge 2025: Meet the Top 10

An instant demo of data lineage is worth a thousand words

Applying Data Pipeline Principles in Practice: Exploring the Kafka Summit Keynote Demo

Demo: Supercharging Data Engineering with Magpie for Snowflake®

Monte Carlo and Databricks Partner to Deliver Data + AI Observability

What are Smart Data Pipelines? 9 Key Smart Data Pipelines Capabilities

Monte Carlo and Snowflake Partner to Provide Observability into Unstructured Data

Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams

Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems

What is a Data Pipeline (and 7 Must-Have Features of Modern Data Pipelines)

Monte Carlo Releases Mastering Data Quality And Your ABCs, World’s First-Ever Children’s Book on Data Quality

Complete Guide to Data Transformation: Basics to Advanced

How to analyze and resolve data pipeline incidents in Databand

Top 10 Data Engineering & AI Trends for 2025

Accelerate Development and Productivity with DevOps in Snowflake

How to create data pipeline and data quality SLA alerts in Databand

The Best Data Dictionary Tools in 2025

Best Practices for Real-Time Stream Processing

Unlocking Real-Time Decision-Making with High-Velocity Data Analytics

No Python, No SQL Templates, No YAML: Why Your Open Source Data Quality Tool Should Generate 80% Of Your Data Quality Tests Automatically

Real-Time AI for Crisis Management: Responding Faster with Smarter Systems

5 Takeaways from the Data Pipeline Automation Summit 2023

Data News — Week 24.28

Striim 5.0 Release: Introducing Stripe Reader Connector for Real-Time Payment Data Insights

Making The Total Cost Of Ownership For External Data Manageable With Crux

Striim 5.0 Release: Unlock Real-Time Salesforce Integration with Powerful New Connectors

Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle

Combining The Simplicity Of Spreadsheets With The Power Of Modern Data Infrastructure At Canvas

Clean Up Your Data Using Scalable Entity Resolution And Data Mastering With Zingg

Tame The Entropy In Your Data Stack And Prevent Failures With Sifflet

Self Service is Simply Efficient – Cloudera DataFlow Designer GA announcement

Top 10 Data & AI Trends for 2025

Adopting Real-Time Data At Organizations Of Every Size

How to activate the Snowflake Data Cloud with AI-Powered Analytics

Stay Connected