Data Pipeline and Data Workflow - Data Engineering Digest

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

Snowflake

APRIL 17, 2024

Since the previous Python connector API mostly communicated via SQL, it also hindered the ability to manage Snowflake objects natively in Python, restricting data pipeline efficiency and the ability to complete complex tasks. To get started, explore the comprehensive API documentation , which will guide you through every step.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineer

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Data Engineering Podcast

SEPTEMBER 11, 2022

Building reliable data pipelines is a complex and costly undertaking with many layered requirements. In order to reduce the amount of time and effort required to build pipelines that power critical insights Manish Jethani co-founded Hevo Data. Data stacks are becoming more and more complex.

Data Pipeline

Data Pipeline Building MongoDB MySQL

New Fivetran connector streamlines data workflows for real-time insights

ThoughtSpot

SEPTEMBER 6, 2023

Those coveted insights live at the end of a process lovingly known as the data pipeline. The pathway from ETL to actionable analytics can often feel disconnected and cumbersome, leading to frustration for data teams and long wait times for business users.

Data Workflow

Data Workflow Raw Data Data Lake Business Intelligence

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

The Emerging Role of AI Data Engineers - The New Strategic Role for AI-Driven Success

Data Engineering Weekly

JANUARY 15, 2025

As large language models (LLMs) and AI agents become indispensable in everything from customer service to autonomous vehicles, the ability to manage, analyze, and optimize unstructured data has become a strategic imperative. Billions of social media posts, hours of video content, and terabytes of sensor data are produced daily.

Data Engineering

Data Engineering Data Engineer Unstructured Data Engineering

A Guide to Data Pipelines (And How to Design One From Scratch)

Striim

SEPTEMBER 11, 2024

Data pipelines are the backbone of your business’s data architecture. Implementing a robust and scalable pipeline ensures you can effectively manage, analyze, and organize your growing data. We’ll answer the question, “What are data pipelines?” Table of Contents What are Data Pipelines?

Data Pipeline

Data Pipeline Designing Data Lake Data Warehouse

Making Data Pipelines Self-Serve For Everyone With Shipyard

Data Engineering Podcast

JUNE 1, 2021

Summary Every part of the business relies on data, yet only a small team has the context and expertise to build and maintain workflows and data pipelines to transform, clean, and integrate it. RudderStack’s smart customer data pipeline is warehouse-first. task-based, data assets, etc.)

Data Pipeline

Data Pipeline Data Warehouse Data Data Engineering

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Snowflake Ventures Invests in DataOps.live Bringing Advanced DevOps Capabilities to the AI Data Cloud

Snowflake

MARCH 24, 2025

Todays organizations recognize the importance of data-driven decision-making, but the process of setting up a data pipeline thats easy to use, easy to track and easy to trust continues to be a complex challenge. Snowflake and DataOps.lives integrated solutions simplify the development, testing and deployment of data workflows.

Cloud

Cloud Data Pipeline Data Workflow Data Engineer

How To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

As we look towards 2025, it’s clear that data teams must evolve to meet the demands of evolving technology and opportunities. In this blog post, we’ll explore key strategies that data teams should adopt to prepare for the year ahead. How effective are your current data workflows?

Data Pipeline

Data Pipeline Metadata Data Workflow Data

Data pipeline asset management with Dataflow

Netflix Tech

FEBRUARY 9, 2022

JAR) form to be executed as part of the user defined data pipeline. data pipeline ?—?a DAG) for the purpose of transforming data using some business logic. Netflix homegrown CLI tool for data pipeline management. workflow ?—?see SQL) or compiled (e.g. Dataflow ?—?Netflix namespace ?

Data Pipeline

Data Pipeline Management Scala Python

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Data Lake

Data Lake High Quality Data BI Data Workflow

6 Ways To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

As we look towards 2025, it’s clear that data teams must evolve to meet the demands of evolving technology and opportunities. In this blog post, we’ll explore key strategies that data teams should adopt to prepare for the year ahead. How effective are your current data workflows?

Data Pipeline

Data Pipeline Metadata Data Workflow Data

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!

Data Lake

Data Lake High Quality Data Machine Learning Data Pipeline

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Kafka

Kafka Data Lake High Quality Data SQL

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Non-relational Database

Non-relational Database Relational Database Database Designing

Airflow Alternatives for Data Orchestration

Analytics Vidhya

AUGUST 7, 2024

Introduction Apache Airflow is a crucial component in data orchestration and is known for its capability to handle intricate workflows and automate data pipelines. Many organizations have chosen it due to its flexibility and strong scheduling capabilities.

Data Pipeline

Data Pipeline Data Process Data Workflow

Complete Guide to Data Transformation: Basics to Advanced

Ascend.io

OCTOBER 28, 2024

Tools like Python’s requests library or ETL/ELT tools can facilitate data enrichment by automating the retrieval and merging of external data. Read More: Discover how to build a data pipeline in 6 steps Data Integration Data integration involves combining data from different sources into a single, unified view.

Raw Data

Raw Data Datasets Aggregated Data Data Pipeline

Stitching Together Enterprise Analytics With Microsoft Fabric

Data Engineering Podcast

JUNE 23, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

What are Smart Data Pipelines? 9 Key Smart Data Pipelines Capabilities

Striim

AUGUST 14, 2024

When implemented effectively, smart data pipelines seamlessly integrate data from diverse sources, enabling swift analysis and actionable insights. They empower data analysts and business users alike by providing critical information while protecting sensitive production systems. What is a Smart Data Pipeline?

Data Pipeline

Data Pipeline Data Architecture Transportation

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Dagster offers a new approach to building and running data platforms and data pipelines.

SQL

SQL Data Lake High Quality Data Machine Learning

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Lake

Data Lake High Quality Data Metadata Machine Learning

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!

Building

Building Data Lake High Quality Data Machine Learning

Introducing Snowflake Notebooks, an End-to-End Interactive Environment for Data & AI Teams

Snowflake

JUNE 6, 2024

Faster, easier AI/ML and data engineering workflows Explore, analyze and visualize data using Python and SQL. Discover valuable business insights through exploratory data analysis. Develop scalable data pipelines and transformations for data engineering.

SQL

SQL Python Machine Learning Data Workflow

Data Engineering Weekly #198

Data Engineering Weekly

NOVEMBER 24, 2024

By creating custom linting rules tailored to their team's needs, Next Insurance has improved its data workflows' maintainability, scalability, and quality, making it easier for engineers to collaborate and debug issues.

Data Engineering

Data Engineering Data Engineer Engineering Insurance

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data. Data lakes are notoriously complex.

Database

Database Data Lake High Quality Data Data Workflow

Top 10 Data Engineering Trends in 2025

Edureka

APRIL 22, 2025

This trend breaks down information silos within an organization so that more teams from different organizations can make decisions based on data without having to learn a lot of technical stuff. There are more and more problems with data engineering, which makes the job of a data engineering consultant more difficult.

Data Engineering

Data Engineering Data Engineer Engineering Consulting

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Dagster offers a new approach to building and running data platforms and data pipelines. Your first 30 days are free!

Project

Project Data Lake High Quality Data Data Workflow

What Is Data Pipeline Automation?

Ascend.io

MARCH 17, 2023

These engineering functions are almost exclusively concerned with data pipelines, spanning ingestion, transformation, orchestration, and observation — all the way to data product delivery to the business tools and downstream applications. Pipelines need to grow faster than the cost to run them.

Data Pipeline

Data Pipeline Datasets Data Software Engineering

What Is Data Pipeline Automation?

Ascend.io

MARCH 17, 2023

These engineering functions are almost exclusively concerned with data pipelines, spanning ingestion, transformation, orchestration, and observation — all the way to data product delivery to the business tools and downstream applications. Pipelines need to grow faster than the cost to run them.

Data Pipeline

Data Pipeline Datasets Data Software Engineering

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. We want interoperability for any data stored versus we have to think how to store the data in a specific node to optimize the processing. We want to have our hands free and be totally devoted to devops principles.

Technology

Technology Architecture Google Cloud Metadata

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Data Engineering Podcast

JUNE 30, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Pipeline-centric

Pipeline-centric Engineering Data Lake High Quality Data

Release Management For Data Platform Services And Logic

Data Engineering Podcast

MAY 12, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake.

Management

Management Data Lake High Quality Data Machine Learning

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!

Building

Building Data Lake High Quality Data Machine Learning

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

FEBRUARY 11, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Dagster offers a new approach to building and running data platforms and data pipelines.

Data Lake

Data Lake High Quality Data Government Machine Learning

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Summary The first step of data pipelines is to move the data to a place where you can process and prepare it for its eventual purpose. Data transfer systems are a critical component of data enablement, and building them to support large volumes of information is a complex endeavor. Datafold : ![Datafold]([link]

Systems

Systems Designing Data Lake SQL

4 Ways to Tackle Data Pipeline Optimization

Monte Carlo

FEBRUARY 8, 2024

Just as a watchmaker meticulously adjusts every tiny gear and spring in harmonious synchrony for flawless performance, modern data pipeline optimization requires a similar level of finesse and attention to detail. Learn how cost, processing speed, resilience, and data quality all contribute to effective data pipeline optimization.

Data Pipeline

Data Pipeline AWS Datasets Data

When And How To Conduct An AI Program

Data Engineering Podcast

MARCH 3, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!

Programming

Programming Data Lake High Quality Data Machine Learning

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!

Database

Database Technology Data Lake High Quality Data

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake.

Process

Process Data Lake High Quality Data Machine Learning

Data Engineering Weekly #196

Data Engineering Weekly

NOVEMBER 3, 2024

Data Engineering Weekly readers get 15% discount by registering the following link, [link] Gustavo Akashi: Building data pipelines effortlessly with a DAG Builder for Apache Airflow Every code-first data workflow grew into a UI-based or Yaml-based workflow.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Why You Shouldn’t Use Notebooks for Production Data Pipelines

Ascend.io

AUGUST 18, 2023

This not only jeopardizes the integrity and robustness of production environments but also compounds challenges for both data scientists and engineers. This article delves into the reasons behind our assertion: data science notebooks are not your best choice for production data pipelines. What Are Jupyter Notebooks?

Data Pipeline

Data Pipeline Programming Language Data Science Coding

Practical First Steps In Data Governance For Long Term Success

Data Engineering Podcast

JUNE 2, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Governance

Data Governance Government Data Lake High Quality Data

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Process

Data Process Process Data Lake High Quality Data

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Webinars

Trending Sources

New Fivetran connector streamlines data workflows for real-time insights

Webinars

The Emerging Role of AI Data Engineers - The New Strategic Role for AI-Driven Success

A Guide to Data Pipelines (And How to Design One From Scratch)

Making Data Pipelines Self-Serve For Everyone With Shipyard

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Snowflake Ventures Invests in DataOps.live Bringing Advanced DevOps Capabilities to the AI Data Cloud

How To Prepare Your Data Team for 2025

Data pipeline asset management with Dataflow

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

6 Ways To Prepare Your Data Team for 2025

Making Email Better With AI At Shortwave

Troubleshooting Kafka In Production

Designing A Non-Relational Database Engine

Airflow Alternatives for Data Orchestration

Complete Guide to Data Transformation: Basics to Advanced

Stitching Together Enterprise Analytics With Microsoft Fabric

What are Smart Data Pipelines? 9 Key Smart Data Pipelines Capabilities

Tackling Real Time Streaming Data With SQL Using RisingWave

Being Data Driven At Stripe With Trino And Iceberg

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Introducing Snowflake Notebooks, an End-to-End Interactive Environment for Data & AI Teams

Data Engineering Weekly #198

Reconciling The Data In Your Databases With Datafold

Top 10 Data Engineering Trends in 2025

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

What Is Data Pipeline Automation?

What Is Data Pipeline Automation?

Toward a Data Mesh (part 2) : Architecture & Technologies

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Release Management For Data Platform Services And Logic

Build Your Second Brain One Piece At A Time

Data Sharing Across Business And Platform Boundaries

Designing Data Transfer Systems That Scale

4 Ways to Tackle Data Pipeline Optimization

When And How To Conduct An AI Program

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Weekly #196

Why You Shouldn’t Use Notebooks for Production Data Pipelines

Practical First Steps In Data Governance For Long Term Success

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Stay Connected