Data Workflow - Data Engineering Digest

Managing Uber’s Data Workflows at Scale

Uber Engineering

FEBRUARY 28, 2019

At Uber’s scale, thousands of microservices serve millions of rides and deliveries a day, generating more than a hundred petabytes of raw data. Internally, engineering and data teams across the company leverage this data to improve the Uber experience.

Data Workflow

Data Workflow Management Raw Data Data

11 Data Engineering Best Practices To Streamline Your Data Workflows

ProjectPro

JUNE 6, 2025

These practices are crucial for building robust and scalable data pipelines, maintaining data quality, and enabling data-driven decision-making. Let us dive into some of the crucial best practices for data engineering that data engineers must implement in their data workflows and projects.

Data Workflow

Data Workflow Data Engineering Data Engineer Data Cleanse

New Fivetran connector streamlines data workflows for real-time insights

ThoughtSpot

SEPTEMBER 6, 2023

The pathway from ETL to actionable analytics can often feel disconnected and cumbersome, leading to frustration for data teams and long wait times for business users. And even when we manage to streamline the data workflow, those insights aren’t always accessible to users unfamiliar with antiquated business intelligence tools.

Data Workflow

Data Workflow Raw Data Data Lake Business Intelligence

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

DBT vs. Airflow-Which is The Best Tool for Your Data Workflows?

ProjectPro

JUNE 6, 2025

Airflow and DBT both have the overall purpose of helping teams in providing reliable data to the users with whom they interact by using a standard interface.

Data Workflow

Data Workflow Programming Language SQL Data Pipeline

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

Speaker: Tamara Fingerlin, Developer Advocate

As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. With the 3.0

Data Workflow

Introducing WorkflowGuard: The Workflow Governance and Observability System That Oversees over 120,000 Data Workflows

Uber Engineering

JANUARY 16, 2023

Our Data Workflow Platform team introduces WorkflowGuard: a new service to govern executions, prioritize resources, and manage life cycle for repetitive data jobs. Check out how it improved workflow reliability and cost efficiency while bringing more observability to users.

Data Workflow

Data Workflow Government Systems Data

Scale Unstructured Text Analytics with Batch LLM Inference

Snowflake

MARCH 6, 2025

And to create significant technology and team efficiencies, organizations need to consider opportunities to integrate LLM pipelines with existing structured data workflows. This unification can also empower data engineers, who already manage structured pipelines, to easily onboard and maintain unstructured data workflows.

Unstructured Data

Unstructured Data Medical Media Data Workflow

The Emerging Role of AI Data Engineers - The New Strategic Role for AI-Driven Success

Data Engineering Weekly

JANUARY 15, 2025

As large language models (LLMs) and AI agents become indispensable in everything from customer service to autonomous vehicles, the ability to manage, analyze, and optimize unstructured data has become a strategic imperative. Billions of social media posts, hours of video content, and terabytes of sensor data are produced daily.

Data Engineering

Data Engineering Data Engineer Unstructured Data Engineering

Startup Spotlight: How ROE AI Empowers Data Teams

Snowflake

MARCH 26, 2025

This means enterprises can run unstructured data workflows, powered by AI agents, without moving data out of Snowflake which enhances trust and helps support compliance. First, Snowflake has enabled us to strengthen user trust in our app. Second, were optimizing scalability.

Unstructured Data

Unstructured Data SQL Data Data Workflow

5 Hidden Gem Python Libraries for Data Science

KDnuggets

SEPTEMBER 9, 2024

Exploring the not-so-famous data science libraries that can be useful in your data workflow.

Data Science

Data Science Python Data Workflow Data

Utilizing Pandas AI for Data Analysis

KDnuggets

APRIL 16, 2024

Bring the latest AI implementation to Pandas to improve your data workflow.

Utilities

Utilities Data Analysis Data Workflow Data

Cloudera announces ‘Interoperability Ecosystem’ with founding members AWS and Snowflake

Cloudera

DECEMBER 4, 2024

Our joint collaboration will enable the following: Seamless Data Sharing and Interoperability : The integration enables AWS customers to leverage Cloudera’s data lakehouse capabilities alongside Snowflake’s AI Data Cloud, facilitating unified data access and sharing across platforms Enhanced AI/ML Performance : The partnership optimizes data workflows (..)

AWS

AWS Raw Data Relational Database Government

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Data Lake

Data Lake High Quality Data BI Data Workflow

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

A Tour of Python NLP Libraries

KDnuggets

JUNE 17, 2024

Exploring the available text Python packages for your data workflow.

Python

Python Data Workflow Data

Airflow vs Dagster: Comparing Two Data Orchestration Solutions

ProjectPro

JUNE 6, 2025

Dagster vs Airflow: Overview Dagster and Airflow are two popular open-source tools that have emerged as leaders in data orchestration. They are often compared because of their shared goal of automating data workflows and widespread adoption in the data engineering community. What is Airflow? What is Dagster?

Pipeline-centric

Pipeline-centric Database-centric Data Workflow Data Pipeline

Snowflake Ventures Invests in DataOps.live Bringing Advanced DevOps Capabilities to the AI Data Cloud

Snowflake

MARCH 24, 2025

DataOps.live keeps users at the forefront of data engineering DataOps.live works together with Snowflake to augment and extend native Snowflake features, resulting in advanced DataOps workflows for Snowflake customers. Snowflake and DataOps.lives integrated solutions simplify the development, testing and deployment of data workflows.

Cloud

Cloud Data Pipeline Data Workflow Data Engineering

10+ Top Data Pipeline Tools to Streamline Your Data Journey

ProjectPro

JUNE 6, 2025

Open Source Data Pipeline Tools Open-source data pipeline tools are pivotal in data engineering, offering organizations flexible and scalable solutions for managing the end-to-end data workflow. Here is the list of robust Data Pipeline Tools in Azure for scalable and optimized management of diverse data sources.

Data Pipeline

Data Pipeline Google Cloud AWS Kafka

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Kafka

Kafka Data Lake High Quality Data SQL

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Non-relational Database

Non-relational Database Relational Database Database Designing

5 Lesser-Known Data Transformation Techniques for Better Analysis

KDnuggets

OCTOBER 22, 2024

Utilize these transformation techniques in your data workflow.

Data Workflow

Data Workflow Utilities Data

10 Advanced Python Tricks for Data Scientists

KDnuggets

JANUARY 27, 2025

Master cleaner, faster code with these essential techniques to supercharge your data workflows.

Python

Python Data Workflow Data Coding

5 Free Courses to Master Data Engineering

KDnuggets

NOVEMBER 30, 2023

Data engineers must prepare and manage the infrastructure and tools necessary for the whole data workflow in a data-driven company.

Data Engineering

Data Engineering Data Engineer Engineering Data Workflow

How to Learn Airflow From Scratch in 2025?

ProjectPro

JUNE 6, 2025

It's like the ultimate solution for managing and automating big data workflows. Did you know 93% of seasoned Airflow users are willing to recommend this powerful data orchestration tool. Businesses from various sectors leverage it to manage and automate massive data workflows seamlessly. Crazy, right? stars and 13.4k

PostgreSQL

PostgreSQL Metadata MySQL Data Workflow

How To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

Deploy DataOps DataOps , or Data Operations, is an approach that applies the principles of DevOps to data management. It aims to streamline and automate data workflows, enhance collaboration and improve the agility of data teams. How effective are your current data workflows?

Data Pipeline

Data Pipeline Metadata Data Workflow Data

Stitching Together Enterprise Analytics With Microsoft Fabric

Data Engineering Podcast

JUNE 23, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Microsoft Azure Data Factory Training Free For Beginners

ProjectPro

JUNE 6, 2025

Data engineers gain insights into pipeline performance, data movement, and potential bottlenecks. This skill is crucial for maintaining smooth data workflows and ensuring data integrity. This phase also underscores the seamless integration of Azure Data Factory with a range of Azure services.

Data Lake

Data Lake Cloud Computing Data Workflow Data Pipeline

Top Data Python Packages to Know in 2023

KDnuggets

JANUARY 4, 2023

These Python packages would improve your data workflow.

Python

Python Data Workflow Data Data Science

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Lake

Data Lake High Quality Data Metadata Machine Learning

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication. Your first 30 days are free! Your first 30 days are free!

Project

Project Data Lake High Quality Data Data Workflow

How to Build AI Agents with Phidata?

ProjectPro

JUNE 6, 2025

However, creating and deploying these agents often involves challenges such as managing complex data workflows, integrating machine learning models, and ensuring scalability across operations. Phidata offers better support for integration with external data sources, whereas CrewAI focuses on refining AI pipelines within its ecosystem.

Building

Building Data Workflow Python Data Pipeline

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Data Pipeline Machine Learning

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data. Data lakes are notoriously complex.

Database

Database Data Lake High Quality Data Data Workflow

Data Engineering Weekly #198

Data Engineering Weekly

NOVEMBER 24, 2024

By creating custom linting rules tailored to their team's needs, Next Insurance has improved its data workflows' maintainability, scalability, and quality, making it easier for engineers to collaborate and debug issues.

Data Engineering

Data Engineering Data Engineer Engineering Insurance

KDnuggets News, December 6: GitHub Repositories to Master Machine Learning • 5 Free Courses to Master Data Engineering

KDnuggets

DECEMBER 6, 2023

This week on KDnuggets: Discover GitHub repositories from machine learning courses, bootcamps, books, tools, interview questions, cheat sheets, MLOps platforms, and more to master ML and secure your dream job • Data engineers must prepare and manage the infrastructure and tools necessary for the whole data workflow in a data-driven company • And much, (..)

Machine Learning

Machine Learning Data Engineering Data Engineer Engineering

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues for every part of your data workflow, from migration to deployment. Datafold has recently launched a 3-in-1 product experience to support accelerated data migrations. Datafold : ![Datafold]([link]

Systems

Systems Designing Data Lake SQL

6 Ways To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

Deploy DataOps DataOps , or Data Operations, is an approach that applies the principles of DevOps to data management. It aims to streamline and automate data workflows, enhance collaboration and improve the agility of data teams. How effective are your current data workflows?

Data Pipeline

Data Pipeline Metadata Data Workflow Data

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Data Engineering Podcast

JUNE 30, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Pipeline-centric

Pipeline-centric Engineering Data Lake High Quality Data

What Is a Lakebase?

databricks

JUNE 11, 2025

Deeply integrated with the lakehouse, Lakebase simplifies operational data workflows. It eliminates fragile ETL pipelines and complex infrastructure, enabling teams to move faster and deliver intelligent applications on a unified data platform In this blog, we propose a new architecture for OLTP databases called a lakebase.

Entertainment

Entertainment Data Lake Manufacturing Retail

What is Apache Airflow?

RudderStack

JUNE 6, 2025

Apache Airflow addresses the need for a robust, scalable, and flexible solution for orchestrating data workflows. What is Apache Airflow?

Data Workflow

Data Workflow Data

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

SQL

SQL Data Lake High Quality Data Kafka

Complete Guide to Data Transformation: Basics to Advanced

Ascend.io

OCTOBER 28, 2024

Read More: Snowflake Snowpark: Overview, Benefits, and How to Harness Its Power Best Practices in Data Transformation Implementing best practices in data transformation is essential to maintain high-quality, consistent, and secure data workflows.

Raw Data

Raw Data Aggregated Data Data Pipeline Data Validation

Building Linked Data Products With JSON-LD

Data Engineering Podcast

SEPTEMBER 17, 2023

Can you describe the workflow for building autonomous linkages across data assets that are modelled as JSON-LD? What are the most interesting, innovative, or unexpected ways that you have seen JSON-LD used for data workflows? When is JSON-LD the wrong choice? When is JSON-LD the wrong choice?

Building

Building SQL BI Python

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. As you can see, this is in the code part where you are building your data pipelines, a misnomer because this is an over simplification. The other benefit is you can also use parameters and build a generic workflows to be re-used.

Technology

Technology Architecture Google Cloud Metadata

Managing Uber’s Data Workflows at Scale

11 Data Engineering Best Practices To Streamline Your Data Workflows

Webinars

Trending Sources

New Fivetran connector streamlines data workflows for real-time insights

Webinars

DBT vs. Airflow-Which is The Best Tool for Your Data Workflows?

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

Introducing WorkflowGuard: The Workflow Governance and Observability System That Oversees over 120,000 Data Workflows

Scale Unstructured Text Analytics with Batch LLM Inference

The Emerging Role of AI Data Engineers - The New Strategic Role for AI-Driven Success

Startup Spotlight: How ROE AI Empowers Data Teams

5 Hidden Gem Python Libraries for Data Science

Utilizing Pandas AI for Data Analysis

Cloudera announces ‘Interoperability Ecosystem’ with founding members AWS and Snowflake

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

A Tour of Python NLP Libraries

Airflow vs Dagster: Comparing Two Data Orchestration Solutions

Snowflake Ventures Invests in DataOps.live Bringing Advanced DevOps Capabilities to the AI Data Cloud

10+ Top Data Pipeline Tools to Streamline Your Data Journey

Troubleshooting Kafka In Production

Designing A Non-Relational Database Engine

5 Lesser-Known Data Transformation Techniques for Better Analysis

10 Advanced Python Tricks for Data Scientists

5 Free Courses to Master Data Engineering

How to Learn Airflow From Scratch in 2025?

How To Prepare Your Data Team for 2025

Stitching Together Enterprise Analytics With Microsoft Fabric

Microsoft Azure Data Factory Training Free For Beginners

Top Data Python Packages to Know in 2023

Being Data Driven At Stripe With Trino And Iceberg

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

How to Build AI Agents with Phidata?

Making Email Better With AI At Shortwave

Reconciling The Data In Your Databases With Datafold

Data Engineering Weekly #198

KDnuggets News, December 6: GitHub Repositories to Master Machine Learning • 5 Free Courses to Master Data Engineering

Designing Data Transfer Systems That Scale

6 Ways To Prepare Your Data Team for 2025

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

What Is a Lakebase?

What is Apache Airflow?

Tackling Real Time Streaming Data With SQL Using RisingWave

Complete Guide to Data Transformation: Basics to Advanced

Building Linked Data Products With JSON-LD

Toward a Data Mesh (part 2) : Architecture & Technologies

Stay Connected