Data Workflow - Data Engineering Digest

Managing Uber’s Data Workflows at Scale

Uber Engineering

FEBRUARY 28, 2019

At Uber’s scale, thousands of microservices serve millions of rides and deliveries a day, generating more than a hundred petabytes of raw data. Internally, engineering and data teams across the company leverage this data to improve the Uber experience.

Data Workflow

Data Workflow Management Raw Data Data

New Fivetran connector streamlines data workflows for real-time insights

ThoughtSpot

SEPTEMBER 6, 2023

The pathway from ETL to actionable analytics can often feel disconnected and cumbersome, leading to frustration for data teams and long wait times for business users. And even when we manage to streamline the data workflow, those insights aren’t always accessible to users unfamiliar with antiquated business intelligence tools.

Data Workflow

Data Workflow Raw Data Data Lake Business Intelligence

Introducing WorkflowGuard: The Workflow Governance and Observability System That Oversees over 120,000 Data Workflows

Uber Engineering

JANUARY 16, 2023

Our Data Workflow Platform team introduces WorkflowGuard: a new service to govern executions, prioritize resources, and manage life cycle for repetitive data jobs. Check out how it improved workflow reliability and cost efficiency while bringing more observability to users.

Data Workflow

Data Workflow Government Systems Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Scale Unstructured Text Analytics with Batch LLM Inference

Snowflake

MARCH 6, 2025

And to create significant technology and team efficiencies, organizations need to consider opportunities to integrate LLM pipelines with existing structured data workflows. This unification can also empower data engineers, who already manage structured pipelines, to easily onboard and maintain unstructured data workflows.

Unstructured Data

Unstructured Data Medical Media Data Workflow

The Emerging Role of AI Data Engineers - The New Strategic Role for AI-Driven Success

Data Engineering Weekly

JANUARY 15, 2025

As large language models (LLMs) and AI agents become indispensable in everything from customer service to autonomous vehicles, the ability to manage, analyze, and optimize unstructured data has become a strategic imperative. Billions of social media posts, hours of video content, and terabytes of sensor data are produced daily.

Data Engineering

Data Engineering Data Engineer Unstructured Data Engineering

5 Hidden Gem Python Libraries for Data Science

KDnuggets

SEPTEMBER 9, 2024

Exploring the not-so-famous data science libraries that can be useful in your data workflow.

Data Science

Data Science Python Data Workflow Data

Utilizing Pandas AI for Data Analysis

KDnuggets

APRIL 16, 2024

Bring the latest AI implementation to Pandas to improve your data workflow.

Utilities

Utilities Data Analysis Data Workflow Data

Startup Spotlight: How ROE AI Empowers Data Teams

Snowflake

MARCH 26, 2025

This means enterprises can run unstructured data workflows, powered by AI agents, without moving data out of Snowflake which enhances trust and helps support compliance. First, Snowflake has enabled us to strengthen user trust in our app. Second, were optimizing scalability.

Unstructured Data

Unstructured Data SQL Data Data Workflow

A Tour of Python NLP Libraries

KDnuggets

JUNE 17, 2024

Exploring the available text Python packages for your data workflow.

Python

Python Data Workflow Data

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Data Lake

Data Lake High Quality Data BI Data Workflow

5 Free Courses to Master Data Engineering

KDnuggets

NOVEMBER 30, 2023

Data engineers must prepare and manage the infrastructure and tools necessary for the whole data workflow in a data-driven company.

Data Engineering

Data Engineering Data Engineer Engineering Data Workflow

Snowflake Ventures Invests in DataOps.live Bringing Advanced DevOps Capabilities to the AI Data Cloud

Snowflake

MARCH 24, 2025

DataOps.live keeps users at the forefront of data engineering DataOps.live works together with Snowflake to augment and extend native Snowflake features, resulting in advanced DataOps workflows for Snowflake customers. Snowflake and DataOps.lives integrated solutions simplify the development, testing and deployment of data workflows.

Cloud

Cloud Data Pipeline Data Workflow Data Engineering

10 Advanced Python Tricks for Data Scientists

KDnuggets

JANUARY 27, 2025

Master cleaner, faster code with these essential techniques to supercharge your data workflows.

Python

Python Data Workflow Data Coding

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

5 Lesser-Known Data Transformation Techniques for Better Analysis

KDnuggets

OCTOBER 22, 2024

Utilize these transformation techniques in your data workflow.

Data Workflow

Data Workflow Utilities Data

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Non-relational Database

Non-relational Database Relational Database Database Designing

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Kafka

Kafka Data Lake High Quality Data SQL

How To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

Deploy DataOps DataOps , or Data Operations, is an approach that applies the principles of DevOps to data management. It aims to streamline and automate data workflows, enhance collaboration and improve the agility of data teams. How effective are your current data workflows?

Data Pipeline

Data Pipeline Metadata Data Workflow Data

Stitching Together Enterprise Analytics With Microsoft Fabric

Data Engineering Podcast

JUNE 23, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Top Data Python Packages to Know in 2023

KDnuggets

JANUARY 4, 2023

These Python packages would improve your data workflow.

Python

Python Data Workflow Data Data Science

KDnuggets News, December 6: GitHub Repositories to Master Machine Learning • 5 Free Courses to Master Data Engineering

KDnuggets

DECEMBER 6, 2023

This week on KDnuggets: Discover GitHub repositories from machine learning courses, bootcamps, books, tools, interview questions, cheat sheets, MLOps platforms, and more to master ML and secure your dream job • Data engineers must prepare and manage the infrastructure and tools necessary for the whole data workflow in a data-driven company • And much, (..)

Machine Learning

Machine Learning Data Engineering Data Engineer Engineering

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication. Your first 30 days are free! Your first 30 days are free!

Project

Project Data Lake High Quality Data Data Workflow

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Lake

Data Lake High Quality Data Metadata Machine Learning

Data Engineering Weekly #198

Data Engineering Weekly

NOVEMBER 24, 2024

By creating custom linting rules tailored to their team's needs, Next Insurance has improved its data workflows' maintainability, scalability, and quality, making it easier for engineers to collaborate and debug issues.

Data Engineering

Data Engineering Data Engineer Engineering Insurance

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data. Data lakes are notoriously complex.

Database

Database Data Lake High Quality Data Data Workflow

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Machine Learning Data Pipeline

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues for every part of your data workflow, from migration to deployment. Datafold has recently launched a 3-in-1 product experience to support accelerated data migrations. Datafold : ![Datafold]([link]

Systems

Systems Designing Data Lake SQL

6 Ways To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

Deploy DataOps DataOps , or Data Operations, is an approach that applies the principles of DevOps to data management. It aims to streamline and automate data workflows, enhance collaboration and improve the agility of data teams. How effective are your current data workflows?

Data Pipeline

Data Pipeline Metadata Data Workflow Data

Introducing Snowflake Notebooks, an End-to-End Interactive Environment for Data & AI Teams

Snowflake

JUNE 6, 2024

Get more out of your data: Top use cases for Snowflake Notebooks To see what’s possible and change how you interact with Snowflake data, check out the various use cases you can achieve in a single interface: Integrated data analysis: Manage your entire data workflow within a single, intuitive environment.

SQL

SQL Python Machine Learning Data Workflow

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Data Engineering Podcast

JUNE 30, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Pipeline-centric

Pipeline-centric Engineering Data Lake High Quality Data

Release Management For Data Platform Services And Logic

Data Engineering Podcast

MAY 12, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake.

Management

Management Data Lake High Quality Data Machine Learning

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

SQL

SQL Data Lake High Quality Data Machine Learning

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake.

Process

Process Data Lake High Quality Data Machine Learning

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. As you can see, this is in the code part where you are building your data pipelines, a misnomer because this is an over simplification. The other benefit is you can also use parameters and build a generic workflows to be re-used.

Technology

Technology Architecture Google Cloud Metadata

Complete Guide to Data Transformation: Basics to Advanced

Ascend.io

OCTOBER 28, 2024

Read More: Snowflake Snowpark: Overview, Benefits, and How to Harness Its Power Best Practices in Data Transformation Implementing best practices in data transformation is essential to maintain high-quality, consistent, and secure data workflows.

Raw Data

Raw Data Datasets Aggregated Data Data Pipeline

Data Engineering Weekly #196

Data Engineering Weekly

NOVEMBER 3, 2024

Data Engineering Weekly readers get 15% discount by registering the following link, [link] Gustavo Akashi: Building data pipelines effortlessly with a DAG Builder for Apache Airflow Every code-first data workflow grew into a UI-based or Yaml-based workflow.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Building Linked Data Products With JSON-LD

Data Engineering Podcast

SEPTEMBER 17, 2023

Can you describe the workflow for building autonomous linkages across data assets that are modelled as JSON-LD? What are the most interesting, innovative, or unexpected ways that you have seen JSON-LD used for data workflows? When is JSON-LD the wrong choice? When is JSON-LD the wrong choice?

Building

Building SQL BI Python

Practical First Steps In Data Governance For Long Term Success

Data Engineering Podcast

JUNE 2, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Governance

Data Governance Government Data Lake High Quality Data

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

How to Automate PySpark Pipelines on AWS EMR With Airflow

Towards Data Science

AUGUST 22, 2023

Optimising big data workflows orchestration Continue reading on Towards Data Science »

AWS

AWS Big Data Data Science Data Workflow

The Scoop: Tech Layoffs in 2022

The Pragmatic Engineer

NOVEMBER 16, 2022

7 November Domino Data Lab (data workflow platform, Series E). In Mexico, the company had a large initiative with private drivers and a fleet of Tesla cars. That operation is in talks of selling to other investors. Meta - 13% layoffs. Widely-reported, e.g. here. 25% of their workforce laid off. Verified.

Healthcare

Healthcare Software Engineering Software Engineer Data Workflow

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Building

Building Data Lake High Quality Data Machine Learning

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Process

Data Process Process Data Lake High Quality Data

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

JANUARY 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Lake

Data Lake Building High Quality Data AWS

Managing Uber’s Data Workflows at Scale

New Fivetran connector streamlines data workflows for real-time insights

Webinars

Trending Sources

Introducing WorkflowGuard: The Workflow Governance and Observability System That Oversees over 120,000 Data Workflows

Webinars

Scale Unstructured Text Analytics with Batch LLM Inference

The Emerging Role of AI Data Engineers - The New Strategic Role for AI-Driven Success

5 Hidden Gem Python Libraries for Data Science

Utilizing Pandas AI for Data Analysis

Startup Spotlight: How ROE AI Empowers Data Teams

A Tour of Python NLP Libraries

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

5 Free Courses to Master Data Engineering

Snowflake Ventures Invests in DataOps.live Bringing Advanced DevOps Capabilities to the AI Data Cloud

10 Advanced Python Tricks for Data Scientists

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

5 Lesser-Known Data Transformation Techniques for Better Analysis

Designing A Non-Relational Database Engine

Troubleshooting Kafka In Production

How To Prepare Your Data Team for 2025

Stitching Together Enterprise Analytics With Microsoft Fabric

Top Data Python Packages to Know in 2023

KDnuggets News, December 6: GitHub Repositories to Master Machine Learning • 5 Free Courses to Master Data Engineering

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Weekly #198

Reconciling The Data In Your Databases With Datafold

Making Email Better With AI At Shortwave

Designing Data Transfer Systems That Scale

6 Ways To Prepare Your Data Team for 2025

Introducing Snowflake Notebooks, an End-to-End Interactive Environment for Data & AI Teams

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Release Management For Data Platform Services And Logic

Tackling Real Time Streaming Data With SQL Using RisingWave

X-Ray Vision For Your Flink Stream Processing With Datorios

Toward a Data Mesh (part 2) : Architecture & Technologies

Complete Guide to Data Transformation: Basics to Advanced

Data Engineering Weekly #196

Building Linked Data Products With JSON-LD

Practical First Steps In Data Governance For Long Term Success

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

How to Automate PySpark Pipelines on AWS EMR With Airflow

The Scoop: Tech Layoffs in 2022

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Build A Data Lake For Your Security Logs With Scanner

Stay Connected