Data Engineer and Data Workflow - Data Engineering Digest

Data Engineering Weekly #198

Data Engineering Weekly

NOVEMBER 24, 2024

Editor’s Note: Launching Data & Gen-AI courses in 2025 I can’t believe DEW will reach almost its 200th edition soon. What I started as a fun hobby has become one of the top-rated newsletters in the data engineering industry. We are planning many exciting product lines to trial and launch in 2025.

Data Engineering

Data Engineering Data Engineer Engineering Insurance

The Emerging Role of AI Data Engineers - The New Strategic Role for AI-Driven Success

Data Engineering Weekly

JANUARY 15, 2025

The Critical Role of AI Data Engineers in a Data-Driven World How does a chatbot seamlessly interpret your questions? The answer lies in unstructured data processing—a field that powers modern artificial intelligence (AI) systems. How does a self-driving car understand a chaotic street scene?

Data Engineering

Data Engineering Data Engineer Unstructured Data Engineering

Top 10 Data Engineering Trends in 2025

Edureka

APRIL 22, 2025

Data engineering can help with it. It is the force behind seamless data flow, enabling everything from AI-driven automation to real-time analytics. Key Trends in Data Engineering for 2025 In the fast-paced world of technology, data engineering services keep companies that focus on data running.

Data Engineering

Data Engineering Data Engineer Engineering Consulting

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

5 Free Courses to Master Data Engineering

KDnuggets

NOVEMBER 30, 2023

Data engineers must prepare and manage the infrastructure and tools necessary for the whole data workflow in a data-driven company.

Data Engineering

Data Engineering Data Engineer Engineering Data Workflow

Data Engineering Weekly #196

Data Engineering Weekly

NOVEMBER 3, 2024

Data Engineering Weekly readers get 15% discount by registering the following link, [link] Gustavo Akashi: Building data pipelines effortlessly with a DAG Builder for Apache Airflow Every code-first data workflow grew into a UI-based or Yaml-based workflow.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Data Engineering Weekly #214

Data Engineering Weekly

MARCH 30, 2025

Save Your Spot → Editor’s Note: Data Council 2025, Apr 22-24, Oakland, CA Data Council has always been one of my favorite events to connect with and learn from the data engineering community. Data Council 2025 is set for April 22-24 in Oakland, CA. link] BVP: Roadmap: Data 3.0

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Data Engineering Weekly #206

Data Engineering Weekly

FEBRUARY 2, 2025

Moreover, advancements in hardware and the economics of cloud pricing further support the case for single-node processing, offering simplified architecture, better resource utilization, and seamless integration with modern data workflows. link] All rights reserved ProtoGrowth Inc, India.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

KDnuggets News, December 6: GitHub Repositories to Master Machine Learning • 5 Free Courses to Master Data Engineering

KDnuggets

DECEMBER 6, 2023

This week on KDnuggets: Discover GitHub repositories from machine learning courses, bootcamps, books, tools, interview questions, cheat sheets, MLOps platforms, and more to master ML and secure your dream job • Data engineers must prepare and manage the infrastructure and tools necessary for the whole data workflow in a data-driven company • And much, (..)

Machine Learning

Machine Learning Data Engineering Data Engineer Engineering

New Fivetran connector streamlines data workflows for real-time insights

ThoughtSpot

SEPTEMBER 6, 2023

The pathway from ETL to actionable analytics can often feel disconnected and cumbersome, leading to frustration for data teams and long wait times for business users. And even when we manage to streamline the data workflow, those insights aren’t always accessible to users unfamiliar with antiquated business intelligence tools.

Data Workflow

Data Workflow Raw Data Data Lake Business Intelligence

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

Snowflake

APRIL 17, 2024

This traditional SQL-centric approach often challenged data engineers working in a Python environment, requiring context-switching and limiting the full potential of Python’s rich libraries and frameworks. To get started, explore the comprehensive API documentation , which will guide you through every step.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineer

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

In this episode Dain Sundstrom, CTO of Starburst, explains how the combination of the Trino query engine and the Iceberg table format offer the ease of use and execution speed of data warehouses with the infinite storage and scalability of data lakes. Data lakes are notoriously complex. Your first 30 days are free!

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Effective Pandas Patterns For Data Engineering

Data Engineering Podcast

JANUARY 30, 2022

Summary Pandas is a powerful tool for cleaning, transforming, manipulating, or enriching data, among many other potential uses. As a result it has become a standard tool for data engineers for a wide range of applications. What are the main tasks that you have seen Pandas used for in a data engineering context?

Data Engineering

Data Engineering Data Engineer Engineering Python

Scale Unstructured Text Analytics with Batch LLM Inference

Snowflake

MARCH 6, 2025

And to create significant technology and team efficiencies, organizations need to consider opportunities to integrate LLM pipelines with existing structured data workflows. This unification can also empower data engineers, who already manage structured pipelines, to easily onboard and maintain unstructured data workflows.

Unstructured Data

Unstructured Data Medical Media Data Workflow

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Kafka

Kafka Data Lake High Quality Data SQL

Snowflake Ventures Invests in DataOps.live Bringing Advanced DevOps Capabilities to the AI Data Cloud

Snowflake

MARCH 24, 2025

Reducing time to success allows organizations to see immediate value from their data investments and scale up productivity. Our investment in DataOps.live , a SaaS platform for data engineering and operations, will help Snowflake users accelerate that timeline.

Cloud

Cloud Data Pipeline Data Workflow Data Engineer

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Summary Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. Data lakes are notoriously complex. Your first 30 days are free!

Data Lake

Data Lake High Quality Data BI Data Workflow

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

Data Engineering Podcast

AUGUST 28, 2022

Summary The dream of every engineer is to automate all of their tasks. For data engineers, this is a monumental undertaking. Orchestration engines are one step in that direction, but they are not a complete solution. The only thing worse than having bad data is not knowing that you have it.

Data Engineering

Data Engineering Data Engineer MongoDB Metadata

Stitching Together Enterprise Analytics With Microsoft Fabric

Data Engineering Podcast

JUNE 23, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. What are the benefits of embedding Copilot into the data engine? What are the benefits of embedding Copilot into the data engine?

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

How to Become a Data Engineer in 2024?

Knowledge Hut

DECEMBER 26, 2023

Data Engineering is typically a software engineering role that focuses deeply on data – namely, data workflows, data pipelines, and the ETL (Extract, Transform, Load) process. What is Data Science? What are the roles and responsibilities of a Data Engineer? And many more.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

How To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

Another significant challenge is the reactive nature of operations within many data teams. Instead of driving innovation, data engineers often find themselves bogged down with maintenance tasks. On average, engineers spend over half of their time maintaining existing systems rather than developing new solutions.

Data Pipeline

Data Pipeline Metadata Data Workflow Data

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication. Data lakes are notoriously complex.

Non-relational Database

Non-relational Database Relational Database Database Designing

Complete Guide to Data Transformation: Basics to Advanced

Ascend.io

OCTOBER 28, 2024

You might even think of effective data transformation like a powerful magnet that draws the needle from the stack, leaving the hay behind. In this blog post, we’ll explore fundamental concepts, intermediate strategies, and cutting-edge techniques that are shaping the future of data engineering.

Raw Data

Raw Data Datasets Aggregated Data Data Pipeline

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.

Data Lake

Data Lake High Quality Data Machine Learning Data Pipeline

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform. Go to [dataengineeringpodcast.com/starburst]([link] Support Data Engineering Podcast

Data Lake

Data Lake High Quality Data Metadata Machine Learning

Building Linked Data Products With JSON-LD

Data Engineering Podcast

SEPTEMBER 17, 2023

Summary A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. Linked data technologies provide a means of tightly coupling metadata with raw information.

Building

Building SQL BI Python

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

SQL

SQL Data Lake High Quality Data Machine Learning

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Data Engineering Podcast

JUNE 30, 2024

He highlights the role of data teams in modern organizations and how Synq is empowering them to achieve this. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Tooling only plays a small part in SLAs and incident management.

Pipeline-centric

Pipeline-centric Engineering Data Lake High Quality Data

Release Management For Data Platform Services And Logic

Data Engineering Podcast

MAY 12, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is supported by Code Comments, an original podcast from Red Hat. Data lakes are notoriously complex. Support Data Engineering Podcast Summary Building a data platform is a substrantial engineering endeavor.

Management

Management Data Lake High Quality Data Machine Learning

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is supported by Code Comments, an original podcast from Red Hat. Data lakes are notoriously complex. Unfortunately, reasoning about stream processing engines is complex and lacks sufficient tooling.

Process

Process Data Lake High Quality Data Machine Learning

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data. Your first 30 days are free!

Database

Database Data Lake High Quality Data Data Workflow

6 Ways To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

Another significant challenge is the reactive nature of operations within many data teams. Instead of driving innovation, data engineers often find themselves bogged down with maintenance tasks. On average, engineers spend over half of their time maintaining existing systems rather than developing new solutions.

Data Pipeline

Data Pipeline Metadata Data Workflow Data

Practical First Steps In Data Governance For Long Term Success

Data Engineering Podcast

JUNE 2, 2024

In this episode she shares the practical steps to implementing a data governance practice in your organization, and the pitfalls to avoid. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Starburst : ![Starburst

Data Governance

Data Governance Government Data Lake High Quality Data

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication. Your first 30 days are free!

Project

Project Data Lake High Quality Data Data Workflow

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

In this episode Pete Hunt, CEO of Dagster labs, outlines these new capabilities, how they reduce the burden on data teams, and the increased collaboration that they enable across teams and business units. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free!

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.

Building

Building Data Lake High Quality Data Machine Learning

Introducing Snowflake Notebooks, an End-to-End Interactive Environment for Data & AI Teams

Snowflake

JUNE 6, 2024

You can now use Snowflake Notebooks to simplify the process of connecting to your data and to amplify your data engineering, analytics and machine learning workflows. Faster, easier AI/ML and data engineering workflows Explore, analyze and visualize data using Python and SQL.

SQL

SQL Python Machine Learning Data Workflow

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.

Database

Database Technology Data Lake High Quality Data

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. What are the open questions today in technical scalability of data engines? What are the open questions today in technical scalability of data engines?

Data Process

Data Process Process Data Lake High Quality Data

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Andrei Tserakhau has dedicated his careeer to this problem, and in this episode he shares the lessons that he has learned and the work he is doing on his most recent data transfer system at DoubleCloud. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles.

Systems

Systems Designing Data Lake SQL

When And How To Conduct An AI Program

Data Engineering Podcast

MARCH 3, 2024

Colleen Tartow has worked across all stages of the data lifecycle, and in this episode she shares her hard-earned wisdom about how to conduct an AI program for your organization. Data lakes are notoriously complex. Visit dataengineeringpodcast.com/data-council and use code dataengpod20 to register today!

Programming

Programming Data Lake High Quality Data Machine Learning

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Edureka

APRIL 22, 2025

Although they take quite different approaches, Microsoft Fabric and Snowflake, two of the top players in the current data landscape, both provide strong capabilities. The company wants to combine its sales, inventory, and customer data in order to facilitate real-time reporting and predictive analytics. Office 365, Power BI, Azure).

BI

BI Pipeline-centric Data Lake Google Cloud

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern customer data platform (CDP).

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.

Building

Building Data Lake High Quality Data Machine Learning

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

FEBRUARY 11, 2024

In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process. Support Data Engineering Podcast Summary Sharing data is a simple concept, but complicated to implement well.

Data Lake

Data Lake High Quality Data Government Machine Learning

Data Engineering Weekly #198

The Emerging Role of AI Data Engineers - The New Strategic Role for AI-Driven Success

Webinars

Trending Sources

Top 10 Data Engineering Trends in 2025

Webinars

5 Free Courses to Master Data Engineering

Data Engineering Weekly #196

Data Engineering Weekly #214

Data Engineering Weekly #206

KDnuggets News, December 6: GitHub Repositories to Master Machine Learning • 5 Free Courses to Master Data Engineering

New Fivetran connector streamlines data workflows for real-time insights

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Effective Pandas Patterns For Data Engineering

Scale Unstructured Text Analytics with Batch LLM Inference

Troubleshooting Kafka In Production

Snowflake Ventures Invests in DataOps.live Bringing Advanced DevOps Capabilities to the AI Data Cloud

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

Stitching Together Enterprise Analytics With Microsoft Fabric

How to Become a Data Engineer in 2024?

How To Prepare Your Data Team for 2025

Designing A Non-Relational Database Engine

Complete Guide to Data Transformation: Basics to Advanced

Making Email Better With AI At Shortwave

Being Data Driven At Stripe With Trino And Iceberg

Building Linked Data Products With JSON-LD

Tackling Real Time Streaming Data With SQL Using RisingWave

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Release Management For Data Platform Services And Logic

X-Ray Vision For Your Flink Stream Processing With Datorios

Reconciling The Data In Your Databases With Datafold

6 Ways To Prepare Your Data Team for 2025

Practical First Steps In Data Governance For Long Term Success

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Introducing Snowflake Notebooks, an End-to-End Interactive Environment for Data & AI Teams

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Designing Data Transfer Systems That Scale

When And How To Conduct An AI Program

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Modern Customer Data Platform Principles

Build Your Second Brain One Piece At A Time

Data Sharing Across Business And Platform Boundaries

Stay Connected