Data Engineer, Data Workflow and SQL - Data Engineering Digest

Data Engineering Weekly #198

Data Engineering Weekly

NOVEMBER 24, 2024

Editor’s Note: Launching Data & Gen-AI courses in 2025 I can’t believe DEW will reach almost its 200th edition soon. What I started as a fun hobby has become one of the top-rated newsletters in the data engineering industry. The blog narrates a few examples of Pipe Syntax in comparison with the SQL queries.

Data Engineering

Data Engineering Data Engineer Engineering Insurance

The Emerging Role of AI Data Engineers - The New Strategic Role for AI-Driven Success

Data Engineering Weekly

JANUARY 15, 2025

The Critical Role of AI Data Engineers in a Data-Driven World How does a chatbot seamlessly interpret your questions? The answer lies in unstructured data processing—a field that powers modern artificial intelligence (AI) systems. How does a self-driving car understand a chaotic street scene?

Data Engineering

Data Engineering Data Engineer Unstructured Data Engineering

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. Can you describe what RisingWave is and the story behind it?

SQL

SQL Data Lake High Quality Data Machine Learning

Webinars

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Data Engineering Weekly #214

Data Engineering Weekly

MARCH 30, 2025

Save Your Spot → Editor’s Note: Data Council 2025, Apr 22-24, Oakland, CA Data Council has always been one of my favorite events to connect with and learn from the data engineering community. Data Council 2025 is set for April 22-24 in Oakland, CA. link] BVP: Roadmap: Data 3.0

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Kafka

Kafka Data Lake High Quality Data SQL

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. Visit: dataengineeringpodcast.com/data-council today. Your first 30 days are free!

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

Snowflake

APRIL 17, 2024

In today’s data-driven world, developer productivity is essential for organizations to build effective and reliable products, accelerate time to value, and fuel ongoing innovation. Dive in to experience how the enhanced Python API streamlines your data workflows and unlocks the full potential of Python within Snowflake.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineer

Scale Unstructured Text Analytics with Batch LLM Inference

Snowflake

MARCH 6, 2025

In this post, you will gain insight into common business use cases for large-scale text data analytics. Youll also discover why deploying batch LLM pipelines can be challenging and how Snowflake has optimized Snowflake Cortex AI for batch inference via SQL functions. What are common batch LLM inference jobs?

Unstructured Data

Unstructured Data Medical Media Data Workflow

Building Linked Data Products With JSON-LD

Data Engineering Podcast

SEPTEMBER 17, 2023

Summary A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. With Materialize, you can! Hex brings everything together.

Building

Building SQL BI Python

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Summary Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. Data lakes are notoriously complex. Your first 30 days are free!

Data Lake

Data Lake High Quality Data BI Data Workflow

Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents

dbt Developer Hub

APRIL 20, 2025

We expect that over the coming years, structured data is going to become heavily integrated into AI workflows and that dbt will play a key role in building and provisioning this data. We are committed to building the data control plane that enables AI to reliably access structured data from across your entire data lineage.

Structured Data

Structured Data SQL BI Project

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.

Data Lake

Data Lake High Quality Data Machine Learning Data Pipeline

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication. Data lakes are notoriously complex.

Non-relational Database

Non-relational Database Relational Database Database Designing

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team. Data lakes are notoriously complex. With Materialize, you can! Rudderstack : ![Rudderstack]([link]

Architecture

Architecture Data Lake High Quality Data SQL

Introducing Snowflake Notebooks, an End-to-End Interactive Environment for Data & AI Teams

Snowflake

JUNE 6, 2024

Snowflake Notebooks aim to provide a convenient, easy-to-use interactive environment that seamlessly blends Python, SQL and Markdown, as well as integrations with key Snowflake offerings, like Snowpark ML, Streamlit, Cortex and Iceberg tables. Discover valuable business insights through exploratory data analysis.

SQL

SQL Python Machine Learning Data Workflow

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Andrei Tserakhau has dedicated his careeer to this problem, and in this episode he shares the lessons that he has learned and the work he is doing on his most recent data transfer system at DoubleCloud. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles.

Systems

Systems Designing Data Lake SQL

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. Your first 30 days are free!

Database

Database Data Lake High Quality Data Data Workflow

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

In this episode Pete Hunt, CEO of Dagster labs, outlines these new capabilities, how they reduce the burden on data teams, and the increased collaboration that they enable across teams and business units. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free!

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

How to Become a Data Engineer in 2024?

Knowledge Hut

DECEMBER 26, 2023

Data Engineering is typically a software engineering role that focuses deeply on data – namely, data workflows, data pipelines, and the ETL (Extract, Transform, Load) process. What is Data Science? What are the roles and responsibilities of a Data Engineer? And many more.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Unlocking Your dbt Projects With Practical Advice For Practitioners

Data Engineering Podcast

NOVEMBER 19, 2023

Summary The dbt project has become overwhelmingly popular across analytics and data engineering teams. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data projects are notoriously complex. Data lakes are notoriously complex.

Project

Project Data Lake SQL High Quality Data

Shining Some Light In The Black Box Of PostgreSQL Performance

Data Engineering Podcast

NOVEMBER 5, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

PostgreSQL

PostgreSQL Data Lake SQL High Quality Data

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. What are the open questions today in technical scalability of data engines? What are the open questions today in technical scalability of data engines?

Data Process

Data Process Process Data Lake High Quality Data

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Data Engineering Podcast

NOVEMBER 12, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Software Engineering

Software Engineering Software Engineer Engineering Data Lake

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.

Database

Database Technology Data Lake High Quality Data

When And How To Conduct An AI Program

Data Engineering Podcast

MARCH 3, 2024

Colleen Tartow has worked across all stages of the data lifecycle, and in this episode she shares her hard-earned wisdom about how to conduct an AI program for your organization. Data lakes are notoriously complex. Visit dataengineeringpodcast.com/data-council and use code dataengpod20 to register today!

Programming

Programming Data Lake High Quality Data Machine Learning

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern customer data platform (CDP).

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

JANUARY 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Lake

Data Lake Building High Quality Data AWS

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication. Your first 30 days are free!

Project

Project Data Lake High Quality Data Data Workflow

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.

Building

Building Data Lake High Quality Data Machine Learning

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

FEBRUARY 11, 2024

In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process. Support Data Engineering Podcast Summary Sharing data is a simple concept, but complicated to implement well.

Data Lake

Data Lake High Quality Data Government Machine Learning

Complete Guide to Data Transformation: Basics to Advanced

Ascend.io

OCTOBER 28, 2024

You might even think of effective data transformation like a powerful magnet that draws the needle from the stack, leaving the hay behind. In this blog post, we’ll explore fundamental concepts, intermediate strategies, and cutting-edge techniques that are shaping the future of data engineering.

Raw Data

Raw Data Datasets Aggregated Data Data Pipeline

Microsoft Fabric Tutorial for Beginners

Edureka

MAY 27, 2025

You won’t have to deal with siloed systems, jump between tools, or write endless lines of code to make data useful. With its ability to seamlessly integrate data engineering, analytics, and business intelligence, Microsoft Fabric stands out as the all-in-one superhero in a world where data is abundant but insights are scarce.

BI

BI Data Pipeline Business Intelligence Data Engineering

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.

Building

Building Data Lake High Quality Data Machine Learning

Designing Data Platforms For Fintech Companies

Data Engineering Podcast

DECEMBER 31, 2023

In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Designing

Designing Data Lake High Quality Data SQL

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Podcast

DECEMBER 10, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products.

Data Lake

Data Lake SQL High Quality Data Architecture

Version Your Data Lakehouse Like Your Software With Nessie

Data Engineering Podcast

MARCH 10, 2024

In this episode Alex Merced explains how the branching and merging functionality in Nessie allows you to use the same versioning semantics for your data lakehouse that you are used to from Git. Data lakes are notoriously complex. Visit dataengineeringpodcast.com/data-council and use code dataengpod20 to register today!

Data Lake

Data Lake High Quality Data Architecture Machine Learning

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Podcast

DECEMBER 17, 2023

In this episode founder Tarush Aggarwal explains how the realities of the modern data stack are impacting data teams and the work that they are doing to accelerate time to value. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles.

Data Lake

Data Lake High Quality Data SQL Architecture

Data Engineering Weekly #191

Data Engineering Weekly

SEPTEMBER 29, 2024

link] Google: SQL Has Problems - We Can Fix Them - Pipe Syntax In SQL It was a good weekend read about the proposed pipe syntax in SQL, which is more similar to Unix pipes in terms of its core concept—sequential data flow and transformation. Unix pipes typically represent a physical flow of data between processes.

Data Engineering

Data Engineering Data Engineer Engineering SQL

Accelerate Development Of Enterprise Analytics With The Coalesce Visual Workflow Builder

Data Engineering Podcast

APRIL 3, 2022

Summary The flexibility of software oriented data workflows is useful for fulfilling complex requirements, but for simple and repetitious use cases it adds significant complexity. In this episode Satish Jayanthi explains how he is building a framework to allow enterprises to move quickly while maintaining guardrails for data workflows.

Data Warehouse

Data Warehouse Data Workflow Data Architecture SQL

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Data Engineering Podcast

DECEMBER 18, 2022

In this episode Abe Gong brings his experiences with the Great Expectations project and community to discuss the technical and organizational considerations involved in implementing these constraints to your data workflows. Missing data? Missing data? Struggling with broken pipelines? Stale dashboards? Stale dashboards?

Metadata

Metadata Business Intelligence Data Lake BI

What Is Data Engineering And What Does A Data Engineer Do?

Meltano

OCTOBER 5, 2022

Interested in becoming a data engineer? The need for data experts in the U.S. job market is expected to grow by 22% in this decade, and according to LinkedIn’s 2020 report , a data engineer is listed as the 8th fastest growing job today. But what is data engineering exactly and what does a data engineer do?

Data Engineering

Data Engineering Data Engineer Engineering Raw Data

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. As someone who listens to the Data Engineering Podcast, you know that the road from tool selection to production readiness is anything but smooth or straight.

Systems

Systems Data Lake High Quality Data Google Cloud

A Reflection On The Data Ecosystem For The Year 2021

Data Engineering Podcast

JANUARY 1, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. Missing data?

Data Warehouse

Data Warehouse Hadoop SQL Data Lake

Data Engineering Weekly #114

Data Engineering Weekly

JANUARY 15, 2023

Data Engineering Weekly Is Brought to You by RudderStack RudderStack provides data pipelines that make it easy to collect data from every application, website, and SaaS platform, then activate it in your warehouse and business tools. Pipelines for data in motion can quickly turn into DAG hell.

Data Engineering

Data Engineering Data Engineer Engineering Metadata

Data Engineering Weekly #198

The Emerging Role of AI Data Engineers - The New Strategic Role for AI-Driven Success

Webinars

Trending Sources

Tackling Real Time Streaming Data With SQL Using RisingWave

Webinars

Data Engineering Weekly #214

Troubleshooting Kafka In Production

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

Scale Unstructured Text Analytics with Batch LLM Inference

Building Linked Data Products With JSON-LD

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents

Making Email Better With AI At Shortwave

Designing A Non-Relational Database Engine

Addressing The Challenges Of Component Integration In Data Platform Architectures

Introducing Snowflake Notebooks, an End-to-End Interactive Environment for Data & AI Teams

Designing Data Transfer Systems That Scale

Reconciling The Data In Your Databases With Datafold

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

How to Become a Data Engineer in 2024?

Unlocking Your dbt Projects With Practical Advice For Practitioners

Shining Some Light In The Black Box Of PostgreSQL Performance

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

When And How To Conduct An AI Program

Modern Customer Data Platform Principles

Build A Data Lake For Your Security Logs With Scanner

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Build Your Second Brain One Piece At A Time

Data Sharing Across Business And Platform Boundaries

Complete Guide to Data Transformation: Basics to Advanced

Microsoft Fabric Tutorial for Beginners

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Designing Data Platforms For Fintech Companies

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Version Your Data Lakehouse Like Your Software With Nessie

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Weekly #191

Accelerate Development Of Enterprise Analytics With The Coalesce Visual Workflow Builder

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

What Is Data Engineering And What Does A Data Engineer Do?

Data Migration Strategies For Large Scale Systems

A Reflection On The Data Ecosystem For The Year 2021

Data Engineering Weekly #114

Stay Connected