Data Engineering Digest

More Trending

Building End-to-End Data Pipelines with Dask

KDnuggets

MAY 5, 2025

Learn how to implement a parallelization process in your data pipeline.

Data Pipeline

Data Pipeline Building Data Process

Enhancing the Python ecosystem with type checking and free threading

Engineering at Meta

MAY 5, 2025

Meta and Quantsight have improved key libraries in the Python Ecosystem. There is plenty more to do and we invite the community to help with our efforts. Well look at two key efforts in Pythons packaging ecosystem to make packages faster and easier to use: Unlock performance wins for developers through free-threaded Python where we leverage Python 3.13s support for concurrent programming (made possible by removing the Global Interpreter Lock (GIL)).

Python

Python Coding Programming Project

Expand to New Regions with Zero Additional Egress Costs

Snowflake

MAY 8, 2025

Data providers want their data available to their customers, no matter where in the world or on which cloud service provider the customer is located. However, egress costs can contribute up to 70% of total data transfer costs. Providers have historically had to balance the desire to increase the availability of their data to any relevant Snowflake regions with the need to manage egress costs.

AWS

AWS Cloud Algorithm Data Security

Migrating Uber’s Compute Platform to Kubernetes: A Technical Journey

Uber Engineering

MAY 5, 2025

Migrating tech stacks at Ubers scale isnt easy. Learn how we migrated our stateless container orchestration platform to Kubernetes and operate it at a scale of 3 million cores with 1.5 million pod launches daily.

Mastering NumPy’s Universal Functions for Fast Array Computation

KDnuggets

MAY 5, 2025

Master element-wise operations, comparisons, logic, aggregation, and broadcasting using NumPy ufuncs for high-performance array processing.

Process

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

CCM Solutions: Build or Buy for Your Business?

Precisely

MAY 5, 2025

Delivering seamless, personalized experiences for customers across channels continues to be a priority for organizations across industries. To make this goal a reality, they seek out powerful customer communications management (CCM) solutions. However, theres often a debate on whether to build a custom in-house solution or purchase an enterprise-grade platform.

Building

Building Telecommunication Certification Data Security

Data Engineering Interview Series #3: SQL

Start Data Engineering

MAY 5, 2025

1. Introduction 2. Step-by-step process to solve any SQL interview question 2.1. Define what the input data is and how they are related 2.2. Understand the input table’s grain, foreign keys, and how they relate to each other 2.3. Define the dimensions and metrics required for the output 2.4. Filter/Join/Group by input columns to get the output dimension and metrics 3.

SQL

SQL Data Engineering Data Engineer Engineering

Navigating Your Netezza to Databricks Migration: Tips for a Seamless Transition

databricks

MAY 5, 2025

Why migrate from Netezza to Databricks? The limitations of traditional enterprise data warehouse (EDW) appliances like Netezza are becoming increasingly apparent.

Data Warehouse

Data Warehouse Systems Data

9 Amazing Application of data engineering in real life

Edureka

MAY 8, 2025

When you purchase online, do you ever find yourself pondering how your tastes get changed into suggestions for products that are uniquely suited to you? Or how self-driving cars get through very complicated situations with amazing accuracy? These are the ways that data engineering improves our lives in the real world. The field of data engineering turns unstructured data into ideas that can be used to change businesses and our lives.

Data Engineering

Data Engineering Data Engineer Engineering Telecommunication

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

AI data collection guide

InData Labs

MAY 6, 2025

Artificial intelligence services have been a hot topic for the last decade. It is hard to find an area or industry nowadays that hasnt at least tried to use this relatively new tool in its work. However, there is one thing that makes it possible for AI to exist. This thing is DATA. Without high-quality. AI data collection guide InData Labs.

Data Collection

Data Collection Data IT BI

Understanding Your Data: The Key to Trust and Business Success

Precisely

MAY 12, 2025

Better decision-making, innovation, and compliance all hinge on one common factor: trusted data. And today, were working with more data than ever. But heres a fundamental challenge that many organizations face and one Ive encountered in countless conversations with customers: they dont fully understand what data they have, let alone whether they can trust it.

Government

Government Metadata Data Governance Data

Sol Rashidi on Why Most AI Strategies Fail—and What Great Data Leaders Get Right

Striim

MAY 9, 2025

Get More Insights In Your Inbox Sol Rashidi has built AI, data, and digital strategies inside some of the worlds biggest companiesand shes seen the same mistakes play out again and again. In this episode, she unpacks why AI initiatives often stall, how executives misread what transformation really requires, and why the future of AI success isnt technicalits cultural.

Data

Navigating the SQL Server to Databricks Migration: Tips for a Seamless Transition

databricks

MAY 5, 2025

The imperative for modernization Traditional database solutions like SQL Server have struggled to keep up with the demands of modern data workloads due to a

SQL

SQL Database Data

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Snowflake Invests in RightRev to Add Revenue Accounting Automation to the AI Data Cloud

Snowflake

MAY 5, 2025

Consumption-based pricing helps Snowflake balance the power of true cloud elasticity with clear visibility into usage and spending. According to The State of Usage-Based Pricing: Second Edition , three out of five SaaS companies now incorporate some form of usage-based pricing into their offerings. Thats why Snowflake Ventures is reinvesting in our partner RightRev , a cloud-based revenue recognition management platform created to facilitate improved financial reporting processes, specifically

Cloud

Cloud Finance Data Process

Data Engineering Weekly #220

Data Engineering Weekly

MAY 11, 2025

Dagster Running Dagster: Our Open Platform We’re pulling back the curtain. Join us on May 13 for a live deep dive into how Dagster Labs runs Dagster in production. One of our lead data engineers will walk through our real-world implementation, architecture decisions, and the lessons we've learned scaling the platform. Register now Editor’s Note: OpenXData Conference - 2025 - A Free Virtual Event A free virtual event on open data architectures - Iceberg, Hudi, lakehouses, query engine

Data Engineering

Data Engineering Data Engineer Engineering Data

Real-Time Spatial Temporal Forecasting @ Lyft

Lyft Engineering

MAY 5, 2025

Written by Josh Xi & Rakesh Kumar atLyft. From real-time rider pricing and driver incentives to long-term budget allocation and strategic planning, forecasting at Lyft plays a pivotal role in providing a foresight of our market conditions for efficient operations and facilitating millions of rides daily across North America. This article explores real-time spatial temporal forecasting models and system designs used for predicting market conditions, focusing on how their complexity and rapid

Machine Learning

Machine Learning Architecture Kafka Systems

Stream ServiceNow Data to Google BigQuery

Striim

MAY 9, 2025

Introduction This recipe shows how you can build a data pipeline to read data from ServiceNow and write to BigQuery. Striim’s ServiceNow Reader will first read the existing tables from the configured ServiceNow dataset and then write them to the target BigQuery project using the BigQuery Writer, a process called “initial load” in Striim and “historical sync” or “initial snapshot” by others.

Data Pipeline

Data Pipeline Data Datasets Data Warehouse

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

Upskill yourself and your teams at Data+AI Summit

databricks

MAY 5, 2025

We are experiencing an unprecedented pace of technological innovation driven by AI and data.

Data

Data Technology

How Snowflake Is Driving the Future of Auto Innovation

Snowflake

MAY 5, 2025

The automotive industry has undergone a seismic transformation over the last several decades. In 2005, most vehicles were mechanically sophisticated but had limited digital capabilities and integrations. Today, the automobile is a computer on wheels: software-defined, cloud-connected, increasingly autonomous and continuously evolving through over-the-air (OTA) updates.

Manufacturing

Manufacturing AWS Cloud Consulting

ArcGIS Pro on Windows 365 GPU-enabled Cloud PCs: Delivering High-Performance GIS Anywhere

ArcGIS

MAY 12, 2025

ArcGIS Pro on Windows 365 GPU-enabled Cloud PCs

3 Excellent Practical Generative AI Courses

KDnuggets

MAY 7, 2025

Learn to build AI agents, fine-tune reasoning models, and master practical AI skills with these courses.

Building

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

6 Real-World ETL Use Cases with Estuary Flow

Seattle Data Guy

MAY 9, 2025

After working in data for over a decade, one thing that remains the same is the need to create data pipelines. Whether you call them ETLs/ELTs or something else, companies need to move and process data for analytics. The question becomes how companies are actually building their data pipelines. What ETL tools are they actually… Read more The post 6 Real-World ETL Use Cases with Estuary Flow appeared first on Seattle Data Guy.

ETL Tools

ETL Tools Data Pipeline Building Process

How Equinor Optimized Seismic Data Pipeline with Databricks

databricks

MAY 12, 2025

The oil and gas industry relies heavily on seismic data to explore and extract hydrocarbons safely and efficiently.

Data Pipeline

Data Pipeline Data Process

Snowflake Invests in Theom to Automate Data Protection

Snowflake

MAY 9, 2025

In today's complex enterprise environments, managing data security is a daunting challenge. Organizations are grappling with a growing number of data stores, data sharing, increasing use of data for AI and increasingly sophisticated threats. This complexity necessitates a shift toward automated, AI-driven solutions that simplify security governance and accelerate threat detection.

Metadata

Metadata Data Governance Government Data Security

Accelerating GPU indexes in Faiss with NVIDIA cuVS

Engineering at Meta

MAY 8, 2025

Meta and NVIDIA collaborated to accelerate vector search on GPUs by integrating NVIDIA cuVS into Faiss v1.10 , Metas open source library for similarity search. This new implementation of cuVS will be more performant than classic GPU-accelerated search in some areas. For inverted file (IVF) indexing, NVIDIA cuVS outperforms classical GPU-accelerated IVF build times by up to 4.7x; and search latency is reduced by as much as 8.1x.

Algorithm

Algorithm Datasets Machine Learning Database

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineer

Trending Articles

Abstracting column access in PySpark with Proxy design pattern

Implementing a Dimensional Data Warehouse with Databricks SQL: Part 2

Webinars

Trending Sources

Securing Machine Learning Applications with Authentication and User Management

Webinars

Unapologetically Technical Episode 20 – Shane Murray

A Guide to Debugging Apache Airflow® DAGs

Atlassian + Databricks: Unlocking Data Insights with Delta Sharing

Fixrleak: Fixing Java Resource Leaks with GenAI

Building End-to-End Data Pipelines with Dask

Sign up to get articles personalized to your interests!

More Trending

Building End-to-End Data Pipelines with Dask

Enhancing the Python ecosystem with type checking and free threading

Expand to New Regions with Zero Additional Egress Costs

Migrating Uber’s Compute Platform to Kubernetes: A Technical Journey

Mastering NumPy’s Universal Functions for Fast Array Computation

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

CCM Solutions: Build or Buy for Your Business?

Data Engineering Interview Series #3: SQL

Navigating Your Netezza to Databricks Migration: Tips for a Seamless Transition

9 Amazing Application of data engineering in real life

Agent Tooling: Connecting AI to Your Tools, Systems & Data

AI data collection guide

Understanding Your Data: The Key to Trust and Business Success

Sol Rashidi on Why Most AI Strategies Fail—and What Great Data Leaders Get Right

Navigating the SQL Server to Databricks Migration: Tips for a Seamless Transition

How to Modernize Manufacturing Without Losing Control

Snowflake Invests in RightRev to Add Revenue Accounting Automation to the AI Data Cloud

Data Engineering Weekly #220

Real-Time Spatial Temporal Forecasting @ Lyft

Stream ServiceNow Data to Google BigQuery

The Ultimate Guide to Apache Airflow DAGS

Upskill yourself and your teams at Data+AI Summit

How Snowflake Is Driving the Future of Auto Innovation

ArcGIS Pro on Windows 365 GPU-enabled Cloud PCs: Delivering High-Performance GIS Anywhere

3 Excellent Practical Generative AI Courses

Apache Airflow® Best Practices: DAG Writing

6 Real-World ETL Use Cases with Estuary Flow

How Equinor Optimized Seismic Data Pipeline with Databricks

Snowflake Invests in Theom to Automate Data Protection

Accelerating GPU indexes in Faiss with NVIDIA cuVS

How to Achieve High-Accuracy Results When Using LLMs

Stay Connected