Process and Systems - Data Engineering Digest

How to Build and Monitor Systems Using Airflow?

Analytics Vidhya

FEBRUARY 3, 2023

Are you looking for a way to automate and simplify the process? Imagine scheduling your ML tasks to run automatically without the need for manual […] The post How to Build and Monitor Systems Using Airflow? Airflow can help you manage your workflow and make your life easier with its monitoring and notifications features.

Systems

Systems Building Machine Learning Management

Redefining AIOps IT Workflows with Legacy System Visibility

Precisely

DECEMBER 16, 2024

Modern IT environments require comprehensive data for successful AIOps, that includes incorporating data from legacy systems like IBM i and IBM Z into ITOps platforms. AIOps presents enormous promise, but many organizations face hurdles in its implementation: Complex ecosystems made of multiple, fragmented systems that lack interoperability.

Systems

Systems IT Machine Learning Insurance

Inside Facebook’s video delivery system

Engineering at Meta

DECEMBER 10, 2024

Were explaining the end-to-end systems the Facebook app leverages to deliver relevant content to people. At Facebooks scale, the systems built to support and overcome these challenges require extensive trade-off analyses, focused optimizations, and architecture built to allow our engineers to push for the same user and business outcomes.

Systems

Systems Architecture Engineering Data Pipeline

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

Summary Any software system that survives long enough will require some form of migration or evolution. When that system is responsible for the data layer the process becomes more challenging. As you have gone through successive migration projects, how has that influenced the ways that you think about architecting data systems?

Systems

Systems Data Lake High Quality Data Google Cloud

LLMs in Production: Tooling, Process, and Team Structure

Speaker: Dr. Greg Loughnane and Chris Alexiuk

Greg Loughnane and Chris Alexiuk in this exciting webinar to learn all about: How to design and implement production-ready systems with guardrails, active monitoring of key evaluation metrics beyond latency and token count, managing prompts, and understanding the process for continuous improvement Best practices for setting up the proper mix of open- (..)

Process

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

The world we live in today presents larger datasets, more complex data, and diverse needs, all of which call for efficient, scalable data systems. Open Table Format (OTF) architecture now provides a solution for efficient data storage, management, and processing while ensuring compatibility across different platforms.

Architecture

Architecture Systems Data Lake Google Cloud

Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable

Data Engineering Podcast

OCTOBER 15, 2023

Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. Go to [materialize.com]([link] today and get 2 weeks free!

Process

Process Building SQL BI

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

Summary Streaming data processing enables new categories of data products and analytics. Unfortunately, reasoning about stream processing engines is complex and lacks sufficient tooling. How have the requirements of generative AI shifted the demand for streaming data systems? Can you describe how Datorios is implemented?

Process

Process Data Lake High Quality Data Machine Learning

Data and Process Automation Adoption: Challenges, Maturity, and Business Impact

Precisely

MARCH 3, 2025

Data and process automation used to be seen as luxury but those days are gone. Lets explore the top challenges to data and process automation adoption in more detail. Almost half of respondents (47%) reported a medium level of automation adoption, meaning they currently have a mix of automated and manual SAP processes.

Process

Process Government Data Finance

Monetizing Analytics Features: Why Data Visualizations Will Never Be Enough

Think your customers will pay more for data visualizations in your application? Five years ago they may have. But today, dashboards and visualizations have become table stakes. Discover which features will differentiate your application and maximize the ROI of your embedded analytics. Brought to you by Logi Analytics.

Data

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

DataKitchen

MARCH 20, 2025

Unlocking Data Team Success: Are You Process-Centric or Data-Centric? We’ve identified two distinct types of data teams: process-centric and data-centric. Process-centric data teams focus their energies predominantly on orchestrating and automating workflows. They work in and on these pipelines.

Pipeline-centric

Pipeline-centric Database-centric Process Data

Ransomware Attacks: 3 Keys to Resilience for Your IBM i Systems

Precisely

NOVEMBER 7, 2024

Key Takeaways: In the face of ransomware attacks, a resilience strategy for IBM i systems must include measures for prevention, detection, and recovery. No platform is immune, not even the reliable and secure IBM i systems. So, how can you keep your IBM i systems resilient even as ransomware risks are on the rise?

Systems

Systems Accessibility Accessible Programming

What is System Hacking? Types and Prevention

Edureka

APRIL 10, 2025

When you hear the term System Hacking, it might bring to mind shadowy figures behind computer screens and high-stakes cyber heists. In this blog, we’ll explore the definition, purpose, process, and methods of prevention related to system hacking, offering a detailed overview to help demystify the concept.

Systems

Systems Education Banking Accessibility

Data Engineering Interview Series #2: System Design

Start Data Engineering

JANUARY 20, 2025

Guide the interviewer through the process 2.1. Introduction 2. Requirements gathering] Make sure you clearly understand the requirements & business use case 2.2. Understand source data] Know what you have to work with 2.3. Model your data] Define data models for historical analytics 2.4.

Designing

Designing Systems Data Engineering Data Engineer

Best Practices for Real-Time Stream Processing

Striim

MARCH 21, 2025

What is Real-Time Stream Processing? To access real-time data, organizations are turning to stream processing. To access real-time data, organizations are turning to stream processing. There are two main data processing paradigms: batch processing and stream processing.

Process

Process Data Warehouse Kafka Data Pipeline

How Meta discovers data flows via lineage at scale

Engineering at Meta

JANUARY 22, 2025

It is a critical and powerful tool for scalable discovery of relevant data and data flows, which supports privacy controls across Metas systems. It enhances the traceability of data flows within systems, ultimately empowering developers to swiftly implement privacy controls and create innovative products.

Data Warehouse

Data Warehouse SQL Programming Language Data

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Summary Data processing technologies have dramatically improved in their sophistication and raw throughput. What are the experimental methods that you are using to gain understanding in the opportunities and practical limits of those systems? What do you have planned for the future of your academic research?

Data Process

Data Process Process Data Lake High Quality Data

Real-Time AI for Crisis Management: Responding Faster with Smarter Systems

Striim

JANUARY 30, 2025

When integrated effectively, AI and machine learning (ML) models can process data streams at near-zero latency, empowering teams to make split-second decisions. Systems must be capable of handling high-velocity data without bottlenecks. Thats where real-time artificial intelligence (AI) can help.

Systems

Systems Management Hospitality Healthcare

The Roots of Today's Modern Backend Engineering Practices

The Pragmatic Engineer

NOVEMBER 21, 2023

If you had a continuous deployment system up and running around 2010, you were ahead of the pack: but today it’s considered strange if your team would not have this for things like web applications. Avoiding downtime was nerve-wracking, and the notion of a 'rollback' was as much a relief as a technical process.

Engineering

Engineering Bytes Cloud Computing AWS

Netflix’s Distributed Counter Abstraction

Netflix Tech

NOVEMBER 12, 2024

Failures in a distributed system are a given, and having the ability to safely retry requests enhances the reliability of the service. Implementing idempotency would likely require using an external system for such keys, which can further degrade performance or cause race conditions.

Datasets

Datasets Computer Science Systems Kafka

Scaling Beyond Postgres: How to Choose a Real-Time Analytical Database

Simon Späti

MARCH 11, 2025

Therefore, you’ve probably come across terms like OLAP (Online Analytical Processing) systems, data warehouses, and, more recently, real-time analytical databases. But data volumes grow, analytical demands become more complex, and Postgres stops being enough.

Database

Database Data Warehouse Data Engineer Data Engineering

Unapologetically Technical Episode 17 – Semih Salihoglu

Jesse Anderson

FEBRUARY 11, 2025

Semih is a researcher and entrepreneur with a background in distributed systems and databases. He then pursued his doctoral studies at Stanford University, delving into the complexities of database systems. Dont forget to subscribe to my YouTube channel to get the latest on Unapologetically Technical!

Computer Science

Computer Science Database Design Software Engineering Software Engineer

Paying down tech debt: further learnings

The Pragmatic Engineer

SEPTEMBER 19, 2024

In the early 90’s, DOS programs like the ones my company made had its own Text UI screen rendering system. This rendering system was easy for me to understand, even on day one. Our rendering system was very memory inefficient, but that could be fixed. By doing so, I got to see every screen of the system.

Recruitment

Recruitment Java Coding Project

Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents

dbt Developer Hub

APRIL 20, 2025

Both AI agents and business stakeholders will then operate on top of LLM-driven systems hydrated by the dbt MCP context. Todays system is not a full realization of the vision in the posts shared above, but it is a meaningful step towards safely integrating your structured enterprise data into AI workflows. Why does this matter?

Structured Data

Structured Data SQL BI Project

Interesting startup idea: benchmarking cloud platform pricing

The Pragmatic Engineer

OCTOBER 17, 2024

We recently covered how CockroachDB joins the trend of moving from open source to proprietary and why Oxide decided to keep using it with self-support , regardless Web hosting: Netlify : chosen thanks to their super smooth preview system with SSR support. Internal comms: Chat: Slack Coordination / project management: Linear 3.

Cloud

Cloud AWS Metadata Cloud Computing

Gen AI in Action: Customers’ Cortex AI Stories and Outcomes

Snowflake

NOVEMBER 6, 2024

Alberta Health Services ER doctors automate note-taking to treat 15% more patients The integrated health system of Alberta, Canada’s third-most-populous province, with 4.5 But getting a handle on all the emails, calls and support tickets had historically been a tedious and largely manual process. Cortex is doing a great job for us.”

Hospitality

Hospitality Medical Government Software Engineer

An educational side project

The Pragmatic Engineer

JUNE 1, 2023

Juraj included system monitoring parts which monitor the server’s capacity he runs the app on: The monitoring page on the Rides app And it doesn’t end here. Juraj created a systems design explainer on how he built this project, and the technologies used: The systems design diagram for the Rides application The app uses: Node.js

Education

Education Project PostgreSQL Software Engineering

They Handle 500B Events Daily. Here’s Their Data Engineering Architecture.

Monte Carlo

NOVEMBER 12, 2024

A data engineering architecture is the structural framework that determines how data flows through an organization – from collection and storage to processing and analysis. And who better to learn from than the tech giants who process more data before breakfast than most companies see in a year?

Architecture

Architecture Data Engineering Data Engineer Engineering

Foundation Model for Personalized Recommendation

Netflix Tech

MARCH 28, 2025

By Ko-Jen Hsiao , Yesu Feng and Sudarshan Lamkhede Motivation Netflixs personalized recommender system is a complex system, boasting a variety of specialized machine learned models each catering to distinct needs including Continue Watching and Todays Top Picks for You. Refer to our recent overview for more details).

Metadata

Metadata Bytes Data Mining Entertainment

Building cost effective data pipelines with Python & DuckDB

Start Data Engineering

MAY 28, 2024

Use DuckDB to process data, not for multiple users to access data 4.2. Cost calculation: DuckDB + Ephemeral VMs = dirt cheap data processing 4.3. Processing data less than 100GB? Distributed systems are scalable, resilient to failures, & designed for high availability 4.5. Project demo 3. Use DuckDB 4.4.

Data Pipeline

Data Pipeline Python Building Data

Most Essential 2023 Interview Questions on Data Engineering

Analytics Vidhya

FEBRUARY 7, 2023

Introduction Data engineering is the field of study that deals with the design, construction, deployment, and maintenance of data processing systems. The goal of this domain is to collect, store, and process data efficiently and efficiently so that it can be used to support business decisions and power data-driven applications.

Data Engineering

Data Engineering Data Engineer Engineering Data

Data News — Week 25.02

Christophe Blefari

JANUARY 11, 2025

AI companies are aiming for the moon—AGI—promising it will arrive once OpenAI develops a system capable of generating at least $100 billion in profits. Meaning: a YAML configuration system for ingestion and transformations, and now, visualisation with BI-as-code. Meanwhile, the AI landscape remains unpredictable.

Data

Data Data Warehouse Coding Programming Language

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Snowflake

NOVEMBER 26, 2024

A consolidated data system to accommodate a big(ger) WHOOP When a company experiences exponential growth over a short period, it’s easy for its data foundation to feel a bit like it was built on the fly. Processing some 90,000 tables per day, the team oversees the ingestion of more than 100 terabytes of data from upward of 8,500 events daily.

Data Warehouse

Data Warehouse Cloud PostgreSQL Hadoop

How Meta understands data at scale

Engineering at Meta

APRIL 28, 2025

Meta’s vast and diverse systems make it particularly challenging to comprehend its structure, meaning, and context at scale. Specifically, we have adopted a “shift-left” approach, integrating data schematization and annotations early in the product development process.

Metadata

Metadata Data Utilities Data Warehouse

Getting Started with Apache Arrow

Analytics Vidhya

MARCH 4, 2025

But processing large-scale data across different systems is often slow. Constant format conversions add processing time and memory overhead. Data is at the core of everything, from business decisions to machine learning. Traditional row-based storage formats struggle to keep up with modern analytics.

Machine Learning

Machine Learning Systems Process Data

YARN for Large Scale Computing: Beginner’s Edition

Analytics Vidhya

JANUARY 31, 2023

It is a powerful resource management system for a horizontal server environment. It is designed to be more flexible and generic than the original Hadoop MapReduce system, making it an attractive choice for companies looking to implement Hadoop. Introduction YARN stands for Yet Another Resource Negotiator.

Hadoop

Hadoop Designing Systems Management

Scale Unstructured Text Analytics with Batch LLM Inference

Snowflake

MARCH 6, 2025

Customer intelligence teams analyze reviews and forum comments to identify sentiment trends, while support teams process tickets to uncover product issues and inform gaps in a product roadmap. As data volumes grow and AI automation expands, cost efficiency in processing with LLMs depends on both system architecture and model flexibility.

Unstructured Data

Unstructured Data Medical Media Data Workflow

Data logs: The latest evolution in Meta’s access tools

Engineering at Meta

FEBRUARY 4, 2025

Here we explore initial system designs we considered, an overview of the current architecture, and some important principles Meta takes into account in making data accessible and easy to understand. We also considered caching data logs in an online system capable of supporting a range of indexed per-user queries. What are data logs?

Accessible

Accessible Accessibility Raw Data Data Warehouse

The Future of Data Management Is Agentic AI

Snowflake

APRIL 13, 2025

Agentic AI refers to AI systems that act autonomously on behalf of their users. These systems make decisions, learn from interactions and continuously improve without constant human intervention. Manual processes can be time-consuming and error-prone. What is agentic AI? Scalability As businesses grow, so does their data.

Data Management

Data Management Management Consulting Unstructured Data

Simplifying Multimodal Data Analysis with Snowflake Cortex AI

Snowflake

APRIL 16, 2025

Process all your data where it already lives Fragmented data environments and complex cloud architectures impede efficiency and innovation. For streamlining manual processes : Online retailers and food delivery platforms use Cortex AI to automate image descriptions for meals and groceries, reducing manual effort.

Data Analysis

Data Analysis Unstructured Data Manufacturing Retail

Snowflake Startup Challenge 2025: Meet the Top 10

Snowflake

APRIL 9, 2025

KAWA Analytics Digital transformation is an admirable goal, but legacy systems and inefficient processes hold back many companies efforts. PTA Robotics PTA Robotics AI-powered vineyard disease prediction system leverages drone imagery, Internet of Things data and weather insights to detect vineyard disease risks before symptoms appear.

Pharmaceutical

Pharmaceutical Manufacturing Data Ingestion SQL

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Cloudera

NOVEMBER 13, 2024

Additionally, multiple copies of the same data locked in proprietary systems contribute to version control issues, redundancies, staleness, and management headaches. This guarantees data quality and automates the laborious, manual processes required to maintain data reliability.

Metadata

Metadata Management Data Governance Government

Happy Leap Day!

The Pragmatic Engineer

FEBRUARY 29, 2024

But first, a few current cases of systems whose developers didn’t: In Sweden, card payments are down at a leading supermarket chain. Airline Avianca printed tickets dated as 3/1 instead of 2/29, thanks to their system not accounting for the leap day. The system was almost fully restored before noon.”

Software Engineering

Software Engineering Software Engineer Banking Engineering

How to Build and Monitor Systems Using Airflow?

Redefining AIOps IT Workflows with Legacy System Visibility

Webinars

Trending Sources

Inside Facebook’s video delivery system

Webinars

Data Migration Strategies For Large Scale Systems

LLMs in Production: Tooling, Process, and Team Structure

Why Open Table Format Architecture is Essential for Modern Data Systems

Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable

X-Ray Vision For Your Flink Stream Processing With Datorios

Data and Process Automation Adoption: Challenges, Maturity, and Business Impact

Monetizing Analytics Features: Why Data Visualizations Will Never Be Enough

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

Ransomware Attacks: 3 Keys to Resilience for Your IBM i Systems

What is System Hacking? Types and Prevention

Data Engineering Interview Series #2: System Design

Best Practices for Real-Time Stream Processing

How Meta discovers data flows via lineage at scale

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Real-Time AI for Crisis Management: Responding Faster with Smarter Systems

The Roots of Today's Modern Backend Engineering Practices

Netflix’s Distributed Counter Abstraction

Scaling Beyond Postgres: How to Choose a Real-Time Analytical Database

Unapologetically Technical Episode 17 – Semih Salihoglu

Paying down tech debt: further learnings

Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents

Interesting startup idea: benchmarking cloud platform pricing

Top 6 Microsoft HDFS Interview Questions

Gen AI in Action: Customers’ Cortex AI Stories and Outcomes

An educational side project

They Handle 500B Events Daily. Here’s Their Data Engineering Architecture.

Foundation Model for Personalized Recommendation

Building cost effective data pipelines with Python & DuckDB

Most Essential 2023 Interview Questions on Data Engineering

Data News — Week 25.02

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

How Meta understands data at scale

Getting Started with Apache Arrow

YARN for Large Scale Computing: Beginner’s Edition

Scale Unstructured Text Analytics with Batch LLM Inference

Data logs: The latest evolution in Meta’s access tools

The Future of Data Management Is Agentic AI

Simplifying Multimodal Data Analysis with Snowflake Cortex AI

Snowflake Startup Challenge 2025: Meet the Top 10

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Happy Leap Day!

Stay Connected