Building and Systems - Data Engineering Digest

How to Build and Monitor Systems Using Airflow?

Analytics Vidhya

FEBRUARY 3, 2023

Imagine scheduling your ML tasks to run automatically without the need for manual […] The post How to Build and Monitor Systems Using Airflow? Airflow can help you manage your workflow and make your life easier with its monitoring and notifications features. appeared first on Analytics Vidhya.

Systems

Systems Building Machine Learning Management

Building a Question-Answering System Using RAG

WeCloudData

APRIL 9, 2025

The ability to extract information from vast amounts of text has made question-answering (QA) systems essential in the modern era of AI-driven apps. RAG-based question-answering systems use large language models to generate human-like responses to user queries.

Systems

Systems Building IT Data Science

Getting Started with Building RAG Systems Using Haystack

KDnuggets

JANUARY 3, 2025

Retrieval augmented generation (RAG) is altering the way we use large language models, but building these systems can be hectic. In this article, you will learn how to build RAG systems using Haystack.

Systems

Systems Building

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Building cost effective data pipelines with Python & DuckDB

Start Data Engineering

MAY 28, 2024

Building efficient data pipelines with DuckDB 4.1. Distributed systems are scalable, resilient to failures, & designed for high availability 4.5. Introduction 2. Project demo 3. Use DuckDB to process data, not for multiple users to access data 4.2. Cost calculation: DuckDB + Ephemeral VMs = dirt cheap data processing 4.3.

Data Pipeline

Data Pipeline Python Building Data

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API.

Systems

Build Compound AI Systems Faster with Databricks Mosaic AI

databricks

OCTOBER 1, 2024

Many of our customers are shifting from monolithic prompts with general-purpose models to specialized compound AI systems to achieve the quality needed for.

Systems

Systems Building Data Science Engineering

Inside Facebook’s video delivery system

Engineering at Meta

DECEMBER 10, 2024

Were explaining the end-to-end systems the Facebook app leverages to deliver relevant content to people. At Facebooks scale, the systems built to support and overcome these challenges require extensive trade-off analyses, focused optimizations, and architecture built to allow our engineers to push for the same user and business outcomes.

Systems

Systems Architecture Engineering Data Pipeline

Building an Automatic Speech Recognition System with PyTorch & Hugging Face

KDnuggets

MARCH 26, 2025

Check out this step-by-step guide to building a speech-to-text system with PyTorch & Hugging Face.

Systems

Systems Building

Mosaic AI: Build and deploy production-quality Compound AI Systems

databricks

JUNE 12, 2024

Over the last year, we have seen a surge of commercial and open-source foundation models showing strong reasoning abilities on general knowledge tasks.

Systems

Systems Building Data Science Data

LLMs in Production: Tooling, Process, and Team Structure

Speaker: Dr. Greg Loughnane and Chris Alexiuk

However, during development – and even more so once deployed to production – best practices for operating and improving generative AI applications are less understood. Register today to save your seat! December 6th, 2023 at 11:00am PST, 2:00pm EST, 7:pm GMT

Process

Establishing a Large Scale Learned Retrieval System at Pinterest

Pinterest Engineering

JANUARY 31, 2025

Modern large-scale recommendation systems usually include multiple stages where retrieval aims at retrieving candidates from billions of candidate pools, and ranking predicts which item a user tends to engage from the trimmed candidate set retrieved from early stages [2]. General multi-stage recommendation system design in Pinterest.

Systems

Systems Metadata Machine Learning Architecture

Building Linked Data Products With JSON-LD

Data Engineering Podcast

SEPTEMBER 17, 2023

Summary A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products. Hex brings everything together.

Building

Building SQL BI Python

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

Summary Any software system that survives long enough will require some form of migration or evolution. When that system is responsible for the data layer the process becomes more challenging. As you have gone through successive migration projects, how has that influenced the ways that you think about architecting data systems?

Systems

Systems Data Lake High Quality Data Google Cloud

Building ETL Pipeline with Snowpark

Cloudyard

DECEMBER 24, 2024

In this blog, well explore Building an ETL Pipeline with Snowpark by simulating a scenario where commerce data flows through distinct data layersRAW, SILVER, and GOLDEN.These tables form the foundation for insightful analytics and robust business intelligence. Create a fact table to summarize daily sales. Develop a VIEW in Semantic Layer.

Building

Building Raw Data Scala Business Intelligence

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale.

Software Engineer

Building Meta’s GenAI Infrastructure

Engineering at Meta

MARCH 12, 2024

By the end of 2024, we’re aiming to continue to grow our infrastructure build-out that will include 350,000 NVIDIA H100 GPUs as part of a portfolio that will feature compute power equivalent to nearly 600,000 H100s. RSC has accelerated our open and responsible AI research by helping us build our first generation of advanced AI models.

Building

Building Portfolio Utilities Data Storage

Building Holiday Finds: How Pinterest Engineers Reimagined Gift Discovery

Pinterest Engineering

MARCH 26, 2025

Unified Logging System: We implemented comprehensive engagement tracking that helps us understand how users interact with gift content differently from standardPins. Personalization Stack Building a Gift-Optimized Recommendation System The success of Holiday Finds hinges on our ability to surface the right gift ideas at the right time.

Building

Building Engineering Algorithm Systems

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

The world we live in today presents larger datasets, more complex data, and diverse needs, all of which call for efficient, scalable data systems. These systems are built on open standards and offer immense analytical and transactional processing flexibility. These formats are transforming how organizations manage large datasets.

Architecture

Architecture Systems Data Lake Google Cloud

Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable

Data Engineering Podcast

OCTOBER 15, 2023

Summary Building streaming applications has gotten substantially easier over the past several years. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. How can you get the best results for your use case? Rudderstack : ![Rudderstack]([link]

Process

Process Building SQL BI

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

Systems

Ransomware Attacks: 3 Keys to Resilience for Your IBM i Systems

Precisely

NOVEMBER 7, 2024

Key Takeaways: In the face of ransomware attacks, a resilience strategy for IBM i systems must include measures for prevention, detection, and recovery. No platform is immune, not even the reliable and secure IBM i systems. So, how can you keep your IBM i systems resilient even as ransomware risks are on the rise?

Systems

Systems Accessibility Accessible Programming

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

In this episode he explains his approach to building AI in a more human-like fashion and the emphasis on learning rather than statistical prediction. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines.

Building

Building Data Lake High Quality Data Machine Learning

Agents of Change: Navigating 2025 with AI and Data Innovation

Data Engineering Weekly

DECEMBER 28, 2024

Enterprises are encouraged to experiment with AI, build numerous small-scale agents, learn from each, and expand their agent infrastructure over time. Investment in an Agent Management System (AMS) is crucial, as it offers a framework for scaling, monitoring, and refining AI agents.

Unstructured Data

Unstructured Data Metadata Data Government

The “10x engineer:" 50 years ago and now

The Pragmatic Engineer

MARCH 12, 2024

” Brooks agrees with this observation, and suggests a radical solution: have as few senior programmers as possible, and build a team around each one – a bit like how a hospital surgeon leads a whole team. Brooks discusses software in the context of producing operating systems, pre-internet. A most interesting addition!

Engineering

Engineering Programming Language Hospitality Programming

Entity Resolution: Your Guide to Deciding Whether to Build It or Buy It

This will help you decide whether to build an in-house entity resolution system or utilize an existing solution like the Senzing® API for entity resolution. This guide will walk you through the requirements and challenges of implementing entity resolution.

IT

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free! Data lakes are notoriously complex.

Building

Building Data Lake High Quality Data Machine Learning

The Future of Reliable Data + AI—Observing the Data, System, Code, and Model

Monte Carlo

MARCH 28, 2025

And its not sufficient to simply build these data + AI applications – as in any other technological discipline, you have to do it reliably, too. System Data + AI applications rely on a complex and interconnected web of tools and systems to deliver insights, models and automations.

Coding

Coding Systems Data Pipeline ETL Tools

Data News — Week 25.02

Christophe Blefari

JANUARY 11, 2025

Over the past four weeks, I took a break from blogging and LinkedIn to focus on building nao. AI companies are aiming for the moon—AGI—promising it will arrive once OpenAI develops a system capable of generating at least $100 billion in profits. Meanwhile, the AI landscape remains unpredictable.

Data

Data Data Warehouse Coding Programming Language

Flask Python: A Comprehensive Guide to Building Web Applications

Edureka

JANUARY 21, 2025

As an example: Systems for authenticating users Dashboards and tools for showing info A small e-commerce site with shopping carts, payment methods, and the ability to browse products. Building APIs Flask is often used to make RESTful APIs that let different apps talk to each other. Steeper learning curve for larger projects.

Python

Python Building Certification Database

Monetizing Analytics Features: Why Data Visualizations Will Never Be Enough

Think your customers will pay more for data visualizations in your application? Five years ago they may have. But today, dashboards and visualizations have become table stakes. Discover which features will differentiate your application and maximize the ROI of your embedded analytics. Brought to you by Logi Analytics.

Data

How Meta discovers data flows via lineage at scale

Engineering at Meta

JANUARY 22, 2025

It is a critical and powerful tool for scalable discovery of relevant data and data flows, which supports privacy controls across Metas systems. It enhances the traceability of data flows within systems, ultimately empowering developers to swiftly implement privacy controls and create innovative products.

Data Warehouse

Data Warehouse SQL Programming Language Data

Interesting startup idea: benchmarking cloud platform pricing

The Pragmatic Engineer

OCTOBER 17, 2024

A €150K ($165K) grant, three people, and 10 months to build it. We recently covered how CockroachDB joins the trend of moving from open source to proprietary and why Oxide decided to keep using it with self-support , regardless Web hosting: Netlify : chosen thanks to their super smooth preview system with SSR support.

Cloud

Cloud AWS Metadata Cloud Computing

Continuously Improving Developer Productivity at Snowflake

Snowflake

JANUARY 27, 2025

Consequently, over the years, our test collateral grew unchecked, the development environment became increasingly intricate and build and test times slowed down significantly, negatively impacting developer productivity. Transparency helps build customer trust and keeps feedback flowing.

Programming Language

Programming Language Coding Cloud Systems

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

JANUARY 28, 2024

Summary Monitoring and auditing IT systems for security events requires the ability to quickly analyze massive volumes of unstructured log data. The majority of products that are available either require too much effort to structure the logs, or aren't fast enough for interactive use cases.

Data Lake

Data Lake Building High Quality Data AWS

The Roots of Today's Modern Backend Engineering Practices

The Pragmatic Engineer

NOVEMBER 21, 2023

If you had a continuous deployment system up and running around 2010, you were ahead of the pack: but today it’s considered strange if your team would not have this for things like web applications. He then worked at the casual games company Zynga, building their in-game advertising platform.

Engineering

Engineering Bytes Cloud Computing AWS

Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents

dbt Developer Hub

APRIL 20, 2025

We expect that over the coming years, structured data is going to become heavily integrated into AI workflows and that dbt will play a key role in building and provisioning this data. We are committed to building the data control plane that enables AI to reliably access structured data from across your entire data lineage.

Structured Data

Structured Data SQL BI Project

An educational side project

The Pragmatic Engineer

JUNE 1, 2023

Juraj included system monitoring parts which monitor the server’s capacity he runs the app on: The monitoring page on the Rides app And it doesn’t end here. Juraj created a systems design explainer on how he built this project, and the technologies used: The systems design diagram for the Rides application The app uses: Node.js

Education

Education Project PostgreSQL Software Engineering

Data Integrity for AI: What’s Old is New Again

Precisely

JANUARY 9, 2025

The simple idea was, hey how can we get more value from the transactional data in our operational systems spanning finance, sales, customer relationship management, and other siloed functions. Data integration best practices are required to build and train the LLM or SLM with the necessary information and context.

Data Integration

Data Integration Hadoop Data Warehouse Data Lake

Unapologetically Technical Episode 17 – Semih Salihoglu

Jesse Anderson

FEBRUARY 11, 2025

Semih is a researcher and entrepreneur with a background in distributed systems and databases. He then pursued his doctoral studies at Stanford University, delving into the complexities of database systems. Dont forget to subscribe to my YouTube channel to get the latest on Unapologetically Technical!

Computer Science

Computer Science Database Design Software Engineering Software Engineer

Going from Developer to CEO: Chronosphere

The Pragmatic Engineer

OCTOBER 10, 2023

He’s solved interesting engineering challenges along the way, too – like building observability for Amazon’s EC2 offering, and being one of the first engineers on Uber’s observability platform. I wrote code for drivers on Windows, and started to put a basic observability system in place.

Software Engineering

Software Engineering Software Engineer Architecture Media

Paying down tech debt: further learnings

The Pragmatic Engineer

SEPTEMBER 19, 2024

In the early 90’s, DOS programs like the ones my company made had its own Text UI screen rendering system. This rendering system was easy for me to understand, even on day one. Our rendering system was very memory inefficient, but that could be fixed. By doing so, I got to see every screen of the system.

Recruitment

Recruitment Java Coding Project

Title Launch Observability at Netflix Scale

Netflix Tech

JANUARY 6, 2025

Part 2: Navigating Ambiguity By: VarunKhaitan With special thanks to my stunning colleagues: Mallika Rao , Esmir Mesic , HugoMarques Building on the foundation laid in Part 1 , where we explored the what behind the challenges of title launch observability at Netflix, this post shifts focus to the how.

Metadata

Metadata Algorithm Systems Building

Datadog’s $65M/year customer mystery solved

The Pragmatic Engineer

MAY 11, 2023

A very popular open-source solution for systems and services monitoring. Prometheus is part of the Cloud Native Foundation, membership of which indicates that it’s safe to build on top of Prometheus, as it’s actively maintained and will continue to be. It evaluates rules and can trigger alerts. But why is this?

AWS

AWS Software Engineering Software Engineer Google Cloud

Snowflake Startup Spotlight: DeepTempo

Snowflake

MARCH 18, 2025

Welcome to Snowflakes Startup Spotlight, where we learn about awesome companies building businesses on Snowflake. Building relationships of trust with a team and an extended network of believers and even critics is all-important. Aging rules-based systems are failing to detect attacks effectively.

Deep Learning

Deep Learning Banking Government Systems

How to Build and Monitor Systems Using Airflow?

Building a Question-Answering System Using RAG

Webinars

Trending Sources

Getting Started with Building RAG Systems Using Haystack

Webinars

Building cost effective data pipelines with Python & DuckDB

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Build Compound AI Systems Faster with Databricks Mosaic AI

Inside Facebook’s video delivery system

Building an Automatic Speech Recognition System with PyTorch & Hugging Face

Mosaic AI: Build and deploy production-quality Compound AI Systems

LLMs in Production: Tooling, Process, and Team Structure

Establishing a Large Scale Learned Retrieval System at Pinterest

Building Linked Data Products With JSON-LD

Data Migration Strategies For Large Scale Systems

Building ETL Pipeline with Snowpark

How to Achieve High-Accuracy Results When Using LLMs

Building Meta’s GenAI Infrastructure

Building Holiday Finds: How Pinterest Engineers Reimagined Gift Discovery

Why Open Table Format Architecture is Essential for Modern Data Systems

Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable

Improving the Accuracy of Generative AI Systems: A Structured Approach

Ransomware Attacks: 3 Keys to Resilience for Your IBM i Systems

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Agents of Change: Navigating 2025 with AI and Data Innovation

The “10x engineer:" 50 years ago and now

Entity Resolution: Your Guide to Deciding Whether to Build It or Buy It

Build Your Second Brain One Piece At A Time

The Future of Reliable Data + AI—Observing the Data, System, Code, and Model

Data News — Week 25.02

Flask Python: A Comprehensive Guide to Building Web Applications

Monetizing Analytics Features: Why Data Visualizations Will Never Be Enough

How Meta discovers data flows via lineage at scale

Interesting startup idea: benchmarking cloud platform pricing

Continuously Improving Developer Productivity at Snowflake

Build A Data Lake For Your Security Logs With Scanner

The Roots of Today's Modern Backend Engineering Practices

Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents

An educational side project

Data Integrity for AI: What’s Old is New Again

Unapologetically Technical Episode 17 – Semih Salihoglu

Going from Developer to CEO: Chronosphere

Paying down tech debt: further learnings

Title Launch Observability at Netflix Scale

Datadog’s $65M/year customer mystery solved

Snowflake Startup Spotlight: DeepTempo

Stay Connected