Project and Systems - Data Engineering Digest

An educational side project

The Pragmatic Engineer

JUNE 1, 2023

I’d like to share a story about an educational side project which could prove fruitful for a software engineer who’s seeking a new job. Juraj included system monitoring parts which monitor the server’s capacity he runs the app on: The monitoring page on the Rides app And it doesn’t end here. Persistence.

Education

Education Project PostgreSQL Software Engineering

Inside Facebook’s video delivery system

Engineering at Meta

DECEMBER 10, 2024

Were explaining the end-to-end systems the Facebook app leverages to deliver relevant content to people. At Facebooks scale, the systems built to support and overcome these challenges require extensive trade-off analyses, focused optimizations, and architecture built to allow our engineers to push for the same user and business outcomes.

Systems

Systems Architecture Engineering Data Pipeline

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

Summary Any software system that survives long enough will require some form of migration or evolution. When that system is responsible for the data layer the process becomes more challenging. Sriram Panyam has been involved in several projects that required migration of large volumes of data in high traffic environments.

Systems

Systems Data Lake High Quality Data Google Cloud

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

A Tour Around Buck2, Meta's New Build System

Tweag

JULY 5, 2023

Buck2 is a from-scratch rewrite of Buck , a polyglot, monorepo build system that was developed and used at Meta (Facebook), and shares a few similarities with Bazel. As you may know, the Scalable Builds Group at Tweag has a strong interest in such scalable build systems. Meta recently announced they have made Buck2 open-source.

Systems

Systems Building Java Programming Language

opam-nix: Nixify Your OCaml Projects

Tweag

FEBRUARY 15, 2023

Provide those dependencies such that the build system can find them. Run the build system. This allows for irreproducible builds, and makes it easy to forget to explicitly list a system dependency if it happens to be installed on the author’s system. Also, the templates are suited to building opam projects.

Project

Project Programming Language Building Systems

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Data transfer systems are a critical component of data enablement, and building them to support large volumes of information is a complex endeavor. With Datafold, you can seamlessly plan, translate, and validate data across systems, massively accelerating your migration project.

Systems

Systems Designing Data Lake SQL

Paying down tech debt: further learnings

The Pragmatic Engineer

SEPTEMBER 19, 2024

In the early 90’s, DOS programs like the ones my company made had its own Text UI screen rendering system. This rendering system was easy for me to understand, even on day one. Our rendering system was very memory inefficient, but that could be fixed. By doing so, I got to see every screen of the system.

Recruitment

Recruitment Java Coding Project

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

The world we live in today presents larger datasets, more complex data, and diverse needs, all of which call for efficient, scalable data systems. These systems are built on open standards and offer immense analytical and transactional processing flexibility. 2019 - Delta Lake Databricks released Delta Lake as an open-source project.

Architecture

Architecture Systems Data Lake Google Cloud

Build faster with Buck2: Our open source build system

Engineering at Meta

APRIL 6, 2023

Buck2, our new open source, large-scale build system , is now available on GitHub. Buck2 is an extensible and performant build system written in Rust and designed to make your build experience faster and more efficient. In particular, we support Sapling-based file systems. Why rebuild Buck?

Building

Building Systems Java Coding

Find the right projection with filters in ArcGIS Pro

ArcGIS

NOVEMBER 30, 2023

Learn how to filter coordinate systems based on a spatial extent, GCS, or projection property.

Project

Project Systems

Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents

dbt Developer Hub

APRIL 20, 2025

In particular, we expect both Business Intelligence and Data Engineering will be driven by AI operating on top of the context defined in your dbt Projects. Both AI agents and business stakeholders will then operate on top of LLM-driven systems hydrated by the dbt MCP context. Why does this matter? MCP addresses this challenge.

Structured Data

Structured Data SQL BI Project

The “10x engineer:" 50 years ago and now

The Pragmatic Engineer

MARCH 12, 2024

The title of the book takes aim at the “myth” that software development can be measured in “man months,” which Brooks disproves in the pages that follow: “Cost [of the software project] does indeed vary as the product of the number of men and the number of months. Progress does not. Two secretaries.

Engineering

Engineering Programming Language Hospitality Programming

Build Better Tests For Your dbt Projects With Datafold And data-diff

Data Engineering Podcast

JUNE 11, 2023

Summary Data engineering is all about building workflows, pipelines, systems, and interfaces to provide stable and reliable data. Datafold has invested a lot of time into integrating with the workflow of dbt projects to add early verification that the changes you are making are correct. What are the parallels to that in data projects?

Project

Project Building Data Lake Machine Learning

PMP Examples Application: Work Experience Examples, Projects

Knowledge Hut

MAY 23, 2024

You can find the online PMP exam application on the Project Management Institute (PMI)® website. Check Project Management professional preparation course to get started with your PMP preparation. Work Experience Your project management expertise is questioned in the next area of the online form.

Project

Project Certification Education Telecommunication

The Roots of Today's Modern Backend Engineering Practices

The Pragmatic Engineer

NOVEMBER 21, 2023

If you had a continuous deployment system up and running around 2010, you were ahead of the pack: but today it’s considered strange if your team would not have this for things like web applications. We dabbled in network engineering, database management, and system administration. and hand-rolled C -code.

Engineering

Engineering Bytes Cloud Computing AWS

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. Can you start by outlining what elements of observability are most relevant for dbt projects? Over the past ~3 years there were numerous data observability systems/products created. How is Elementary designed/implemented?

Project

Project Data Lake High Quality Data Data Workflow

Bun: lessons from disrupting a tech ecosystem

The Pragmatic Engineer

SEPTEMBER 22, 2023

It begins with a clean state, and can ship something that works for, say, 90% of existing Node projects, and break the remaining 10%. I tip my hat to all volunteer open source contributors and maintainers — both for Node, and for other projects. Bun has no such constraint. If you are one of these people: thank you!

Programming Language

Programming Language Project Coding Engineering

Did Automattic commit open source theft?

The Pragmatic Engineer

OCTOBER 18, 2024

Corporate conflict recap Automattic is the creator of open source WordPress content management system (CMS), and WordPress powers an incredible 43% of webpages and 65% of CMSes. OpenAI’s impossible business projections. According This event is shameful and unprecedented in the history of open source on the web.

Engineering

Engineering Government Project AWS

Getting Started with The Basics of Docker

Analytics Vidhya

FEBRUARY 3, 2023

Say Harish and Lisa are two people working on the same project but on two different systems(say windows and […] The post Getting Started with The Basics of Docker appeared first on Analytics Vidhya. ” If you read the above quote, you must think, what does this all mean? Well, my friend, this is what Docker is.

Coding

Coding Project Systems IT

Introducing Configurable Metaflow

Netflix Tech

DECEMBER 19, 2024

Many of these projects are under constant development by dedicated teams with their own business goals and development best practices, such as the system that supports our content decision makers , or the system that ranks which language subtitles are most valuable for a specific piece ofcontent.

Machine Learning

Machine Learning Project Data Warehouse Coding

Interesting startup idea: benchmarking cloud platform pricing

The Pragmatic Engineer

OCTOBER 17, 2024

We recently covered how CockroachDB joins the trend of moving from open source to proprietary and why Oxide decided to keep using it with self-support , regardless Web hosting: Netlify : chosen thanks to their super smooth preview system with SSR support. Internal comms: Chat: Slack Coordination / project management: Linear 3.

Cloud

Cloud AWS Metadata Cloud Computing

Data News — Week 25.02

Christophe Blefari

JANUARY 11, 2025

There are multiple ways to start a new year, either with new projects, new ideas, new resolutions or by just keeping doing the same music. AI companies are aiming for the moon—AGI—promising it will arrive once OpenAI develops a system capable of generating at least $100 billion in profits. I hope you will enjoy 2025.

Data

Data Data Warehouse Coding Programming Language

Building cost effective data pipelines with Python & DuckDB

Start Data Engineering

MAY 28, 2024

Project demo 3. Distributed systems are scalable, resilient to failures, & designed for high availability 4.5. Introduction 2. Building efficient data pipelines with DuckDB 4.1. Use DuckDB to process data, not for multiple users to access data 4.2. Cost calculation: DuckDB + Ephemeral VMs = dirt cheap data processing 4.3.

Data Pipeline

Data Pipeline Python Building Data

What is a Senior Software Engineer at Wise and Amazon?

The Pragmatic Engineer

AUGUST 1, 2023

Senior Engineers are not only expected to lead significant projects in their teams, but they have a say in whether that feature is worth building or not. The SDE3 level expects leadership on projects in which this engineer is involved. It’s not a checklist, but some expectations that could be considered: Lead a complex project.

Software Engineering

Software Engineering Software Engineer Engineering Designing

Open source business model struggles at WordPress

The Pragmatic Engineer

OCTOBER 10, 2024

Wordpress is the most popular content management system (CMS), estimated to power around 43% of all websites; a staggering number! As a response, cloud providers started the Valkey project, which could become the “new and still permissive Redis.”

Consulting

Consulting AWS Engineering Software Engineering

How Games Typically Get Built

The Pragmatic Engineer

AUGUST 22, 2023

Tommy has built his own video games, consulted on a wide variety of game projects, and for a decade has taught game development at various universities. Each project typically takes several years to create, with shifting hardware specifications and emerging competitors and trends to anticipate and react to, during the process.

Software Engineering

Software Engineering Software Engineer Consulting Entertainment

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

Snowflake

DECEMBER 4, 2024

Beyond working with well-structured data in a data warehouse, modern AI systems can use deep learning and natural language processing to work effectively with unstructured and semi-structured data in data lakes and lakehouses. AI projects should not be about “the latest” or “the best.” Leadership will be the antidote to AI exhaustion.

Unstructured Data

Unstructured Data Data Lake Deep Learning Structured Data

Weekend maintenance kicks an Italian bank offline for days

The Pragmatic Engineer

APRIL 11, 2024

From Sella’s status page : “Following the installation of an update to the operating system and related firmware which led to an unstable situation. Still, I’m puzzled by how long the system has been down. If it was an update to Oracle, or to the operating system, then why not roll back the update?

Banking

Banking Utilities Database Engineering

Simplify Data Warehouse Migrations: Free SnowConvert with Redshift Support

Snowflake

JANUARY 28, 2025

Thats why we are announcing that SnowConvert , Snowflakes high-fidelity code conversion solution to accelerate data warehouse migration projects, is now available for download for prospects, customers and partners free of charge. And today, we are announcing expanded support for code conversions from Amazon Redshift to Snowflake.

Data Warehouse

Data Warehouse Professional Services SQL Coding

Drug Launch Case Study: Amazing Efficiency Using DataOps

DataKitchen

DECEMBER 9, 2024

The project showed that smaller, empowered teams achieve higher impact than larger ones. These small, cross-functional teams ensured that members were deeply involved in the project operations, the technical setup, and the feedback cycle, leading to fewer delays, fewer bottlenecks, and faster decision-making.

Pharmaceutical

Pharmaceutical Data Lake Cloud Storage Project

Using fsspec for Unified File Management in Your Python Projects

KDnuggets

JANUARY 22, 2025

Are you looking for an easier way to manage files across different storage systems? fsspec is a Python library that simplifies file handling by providing a unified interface for file management.

Python

Python Management Project Systems

Phone Number Masking for Yelp Services Projects

Yelp Engineering

MARCH 25, 2024

We present a high level overview of our in-house phone masking system and dive into the details of the engineering challenge of optimizing the usage of proxy phone number resources at Yelp’s scale. Background Every year, millions of requests for quotes, consultations or other messages are sent to businesses on Yelp.

Project

Project Consulting Engineering Systems

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Multiple open source projects and vendors have been working together to make this vision a reality. What are the pain points that are still prevalent in lakehouse architectures as compared to warehouse or vertically integrated systems? If you've learned something or tried out a project from the show then tell us about it!

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Working at a Startup vs in Big Tech

The Pragmatic Engineer

SEPTEMBER 28, 2023

Willem Spruijt is a software engineer whom I worked on the same team with at Uber in Amsterdam, building payments systems. So we had a quarterly planning process to ensure all project dependencies were incorporated into each team’s roadmap. We cover one out of four topics in today’s subscriber-only The Pulse issue.

Software Engineering

Software Engineering Software Engineer Engineering Building

Surveying The Market Of Database Products

Data Engineering Podcast

OCTOBER 29, 2023

Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Data projects are notoriously complex. I especially like the ability to combine your technical diagrams with data documentation and dependency mapping, allowing your data engineers and data consumers to communicate seamlessly about your projects.

Database

Database SQL BI Machine Learning

Simplify Data Warehouse Migrations: Free SnowConvert

Snowflake

JANUARY 28, 2025

Thats why we are announcing that SnowConvert , Snowflakes high-fidelity code conversion solution to accelerate data warehouse migration projects, is now available for download for prospects, customers and partners free of charge. And today, we are announcing expanded support for code conversions from Amazon Redshift to Snowflake.

Data Warehouse

Data Warehouse Professional Services SQL Data

Going from Developer to CEO: Chronosphere

The Pragmatic Engineer

OCTOBER 10, 2023

They called it Office 365, and in 2010, this was a really exciting project to work on. I wrote code for drivers on Windows, and started to put a basic observability system in place. EC2 had no observability system back then: people would spin up EC2 instances but have no idea whether or not they worked.

Software Engineering

Software Engineering Software Engineer Architecture Media

Scale Unstructured Text Analytics with Batch LLM Inference

Snowflake

MARCH 6, 2025

Document RAG preparation : Ingesting, cleaning and chunking documents before embedding them into vector representations, enabling efficient retrieval and enhanced LLM responses in retrieval-augmented generation (RAG) systems. An efficient batch processing system scales in a cost-effective manner to handle growing volumes of unstructured data.

Unstructured Data

Unstructured Data Medical Media Data Workflow

Klarna’s AI chatbot: how revolutionary is it, really?

The Pragmatic Engineer

AUGUST 8, 2024

” A few days ago on 27 February, Klarna shared progress, a month after the project went live. With clever-enough probing, this system prompt can be revealed. ” What is the system prompt for Klarna’s bot? ’ The two companies have worked together ever since.” Translate to English if needed.

IT

IT Software Engineering Software Engineer Systems

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Cloudera

NOVEMBER 13, 2024

Additionally, multiple copies of the same data locked in proprietary systems contribute to version control issues, redundancies, staleness, and management headaches. This dampens confidence in the data and hampers access, in turn impacting the speed to launch new AI and analytic projects.

Metadata

Metadata Management Data Governance Government

Is Critical Thinking the Most Important Skill for Software Engineers?

The Pragmatic Engineer

APRIL 19, 2023

I still remember being in a meeting where a Very Respected Engineer was explaining how they are building a project, and they said something along the lines of "and, of course, idempotency is non-negotiable." I was sceptical that any system would automatically reject resumes, because I never saw this as a hiring manager.

Software Engineering

Software Engineering Software Engineer Engineering Media

Ensuring the Successful Launch of Ads on Netflix

Netflix Tech

JUNE 1, 2023

To do this, we devised a novel way to simulate the projected traffic weeks ahead of launch by building upon the traffic migration framework described here. Replay traffic enabled us to test our new systems and algorithms at scale before launch, while also making the traffic as realistic as possible.

Algorithm

Algorithm Kafka Metadata Systems

Cloudera AI Inference Service Enables Easy Integration and Deployment of GenAI Into Your Production Environments

Cloudera

DECEMBER 4, 2024

System metrics, such as inference latency and throughput, are available as Prometheus metrics. Users can manage all of their models and applications on the Cloudera AI Inference service with common CI/CD systems using Cloudera service accounts, also known as machine users.

Architecture

Architecture Machine Learning BI Deep Learning

An educational side project

Inside Facebook’s video delivery system

Webinars

Trending Sources

Data Migration Strategies For Large Scale Systems

Webinars

A Tour Around Buck2, Meta's New Build System

opam-nix: Nixify Your OCaml Projects

Designing Data Transfer Systems That Scale

Paying down tech debt: further learnings

Why Open Table Format Architecture is Essential for Modern Data Systems

Build faster with Buck2: Our open source build system

Find the right projection with filters in ArcGIS Pro

Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents

The “10x engineer:" 50 years ago and now

Build Better Tests For Your dbt Projects With Datafold And data-diff

PMP Examples Application: Work Experience Examples, Projects

The Roots of Today's Modern Backend Engineering Practices

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Bun: lessons from disrupting a tech ecosystem

Did Automattic commit open source theft?

Getting Started with The Basics of Docker

Top 6 Cassandra Interview Questions

Introducing Configurable Metaflow

Interesting startup idea: benchmarking cloud platform pricing

Data News — Week 25.02

Building cost effective data pipelines with Python & DuckDB

What is a Senior Software Engineer at Wise and Amazon?

Open source business model struggles at WordPress

How Games Typically Get Built

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

Weekend maintenance kicks an Italian bank offline for days

Simplify Data Warehouse Migrations: Free SnowConvert with Redshift Support

Drug Launch Case Study: Amazing Efficiency Using DataOps

Using fsspec for Unified File Management in Your Python Projects

Phone Number Masking for Yelp Services Projects

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Working at a Startup vs in Big Tech

Surveying The Market Of Database Products

Simplify Data Warehouse Migrations: Free SnowConvert

Going from Developer to CEO: Chronosphere

Scale Unstructured Text Analytics with Batch LLM Inference

Klarna’s AI chatbot: how revolutionary is it, really?

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Is Critical Thinking the Most Important Skill for Software Engineers?

Ensuring the Successful Launch of Ads on Netflix

Cloudera AI Inference Service Enables Easy Integration and Deployment of GenAI Into Your Production Environments

Stay Connected