Blog, Pipeline-centric and Systems - Data Engineering Digest

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

DataKitchen

MARCH 20, 2025

Unlocking Data Team Success: Are You Process-Centric or Data-Centric? We’ve identified two distinct types of data teams: process-centric and data-centric. We’ve identified two distinct types of data teams: process-centric and data-centric. They work in and on these pipelines.

Pipeline-centric

Pipeline-centric Database-centric Process Data

The Recommendation System at Lyft

Lyft Engineering

APRIL 3, 2023

This blog post focuses on the scope and the goals of the recommendation system, and explores some of the most recent changes the Rider team has made to better serve Lyft’s riders. Introduction: Scope of the Recommendation System The recommendation system covers user experiences throughout the ride journey.

Systems

Systems Pipeline-centric Machine Learning Transportation

Data Engineering Weekly #203

Data Engineering Weekly

JANUARY 12, 2025

With Astro, you can build, run, and observe your data pipelines in one place, ensuring your mission critical data is delivered on time. This blog captures the current state of Agent adoption, emerging software engineering roles, and the use case category. Save Your Spot → Chirag Shah & Ryen W.

Pipeline-centric

Pipeline-centric Data Engineer Data Engineering Engineering

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Data Engineering Weekly #196

Data Engineering Weekly

NOVEMBER 3, 2024

Foundation Capital: A System of Agents brings Service-as-Software to life software is no longer simply a tool for organizing work; software becomes the worker itself, capable of understanding, executing, and improving upon traditionally human-delivered services. 60+ speakers from LinkedIn, Shopify, Amazon, Lyft, Grammarly, Mistral, et al.

Data Engineer

Data Engineer Data Engineering Engineering Pipeline-centric

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

This foundational layer is a repository for various data types, from transaction logs and sensor data to social media feeds and system logs. We have also seen a fourth layer, the Platinum layer , in companies’ proposals that extend the Data pipeline to OneLake and Microsoft Fabric.

Architecture

Architecture Raw Data Pipeline-centric Data Ingestion

Data Engineering Weekly #214

Data Engineering Weekly

MARCH 30, 2025

One thing that stands out to me is As AI-driven data workflows increase in scale and become more complex, modern data stack tools such as drag-and-drop ETL solutions are too brittle, expensive, and inefficient for dealing with the higher volume and scale of pipeline and orchestration approaches. We all bet on 2025 being the year of Agents.

Data Engineer

Data Engineer Data Engineering Engineering Pipeline-centric

Use Consistent And Up To Date Customer Profiles To Power Your Business With Segment Unify

Data Engineering Podcast

MAY 7, 2023

The challenge is that most companies have a multitude of systems that contain fragments of the customer's interactions and stitching that together is complex and time consuming. Segment created the Unify product to reduce the burden of building a comprehensive view of customers and synchronizing it to all of the systems that need it.

Pipeline-centric

Pipeline-centric Data Lake Machine Learning Data Warehouse

Data Engineering Weekly #182

Data Engineering Weekly

JULY 28, 2024

The blog is an excellent summarization of the common patterns emerging in GenAI platforms. Adopting LLM in SQL-centric workflow is particularly interesting since companies increasingly try text-2-SQL to boost data usage. Pipeline breakpoint feature. A key highlight for me is the following features from Maestro.

Data Engineer

Data Engineer Data Engineering Engineering Database-centric

Rebuilding Netflix Video Processing Pipeline with Microservices

Netflix Tech

JANUARY 10, 2024

This introductory blog focuses on an overview of our journey. Future blogs will provide deeper dives into each service, sharing insights and lessons learned from this process. Future blogs will provide deeper dives into each service, sharing insights and lessons learned from this process.

Process

Process Pipeline-centric Media Metadata

Data Engineering Weekly #174

Data Engineering Weekly

JUNE 2, 2024

Workflow Optimization : Decomposing complex tasks into smaller, manageable steps and prioritizing deterministic workflows can enhance the reliability and performance of LLM-based systems. The resulting solution was SnowPatrol, an OSS app that alerts on anomalous Snowflake usage, powered by ML Airflow pipelines.

Data Engineer

Data Engineer Data Engineering Engineering Pipeline-centric

Apache Ozone and Dense Data Nodes

Cloudera

APRIL 22, 2021

on Cisco UCS S3260 M5 Rack Server with Apache Ozone as the distributed file system for CDP. Look at details of volumes/buckets/keys/containers/pipelines/datanodes. Given a file, find out what nodes/pipeline is it part of. Cloudera will publish separate blog posts with results of performance benchmarks.

Pipeline-centric

Pipeline-centric Data Lake Hadoop Big Data

Data News — Week 24.37

Christophe Blefari

SEPTEMBER 13, 2024

in the OpenAI system card explained that the model was able during a cybersecurity challenge (a CTF) to understand a failing Docker environment (due to infra) and still be able to find the flag. /s Lots of stories about exceptional things the model can do have been published today—e.g. How the UK football rely heavily on data?

Pipeline-centric

Pipeline-centric Data Python Data Science

Building for Inclusivity: The Technical Blueprint of Pinterest’s Multidimensional Diversification

Pinterest Engineering

SEPTEMBER 20, 2023

These external partnerships along with our internal fashion specialists and labellers were fundamental in helping us design the experience from both a technical and human-centric perspective. To learn more about engineering at Pinterest, check out the rest of our Engineering Blog and visit our Pinterest Labs site.

Building

Building Pipeline-centric Machine Learning Datasets

United Bank Limited optimizes its data analytics with the Cloudera Data Platform (CDP)

Cloudera

JANUARY 24, 2023

Next, it needed to enhance the company’s customer-centric approach for a needs-based alignment of products and services. We are positive that our continuing partnership with Cloudera and Blutech Consulting will be foundational to our customer-centric approach, considerably improving our customer responsiveness,” he said.

Banking

Banking Data Analytics Pipeline-centric IT

Building a Scalable Search Architecture

Confluent

JUNE 18, 2019

It involves many moving parts, from data preparation to building indexing and query pipelines. It also requires both systems to always be available, so no maintenance windows are possible. Distributed transactions are very hard to implement successfully, which is why we’ll introduce a log-inspired system such as Apache Kafka ®.

Architecture

Architecture Building Kafka Database-centric

Data Engineering Weekly #186

Data Engineering Weekly

AUGUST 25, 2024

Take Astro (the fully managed Airflow solution) for a test drive today and unlock a suite of features designed to simplify, optimize, and scale your data pipelines. The blog is a good overview of various components in a typical data stack. Get Guide → Marc Olson: Continuous reinvention: A brief history of block storage at AWS.

Data Engineer

Data Engineer Data Engineering Engineering Database-centric

Data Entropy?—?More Data, More Problems?

Towards Data Science

MAY 19, 2023

Webster’s dictionary defines Entropy in thermodynamics as a measure of the unavailable energy in a closed thermodynamic system that is also usually considered to be a measure of the system’s disorder. Data engineers spend countless hours troubleshooting broken pipelines. More can be found in this blog.

Pipeline-centric

Pipeline-centric Data Software Engineer Software Engineering

Data Engineering Weekly #161

Data Engineering Weekly

MARCH 3, 2024

Here is the agenda, 1) Data Application Lifecycle Management - Harish Kumar( Paypal) Hear from the team in PayPal on how they build the data product lifecycle management (DPLM) systems. 3) DataOPS at AstraZeneca The AstraZeneca team talks about data ops best practices internally established and what worked and what didn’t work!!!

Data Engineer

Data Engineer Data Engineering Engineering Pipeline-centric

The Rise of the Data Engineer

Maxime Beauchemin

JANUARY 20, 2017

This discipline also integrates specialization around the operation of so called “big data” distributed systems, along with concepts around the extended Hadoop ecosystem, stream processing, and in computation at scale. Those systems have been taught to normalize the data for storage on their own.

Data Engineer

Data Engineer Data Engineering Engineering ETL Tools

How DataOps is Transforming Commercial Pharma Analytics

DataKitchen

AUGUST 27, 2021

The Otezla team built a system with tens of thousands of automated tests checking data and analytics quality. The data pipelines must contend with a high level of complexity – over seventy data sources and a variety of cadences, including daily/weekly updates and builds. That’s the power of DataOps automation. It’s that simple. .

Pharmaceutical

Pharmaceutical Pipeline-centric Data Analytics Data Lake

The Rise of Unstructured Data

Cloudera

NOVEMBER 15, 2021

This blog discusses quantifications, types, and implications of data. The activity in the field of learning with limited data is reflected in a variety of courses , workshops , reports , blogs and a large number of academic papers (a curated list of which can be found here ). Quantifications of data. Addressing the challenges of data.

Unstructured Data

Unstructured Data Pipeline-centric Database-centric Entertainment

How to Become a Data Engineer in 2024?

Knowledge Hut

DECEMBER 26, 2023

Data Engineering is typically a software engineering role that focuses deeply on data – namely, data workflows, data pipelines, and the ETL (Extract, Transform, Load) process. Data Engineers are engineers responsible for uncovering trends in data sets and building algorithms and data pipelines to make raw data beneficial for the organization.

Data Engineer

Data Engineer Data Engineering Engineering Hadoop

Transforming MLOps at DoorDash with Machine Learning Workbench

DoorDash Engineering

NOVEMBER 28, 2023

It is amusing for a human being to write an article about artificial intelligence in a time when AI systems, powered by machine learning (ML), are generating their own blog posts. I frequently check Pipeline Runs and Sensor Ticks, but, often verify with Dagit.”

Machine Learning

Machine Learning Pipeline-centric Data Science Designing

Top 10 Automation Testing Tools used in Software Industry

Knowledge Hut

SEPTEMBER 24, 2024

In this blog post, we will see the top Automation testing tools used in the software industry. Supports major operating systems:- Windows, Linux, and Mac. TestComplete is essentially a Windows-based application and thus cannot run on Linux/Unix systems. A webDriver-based tool called Appium can be used for mobile applications.

Java

Java Programming Language Pipeline-centric Database-centric

5 Key Takeaways from #Current2023

Cloudera

OCTOBER 17, 2023

This blog is for anyone who was interested but unable to attend the conference, or anyone interested in a quick summary of what happened there. Use cases such as fraud monitoring, real-time supply chain insight, IoT-enabled fleet operations, real-time customer intent, and modernizing analytics pipelines are driving development activity.

Kafka

Kafka Database-centric Pipeline-centric Database

Creating Value With a Data-Centric Culture: Essential Capabilities to Treat Data as a Product

Ascend.io

JUNE 8, 2023

Treating data as a product is more than a concept; it’s a paradigm shift that can significantly elevate the value that business intelligence and data-centric decision-making have on the business. Data pipelines Data integrity Data lineage Data stewardship Data catalog Data product costing Let’s review each one in detail.

Pipeline-centric

Pipeline-centric Database-centric Data Ingestion Data Pipeline

Kickstart Your 2023 with these 6 Articles – The Meltano Teams Favorite Data Articles of 2022

Meltano

JANUARY 25, 2023

A curated list of the top 9 must read blogs on data. At the end of 2022 we decided to collect the blogs we enjoyed the most over the year. The data world is in turmoil and lots of exciting things happen every day, week and year. Happy reading! Happy reading!

Pipeline-centric

Pipeline-centric Database-centric SQL Data Warehouse

Data Pipelines in the Healthcare Industry

DareData

JULY 29, 2020

We have heard news of machine learning systems outperforming seasoned physicians on diagnosis accuracy, chatbots that present recommendations depending on your symptoms , or algorithms that can identify body parts from transversal image slices , just to name a few. What makes a good Data Pipeline?

Data Pipeline

Data Pipeline Healthcare Medical Pipeline-centric

Kubernetes Pods: How to Create with Examples

Knowledge Hut

APRIL 25, 2024

Kubernetes (sometimes shortened to K8s with the 8 standing for the number of letters between the “K” and the “s”) is an open-source system to deploy, scale, and manage containerized applications anywhere. Kubernetes is a container-centric management software that allows the creation and deployment of containerized applications with ease.

Database-centric

Database-centric Metadata MongoDB Pipeline-centric

Data Engineering Weekly #137

Data Engineering Weekly

JULY 2, 2023

Editors Note: 🔥 DEW is thrilled to announce a developer-centric Data Eng & AI conference in the tech hub of Bengaluru, India, on October 12th! LinkedIn write about Hoptimator for auto generated Flink pipeline with multiple stages of systems. See how it works today.

Data Engineer

Data Engineer Data Engineering Engineering Database-centric

Data Engineer Roles And Responsibilities 2022

U-Next

AUGUST 17, 2022

Data Engineers create a system that gathers, handles, and transforms unprocessed data into useful information that data researchers and Data Analysts may use to evaluate it in several contexts. . Pipeline-centric: Pipeline-centric Data Engineers collaborate with data researchers to maximize the use of the info they gather.

Data Engineer

Data Engineer Data Engineering Database-centric Pipeline-centric

Data Lineage Tools: Key Capabilities and 5 Notable Solutions

Databand.ai

JULY 19, 2023

Data lineage tools provide a visual representation of your data’s journey across multiple systems and transformations. This feature is particularly useful in complex data architectures, where data may pass through multiple systems and transformations.

Pipeline-centric

Pipeline-centric Data Governance Metadata Government

Netflix Video Quality at Scale with Cosmos Microservices

Netflix Tech

NOVEMBER 2, 2021

Moorthy and Zhi Li Introduction Measuring video quality at scale is an essential component of the Netflix streaming pipeline. The coupling problem Until recently, video quality measurements were generated as part of our Reloaded production system. We call this system Cosmos. by Christos G. Bampis , Chao Chen , Anush K.

Media

Media Pipeline-centric Database-centric Algorithm

Data Engineering Weekly #125

Data Engineering Weekly

APRIL 2, 2023

Meta: Presto - A Decade of SQL Analytics at Meta Presto and Kafka are the two systems that greatly impacted data infrastructure in the last decade. As with any good system, Presto went through many optimizations. There are some interesting threads on Twitter, but the highlight for me is the design of the Tweet search system.

Data Engineer

Data Engineer Data Engineering Engineering Pipeline-centric

Data Engineering Weekly #127

Data Engineering Weekly

APRIL 16, 2023

Data Engineering Weekly Is Brought to You by RudderStack RudderStack provides data pipelines that make collecting data from every application, website, and SaaS platform easy, then activating it in your warehouse and business tools. Watch On-demand Niels Claeys: Use dbt and Duckdb instead of Spark in data pipelines AWS u-24tb1.metal

Data Engineer

Data Engineer Data Engineering Engineering Pipeline-centric

A summary of Gartner’s recent DataOps-driven data engineering best practices article

DataKitchen

FEBRUARY 21, 2023

As a result, a less senior team member was made responsible for modifying a production pipeline. Make Trusted Data Products with Reusable Modules : “Many organizations are operating monolithic data systems and processes that massively slow their data delivery time.” Build analytic data systems that have modular, reusable components.

Data Engineer

Data Engineer Data Engineering Engineering Pipeline-centric

Snowpark Offers Expanded Capabilities Including Fully Managed Containers, Native ML APIs, New Python Versions, External Access, Enhanced DevOps and More

Snowflake

JUNE 28, 2023

It provides familiar APIs for various data centric tasks, including data preparation, cleansing, preprocessing, model training, and deployments tasks. In the warehouse model, users can seamlessly run and operationalize data pipelines, ML models, and data applications with user-defined functions (UDFs) and stored procedures (sprocs).

Python

Python Accessible Accessibility Pipeline-centric

Data Contracts and 4 Other Ways to Overcome Schema Changes

Monte Carlo

JULY 28, 2022

These are particularly frustrating, because while they are breaking data pipelines constantly, it’s not their fault. Tight coupling Upstream data quality challenges are oftentimes a result of tight coupling between systems. In fact, most of the time they are unaware of these data quality challenges. Image courtesy of Andrew Jones.

Software Engineer

Software Engineer Software Engineering Pipeline-centric Database-centric

Using DataOps to Drive Agility and Business Value

DataKitchen

JUNE 24, 2021

Chapin shared that even though GE had embraced agile practices since 2013, the company still struggled with massive amounts of legacy systems. It provides the ability] to incrementally and constantly improve the system. . Be business-centric. Success Requires Focus on Business Outcomes, Benchmarking.

Pipeline-centric

Pipeline-centric Education Manufacturing Data Cleanse

Enabling Data Mesh Principles for Organizational Agility

Snowflake

AUGUST 21, 2023

In this blog post, we’ll review the core data mesh principles, highlight how both organizations and modern data platforms are putting those principles into action, and demonstrate just how achievable a secure and efficient data mesh architecture can be.

Pipeline-centric

Pipeline-centric Architecture Government Data Architect

2023 in a nutshell —ride along!

Picnic Engineering

DECEMBER 19, 2023

In this blog, we’d like to give you a glimpse into some of the major developments in Picnic Tech in 2023. This approach not only helps in maintaining system stability but also in predicting potential issues, enabling proactive measures. July: Introduction of a new Transport Planning System ? Join us and have a read!

Transportation

Transportation Pipeline-centric Database-centric Python

DevOps Release Management

Edureka

JULY 14, 2024

In this blog, we’ll discuss DevOps release management, its process, best practices, and the advantages of release manager in Devops. It encompasses the planning, scheduling, and controlling of software builds and delivery pipelines. This includes unit tests, integration tests, system tests, and acceptance tests.

Management

Management Pipeline-centric Coding Software Engineer

The Top Data Analytics and Science Influencers and Content Creators on LinkedIn

Databand.ai

DECEMBER 20, 2022

He is also an open-source developer at The Apache Software Foundation and the author of Hysterical , a popular blog on tech careers and topics like data, coding, and engineering. Brian shares advice regularly on his Medium blog and GitHub , as well as on LinkedIn, focusing on topics like data science, data engineering, data strategy, and SQL.

Data Analytics

Data Analytics Google Cloud Data Science Data Mining

5 Steps for Migrating from Elasticsearch to Rockset for Real-Time Analytics

Rockset

NOVEMBER 1, 2022

This blog outlines best practices from customers I have helped migrate from Elasticsearch to Rockset , reducing risk and avoiding common pitfalls. Elasticsearch has become ubiquitous as an index centric datastore for search and rose in tandem with the popularity of the internet and Web2.0.

Database-centric

Database-centric SQL Pipeline-centric Aggregated Data

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

The Recommendation System at Lyft

Webinars

Trending Sources

Data Engineering Weekly #203

Webinars

Data Engineering Weekly #196

The Race For Data Quality in a Medallion Architecture

Data Engineering Weekly #214

Use Consistent And Up To Date Customer Profiles To Power Your Business With Segment Unify

Data Engineering Weekly #182

Rebuilding Netflix Video Processing Pipeline with Microservices

Data Engineering Weekly #174

Apache Ozone and Dense Data Nodes

Data News — Week 24.37

Building for Inclusivity: The Technical Blueprint of Pinterest’s Multidimensional Diversification

United Bank Limited optimizes its data analytics with the Cloudera Data Platform (CDP)

Building a Scalable Search Architecture

Data Engineering Weekly #186

Data Entropy?—?More Data, More Problems?

Data Engineering Weekly #161

The Rise of the Data Engineer

How DataOps is Transforming Commercial Pharma Analytics

The Rise of Unstructured Data

How to Become a Data Engineer in 2024?

Transforming MLOps at DoorDash with Machine Learning Workbench

Top 10 Automation Testing Tools used in Software Industry

5 Key Takeaways from #Current2023

Creating Value With a Data-Centric Culture: Essential Capabilities to Treat Data as a Product

Kickstart Your 2023 with these 6 Articles – The Meltano Teams Favorite Data Articles of 2022

Data Pipelines in the Healthcare Industry

Kubernetes Pods: How to Create with Examples

Data Engineering Weekly #137

Data Engineer Roles And Responsibilities 2022

Data Lineage Tools: Key Capabilities and 5 Notable Solutions

Netflix Video Quality at Scale with Cosmos Microservices

Data Engineering Weekly #125

Data Engineering Weekly #127

A summary of Gartner’s recent DataOps-driven data engineering best practices article

Snowpark Offers Expanded Capabilities Including Fully Managed Containers, Native ML APIs, New Python Versions, External Access, Enhanced DevOps and More

Data Contracts and 4 Other Ways to Overcome Schema Changes

Using DataOps to Drive Agility and Business Value

Enabling Data Mesh Principles for Organizational Agility

2023 in a nutshell —ride along!

DevOps Release Management

The Top Data Analytics and Science Influencers and Content Creators on LinkedIn

5 Steps for Migrating from Elasticsearch to Rockset for Real-Time Analytics

Stay Connected