Pipeline-centric - Data Engineering Digest

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

DataKitchen

MARCH 20, 2025

Unlocking Data Team Success: Are You Process-Centric or Data-Centric? We’ve identified two distinct types of data teams: process-centric and data-centric. We’ve identified two distinct types of data teams: process-centric and data-centric. They work in and on these pipelines.

Pipeline-centric

Pipeline-centric Database-centric Process Data

An IBM Z Data Integration Success Story

Precisely

MARCH 28, 2025

Some departments used IBM Db2, while others relied on VSAM files or IMS databases creating complex data governance processes and costly data pipeline maintenance. By implementing data replication from the IBM Z, the team was able to build data pipelines to distributed targets, ensuring that each application use case could be supported.

Data Integration

Data Integration Pipeline-centric Database-centric Kafka

Data Engineering Weekly #203

Data Engineering Weekly

JANUARY 12, 2025

With Astro, you can build, run, and observe your data pipelines in one place, ensuring your mission critical data is delivered on time. meeting recordings and videos), which contrasts with traditional SQL-centric systems for structured data. Generative AI demands the processing of vast amounts of diverse, unstructured data (e.g.,

Pipeline-centric

Pipeline-centric Data Engineering Data Engineer Engineering

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Data Engineering Podcast

JUNE 30, 2024

How does the focus on data assets/data products shift your approach to observability as compared to a table/pipeline centric approach? How does the focus on data assets/data products shift your approach to observability as compared to a table/pipeline centric approach?

Pipeline-centric

Pipeline-centric Engineering Data Lake High Quality Data

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Edureka

APRIL 22, 2025

Ideal for: Business-centric workflows involving fabric Snowflake = environments with a lot of developers and data engineers 2. Ideal for: Fabric: Microsoft-centric organizations Snowflake: Multi-cloud flexibility seekers 3. Cloud support Microsoft Fabric: Works only on Microsoft Azure.

BI

BI Pipeline-centric Data Lake Google Cloud

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

Snowflake

APRIL 17, 2024

This traditional SQL-centric approach often challenged data engineers working in a Python environment, requiring context-switching and limiting the full potential of Python’s rich libraries and frameworks. The post Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease appeared first on Snowflake.

Data Pipeline

Data Pipeline Python Data Engineering Data Engineer

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

We have also seen a fourth layer, the Platinum layer , in companies’ proposals that extend the Data pipeline to OneLake and Microsoft Fabric. The need to copy data across layers, manage different schemas, and address data latency issues can complicate data pipelines. However, this architecture is not without its challenges.

Architecture

Architecture Raw Data Pipeline-centric Data Ingestion

Delivering Modern Enterprise Data Engineering with Cloudera Data Engineering on Azure

Cloudera

JULY 13, 2021

CDP Data Engineering offers an all-inclusive toolset that enables data pipeline orchestration, automation, advanced monitoring, visual profiling, and a comprehensive management toolset for streamlining ETL processes and making complex data actionable across your analytic teams. . A key aspect of ETL or ELT pipelines is automation.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Data Engineering Weekly #196

Data Engineering Weekly

NOVEMBER 3, 2024

The blog emphasizes the importance of starting with a clear client focus to avoid over-engineering and ensure user-centric development. impactdatasummit.com Thumbtack: What we learned building an ML infrastructure team at Thumbtack Thumbtack shares valuable insights from building its ML infrastructure team.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Snowflake Startup Challenge 2024: Announcing the 10 Semi-Finalists

Snowflake

APRIL 8, 2024

The list of Top 10 semi-finalists is a perfect example: we have use cases for cybersecurity, gen AI, food safety, restaurant chain pricing, quantitative trading analytics, geospatial data, sales pipeline measurement, marketing tech and healthcare. Our sincere thanks go out to everyone who participated in this year’s competition.

Pipeline-centric

Pipeline-centric Food Healthcare Unstructured Data

Data Engineering Weekly #214

Data Engineering Weekly

MARCH 30, 2025

One thing that stands out to me is As AI-driven data workflows increase in scale and become more complex, modern data stack tools such as drag-and-drop ETL solutions are too brittle, expensive, and inefficient for dealing with the higher volume and scale of pipeline and orchestration approaches.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Use Consistent And Up To Date Customer Profiles To Power Your Business With Segment Unify

Data Engineering Podcast

MAY 7, 2023

However, that's also something we're re-thinking with our warehouse-centric strategy. Rudderstack]([link] RudderStack provides all your customer data pipelines in one platform. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines.

Pipeline-centric

Pipeline-centric Data Lake Machine Learning Data Warehouse

Pandas 2.0: A Game-Changer for Data Scientists?

Towards Data Science

JUNE 27, 2023

Although I wasn’t aware of all the hype, the Data-Centric AI Community promptly came to the rescue: The 2.0 There is nothing worst for a data flow than wrong typesets , especially within a data-centric AI paradigm. In the new release, users can rest to sure that their pipelines won’t break if they’re using pandas 2.0,

Pipeline-centric

Pipeline-centric Data Science Machine Learning Datasets

The Global Impact of Cloudera in Our Daily Lives

Cloudera

SEPTEMBER 27, 2024

At EVOLVE in Singapore, the Manila Electric Company, Meralco , won the Cloudera 2024 Data Impact Award in the Leadership and Transformation category for its customer-centric and data-driven transformation. But, what is the ultimate impact of all this effort and investment on each of us in our daily lives?

Pipeline-centric

Pipeline-centric Pharmaceutical Telecommunication Data Analytics

Data Engineering Weekly #182

Data Engineering Weekly

JULY 28, 2024

Adopting LLM in SQL-centric workflow is particularly interesting since companies increasingly try text-2-SQL to boost data usage. Pipeline breakpoint feature. I like testing people on their practical knowledge rather than artificial coding challenges. Swiggy recently wrote about its internal platform, Hermes, a text-to-SQL solution.

Data Engineering

Data Engineering Data Engineer Engineering Database-centric

Streaming SQL in Data Mesh

Netflix Tech

NOVEMBER 3, 2023

When a user wants to leverage Data Mesh to move and transform data, they start by creating a new Data Mesh pipeline. The pipeline is composed of individual “Processors” that are connected by Kafka topics. Furthermore, many pipelines needed to be composed of multiple Processors. Overview of the SQL Processor workflow.

SQL

SQL Pipeline-centric Kafka Data

Serverless Data Pipelines On DataCoral

Data Engineering Podcast

APRIL 7, 2019

Summary How much time do you spend maintaining your data pipeline? How does the data-centric approach of DataCoral differ from the way that other platforms think about processing information? How does the data-centric approach of DataCoral differ from the way that other platforms think about processing information?

Data Pipeline

Data Pipeline Pipeline-centric Database-centric AWS

CircleCI’s unnoticed holiday security breach

The Pragmatic Engineer

JANUARY 5, 2023

The first response has been frustration because of the chaos a breach like this causes: At a scaleup I talked with, infrastructure teams shut down all pipelines in order to replace secrets. Our customers are some of the most innovative, engineering-centric businesses on the planet, and helping them do great work will continue to be our focus.”

Pipeline-centric

Pipeline-centric Database-centric Coding Accessibility

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

You are starting to be an operation or technology centric data team. How to build Data Products or never call me Data Pipeline any more You have this interesting schema in her second article on Data Mesh by Zhamak Dehghani : “Data mesh introduces the concept of data product as its architectural quantum.

Technology

Technology Architecture Google Cloud Metadata

Bringing Automation To Data Labeling For Machine Learning With Watchful

Data Engineering Podcast

AUGUST 13, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. Data stacks are becoming more and more complex.

Machine Learning

Machine Learning Pipeline-centric Database-centric MongoDB

Introducing CDP Data Engineering: Purpose Built Tooling For Accelerating Data Pipelines

Cloudera

SEPTEMBER 17, 2020

For modern data engineers using Apache Spark, DE offers an all-inclusive toolset that enables data pipeline orchestration, automation, advanced monitoring, visual troubleshooting, and a comprehensive management toolset for streamlining ETL processes and making complex data actionable across your analytic teams. Job Deployment Made Simple.

Data Pipeline

Data Pipeline Data Engineer Data Engineering Engineering

Data News — Week 24.37

Christophe Blefari

SEPTEMBER 13, 2024

NVidia released Eagle a vision-centric multimodal LLM — Look at the example in the Github repo, given an image and a user input the LLM is able to answer things like "Describe the image in detail" or "Which car in the picture is more aerodynamic" based on a drawing.

Pipeline-centric

Pipeline-centric Data Python Data Science

Data Engineering Weekly #174

Data Engineering Weekly

JUNE 2, 2024

The resulting solution was SnowPatrol, an OSS app that alerts on anomalous Snowflake usage, powered by ML Airflow pipelines. link] Adevinta: How we moved from local scripts and spreadsheets shared by email to Data Products Data Product Thinking Shaping the data management to build a reliable, customer-centric data application.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Data News — Week 23.14

Christophe Blefari

APRIL 8, 2023

At the same time Maxime Beauchemin wrote a post about Entity-Centric data modeling. This week I discovered SQLMesh , a all-in-one data pipelines tool. I hope he will fill the gaps. In the first part he treats about the history of modeling and the main concepts. When it comes to modeling it's hard not to mention dbt.

Pipeline-centric

Pipeline-centric Database-centric Algorithm Data

Data News — Week 13.14

Christophe Blefari

APRIL 8, 2023

At the same time Maxime Beauchemin wrote a post about Entity-Centric data modeling. This week I discovered SQLMesh , a all-in-one data pipelines tool. I hope he will fill the gaps. In the first part he treats about the history of modeling and the main concepts. When it comes to modeling it's hard not to mention dbt.

Pipeline-centric

Pipeline-centric Database-centric Algorithm Data

United Bank Limited optimizes its data analytics with the Cloudera Data Platform (CDP)

Cloudera

JANUARY 24, 2023

Next, it needed to enhance the company’s customer-centric approach for a needs-based alignment of products and services. We are positive that our continuing partnership with Cloudera and Blutech Consulting will be foundational to our customer-centric approach, considerably improving our customer responsiveness,” he said.

Banking

Banking Data Analytics Pipeline-centric IT

LiveRamp Customers Build ‘Foundation of Identity’ With Snowflake Native Apps

Snowflake

DECEMBER 19, 2023

Sometimes they need feedback on touchpoints very quickly, while other pipelines don’t need as much acceleration. Acadia, a digital media agency, wanted to accelerate end-to-end pipeline for its clients while also enhancing security for clients’ PII. One conversation quickly coming to the forefront is first-party data.

Building

Building Pipeline-centric Database-centric Digital Media

Gearing Up for Gartner Data & Analytics Summit 2025

Monte Carlo

JANUARY 21, 2025

Key Themes Data-Driven Decision-Making : Learn how to build a data-centric culture that drives better outcomes. If youre attending, be sure to stop by Booth #219 to learn more about how data observability can enable your team to build AI-ready pipelines. Its a unique blend of business and technical expertise under one roof.

Data Analytics

Data Analytics Pipeline-centric Food Data Lake

Rebuilding Netflix Video Processing Pipeline with Microservices

Netflix Tech

JANUARY 10, 2024

The Netflix video processing pipeline went live with the launch of our streaming service in 2007. By integrating with studio content systems, we enabled the pipeline to leverage rich metadata from the creative side and create more engaging member experiences like interactive storytelling.

Process

Process Pipeline-centric Media Metadata

Apache Ozone and Dense Data Nodes

Cloudera

APRIL 22, 2021

Look at details of volumes/buckets/keys/containers/pipelines/datanodes. Given a file, find out what nodes/pipeline is it part of. Seamlessly scale the architecture to thousands of nodes with a single pane of glass management using Cisco Application Centric Infrastructure (ACI).

Pipeline-centric

Pipeline-centric Data Lake Hadoop Big Data

Cloudera Customer Story

Cloudera

DECEMBER 13, 2023

To enable LGIM to better utilize its wealth of data, LGIM required a centralized platform that made internal data discovery easy for all teams and could securely integrate external partners and third-party outsourced data pipelines. To realize this cohesive data vision, LGIM adopted Cloudera Data Platform (CDP) Public Cloud.

Pipeline-centric

Pipeline-centric Professional Services BI Datasets

Bring Gen AI & LLMs to Your Data

Snowflake

JUNE 28, 2023

Since it’s all part of Snowflake’s single platform, data engineers and developers can also perform inference by programmatically calling the built-in or fine-tuned models, like in pipelines with Streams and Tasks or in applications. Learn More: Learn more about how Snowflake is building a data-centric platform for generative AI and LLM.

Pipeline-centric

Pipeline-centric Unstructured Data Data Government

Every Company is Becoming a Software Company

Confluent

SEPTEMBER 25, 2019

Of course, this is not to imply that companies will become only software (there are still plenty of people in even the most software-centric companies), just that the full scope of the business is captured in an integrated software defined process. Here, the bank loan business division has essentially become software.

Database-centric

Database-centric Kafka Pipeline-centric Retail

How DataOps is Transforming Commercial Pharma Analytics

DataKitchen

AUGUST 27, 2021

The data pipelines must contend with a high level of complexity – over seventy data sources and a variety of cadences, including daily/weekly updates and builds. Perhaps more importantly, data engineers and scientists may change any part of the automated pipelines related to data at any time. That’s the power of DataOps automation.

Pharmaceutical

Pharmaceutical Pipeline-centric Data Analytics Data Lake

Data Engineering Weekly #161

Data Engineering Weekly

MARCH 3, 2024

2) Why High-Quality Data Products Beats Complexity in Building LLM Apps - Ananth Packildurai I will walk through the evolution of model-centric to data-centric AI and how data products and DPLM (Data Product Lifecycle Management) systems are vital for an organization's system.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Building a Scalable Search Architecture

Confluent

JUNE 18, 2019

It involves many moving parts, from data preparation to building indexing and query pipelines. Building an indexing pipeline at scale with Kafka Connect. Always keep an eye on their performance and make sure they run in the expected time to allow your pipeline to function properly. Scaling indexing. Interested in more?

Architecture

Architecture Building Kafka Database-centric

What is a Data Engineer?

Dataquest

JANUARY 25, 2017

This is where data engineers come in — they build pipelines that transform that data into formats that data scientists can use. Roughly, the operations in a data pipeline consist of the following phases: Ingestion — this involves gathering in the needed data. A data scientist is only as good as the data they have access to.

Data Engineering

Data Engineering Data Engineer Pipeline-centric Database-centric

The Rise of Unstructured Data

Cloudera

NOVEMBER 15, 2021

Related to the neglect of data quality, it has been observed that much of the efforts in AI have been model-centric, that is, mostly devoted to developing and improving models , given fixed data sets. Data Cascades are said to be pervasive, to lack immediate visibility, but to eventually impact the world in a negative manner. Conclusions.

Unstructured Data

Unstructured Data Pipeline-centric Database-centric Entertainment

Building for Inclusivity: The Technical Blueprint of Pinterest’s Multidimensional Diversification

Pinterest Engineering

SEPTEMBER 20, 2023

These external partnerships along with our internal fashion specialists and labellers were fundamental in helping us design the experience from both a technical and human-centric perspective. The resulting structured dataset becomes the foundation to train and evaluate the machine learning model known as the body type signal.

Building

Building Pipeline-centric Machine Learning Datasets

How to manage and schedule dbt

Christophe Blefari

DECEMBER 19, 2022

But this article is not about the pricing which can be very subjective depending on the context—what is 1200$ for dev tooling when you pay them more than $150k per year, yes it's US-centric but relevant. But before sending your code to production you still want to validate some stuff, static or not, in the CI/CD pipelines.

Management

Management Pipeline-centric Database-centric SQL

How to Become a Data Engineer in 2024?

Knowledge Hut

DECEMBER 26, 2023

Data Engineering is typically a software engineering role that focuses deeply on data – namely, data workflows, data pipelines, and the ETL (Extract, Transform, Load) process. Data Engineers are engineers responsible for uncovering trends in data sets and building algorithms and data pipelines to make raw data beneficial for the organization.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Centralize Your Data Processes With a DataOps Process Hub

DataKitchen

NOVEMBER 4, 2021

These limited-term databases can be generated as needed from automated recipes (orchestrated pipelines and qualification tests) stored and managed within the process hub. . The data pipelines must contend with a high level of complexity – over seventy data sources and various cadences, including daily/weekly updates and builds.

Process

Process Data Process Pharmaceutical Data Lake

Data Entropy?—?More Data, More Problems?

Towards Data Science

MAY 19, 2023

Data engineers spend countless hours troubleshooting broken pipelines. Data plays a central role in modern organisations; the centricity here is not just a figure of speech, as data teams often sit between traditional IT and different business functions. Every “minor” change upstream results in mayhem. trillion to the U.S.

Pipeline-centric

Pipeline-centric Data Software Engineer Software Engineering

Data Engineering Weekly #186

Data Engineering Weekly

AUGUST 25, 2024

Take Astro (the fully managed Airflow solution) for a test drive today and unlock a suite of features designed to simplify, optimize, and scale your data pipelines. The author writes about integrating uwheel with DataFusion (a query engine for building high-quality data-centric systems in Rust , using the Apache Arrow in-memory format).

Data Engineering

Data Engineering Data Engineer Engineering Database-centric

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

An IBM Z Data Integration Success Story

Webinars

Trending Sources

Data Engineering Weekly #203

Webinars

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

The Race For Data Quality in a Medallion Architecture

Delivering Modern Enterprise Data Engineering with Cloudera Data Engineering on Azure

Data Engineering Weekly #196

Snowflake Startup Challenge 2024: Announcing the 10 Semi-Finalists

Data Engineering Weekly #214

Use Consistent And Up To Date Customer Profiles To Power Your Business With Segment Unify

Pandas 2.0: A Game-Changer for Data Scientists?

The Global Impact of Cloudera in Our Daily Lives

Data Engineering Weekly #182

Streaming SQL in Data Mesh

Serverless Data Pipelines On DataCoral

CircleCI’s unnoticed holiday security breach

Toward a Data Mesh (part 2) : Architecture & Technologies

Bringing Automation To Data Labeling For Machine Learning With Watchful

Introducing CDP Data Engineering: Purpose Built Tooling For Accelerating Data Pipelines

Data News — Week 24.37

Data Engineering Weekly #174

Data News — Week 23.14

Data News — Week 13.14

United Bank Limited optimizes its data analytics with the Cloudera Data Platform (CDP)

LiveRamp Customers Build ‘Foundation of Identity’ With Snowflake Native Apps

Gearing Up for Gartner Data & Analytics Summit 2025

Rebuilding Netflix Video Processing Pipeline with Microservices

Apache Ozone and Dense Data Nodes

Cloudera Customer Story

Bring Gen AI & LLMs to Your Data

Every Company is Becoming a Software Company

How DataOps is Transforming Commercial Pharma Analytics

Data Engineering Weekly #161

Building a Scalable Search Architecture

What is a Data Engineer?

The Rise of Unstructured Data

Building for Inclusivity: The Technical Blueprint of Pinterest’s Multidimensional Diversification

How to manage and schedule dbt

How to Become a Data Engineer in 2024?

Centralize Your Data Processes With a DataOps Process Hub

Data Entropy?—?More Data, More Problems?

Data Engineering Weekly #186

Stay Connected