Database-centric and Pipeline-centric - Data Engineering Digest

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

DataKitchen

MARCH 20, 2025

Unlocking Data Team Success: Are You Process-Centric or Data-Centric? We’ve identified two distinct types of data teams: process-centric and data-centric. We’ve identified two distinct types of data teams: process-centric and data-centric. They work in and on these pipelines.

Pipeline-centric

Pipeline-centric Database-centric Process Data

An IBM Z Data Integration Success Story

Precisely

MARCH 28, 2025

Some departments used IBM Db2, while others relied on VSAM files or IMS databases creating complex data governance processes and costly data pipeline maintenance. With near real-time data synchronization, the solution ensures that databases stay in sync for reporting, analytics, and data warehousing.

Data Integration

Data Integration Pipeline-centric Database-centric Kafka

Data Engineering Weekly #196

Data Engineering Weekly

NOVEMBER 3, 2024

The blog emphasizes the importance of starting with a clear client focus to avoid over-engineering and ensure user-centric development. impactdatasummit.com Thumbtack: What we learned building an ML infrastructure team at Thumbtack Thumbtack shares valuable insights from building its ML infrastructure team.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

Bronze layers can also be the raw database tables. We have also seen a fourth layer, the Platinum layer , in companies’ proposals that extend the Data pipeline to OneLake and Microsoft Fabric. The need to copy data across layers, manage different schemas, and address data latency issues can complicate data pipelines.

Architecture

Architecture Raw Data Pipeline-centric Data Ingestion

Data Engineering Weekly #182

Data Engineering Weekly

JULY 28, 2024

Adopting LLM in SQL-centric workflow is particularly interesting since companies increasingly try text-2-SQL to boost data usage. Pipeline breakpoint feature. The blog highlights the 2024 Sigmod paper Understanding the Performance Implications of the Design Principles in Storage-Disaggregated Databases.

Data Engineering

Data Engineering Data Engineer Engineering Database-centric

Serverless Data Pipelines On DataCoral

Data Engineering Podcast

APRIL 7, 2019

Summary How much time do you spend maintaining your data pipeline? Managing and auditing access to your servers and databases is a problem that grows in difficulty alongside the growth of your teams. How does the data-centric approach of DataCoral differ from the way that other platforms think about processing information?

Data Pipeline

Data Pipeline Pipeline-centric Database-centric AWS

Every Company is Becoming a Software Company

Confluent

SEPTEMBER 25, 2019

Of course, this is not to imply that companies will become only software (there are still plenty of people in even the most software-centric companies), just that the full scope of the business is captured in an integrated software defined process. Here, the bank loan business division has essentially become software.

Database-centric

Database-centric Kafka Pipeline-centric Retail

Bringing Automation To Data Labeling For Machine Learning With Watchful

Data Engineering Podcast

AUGUST 13, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. Data stacks are becoming more and more complex.

Machine Learning

Machine Learning Pipeline-centric Database-centric MongoDB

Data Engineering Weekly #174

Data Engineering Weekly

JUNE 2, 2024

link] Sponsored: DoubleCloud - More than just ClickHouse ClickHouse is the fastest, most resource-efficient OLAP database, which queries billions of rows in milliseconds and is trusted by thousands of companies for real-time analytics. The author highlights the structured approach to building data infrastructure, data management, and metrics.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Data News — Week 23.14

Christophe Blefari

APRIL 8, 2023

At the same time Maxime Beauchemin wrote a post about Entity-Centric data modeling. This week I discovered SQLMesh , a all-in-one data pipelines tool. If you want to go deeper to me Dozer looks like Materialize or Popsink but with a different vision, offering more an API as a serving layer than a database. Roboto AI raises $4.8m

Pipeline-centric

Pipeline-centric Database-centric Algorithm Data

Data News — Week 13.14

Christophe Blefari

APRIL 8, 2023

At the same time Maxime Beauchemin wrote a post about Entity-Centric data modeling. This week I discovered SQLMesh , a all-in-one data pipelines tool. If you want to go deeper to me Dozer looks like Materialize or Popsink but with a different vision, offering more an API as a serving layer than a database. Roboto AI raises $4.8m

Pipeline-centric

Pipeline-centric Database-centric Algorithm Data

CircleCI’s unnoticed holiday security breach

The Pragmatic Engineer

JANUARY 5, 2023

The first response has been frustration because of the chaos a breach like this causes: At a scaleup I talked with, infrastructure teams shut down all pipelines in order to replace secrets. Our customers are some of the most innovative, engineering-centric businesses on the planet, and helping them do great work will continue to be our focus.”

Pipeline-centric

Pipeline-centric Database-centric Coding Accessibility

LiveRamp Customers Build ‘Foundation of Identity’ With Snowflake Native Apps

Snowflake

DECEMBER 19, 2023

Sometimes they need feedback on touchpoints very quickly, while other pipelines don’t need as much acceleration. Acadia, a digital media agency, wanted to accelerate end-to-end pipeline for its clients while also enhancing security for clients’ PII. One conversation quickly coming to the forefront is first-party data.

Building

Building Pipeline-centric Database-centric Digital Media

Building a Scalable Search Architecture

Confluent

JUNE 18, 2019

As the databases professor at my university used to say, it depends. Using SQL to run your search might be enough for your use case, but as your project requirements grow and more advanced features are needed—for example, enabling synonyms, multilingual search, or even machine learning—your relational database might not be enough.

Architecture

Architecture Building Kafka Database-centric

The Rise of the Data Engineer

Maxime Beauchemin

JANUARY 20, 2017

Storage and compute is cheaper than ever, and with the advent of distributed databases that scale out linearly, the scarcer resource is engineering time. The use of natural, human readable keys and dimension attributes in fact tables is becoming more common, reducing the need for costly joins that can be heavy on distributed databases.

Data Engineering

Data Engineering Data Engineer Engineering ETL Tools

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

To illustrate that, let’s take Cloud SQL from the Google Cloud Platform that is a “Fully managed relational database service for MySQL, PostgreSQL, and SQL Server” It looks like this when you want to create an instance. You are starting to be an operation or technology centric data team.

Technology

Technology Architecture Google Cloud Metadata

What is a Data Engineer?

Dataquest

JANUARY 25, 2017

Most companies store their data in variety of formats across databases and text files. This is where data engineers come in — they build pipelines that transform that data into formats that data scientists can use. You’ll have a few different data stores: The database that backs your main app. Ride database.

Data Engineering

Data Engineering Data Engineer Pipeline-centric Database-centric

The Rise of Unstructured Data

Cloudera

NOVEMBER 15, 2021

Structured data can be defined as data that can be stored in relational databases, and unstructured data as everything else. Related to the neglect of data quality, it has been observed that much of the efforts in AI have been model-centric, that is, mostly devoted to developing and improving models , given fixed data sets.

Unstructured Data

Unstructured Data Pipeline-centric Database-centric Entertainment

How to manage and schedule dbt

Christophe Blefari

DECEMBER 19, 2022

But this article is not about the pricing which can be very subjective depending on the context—what is 1200$ for dev tooling when you pay them more than $150k per year, yes it's US-centric but relevant. But before sending your code to production you still want to validate some stuff, static or not, in the CI/CD pipelines.

Management

Management Pipeline-centric Database-centric SQL

How to Become a Data Engineer in 2024?

Knowledge Hut

DECEMBER 26, 2023

Data Engineering is typically a software engineering role that focuses deeply on data – namely, data workflows, data pipelines, and the ETL (Extract, Transform, Load) process. Data Engineers are engineers responsible for uncovering trends in data sets and building algorithms and data pipelines to make raw data beneficial for the organization.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Introducing CDP Data Engineering: Purpose Built Tooling For Accelerating Data Pipelines

Cloudera

SEPTEMBER 17, 2020

For modern data engineers using Apache Spark, DE offers an all-inclusive toolset that enables data pipeline orchestration, automation, advanced monitoring, visual troubleshooting, and a comprehensive management toolset for streamlining ETL processes and making complex data actionable across your analytic teams. Job Deployment Made Simple.

Data Pipeline

Data Pipeline Data Engineering Data Engineer Engineering

Data Engineering Weekly #186

Data Engineering Weekly

AUGUST 25, 2024

Take Astro (the fully managed Airflow solution) for a test drive today and unlock a suite of features designed to simplify, optimize, and scale your data pipelines. The author writes an overview of the performance implication of disaggregated systems compared to traditional monolithic databases.

Data Engineering

Data Engineering Data Engineer Engineering Database-centric

Data Engineering Weekly #161

Data Engineering Weekly

MARCH 3, 2024

2) Why High-Quality Data Products Beats Complexity in Building LLM Apps - Ananth Packildurai I will walk through the evolution of model-centric to data-centric AI and how data products and DPLM (Data Product Lifecycle Management) systems are vital for an organization's system.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Centralize Your Data Processes With a DataOps Process Hub

DataKitchen

NOVEMBER 4, 2021

The DataKitchen Platform serves as a process hub that builds temporary analytic databases for daily and weekly ad hoc analytics work. These limited-term databases can be generated as needed from automated recipes (orchestrated pipelines and qualification tests) stored and managed within the process hub. . The DataOps Advantage

Process

Process Data Process Pharmaceutical Data Lake

Data Engineer Roles And Responsibilities 2022

U-Next

AUGUST 17, 2022

SQL – A database may be used to build data warehousing, combine it with other technologies, and analyze the data for commercial reasons with the help of strong SQL abilities. Pipeline-centric: Pipeline-centric Data Engineers collaborate with data researchers to maximize the use of the info they gather.

Data Engineering

Data Engineering Data Engineer Database-centric Pipeline-centric

Rebuilding Netflix Video Processing Pipeline with Microservices

Netflix Tech

JANUARY 10, 2024

The Netflix video processing pipeline went live with the launch of our streaming service in 2007. By integrating with studio content systems, we enabled the pipeline to leverage rich metadata from the creative side and create more engaging member experiences like interactive storytelling.

Process

Process Pipeline-centric Media Metadata

RAG vs Fine Tuning: How to Choose the Right Method

Monte Carlo

MAY 30, 2024

Retrieval augmented generation (RAG) is an architecture framework introduced by Meta in 2020 that connects your large language model (LLM) to a curated, dynamic database. Data retrieval: Based on the query, the RAG system searches the database to find relevant data. A RAG flow in Databricks can be visualized like this.

Pipeline-centric

Pipeline-centric Database-centric Datasets Data Pipeline

Kubernetes Pods: How to Create with Examples

Knowledge Hut

APRIL 25, 2024

Kubernetes is a container-centric management software that allows the creation and deployment of containerized applications with ease. Here is a sample YAML file used to create a pod with the postgres database. To read more about Kubernetes and deployment, you can refer to the Best Kubernetes Course Online.

Database-centric

Database-centric Metadata MongoDB Pipeline-centric

5 Key Takeaways from #Current2023

Cloudera

OCTOBER 17, 2023

Use cases such as fraud monitoring, real-time supply chain insight, IoT-enabled fleet operations, real-time customer intent, and modernizing analytics pipelines are driving development activity. Their core value proposition is that streaming databases are inherently faster than Flink due to in-memory processing and state management.

Kafka

Kafka Database-centric Pipeline-centric Database

A Comprehensive Overview of Microsoft Fabric & Its Use Cases

RandomTrees

SEPTEMBER 27, 2024

With One Lake serving as a primary multi-cloud repository, Fabric is designed with an open, lake-centric architecture. Mirroring (a data replication capability) : Access and manage any database or warehouse from Fabric without switching database clients; Mirroring will be available for Azure Cosmos DB, Azure SQL DB, Snowflake, and Mongo DB.

Database-centric

Database-centric Pipeline-centric IT BI

Creating Value With a Data-Centric Culture: Essential Capabilities to Treat Data as a Product

Ascend.io

JUNE 8, 2023

Treating data as a product is more than a concept; it’s a paradigm shift that can significantly elevate the value that business intelligence and data-centric decision-making have on the business. Data pipelines Data integrity Data lineage Data stewardship Data catalog Data product costing Let’s review each one in detail.

Pipeline-centric

Pipeline-centric Database-centric Data Ingestion Data Pipeline

Top-Paying Data Engineer Jobs in Singapore [2023 Updated]

Knowledge Hut

FEBRUARY 27, 2023

In large organizations, data engineers concentrate on analytical databases, operate data warehouses that span multiple databases, and are responsible for developing table schemas. Data engineering builds data pipelines for core professionals like data scientists, consumers, and data-centric applications.

Data Engineering

Data Engineering Data Engineer Database-centric Pipeline-centric

?Data Engineer vs Machine Learning Engineer: What to Choose?

Knowledge Hut

JUNE 20, 2023

In addition, they are responsible for developing pipelines that turn raw data into formats that data consumers can use easily. Pipeline-Centric Engineer: These data engineers prefer to serve in distributed systems and more challenging projects of data science with a midsize data analytics team.

Machine Learning

Machine Learning Data Engineering Data Engineer Engineering

Kickstart Your 2023 with these 6 Articles – The Meltano Teams Favorite Data Articles of 2022

Meltano

JANUARY 25, 2023

He compared the SQL + Jinja approach to the early PHP era… […] “If you take the dataframe-centric approach, you have much more “proper” objects, and programmatic abstractions and semantics around datasets, columns, and transformations.

Pipeline-centric

Pipeline-centric Database-centric SQL Data Warehouse

Hadoop vs Spark: Main Big Data Tools Explained

AltexSoft

JUNE 7, 2021

Data engineers who previously worked only with relational database management systems and SQL queries need training to take advantage of Hadoop. Apache HBase , a noSQL database on top of HDFS, is designed to store huge tables, with millions of columns and billions of rows. Complex programming environment. Data storage options.

Big Data Tools

Big Data Tools Hadoop Big Data Database-centric

Top 10 Automation Testing Tools used in Software Industry

Knowledge Hut

SEPTEMBER 24, 2024

We can test all three layers of an application interface, the service layer and the database layer from a single console of UFT as it provides a graphical user interface. The seamless integration of this automation testing tool with CI/CD pipelines makes creating extremely complex automated tests easy without writing a single code line.

Java

Java Programming Language Pipeline-centric Database-centric

Data Engineering Weekly #137

Data Engineering Weekly

JULY 2, 2023

Editors Note: 🔥 DEW is thrilled to announce a developer-centric Data Eng & AI conference in the tech hub of Bengaluru, India, on October 12th! LinkedIn write about Hoptimator for auto generated Flink pipeline with multiple stages of systems. Can't we use the vector feature in the existing databases?

Data Engineering

Data Engineering Data Engineer Engineering Database-centric

Data Pipelines in the Healthcare Industry

DareData

JULY 29, 2020

One paper suggests that there is a need for a re-orientation of the healthcare industry to be more "patient-centric". Furthermore, clean and accessible data, along with data driven automations, can assist medical professionals in taking this patient-centric approach by freeing them from some time-consuming processes.

Data Pipeline

Data Pipeline Healthcare Medical Pipeline-centric

Revolutionizing Build Analytics: How to enhance build processes with ThoughtSpot

ThoughtSpot

OCTOBER 18, 2024

It aims to explain how we transformed our development practices with a data-centric approach and offers recommendations to help your teams address similar challenges in your software development lifecycle. Step 3: Implementing a data pipeline To automate the data collection and processing, we integrated a Jenkins job that runs hourly.

Building

Building Process Pipeline-centric Database-centric

Hexagonal Architecture: A Practical Guide

Booking.com Engineering

NOVEMBER 27, 2024

At its core, Hexagonal Architecture is a domain-centric approach. The primary goal is to make the core domain of an application independent of technical details like APIs or databases. Adapters : Implementations of ports that connect the domain with external systems, such as databases, APIs, and user interfaces.

Architecture

Architecture Database-centric Pipeline-centric Java

Recap of Hadoop News for September

ProjectPro

OCTOBER 3, 2016

News on Hadoop-September 2016 HPE adapts Vertica analytical database to world with Hadoop, Spark.TechTarget.com,September 1, 2016. has expanded its analytical database support for Apache Hadoop and Spark integration and also to enhance Apache Kafka management pipeline. To compete in a field of diverse data tools, Vertica 8.0

Hadoop

Hadoop Database-centric Pipeline-centric Big Data

Ripple's Centralized Data Platform

Ripple Engineering

JANUARY 29, 2024

For Ripple's product capabilities, the Payments team of Ripple, for example, ingests millions of transactional records into databases and performs analytics to generate invoices, reports, and other related payment operations. A lack of a centralized system makes building a single source of high-quality data difficult.

Database-centric

Database-centric Pipeline-centric NoSQL High Quality Data

What is Azure Data Factory – Here’s Everything You Need to Know

Edureka

JULY 3, 2024

It then gathers and relocates information to a centralized hub in the cloud using the Copy Activity within data pipelines. Manage Workflow: ADF manages these processes through time-sliced, scheduled pipelines. Therefore, only authorized personnel can access and manipulate data pipelines and data stores.

Pipeline-centric

Pipeline-centric Data Lake Database-centric Data Pipeline

End-to-End Data Pipelines: Hitting Home Runs in Data Strategy

Ascend.io

AUGUST 29, 2023

A star-studded baseball team is analogous to an optimized “end-to-end data pipeline” — both require strategy, precision, and skill to achieve success. Just as every play and position in baseball is key to a win, each component of a data pipeline is integral to effective data management.

Data Pipeline

Data Pipeline Pipeline-centric Database-centric Data Ingestion

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

An IBM Z Data Integration Success Story

Webinars

Trending Sources

Data Engineering Weekly #196

Webinars

The Race For Data Quality in a Medallion Architecture

Data Engineering Weekly #182

Serverless Data Pipelines On DataCoral

Every Company is Becoming a Software Company

Bringing Automation To Data Labeling For Machine Learning With Watchful

Data Engineering Weekly #174

Data News — Week 23.14

Data News — Week 13.14

CircleCI’s unnoticed holiday security breach

LiveRamp Customers Build ‘Foundation of Identity’ With Snowflake Native Apps

Building a Scalable Search Architecture

The Rise of the Data Engineer

Toward a Data Mesh (part 2) : Architecture & Technologies

What is a Data Engineer?

The Rise of Unstructured Data

How to manage and schedule dbt

How to Become a Data Engineer in 2024?

Introducing CDP Data Engineering: Purpose Built Tooling For Accelerating Data Pipelines

Data Engineering Weekly #186

Data Engineering Weekly #161

Centralize Your Data Processes With a DataOps Process Hub

Data Engineer Roles And Responsibilities 2022

Rebuilding Netflix Video Processing Pipeline with Microservices

RAG vs Fine Tuning: How to Choose the Right Method

Kubernetes Pods: How to Create with Examples

5 Key Takeaways from #Current2023

A Comprehensive Overview of Microsoft Fabric & Its Use Cases

Creating Value With a Data-Centric Culture: Essential Capabilities to Treat Data as a Product

Top-Paying Data Engineer Jobs in Singapore [2023 Updated]

?Data Engineer vs Machine Learning Engineer: What to Choose?

Kickstart Your 2023 with these 6 Articles – The Meltano Teams Favorite Data Articles of 2022

Hadoop vs Spark: Main Big Data Tools Explained

Top 10 Automation Testing Tools used in Software Industry

Data Engineering Weekly #137

Data Pipelines in the Healthcare Industry

Revolutionizing Build Analytics: How to enhance build processes with ThoughtSpot

Hexagonal Architecture: A Practical Guide

Recap of Hadoop News for September

Ripple's Centralized Data Platform

What is Azure Data Factory – Here’s Everything You Need to Know

End-to-End Data Pipelines: Hitting Home Runs in Data Strategy

Stay Connected