Data Process and Pipeline-centric - Data Engineering Digest

Data Process

Pipeline-centric

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

DataKitchen

MARCH 20, 2025

Unlocking Data Team Success: Are You Process-Centric or Data-Centric? Over the years of working with data analytics teams in large and small companies, we have been fortunate enough to observe hundreds of companies. We’ve identified two distinct types of data teams: process-centric and data-centric.

Pipeline-centric

Pipeline-centric Database-centric Process Data

Centralize Your Data Processes With a DataOps Process Hub

DataKitchen

NOVEMBER 4, 2021

The typical pharmaceutical organization faces many challenges which slow down the data team: Raw, barely integrated data sets require engineers to perform manual , repetitive, error-prone work to create analyst-ready data sets. Cloud computing has made it much easier to integrate data sets, but that’s only the beginning.

Process

Process Data Process Pharmaceutical Data Lake

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Data Engineering Weekly #203

Data Engineering Weekly

JANUARY 12, 2025

With Astro, you can build, run, and observe your data pipelines in one place, ensuring your mission critical data is delivered on time. Generative AI demands the processing of vast amounts of diverse, unstructured data (e.g., Generative AI demands the processing of vast amounts of diverse, unstructured data (e.g.,

Pipeline-centric

Pipeline-centric Data Engineering Data Engineer Engineering

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

The Race For Data Quality In A Medallion Architecture The Medallion architecture pattern is gaining traction among data teams. It is a layered approach to managing and transforming data. By systematically moving data through these layers, the Medallion architecture enhances the data structure in a data lakehouse environment.

Architecture

Architecture Raw Data Pipeline-centric Data Ingestion

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Edureka

APRIL 22, 2025

Snowflake is completely managed, but its main focus is on the data warehouse layer, and users need to integrate with other tools for BI, ML, or ETL. Ideal for: Business-centric workflows involving fabric Snowflake = environments with a lot of developers and data engineers 2.

BI Pipeline-centric Data Lake Google Cloud

Snowflake Startup Challenge 2024: Announcing the 10 Semi-Finalists

Snowflake

APRIL 8, 2024

The list of Top 10 semi-finalists is a perfect example: we have use cases for cybersecurity, gen AI, food safety, restaurant chain pricing, quantitative trading analytics, geospatial data, sales pipeline measurement, marketing tech and healthcare. Our sincere thanks go out to everyone who participated in this year’s competition.

Pipeline-centric

Pipeline-centric Food Healthcare Unstructured Data

LiveRamp Customers Build ‘Foundation of Identity’ With Snowflake Native Apps

Snowflake

DECEMBER 19, 2023

“The Snowflake Native App Framework really helps them give their customers the reassurance that their data is not traveling across the internet, and that they’re able to do all of their data processing within their own environment.” One conversation quickly coming to the forefront is first-party data.

Building

Building Pipeline-centric Database-centric Digital Media

Gearing Up for Gartner Data & Analytics Summit 2025

Monte Carlo

JANUARY 21, 2025

Who Attends Expect to meet a diverse crowd: top-level executives, seasoned data scientists, technology vendors, and rising innovators. Key Themes Data-Driven Decision-Making : Learn how to build a data-centric culture that drives better outcomes. Its a unique blend of business and technical expertise under one roof.

Data Analytics

Data Analytics Pipeline-centric Food Data Lake

The Rise of the Data Engineer

Maxime Beauchemin

JANUARY 20, 2017

The fact that ETL tools evolved to expose graphical interfaces seems like a detour in the history of data processing, and would certainly make for an interesting blog post of its own. Sure, there’s a need to abstract the complexity of data processing, computation and storage.

Data Engineering

Data Engineering Data Engineer Engineering ETL Tools

The Good and the Bad of Apache Spark Big Data Processing

AltexSoft

JULY 18, 2023

Its flexibility allows it to operate on single-node machines and large clusters, serving as a multi-language platform for executing data engineering , data science , and machine learning tasks. Before diving into the world of Spark, we suggest you get acquainted with data engineering in general. Big data processing.

Big Data

Big Data Data Process Process Hadoop

Every Company is Becoming a Software Company

Confluent

SEPTEMBER 25, 2019

Of course, this is not to imply that companies will become only software (there are still plenty of people in even the most software-centric companies), just that the full scope of the business is captured in an integrated software defined process. Here, the bank loan business division has essentially become software.

Database-centric

Database-centric Kafka Pipeline-centric Retail

Building a Scalable Search Architecture

Confluent

JUNE 18, 2019

It involves many moving parts, from data preparation to building indexing and query pipelines. Luckily, this task looks a lot like the way we tackle problems that arise when connecting data. Building an indexing pipeline at scale with Kafka Connect. It is a natural evolution from the initial application-centric setup.

Architecture

Architecture Building Kafka Database-centric

How to Become a Data Engineer in 2024?

Knowledge Hut

DECEMBER 26, 2023

Data Engineering is typically a software engineering role that focuses deeply on data – namely, data workflows, data pipelines, and the ETL (Extract, Transform, Load) process. What is the role of a Data Engineer? They are required to have deep knowledge of distributed systems and computer science.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

What is a Data Engineer?

Dataquest

JANUARY 25, 2017

A data scientist is only as good as the data they have access to. Most companies store their data in variety of formats across databases and text files. This is where data engineers come in — they build pipelines that transform that data into formats that data scientists can use.

Data Engineering

Data Engineering Data Engineer Pipeline-centric Database-centric

Data Engineer Roles And Responsibilities 2022

U-Next

AUGUST 17, 2022

Data Engineers must be proficient in Python to create complicated, scalable algorithms. This language provides a solid basis for big data processing and is effective, flexible, and ideal for text analytics. Data Engineers focused on pipelines require a solid understanding of decentralized technology and computer engineering.

Data Engineering

Data Engineering Data Engineer Database-centric Pipeline-centric

Creating Value With a Data-Centric Culture: Essential Capabilities to Treat Data as a Product

Ascend.io

JUNE 8, 2023

Treating data as a product is more than a concept; it’s a paradigm shift that can significantly elevate the value that business intelligence and data-centric decision-making have on the business. Data pipelines Data integrity Data lineage Data stewardship Data catalog Data product costing Let’s review each one in detail.

Pipeline-centric

Pipeline-centric Database-centric Data Ingestion Data Pipeline

End-to-End Data Pipelines: Hitting Home Runs in Data Strategy

Ascend.io

AUGUST 29, 2023

A star-studded baseball team is analogous to an optimized “end-to-end data pipeline” — both require strategy, precision, and skill to achieve success. Just as every play and position in baseball is key to a win, each component of a data pipeline is integral to effective data management.

Data Pipeline

Data Pipeline Pipeline-centric Database-centric Data Ingestion

Hadoop vs Spark: Main Big Data Tools Explained

AltexSoft

JUNE 7, 2021

Hadoop and Spark are the two most popular platforms for Big Data processing. They both enable you to deal with huge collections of data no matter its format — from Excel tables to user feedback on websites to images and video files. Obviously, Big Data processing involves hundreds of computing units.

Big Data Tools

Big Data Tools Hadoop Big Data Database-centric

Azure Data Engineer vs Azure DevOps: Top 8 Differences

Knowledge Hut

NOVEMBER 2, 2023

An Azure Data Engineer is a professional responsible for designing, implementing, and managing data solutions using Microsoft's Azure cloud platform. They work with various Azure services and tools to build scalable, efficient, and reliable data pipelines, data storage solutions, and data processing systems.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Revolutionizing Build Analytics: How to enhance build processes with ThoughtSpot

ThoughtSpot

OCTOBER 18, 2024

This article presents the challenges associated with Build Analytics and the measures we adopted to enhance the efficiency of build processes at ThoughtSpot. This realization led us to explore alternatives and develop a custom analytics pipeline integrated with the ThoughtSpot application development process.

Building

Building Process Pipeline-centric Database-centric

What is Azure Data Factory – Here’s Everything You Need to Know

Edureka

JULY 3, 2024

ADF connects to various data sources, including on-premises systems, cloud services, and SaaS applications. It then gathers and relocates information to a centralized hub in the cloud using the Copy Activity within data pipelines. Transform and Enhance the Data: Once centralized, data undergoes transformation and enrichment.

Pipeline-centric

Pipeline-centric Data Lake Database-centric Data Pipeline

Top-Paying Data Engineer Jobs in Singapore [2023 Updated]

Knowledge Hut

FEBRUARY 27, 2023

A data engineer is a key member of an enterprise data analytics team and is responsible for handling, leading, optimizing, evaluating, and monitoring the acquisition, storage, and distribution of data across the enterprise. Data Engineers indulge in the whole data process, from data management to analysis.

Data Engineering

Data Engineering Data Engineer Database-centric Pipeline-centric

Data Pipeline vs. ETL: Which Delivers More Value?

Ascend.io

MAY 31, 2023

In the modern world of data engineering, two concepts often find themselves in a semantic tug-of-war: data pipeline and ETL. Fast forward to the present day, and we now have data pipelines. Data Ingestion Data ingestion is the first step of both ETL and data pipelines.

Data Pipeline

Data Pipeline ETL Tools Pipeline-centric Data Warehouse

Data Lineage Tools: Key Capabilities and 5 Notable Solutions

Databand.ai

JULY 19, 2023

This capability is particularly useful in complex data landscapes, where data may pass through multiple systems and transformations before reaching its final destination Impact analysis: When changes are made to data sources or data processing systems, it’s critical to understand the potential impact on downstream processes and reports.

Pipeline-centric

Pipeline-centric Data Governance Metadata Government

Snowpark Offers Expanded Capabilities Including Fully Managed Containers, Native ML APIs, New Python Versions, External Access, Enhanced DevOps and More

Snowflake

JUNE 28, 2023

Snowpark is our secure deployment and processing of non-SQL code, consisting of two layers: Familiar Client Side Libraries – Snowpark brings deeply integrated, DataFrame-style programming and OSS compatible APIs to the languages data practitioners like to use. Previously, tasks could be executed as quickly as 1-minute.

Python

Python Accessible Accessibility Pipeline-centric

Ripple's Centralized Data Platform

Ripple Engineering

JANUARY 29, 2024

The key aspect of any business-centric team in delivering products and features is to make critical decisions on ensuring low latency, high throughput, cost-effective storage, and highly efficient infrastructure. Multiple data processing systems also make building detailed dashboards and monitoring very difficult.

Database-centric

Database-centric Pipeline-centric NoSQL High Quality Data

97 things every data engineer should know

Grouparoo

OCTOBER 6, 2021

This provided a nice overview of the breadth of topics that are relevant to data engineering including data warehouses/lakes, pipelines, metadata, security, compliance, quality, and working with other teams. Open question: how to seed data in a staging environment? Test system with A/A test. Be adaptable.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Enhancing Airline Customer Journeys with AI and Real-Time Data

Striim

JULY 16, 2024

Leveraging Striim’s real-time data integration and streaming capabilities allows your airline to consistently deliver timely, personalized services that enhance customer satisfaction. American Airlines Uses Real-Time Data to Supercharge Customer Journeys Want to see Striim’s impact in action?

Pipeline-centric

Pipeline-centric Algorithm Transportation Utilities

Data Engineering Weekly #113

Data Engineering Weekly

JANUARY 8, 2023

Data Engineering Weekly Is Brought to You by RudderStack RudderStack provides data pipelines that make it easy to collect data from every application, website, and SaaS platform, then activate it in your warehouse and business tools. Pipelines for data in motion can quickly turn into DAG hell.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Azure Synapse vs Databricks: 2023 Comparison Guide

Knowledge Hut

SEPTEMBER 26, 2023

Organisations are constantly looking for robust and effective platforms to manage and derive value from their data in the constantly changing landscape of data analytics and processing. These platforms provide strong capabilities for data processing, storage, and analytics, enabling companies to fully use their data assets.

Data Lake

Data Lake Database-centric Machine Learning Pipeline-centric

How to Become an Azure Data Engineer? 2023 Roadmap

Knowledge Hut

NOVEMBER 17, 2023

The demand for data-related professions, including data engineering, has indeed been on the rise due to the increasing importance of data-driven decision-making in various industries. Becoming an Azure Data Engineer in this data-centric landscape is a promising career choice.

Data Engineering

Data Engineering Data Engineer Engineering Scala

What is Real-time Data Ingestion? Use cases, Tools, Infrastructure

Knowledge Hut

JULY 3, 2023

Real-time Data ingestion performs the utilization of data from various origins, does the data cleaning, validation, and preprocessing operations and at the end store it in the required format, either structured or unstructured. As real-time insights gain popularity, real-time data ingestion remains vital for companies worldwide.

Data Ingestion

Data Ingestion Google Cloud Pipeline-centric Media

Data Orchestration Tools (Quick Reference Guide)

Monte Carlo

NOVEMBER 14, 2023

This is the world that data orchestration tools aim to create. Data orchestration tools minimize manual intervention by automating the movement of data within data pipelines. According to one Redditor on r/dataengineering, “Seems like 99/100 data engineering jobs mention Airflow.”

Pipeline-centric

Pipeline-centric Google Cloud Python Data Workflow

Python for Data Engineering

Ascend.io

SEPTEMBER 14, 2023

Data engineers can find one for almost any need, from data extraction to complex transformations, ensuring that they’re not reinventing the wheel by writing code that’s already been written. PySpark, for instance, optimizes distributed data operations across clusters, ensuring faster data processing.

Data Engineering

Data Engineering Data Engineer Python Engineering

Azure Synapse vs. Databricks – What Are the Differences?

Edureka

JULY 4, 2024

Databricks runs on an optimized Spark version and gives you the option to select GPU-enabled clusters, making it more suitable for complex data processing. The platform’s massive parallel processing (MPP) architecture empowers you with high-performance querying of even massive datasets. But it doesn’t stop there.

Data Lake

Data Lake Pipeline-centric Data Warehouse ETL Tools

The Ultimate Modern Data Stack Migration Guide

phData: Data Engineering

JULY 18, 2023

Slow Response to New Information: Legacy data systems often lack the computation power necessary to run efficiently and can be cost-inefficient to scale. This typically results in long-running ETL pipelines that cause decisions to be made on stale or old data.

Data Warehouse

Data Warehouse Pipeline-centric Government Data

Top 12 Azure Skills That are in Demand for 2023!

Knowledge Hut

JUNE 26, 2023

Application Management Application management expertise is crucial in an Azure-centric ecosystem. Experience with Azure Kubernetes Service (AKS), Azure Container Instances (ACI), & Azure DevOps pipelines can help achieve this skill.

Cloud Computing

Cloud Computing Certification Cloud Pipeline-centric

The Top Data Analytics and Science Influencers and Content Creators on LinkedIn

Databand.ai

DECEMBER 20, 2022

Follow Eric on LinkedIn 10) Brian Femiano Senior Data Engineer at Apple Brian is a senior data engineer with nearly two decades of experience at companies like Booz Allen Hamilton, Magnetic, Pandora, and, most recently, Apple. He is also a member of The Apache Software Foundation.

Data Analytics

Data Analytics Google Cloud Data Science Data Mining

The Top Data Strategy Influencers and Content Creators on LinkedIn

Databand.ai

DECEMBER 29, 2022

Seth champions exponential change by combining existing technologies and data science to create industrial scale processes including innovative automation, IT systems and analysis pipelines to support these.

BI Consulting Data Science Data Governance

Journey to Event Driven – Part 4: Four Pillars of Event Streaming Microservices

Confluent

MAY 9, 2019

Storing events in a stream and connecting streams via stream processors provide a generic, data-centric, distributed application runtime that you can use to build ETL, event streaming applications, applications for recording metrics and anything else that has a real-time data requirement. How quickly are payments processed?

Kafka

Kafka Pipeline-centric Architecture Database-centric

The Rise of Streaming Data Architectures: What You Need to Know

Precisely

JANUARY 6, 2025

Organizations leveraging real-time data can make faster, data-driven decisions, optimize processes, and accelerate time-to-market. Your ability to deliver seamless, personalized, and timely experiences is key to success in our modern customer-centric landscape.

Data Architecture

Data Architecture Architecture Pipeline-centric Banking

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

Centralize Your Data Processes With a DataOps Process Hub

Webinars

Trending Sources

Data Engineering Weekly #203

Webinars

The Race For Data Quality in a Medallion Architecture

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Snowflake Startup Challenge 2024: Announcing the 10 Semi-Finalists

LiveRamp Customers Build ‘Foundation of Identity’ With Snowflake Native Apps

Gearing Up for Gartner Data & Analytics Summit 2025

The Rise of the Data Engineer

The Good and the Bad of Apache Spark Big Data Processing

Every Company is Becoming a Software Company

Building a Scalable Search Architecture

How to Become a Data Engineer in 2024?

What is a Data Engineer?

Data Engineer Roles And Responsibilities 2022

Creating Value With a Data-Centric Culture: Essential Capabilities to Treat Data as a Product

End-to-End Data Pipelines: Hitting Home Runs in Data Strategy

Hadoop vs Spark: Main Big Data Tools Explained

Azure Data Engineer vs Azure DevOps: Top 8 Differences

Revolutionizing Build Analytics: How to enhance build processes with ThoughtSpot

What is Azure Data Factory – Here’s Everything You Need to Know

Top-Paying Data Engineer Jobs in Singapore [2023 Updated]

Data Pipeline vs. ETL: Which Delivers More Value?

Data Lineage Tools: Key Capabilities and 5 Notable Solutions

Snowpark Offers Expanded Capabilities Including Fully Managed Containers, Native ML APIs, New Python Versions, External Access, Enhanced DevOps and More

Ripple's Centralized Data Platform

97 things every data engineer should know

Enhancing Airline Customer Journeys with AI and Real-Time Data

Data Engineering Weekly #113

Azure Synapse vs Databricks: 2023 Comparison Guide

How to Become an Azure Data Engineer? 2023 Roadmap

What is Real-time Data Ingestion? Use cases, Tools, Infrastructure

Data Orchestration Tools (Quick Reference Guide)

Python for Data Engineering

Azure Synapse vs. Databricks – What Are the Differences?

The Ultimate Modern Data Stack Migration Guide

Top 12 Azure Skills That are in Demand for 2023!

The Top Data Analytics and Science Influencers and Content Creators on LinkedIn

The Top Data Strategy Influencers and Content Creators on LinkedIn

Journey to Event Driven – Part 4: Four Pillars of Event Streaming Microservices

The Rise of Streaming Data Architectures: What You Need to Know

Stay Connected