Data Process and Database-centric - Data Engineering Digest

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

DataKitchen

MARCH 20, 2025

Unlocking Data Team Success: Are You Process-Centric or Data-Centric? Over the years of working with data analytics teams in large and small companies, we have been fortunate enough to observe hundreds of companies. We’ve identified two distinct types of data teams: process-centric and data-centric.

Pipeline-centric

Pipeline-centric Database-centric Process Data

Centralize Your Data Processes With a DataOps Process Hub

DataKitchen

NOVEMBER 4, 2021

The typical pharmaceutical organization faces many challenges which slow down the data team: Raw, barely integrated data sets require engineers to perform manual , repetitive, error-prone work to create analyst-ready data sets. Cloud computing has made it much easier to integrate data sets, but that’s only the beginning.

Process

Process Data Process Pharmaceutical Data Lake

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

The Race For Data Quality In A Medallion Architecture The Medallion architecture pattern is gaining traction among data teams. It is a layered approach to managing and transforming data. By systematically moving data through these layers, the Medallion architecture enhances the data structure in a data lakehouse environment.

Architecture

Architecture Raw Data Pipeline-centric Data Ingestion

Webinars

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Preventing Fraud at Robinhood using Graph Intelligence

Robinhood

MARCH 4, 2024

Part 2: Types of graph intelligence for combating fraud To gain intelligence for combating fraud via graph, there are two graph algorithms. -> Type 1: Vertex-centric intelligence Vertex-centric graph intelligence helps us quantify the likelihood that the user is a bad actor.

Database-centric

Database-centric Finance Algorithm Banking

The Rise of the Data Engineer

Maxime Beauchemin

JANUARY 20, 2017

The fact that ETL tools evolved to expose graphical interfaces seems like a detour in the history of data processing, and would certainly make for an interesting blog post of its own. Sure, there’s a need to abstract the complexity of data processing, computation and storage.

Data Engineering

Data Engineering Data Engineer Engineering ETL Tools

Every Company is Becoming a Software Company

Confluent

SEPTEMBER 25, 2019

Of course, this is not to imply that companies will become only software (there are still plenty of people in even the most software-centric companies), just that the full scope of the business is captured in an integrated software defined process. Here, the bank loan business division has essentially become software.

Database-centric

Database-centric Kafka Pipeline-centric Retail

The Future of Business Intelligence is Open Source

Maxime Beauchemin

MARCH 8, 2021

For those reasons, it is not surprising that it has taken over most of the modern data stack: infrastructure, databases, orchestration, data processing, AI/ML and beyond. That’s without mentioning the fact that for a cloud-native company, Tableau’s Windows-centric approach at the time didn’t work well for the team.

Business Intelligence

Business Intelligence BI Database-centric Google Cloud

The Good and the Bad of Apache Spark Big Data Processing

AltexSoft

JULY 18, 2023

It allows data scientists to analyze large datasets and interactively run jobs on them from the R shell. Big data processing. Distributed: RDDs are distributed across the network, enabling them to be processed in parallel. Here are some of the possible use cases.

Big Data

Big Data Data Process Process Hadoop

Building a Scalable Search Architecture

Confluent

JUNE 18, 2019

As the databases professor at my university used to say, it depends. Using SQL to run your search might be enough for your use case, but as your project requirements grow and more advanced features are needed—for example, enabling synonyms, multilingual search, or even machine learning—your relational database might not be enough.

Architecture

Architecture Building Kafka Database-centric

Building a maintainable and modular LLM application stack with Hamilton

Towards Data Science

JULY 13, 2023

The example we’ll walk you through will mirror a typical LLM application workflow you’d run to populate a vector database with some text knowledge. Specifically, we’ll cover pulling data from the web, creating text embeddings (vectors) and pushing them to a vector store. The application will receive a small data input (e.g.,

Building

Building Database-centric Database Coding

Hadoop vs Spark: Main Big Data Tools Explained

AltexSoft

JUNE 7, 2021

Hadoop and Spark are the two most popular platforms for Big Data processing. They both enable you to deal with huge collections of data no matter its format — from Excel tables to user feedback on websites to images and video files. Obviously, Big Data processing involves hundreds of computing units.

Big Data Tools

Big Data Tools Hadoop Big Data Database-centric

LiveRamp Customers Build ‘Foundation of Identity’ With Snowflake Native Apps

Snowflake

DECEMBER 19, 2023

“The Snowflake Native App Framework really helps them give their customers the reassurance that their data is not traveling across the internet, and that they’re able to do all of their data processing within their own environment.” One conversation quickly coming to the forefront is first-party data.

Building

Building Pipeline-centric Database-centric Digital Media

How to Become a Data Engineer in 2024?

Knowledge Hut

DECEMBER 26, 2023

Data Engineers are skilled professionals who lay the foundation of databases and architecture. Using database tools, they create a robust architecture and later implement the process to develop the database from zero. Let us now understand the basic responsibilities of a Data engineer.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Object-centric Process Mining on Data Mesh Architectures

Data Science Blog: Data Engineering

NOVEMBER 15, 2023

In addition to Business Intelligence (BI), Process Mining is no longer a new phenomenon, but almost all larger companies are conducting this data-driven process analysis in their organization. This aspect can be applied well to Process Mining, hand in hand with BI and AI.

Architecture

Architecture Database-centric Process BI

What is a Data Engineer?

Dataquest

JANUARY 25, 2017

But what about data engineers? A data scientist is only as good as the data they have access to. Most companies store their data in variety of formats across databases and text files. This is where data engineers come in — they build pipelines that transform that data into formats that data scientists can use.

Data Engineering

Data Engineering Data Engineer Pipeline-centric Database-centric

Data Engineer Roles And Responsibilities 2022

U-Next

AUGUST 17, 2022

When organizing vast amounts of data, Data Engineering skills are most important. Data must be comprehensive and cohesive, and Data Engineers are best at this task with their set of skills. Skills Required To Be A Data Engineer. Data Engineers must be proficient in Python to create complicated, scalable algorithms.

Data Engineering

Data Engineering Data Engineer Database-centric Pipeline-centric

Top-Paying Data Engineer Jobs in Singapore [2023 Updated]

Knowledge Hut

FEBRUARY 27, 2023

Data Engineers indulge in the whole data process, from data management to analysis. Engineers work with Data Scientists to help make the most of the data they collect and have deep knowledge of distributed systems and computer science. Who is Data Engineer, and What Do They Do?

Data Engineering

Data Engineering Data Engineer Database-centric Pipeline-centric

NoSQL vs SQL- 4 Reasons Why NoSQL is better for Big Data applications

ProjectPro

MARCH 19, 2015

Big Data NoSQL databases were pioneered by top internet companies like Amazon, Google, LinkedIn and Facebook to overcome the drawbacks of RDBMS. RDBMS is not always the best solution for all situations as it cannot meet the increasing growth of unstructured data.

NoSQL

NoSQL Big Data SQL Database-centric

Ripple's Centralized Data Platform

Ripple Engineering

JANUARY 29, 2024

For Ripple's product capabilities, the Payments team of Ripple, for example, ingests millions of transactional records into databases and performs analytics to generate invoices, reports, and other related payment operations. A lack of a centralized system makes building a single source of high-quality data difficult.

Database-centric

Database-centric Pipeline-centric NoSQL High Quality Data

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

Cloudera

AUGUST 31, 2021

Case Study: Accenture’s Experience on Legacy Data Warehouse Migration into Cloudera with a Health Insurance Company . Business Problem & Background.

Data Warehouse

Data Warehouse Database-centric Metadata Cloud

Big Data vs Data Mining

Knowledge Hut

APRIL 23, 2024

Data Types Big Data Data Mining Big data refers to robust and complicated datasets that require a high level of expertise and tools for managing, processing, or analyzing. Traditional data processing techniques cannot be used.

Data Mining

Data Mining Big Data Database-centric Datasets

Azure Synapse vs Databricks: 2023 Comparison Guide

Knowledge Hut

SEPTEMBER 26, 2023

Organisations are constantly looking for robust and effective platforms to manage and derive value from their data in the constantly changing landscape of data analytics and processing. These platforms provide strong capabilities for data processing, storage, and analytics, enabling companies to fully use their data assets.

Data Lake

Data Lake Database-centric Machine Learning Pipeline-centric

Azure Data Engineer vs Azure DevOps: Top 8 Differences

Knowledge Hut

NOVEMBER 2, 2023

An Azure Data Engineer is a professional responsible for designing, implementing, and managing data solutions using Microsoft's Azure cloud platform. They work with various Azure services and tools to build scalable, efficient, and reliable data pipelines, data storage solutions, and data processing systems.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Revolutionizing Build Analytics: How to enhance build processes with ThoughtSpot

ThoughtSpot

OCTOBER 18, 2024

This article presents the challenges associated with Build Analytics and the measures we adopted to enhance the efficiency of build processes at ThoughtSpot. This pipeline is designed to capture detailed data, process it efficiently, and provide actionable insights through ThoughtSpot’s powerful analytics features.

Building

Building Process Pipeline-centric Database-centric

Python for Data Engineering

Ascend.io

SEPTEMBER 14, 2023

Data engineers can find one for almost any need, from data extraction to complex transformations, ensuring that they’re not reinventing the wheel by writing code that’s already been written. PySpark, for instance, optimizes distributed data operations across clusters, ensuring faster data processing.

Data Engineering

Data Engineering Data Engineer Python Engineering

End-to-End Data Pipelines: Hitting Home Runs in Data Strategy

Ascend.io

AUGUST 29, 2023

In this article, we’ll break down the intricacies of an end-to-end data pipeline and highlight its importance in today’s landscape. A visual maze: The tangled web of disparate tools commonly used in fragmented data pipelines. Playing the Field – Data Transformation: This is where the action happens.

Data Pipeline

Data Pipeline Pipeline-centric Database-centric Data Ingestion

Creating Value With a Data-Centric Culture: Essential Capabilities to Treat Data as a Product

Ascend.io

JUNE 8, 2023

Treating data as a product is more than a concept; it’s a paradigm shift that can significantly elevate the value that business intelligence and data-centric decision-making have on the business. It is the stage where data truly becomes a product, delivering tangible value to its end users.

Pipeline-centric

Pipeline-centric Database-centric Data Ingestion Data Pipeline

Unlocking the Power of Geospatial Data for Insights

Snowflake

JANUARY 15, 2025

Over the last three geospatial-centric blog posts, weve covered the basics of what geospatial data is, how it works in the broader world of data and how it specifically works in Snowflake based on our native support for GEOGRAPHY , GEOMETRY and H3.

Transportation

Transportation BI Database-centric Metadata

How to Become an Azure Data Engineer? 2023 Roadmap

Knowledge Hut

NOVEMBER 17, 2023

The demand for data-related professions, including data engineering, has indeed been on the rise due to the increasing importance of data-driven decision-making in various industries. Becoming an Azure Data Engineer in this data-centric landscape is a promising career choice.

Data Engineering

Data Engineering Data Engineer Engineering Scala

Top Big Data Tools You Need to Know in 2023

Knowledge Hut

DECEMBER 27, 2023

Variety : Refers to the professed formats of data, from structured, numeric data in traditional databases, to unstructured text documents, emails, videos, audios, stock ticker data and financial transactions. Some examples of Big Data: 1. However, big data analytics and using big data tools must be learned.

Big Data Tools

Big Data Tools Big Data Hadoop Database-centric

Data Lineage Tools: Key Capabilities and 5 Notable Solutions

Databand.ai

JULY 19, 2023

This capability is particularly useful in complex data landscapes, where data may pass through multiple systems and transformations before reaching its final destination Impact analysis: When changes are made to data sources or data processing systems, it’s critical to understand the potential impact on downstream processes and reports.

Pipeline-centric

Pipeline-centric Data Governance Metadata Government

97 things every data engineer should know

Grouparoo

OCTOBER 6, 2021

36 Give Data Products a Frontend with Latent Documentation Document more to help everyone 37 How Data Pipelines Evolve Build ELT at mid-range and move to data lakes when you need scale 38 How to Build Your Data Platform like a Product PM your data with business. Increase visibility. how fast are queries?

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

What is Azure Data Factory – Here’s Everything You Need to Know

Edureka

JULY 3, 2024

There are three types: Azure IR (fully managed serverless compute), Self-Hosted IR (for private network data stores), and Azure-SSIS IR (for running SSIS packages). Azure Data Factory Data Migration: Overview Cross-Region: Source & Sink Setup: Configure data source (storage accounts, databases) in both regions.

Pipeline-centric

Pipeline-centric Data Lake Database-centric Data Pipeline

What is Real-time Data Ingestion? Use cases, Tools, Infrastructure

Knowledge Hut

JULY 3, 2023

Studying data in deeper detail can help to identify the inefficiencies, bottlenecks or anomalies hence leading to quick actions resulting in efficient operations and reduced costs. Appreciated Customer Experience: The industry focuses on customer-centric approaches to enhance the overall customer experience.

Data Ingestion

Data Ingestion Google Cloud Pipeline-centric Media

Data Pipeline vs. ETL: Which Delivers More Value?

Ascend.io

MAY 31, 2023

Data Transformation Because of the many variations of source systems, the data collected during the ingestion phase is often raw, messy, and unstructured. In the ETL world, data transformation is intended to change the structure of the source data to match a specific target database schema, usually in the context of a data warehouse.

Data Pipeline

Data Pipeline ETL Tools Pipeline-centric Data Warehouse

How To Become a Project Manager From Software Engineer?

Knowledge Hut

OCTOBER 8, 2023

Not only that, but they are also responsible for working on web applications, content management systems, databases, and operating systems. I had earlier chosen KnowledgeHut’s training in Project Management to understand such processes efficiently.

Software Engineering

Software Engineering Software Engineer Project Engineering

Top 12 Azure Skills That are in Demand for 2023!

Knowledge Hut

JUNE 26, 2023

Application Management Application management expertise is crucial in an Azure-centric ecosystem. Microsoft Certification: Azure Data Fundamentals Azure Data Fundamentals is designed for individuals who want to gain knowledge of data principles & core concepts related to Azure data services.

Cloud Computing

Cloud Computing Certification Cloud Pipeline-centric

The Ultimate Modern Data Stack Migration Guide

phData: Data Engineering

JULY 18, 2023

First up, let’s dive into the foundation of every Modern Data Stack, a cloud-based data warehouse. Central Source of Truth for Analytics A Cloud Data Warehouse (CDW) is a type of database that provides analytical data processing and storage capabilities within a cloud-based infrastructure.

Data Warehouse

Data Warehouse Pipeline-centric Government Data

Journey to Event Driven – Part 2: Programming Models for the Event-Driven Architecture

Confluent

FEBRUARY 13, 2019

Akka Streams then changed tact as streaming became the core mechanism to drive processors in a more data-centric manner. We think of streams and events much like database tables and rows; they are the basic building blocks of a data platform. The term is called turning the database inside out.

Architecture

Architecture Programming Kafka Database-centric

Journey to Event Driven – Part 4: Four Pillars of Event Streaming Microservices

Confluent

MAY 9, 2019

Storing events in a stream and connecting streams via stream processors provide a generic, data-centric, distributed application runtime that you can use to build ETL, event streaming applications, applications for recording metrics and anything else that has a real-time data requirement. Instrumentation plane.

Kafka

Kafka Pipeline-centric Architecture Systems

Unlocking the Power of Geospatial Data for Insights

Snowflake

JANUARY 15, 2025

Over the last three geospatial-centric blog posts, weve covered the basics of what geospatial data is, how it works in the broader world of data and how it specifically works in Snowflake based on our native support for GEOGRAPHY , GEOMETRY and H3.

Transportation

Transportation BI Database-centric Metadata

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

Centralize Your Data Processes With a DataOps Process Hub

Webinars

Trending Sources

The Race For Data Quality in a Medallion Architecture

Webinars

Preventing Fraud at Robinhood using Graph Intelligence

The Rise of the Data Engineer

Every Company is Becoming a Software Company

The Future of Business Intelligence is Open Source

The Good and the Bad of Apache Spark Big Data Processing

Building a Scalable Search Architecture

Building a maintainable and modular LLM application stack with Hamilton

Hadoop vs Spark: Main Big Data Tools Explained

LiveRamp Customers Build ‘Foundation of Identity’ With Snowflake Native Apps

How to Become a Data Engineer in 2024?

Object-centric Process Mining on Data Mesh Architectures

What is a Data Engineer?

Data Engineer Roles And Responsibilities 2022

Top-Paying Data Engineer Jobs in Singapore [2023 Updated]

NoSQL vs SQL- 4 Reasons Why NoSQL is better for Big Data applications

Ripple's Centralized Data Platform

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

Big Data vs Data Mining

Azure Synapse vs Databricks: 2023 Comparison Guide

Azure Data Engineer vs Azure DevOps: Top 8 Differences

Revolutionizing Build Analytics: How to enhance build processes with ThoughtSpot

Python for Data Engineering

End-to-End Data Pipelines: Hitting Home Runs in Data Strategy

Creating Value With a Data-Centric Culture: Essential Capabilities to Treat Data as a Product

Unlocking the Power of Geospatial Data for Insights

How to Become an Azure Data Engineer? 2023 Roadmap

Top Big Data Tools You Need to Know in 2023

Data Lineage Tools: Key Capabilities and 5 Notable Solutions

97 things every data engineer should know

What is Azure Data Factory – Here’s Everything You Need to Know

What is Real-time Data Ingestion? Use cases, Tools, Infrastructure

Data Pipeline vs. ETL: Which Delivers More Value?

How To Become a Project Manager From Software Engineer?

Top 12 Azure Skills That are in Demand for 2023!

The Ultimate Modern Data Stack Migration Guide

Journey to Event Driven – Part 2: Programming Models for the Event-Driven Architecture

Journey to Event Driven – Part 4: Four Pillars of Event Streaming Microservices

Unlocking the Power of Geospatial Data for Insights

Stay Connected