Data Management and Data Process - Data Engineering Digest

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Summary Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. What do you have planned for the future of your academic research?

Data Process

Data Process Process Data Lake High Quality Data

Master Data Management: Common Misconceptions You Should Know

Precisely

OCTOBER 23, 2023

When most people think of master data management, they first think of customers and products. But master data encompasses so much more than data about customers and products. Challenges of Master Data Management A decade ago, master data management (MDM) was a much simpler proposition than it is today.

Data Management

Data Management Management Data Government

Composable data management at Meta

Engineering at Meta

MAY 22, 2024

In recent years, Meta’s data management systems have evolved into a composable architecture that creates interoperability, promotes reusability, and improves engineering efficiency. Data is at the core of every product and service at Meta. Data is at the core of every product and service at Meta.

Data Management

Data Management Management Data SQL

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Mastering the Art of ETL on AWS for Data Management

ProjectPro

JUNE 6, 2025

With so much riding on the efficiency of ETL processes for data engineering teams, it is essential to take a deep dive into the complex world of ETL on AWS to take your data management to the next level. This is particularly useful for companies that need to process data in near-real-time.

AWS

AWS Data Management ETL Tools Management

Mastering Batch Data Processing with Versatile Data Kit (VDK)

Towards Data Science

NOVEMBER 16, 2023

Data Management A tutorial on how to use VDK to perform batch data processing Photo by Mika Baumeister on Unsplash Versatile Data Ki t (VDK) is an open-source data ingestion and processing framework designed to simplify data management complexities.

Data Process

Data Process Process Raw Data Data

Massively Parallel Data Processing In Python Without The Effort Using Bodo

Data Engineering Podcast

SEPTEMBER 24, 2021

In this episode Ehsan Totoni explains how he built the Bodo project to bring the speed and processing power of HPC techniques to the Python data ecosystem without requiring any re-work. What are the techniques/technologies that teams might use to optimize or scale out their data processing workflows?

Data Process

Data Process Python Process Data Lake

Aligning Velox and Apache Arrow: Towards composable data management

Engineering at Meta

FEBRUARY 20, 2024

This new convergence helps Meta and the larger community build data management systems that are unified, more efficient, and composable. Meta’s Data Infrastructure teams have been rethinking how data management systems are designed.

Data Management

Data Management Bytes Management Datasets

Build Your Python Data Processing Your Way And Run It Anywhere With Fugue

Data Engineering Podcast

FEBRUARY 20, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.

Python

Python Data Process IT Process

Supporting And Expanding The Arrow Ecosystem For Fast And Efficient Data Processing At Voltron Data

Data Engineering Podcast

NOVEMBER 27, 2022

In this episode Wes McKinney shares the ways that Arrow and its related projects are improving the efficiency of data systems and driving their next stage of evolution. Can you describe what you are building at Voltron Data and the story behind it? Can you describe what you are building at Voltron Data and the story behind it?

Data Process

Data Process Process Metadata Business Intelligence

Cloudera announces ‘Interoperability Ecosystem’ with founding members AWS and Snowflake

Cloudera

DECEMBER 4, 2024

Secure, Real-Time Insights : Combine robust governance with real-time analytics for efficient, secure data management and AI-driven insights. For example: An AWS customer using Cloudera for hybrid workloads can now extend analytics workflows to Snowflake, gaining deeper insights without moving data across infrastructures.

AWS

AWS Raw Data Relational Database Government

Taking Charge of Tables: Introducing OpenHouse for Big Data Management

LinkedIn Engineering

JULY 19, 2023

We’ll also introduce OpenHouse’s control plane, specifics of the deployed system at LinkedIn including our managed Iceberg lakehouse, and the impact and roadmap for future development of OpenHouse, including a path to open source. Managed Iceberg Lakehouse At LinkedIn, OpenHouse tables are persisted on HDFS in Iceberg table format.

Big Data

Big Data Data Management Management Metadata

Secrets of Spark to Snowflake Migration Success: Customer Stories

Snowflake

NOVEMBER 19, 2024

To overcome these hurdles, CTC moved its processing off of managed Spark and onto Snowflake, where it had already built its data foundation. Thanks to the reduction in costs, CTC now maximizes data to further innovate and increase its market-making capabilities.

Data Governance

Data Governance Government Healthcare Data Integration

Top 10 Data Engineering Trends in 2025

Edureka

APRIL 22, 2025

AI-powered data engineering solutions make it easier to streamline the data management process, which helps businesses find useful insights with little to no manual work. Real-time data processing has emerged The demand for real-time data handling is expected to increase significantly in the coming years.

Data Engineer

Data Engineer Data Engineering Engineering Consulting

How To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

Examples include “reduce data processing time by 30%” or “minimize manual data entry errors by 50%.” Deploy DataOps DataOps , or Data Operations, is an approach that applies the principles of DevOps to data management. How effective are your current data workflows?

Data Pipeline

Data Pipeline Metadata Data Workflow Data

The Ultimate 101 Guide to Apache Airflow DAGS

ProjectPro

JUNE 6, 2025

Looking for an efficient tool for streamlining and automating your data processing workflows? Let's consider an example of a data processing pipeline that involves ingesting data from various sources, cleaning it, and then performing analysis. Airflow operators hold the data processing logic.

Data Pipeline

Data Pipeline PostgreSQL Python Database

7 Popular Azure ETL Tools for Data Engineers in 2025

ProjectPro

JUNE 6, 2025

Learn all about Azure ETL Tools in minutes with this quick guide, showcasing the top 7 Azure tools with their key features, pricing, and pros/cons for your data processing needs. Many are turning to Azure ETL tools for their simplicity and efficiency, offering a seamless experience for easy data extraction, transformation, and loading.

ETL Tools

ETL Tools Data Engineer Data Engineering Data Lake

An Exploration Of The Composable Customer Data Platform

Data Engineering Podcast

APRIL 9, 2023

Summary The customer data platform is a category of services that was developed early in the evolution of the current era of cloud services for data processing. Can you describe what you mean by a "composable CDP"? What are some of the key ways that it differs from the ways that we think of a CDP today?

Data Lake

Data Lake Data Warehouse Machine Learning Data

Snowflake Architecture and It's Fundamental Concepts

ProjectPro

JUNE 6, 2025

Snowflake Data Marketplace gives users rapid access to various third-party data sources. Moreover, numerous sources offer unique third-party data that is instantly accessible when needed. Snowflake's machine learning partners transfer most of their automated feature engineering down into Snowflake's cloud data platform.

Architecture

Architecture IT Data Warehouse Amazon Web Services

6 Ways To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

Examples include “reduce data processing time by 30%” or “minimize manual data entry errors by 50%.” Deploy DataOps DataOps , or Data Operations, is an approach that applies the principles of DevOps to data management. How effective are your current data workflows?

Data Pipeline

Data Pipeline Metadata Data Workflow Data

How to Become a Data Architect in 2025?

ProjectPro

JUNE 6, 2025

According to the Data Management Body of Knowledge, a Data Architect "provides a standard common business vocabulary, expresses strategic requirements, outlines high-level integrated designs to meet those requirements, and aligns with enterprise strategy and related business architecture."

Data Architect

Data Architect Data Mining Programming Language Java

Snowflake Startup Spotlight: Innova-Q

Snowflake

APRIL 7, 2025

The Snowflake Native App Framework enables us to develop and deploy data-intensive applications directly within the Snowflake ecosystem. This integration allows us to leverage Snowflake's robust data processing and storage features, enabling our AI-driven compliance and quality management tools to operate efficiently and at scale.

Food

Food Data Transparency Software Engineering Software Engineer

7 Best Data Warehousing Tools for Efficient Data Storage Needs

ProjectPro

JUNE 6, 2025

This article will explore the top seven data warehousing tools that simplify the complexities of data storage, making it more efficient and accessible. So, read on to discover these essential tools for your data management needs. Table of Contents What are Data Warehousing Tools? Why Choose a Data Warehousing Tool?

Data Storage

Data Storage PostgreSQL Data Warehouse AWS

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

Summary Streaming data processing enables new categories of data products and analytics. Unfortunately, reasoning about stream processing engines is complex and lacks sufficient tooling. Data observability has been gaining adoption for a number of years now, with a large focus on data warehouses.

Process

Process Data Lake High Quality Data Government

Trends and Takeaways from Banking and Payments’ Event of the Year

Snowflake

NOVEMBER 11, 2024

Internally, banks are using AI to reduce the burden of data management, including data lineage and data quality controls, or drive efficiencies with business intelligence particularly in call centers. Commercially, we heard AI use cases around treasury services, fraud detection and risk analytics.

Banking

Banking Finance Retail Food

Telco Enterprise Data Platforms: Key Success Factors in Building for an AI Future

Cloudera

DECEMBER 17, 2024

Since 5G networks began rolling out commercially in 2019, telecom carriers have faced a wide range of new challenges: managing high-velocity workloads, reducing infrastructure costs, and adopting AI and automation. From customer service to network management, AI-driven automation will transform the way carriers run their businesses.

Telecommunication

Telecommunication Building Data Architecture Architecture

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

DataKitchen

MARCH 20, 2025

Understanding this framework offers valuable insights into team efficiency, operational excellence, and data quality. Process-centric data teams focus their energies predominantly on orchestrating and automating workflows. The path to better data management is accessible and rewarding, regardless of your starting point.

Pipeline-centric

Pipeline-centric Database-centric Process Data

From Zero to ETL Hero-A-Z Guide to Become an ETL Developer

ProjectPro

JUNE 6, 2025

The role of an ETL developer is to extract data from multiple sources, transform it into a usable format and load it into a data warehouse or any other destination database. ETL developers are the backbone of a successful data management strategy as they ensure that the data is consistent and accurate for data-driven decision-making.

ETL Tools

ETL Tools Data Cleanse Data Warehouse Big Data

Snowflake Startup Challenge 2025: Meet the Top 10

Snowflake

APRIL 9, 2025

It employs Snowpark Container Services to build scalable AI/ML models for satellite data processing and Snowflake AI/ML functions to enable advanced analytics and predictive insights for satellite operators.

Pharmaceutical

Pharmaceutical Manufacturing Data Ingestion SQL

Complete Guide to Data Transformation: Basics to Advanced

Ascend.io

OCTOBER 28, 2024

Advanced Data Transformation Techniques For data engineers ready to push the boundaries, advanced data transformation techniques offer the tools to tackle complex data challenges and drive innovation. Automated testing and validation steps can also streamline transformation processes, ensuring reliable outcomes.

Raw Data

Raw Data Aggregated Data Data Pipeline Data Validation

Build Real Time Applications With Operational Simplicity Using Dozer

Data Engineering Podcast

JULY 23, 2023

Summary Real-time data processing has steadily been gaining adoption due to advances in the accessibility of the technologies involved. To bring streaming data in reach of application engineers Matteo Pelati helped to create Dozer. What was your decision process for building Dozer as open source?

Building

Building SQL Machine Learning Data Ingestion

How to Build an ETL Pipeline in Python? (Hands-On Example)

ProjectPro

JUNE 6, 2025

In this blog, you’ll build a complete ETL pipeline in Python to perform data extraction from the Spotify API, followed by data manipulation and transformation for analysis. You’ll walk through each stage of the data processing workflow, similar to what’s used in production-grade systems.

Python

Python Building PostgreSQL Raw Data

Effective Pandas Patterns For Data Engineering

Data Engineering Podcast

JANUARY 30, 2022

He recently wrote a book on effective patterns for Pandas code, and in this episode he shares advice on how to write efficient data processing routines that will scale with your data volumes, while being understandable and maintainable. What are the main tasks that you have seen Pandas used for in a data engineering context?

Data Engineer

Data Engineer Data Engineering Engineering Consulting

Top 10 Essential Data Engineering Skills

ProjectPro

JUNE 6, 2025

To do that, they must use data visualization tools such as Microsoft Power BI , Tableau , etc., to create visualizations that narrate the characteristics of the data at hand. Real-time data processing frameworks are used to process data streams and handle data as it is generated.

Data Engineer

Data Engineer Data Engineering Engineering Hadoop

Spark vs Hive - What's the Difference

ProjectPro

JUNE 6, 2025

Apache Hive and Apache Spark are the two popular Big Data tools available for complex data processing. To effectively utilize the Big Data tools, it is essential to understand the features and capabilities of the tools. Spark SQL, for instance, enables structured data processing with SQL.

Hadoop

Hadoop Java Big Data Tools SQL

Build a Data Mesh Architecture Using Teradata VantageCloud on AWS

Teradata

MAY 30, 2025

This emphasis on simplicity and ease of use in workload management simplifies operations and minimizes complexity. Teradata Block File System (BFS) enhances data domain isolation by providing a high-performance, scalable storage solution that supports efficient data management and retrieval.

AWS

AWS Architecture Building Amazon Web Services

How to Become a GCP Data Engineer?

ProjectPro

JUNE 6, 2025

For instance, a cloud engineer is responsible for automating manual processes, architecting distributed systems and data stores, and building data processing systems and resilient streaming analytics systems. Moreover, you must pass a two-hour exam to get certified as a Google Data Engineer.

Data Engineer

Data Engineer Data Engineering Google Cloud Engineering

Data and Process Automation Adoption: Challenges, Maturity, and Business Impact

Precisely

MARCH 3, 2025

And many such customers are enabling their business owners or data stewards who are closest to the data and processes as citizen developers of automation solutions for those business areas. For example, SAP ERP master data processes are complex and often highly data intensive.

Process

Process Government Data Finance

A Deep Dive into Hive Architecture for Big Data Projects

ProjectPro

JUNE 6, 2025

Its Thrift interface acts as a bridge for third-party tools to access Hive metadata, enhancing data management capabilities. Hive Query Language (HiveQL) HiveQL is a query language in Apache Hive designed for querying and analyzing structured data stored in Hadoop, especially in HDFS.

Big Data

Big Data Architecture Project Hadoop

Snowflake and the Pursuit Of Precision Medicine

Snowflake

NOVEMBER 29, 2023

The data is there, it’s just not FAIR: Findable, Accessible, Interoperable and Reusable. Defining FAIR data and it’s applications for life sciences FAIR was a term coined in 2016 to help define good data management practices within the scientific realm. The principles emphasize machine-actionability (i.e.,

Metadata

Metadata Healthcare Medical Data Storage

Snowflake’s Fully Managed Service: Beyond Serverless

Snowflake

FEBRUARY 13, 2025

Snowflake, on the other hand, has not only been serverless since our founding but also provides a fully managed service that is truly easy, connected across your data estate and trusted by thousands of customers. We want our data engineers to spend their time innovating and solving hard problems, not maintaining platforms.

Management

Management Government Cloud Unstructured Data

On-Prem vs. The Cloud: Key Considerations

phData: Data Engineering

FEBRUARY 21, 2025

A data warehouse acts as a single source of truth for an organization’s data, providing a unified view of its operations and enabling data-driven decision-making. A data warehouse enables advanced analytics, reporting, and business intelligence. Data integrations and pipelines can also impact latency.

Cloud

Cloud Data Warehouse Amazon Web Services Data Ingestion

Talend ETL Tool - A Comprehensive Guide [2025]

ProjectPro

JUNE 6, 2025

It offers specialized ETL data extractions tailored to the needs of IT developers. Talend ETL Products Below are Talend’s four powerful open-source tools that help businesses level up their big data management and ETL activities. Talend is an open-source tool that supports data integration and management.

ETL Tools

ETL Tools Big Data Java Metadata

What is Apache Iceberg: Features, Architecture & Use Cases

ProjectPro

JUNE 6, 2025

Apache Iceberg is an open-source table format designed to handle petabyte-scale analytical datasets efficiently on cloud object stores and distributed data systems. Apache Iceberg tables thus represent a fundamental shift in how structured and unstructured data is managed in the cloud. x Apache Spark (version 2.4

Architecture

Architecture Data Lake Metadata Cloud Storage

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Master Data Management: Common Misconceptions You Should Know

Webinars

Trending Sources

Composable data management at Meta

Webinars

Mastering the Art of ETL on AWS for Data Management

Mastering Batch Data Processing with Versatile Data Kit (VDK)

Massively Parallel Data Processing In Python Without The Effort Using Bodo

Aligning Velox and Apache Arrow: Towards composable data management

Build Your Python Data Processing Your Way And Run It Anywhere With Fugue

Supporting And Expanding The Arrow Ecosystem For Fast And Efficient Data Processing At Voltron Data

Cloudera announces ‘Interoperability Ecosystem’ with founding members AWS and Snowflake

Taking Charge of Tables: Introducing OpenHouse for Big Data Management

Secrets of Spark to Snowflake Migration Success: Customer Stories

Top 10 Data Engineering Trends in 2025

How To Prepare Your Data Team for 2025

The Ultimate 101 Guide to Apache Airflow DAGS

7 Popular Azure ETL Tools for Data Engineers in 2025

An Exploration Of The Composable Customer Data Platform

Snowflake Architecture and It's Fundamental Concepts

6 Ways To Prepare Your Data Team for 2025

How to Become a Data Architect in 2025?

Snowflake Startup Spotlight: Innova-Q

7 Best Data Warehousing Tools for Efficient Data Storage Needs

X-Ray Vision For Your Flink Stream Processing With Datorios

Trends and Takeaways from Banking and Payments’ Event of the Year

Telco Enterprise Data Platforms: Key Success Factors in Building for an AI Future

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

From Zero to ETL Hero-A-Z Guide to Become an ETL Developer

Snowflake Startup Challenge 2025: Meet the Top 10

Complete Guide to Data Transformation: Basics to Advanced

Build Real Time Applications With Operational Simplicity Using Dozer

How to Build an ETL Pipeline in Python? (Hands-On Example)

Effective Pandas Patterns For Data Engineering

Top 10 Essential Data Engineering Skills

Spark vs Hive - What's the Difference

Build a Data Mesh Architecture Using Teradata VantageCloud on AWS

Top 13 Amazon Data Engineering Questions and Answers

How to Become a GCP Data Engineer?

Data and Process Automation Adoption: Challenges, Maturity, and Business Impact

A Deep Dive into Hive Architecture for Big Data Projects

Snowflake and the Pursuit Of Precision Medicine

Snowflake’s Fully Managed Service: Beyond Serverless

On-Prem vs. The Cloud: Key Considerations

Talend ETL Tool - A Comprehensive Guide [2025]

What is Apache Iceberg: Features, Architecture & Use Cases

Stay Connected