Data Process and Data Workflow - Data Engineering Digest

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Summary Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up.

Data Process

Data Process Process Data Lake High Quality Data

Startup Spotlight: How ROE AI Empowers Data Teams

Snowflake

MARCH 26, 2025

In this edition, we talk to Richard Meng, co-founder and CEO of ROE AI , a startup that empowers data teams to extract insights from unstructured, multimodal data including documents, images and web pages using familiar SQL queries. What inspires you as a founder? First, Snowflake has enabled us to strengthen user trust in our app.

Unstructured Data

Unstructured Data SQL Data Data Workflow

The Emerging Role of AI Data Engineers - The New Strategic Role for AI-Driven Success

Data Engineering Weekly

JANUARY 15, 2025

The Critical Role of AI Data Engineers in a Data-Driven World How does a chatbot seamlessly interpret your questions? The answer lies in unstructured data processing—a field that powers modern artificial intelligence (AI) systems. Commitment to promoting fairness and transparency in AI data workflows.

Data Engineering

Data Engineering Data Engineer Unstructured Data Engineering

Webinars

Going Beyond Chatbots: Connecting AI to Your Tools, Systems, & Data

Smart Tech + Human Expertise = How to Modernize Manufacturing Without Losing Control

MORE WEBINARS

How To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

Examples include “reduce data processing time by 30%” or “minimize manual data entry errors by 50%.” Deploy DataOps DataOps , or Data Operations, is an approach that applies the principles of DevOps to data management. How effective are your current data workflows?

Data Pipeline

Data Pipeline Metadata Data Workflow Data

Top 10 Data Engineering Trends in 2025

Edureka

APRIL 22, 2025

AI-powered data engineering solutions make it easier to streamline the data management process, which helps businesses find useful insights with little to no manual work. Real-time data processing has emerged The demand for real-time data handling is expected to increase significantly in the coming years.

Data Engineering

Data Engineering Data Engineer Engineering Consulting

Complete Guide to Data Transformation: Basics to Advanced

Ascend.io

OCTOBER 28, 2024

Advanced Data Transformation Techniques For data engineers ready to push the boundaries, advanced data transformation techniques offer the tools to tackle complex data challenges and drive innovation.

Raw Data

Raw Data Datasets Aggregated Data Data Pipeline

6 Ways To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

Examples include “reduce data processing time by 30%” or “minimize manual data entry errors by 50%.” Deploy DataOps DataOps , or Data Operations, is an approach that applies the principles of DevOps to data management. How effective are your current data workflows?

Data Pipeline

Data Pipeline Metadata Data Workflow Data

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

Summary Streaming data processing enables new categories of data products and analytics. Unfortunately, reasoning about stream processing engines is complex and lacks sufficient tooling. Data lakes are notoriously complex. Data lakes are notoriously complex.

Process

Process Data Lake High Quality Data Machine Learning

Simplifying Data Processing with Snowpark

Cloudyard

FEBRUARY 19, 2024

The data, originating from different formats and sources, requires consolidation into Snowflake tables for comprehensive analysis. Therefore, Snowpark, with its capabilities in simplifying complex data workflows, becomes instrumental in achieving this objective. The journey begins with customer invoice data stored in a CSV file.

Data Process

Data Process Process Data Workflow Data

Introducing Snowflake Notebooks, an End-to-End Interactive Environment for Data & AI Teams

Snowflake

JUNE 6, 2024

Schedule data ingestion, processing, model training and insight generation to enhance efficiency and consistency in your data processes. That’s why we partner with Hex , to provide data teams with access to best-in-class tools to drive more decisions for the business and more value from their data.

SQL

SQL Python Machine Learning Data Workflow

Data Engineering Weekly #206

Data Engineering Weekly

FEBRUARY 2, 2025

I finally found a good critique that discusses its flaws, such as multi-hop architecture, inefficiencies, high costs, and difficulties maintaining data quality and reusability. The article advocates for a "shift left" approach to data processing, improving data accessibility, quality, and efficiency for operational and analytical use cases.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

Effective Pandas Patterns For Data Engineering

Data Engineering Podcast

JANUARY 30, 2022

Matt Harrison is a Python expert with a long history of working with data who now spends his time on consulting and training. What are some of the utility features that you have found most helpful for data processing? Pandas is a tool that spans data processing and data science.

Data Engineering

Data Engineering Data Engineer Engineering Python

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

Data Engineering Podcast

AUGUST 28, 2022

What are the different concerns that need to be included in a stack that supports fully automated data workflows? There was recently an interesting article suggesting that the "left-to-right" approach to data workflows is backwards.

Data Engineering

Data Engineering Data Engineer MongoDB Metadata

Data logs: The latest evolution in Meta’s access tools

Engineering at Meta

FEBRUARY 4, 2025

To prevent this issue, we built verification in the post-processing stage to ensure that the user ID column in the data matches the identifier for the user whose logs we are generating.

Accessibility

Accessibility Accessible Raw Data Data Warehouse

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Edureka

APRIL 22, 2025

Since all of Fabric’s tools run natively on OneLake, real-time performance without data duplication is possible in Direct Lake mode. Because of the architecture’s ability to abstract infrastructure complexity, users can focus solely on data workflows.

BI

BI Pipeline-centric Data Lake Google Cloud

3. Psyberg: Automated end to end catch up

Netflix Tech

NOVEMBER 14, 2023

In the previous installments of this series, we introduced Psyberg and delved into its core operational modes: Stateless and Stateful Data Processing. Pipelines After Psyberg Let’s explore how different modes of Psyberg could help with a multistep data pipeline. Stay tuned for a new post on this!

Metadata

Metadata Data Pipeline Scala Data Process

Fire Your Super-Smart Data Consultants with DataOps

DataKitchen

JANUARY 25, 2022

DataOps improves the robustness, transparency and efficiency of data workflows through automation. For example, DataOps can be used to automate data integration. Previously, the consulting team had been using a patchwork of ETL to consolidate data from disparate sources into a data lake.

Consulting

Consulting Recruitment Data Lake Data Engineering

Data Ops: Transforming the Way We Handle Data

Ascend.io

JULY 3, 2024

This methodology emphasizes automation, collaboration, and continuous improvement, ensuring faster, more reliable data workflows. With data workflows growing in scale and complexity, data teams often struggle to keep up with the increasing volume, variety, and velocity of data. Let’s dive in!

Data Workflow

Data Workflow Data Pipeline Data Data Engineering

Airflow vs Azure Data Factory: Guide to Choose the Right Tool

Hevo

SEPTEMBER 1, 2024

Managing and orchestrating data workflows efficiently is crucial in today’s data-driven world. As the amount of data constantly increases with each passing day, so does the complexity of the pipelines handling such data processes.

Data Workflow

Data Workflow Data Pipeline Data Data Process

Top-10 Open Source Data Orchestration Tools

Hevo

AUGUST 16, 2024

This blog explores the world of open source data orchestration tools, highlighting their importance in managing and automating complex data workflows. From Apache Airflow to Google Cloud Composer, we’ll walk you through ten powerful tools to streamline your data processes, enhance efficiency, and scale your growing needs.

Google Cloud

Google Cloud Data Workflow Data Data Engineering

DataOps: Transforming the Way We Handle Data

Ascend.io

JULY 3, 2024

Start small, then scale With data workflows growing in scale and complexity, data teams often struggle to keep up with the increasing volume, variety, and velocity of data. This is where DataOps comes ina methodology designed to streamline and automate data workflows, ensuring faster and more reliable data delivery.

Data Workflow

Data Workflow Data Pipeline Data Data Engineering

Getting the Most From Your Modern Data Platform: A Three-Phase Approach

Snowflake

JULY 22, 2024

From there, you can address more complex use cases, such as creating a 360-degree view of customers by integrating systems across CRM, ERP, marketing applications, social media handles and other data sources.

Government

Government Data Cloud Hadoop

Deploying AI to Enhance Data Quality and Reliability

Ascend.io

SEPTEMBER 6, 2024

AI-driven data quality workflows deploy machine learning to automate data cleansing, detect anomalies, and validate data. Integrating AI into data workflows ensures reliable data and enables smarter business decisions. Data quality is the backbone of successful data engineering projects.

Data Cleanse

Data Cleanse Data Workflow Data Pipeline Machine Learning

DataOps Tools: Key Capabilities & 5 Tools You Must Know About

Databand.ai

AUGUST 30, 2023

DataOps , short for data operations, is an emerging discipline that focuses on improving the collaboration, integration, and automation of data processes across an organization. Accelerated Data Analytics DataOps tools help automate and streamline various data processes, leading to faster and more efficient data analytics.

Data Cleanse

Data Cleanse Data Pipeline Data Ingestion Data Validation

A Guide to Data Pipelines (And How to Design One From Scratch)

Striim

SEPTEMBER 11, 2024

Furthermore, Striim also supports real-time data replication and real-time analytics, which are both crucial for your organization to maintain up-to-date insights. By efficiently handling data ingestion, this component sets the stage for effective data processing and analysis. Are we using all the data or just a subset?

Data Pipeline

Data Pipeline Designing Data Lake Data Warehouse

Hadoop vs Spark: Main Big Data Tools Explained

AltexSoft

JUNE 7, 2021

Hadoop and Spark are the two most popular platforms for Big Data processing. They both enable you to deal with huge collections of data no matter its format — from Excel tables to user feedback on websites to images and video files. Obviously, Big Data processing involves hundreds of computing units.

Big Data Tools

Big Data Tools Hadoop Big Data Database-centric

DataOps Architecture: 5 Key Components and How to Get Started

Databand.ai

AUGUST 30, 2023

DataOps is a collaborative approach to data management that combines the agility of DevOps with the power of data analytics. It aims to streamline data ingestion, processing, and analytics by automating and integrating various data workflows.

Architecture

Architecture Data Ingestion Data Governance Data Cleanse

Data Orchestration: Defining, Understanding, and Applying

Ascend.io

DECEMBER 11, 2023

Data orchestration is the process of efficiently coordinating the movement and processing of data across multiple, disparate systems and services within a company. It automates and optimizes data processes, reducing manual effort and the likelihood of errors. But let’s step back for a second.

Data Workflow

Data Workflow Data Pipeline Data Lake Data

Tasks Failure Recovery in Snowflake with RETRY LAST

Cloudyard

JUNE 11, 2024

Read Time: 1 Minute, 48 Second RETRY LAST: In modern data workflows, tasks are often interdependent, forming complex task chains. Ensuring the reliability and resilience of these workflows is critical, especially when dealing with production data pipelines.

Raw Data

Raw Data Aggregated Data Data Pipeline Data Workflow

Data Engineering Weekly #191

Data Engineering Weekly

SEPTEMBER 29, 2024

The article details how bypassing intermediate storage steps reduces latency and improves data processing speed. The approach highlights the importance of streamlining data workflows for faster machine learning model training and deployment.

Data Engineering

Data Engineering Data Engineer Engineering SQL

How to Master Data Transformations with DBT Materializations?

Workfall

JULY 18, 2023

Additionally, they offer a way to keep your transformed data up to date without redoing the entire transformation process. By using DBT materializations, you can streamline your data workflows and focus on gaining insights rather than spending excessive time on repetitive calculations.

Datasets

Datasets Entertainment Data Data Workflow

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

Composable Analytics — A DataOps Enterprise Platform with built-in services for data orchestration, automation, and analytics. Reflow — A system for incremental data processing in the cloud. Dagster / ElementL — A data orchestrator for machine learning, analytics, and ETL. .

Consulting

Consulting Machine Learning Data Science Data Pipeline

Incremental Processing using Netflix Maestro and Apache Iceberg

Netflix Tech

NOVEMBER 20, 2023

IPS provides the incremental processing support with data accuracy, data freshness, and backfill for users and addresses many of the challenges in workflows. IPS enables users to continue to use the data processing patterns with minimal changes. Note that the backfill support is skipped in this blog.

Process

Process Data Pipeline Datasets SQL

What Is A DataOps Engineer? Responsibilities + How A DataOps Platform Facilitates The Role

Meltano

OCTOBER 5, 2022

In the same way, a DataOps engineer designs the data assembly line that enables data scientists to derive insights from data analytics faster and with fewer errors. DataOps engineers improve the speed and quality of the data development process by applying DevOps principles to data workflow, known as DataOps.

Engineering

Engineering Raw Data Data Pipeline Data Warehouse

Data Engineering Trends With Aswin & Ananth

Data Engineering Weekly

DECEMBER 25, 2023

These technologies are increasingly automating processes like ETL, improving data quality management, and evolving the landscape of data tools. Integrating AI into data workflows is not just a trend but a paradigm shift, making data processes more efficient and intelligent.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

What is Data Timeliness? The Boost Your Business Needs

Monte Carlo

JULY 9, 2024

Overcoming Challenges to Achieve Data Timeliness Despite the clear benefits of timely data, organizations often encounter several hurdles: Complex Data Pipelines: Intricate data pipelines that involve multiple stages and dependencies can hinder smooth data flow.

Food

Food Data Pipeline Data Data Workflow

The State of Data Engineering in 2024: Key Insights and Trends

Data Engineering Weekly

DECEMBER 16, 2024

Evolution of Data Lake Technologies The data lake ecosystem has matured significantly in 2024, particularly in table formats and storage technologies. Query Optimization and Cost Attribution By optimizing their most expensive pipelines , Medium's engineering team demonstrated significant cost savings in their Snowflake environment.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

ETL for Snowflake: Why You Need It and How to Get Started

Ascend.io

DECEMBER 19, 2023

Snowflake’s Data Marketplace : Enriches data pipelines with external data sources, providing access to a diverse range of datasets and services that can be seamlessly integrated into your analytics and data processing workflows. that you can combine to create custom data workflows.

ETL Tools

ETL Tools IT Data Pipeline Data Warehouse

Unleash the Quacken: A Dummies’ Guide to DuckDB

Monte Carlo

APRIL 28, 2023

DuckDB’s parallel execution capabilities can help DBAs improve the performance of data processing tasks. Researchers : Academics and researchers working with large volumes of data use DuckDB to process and analyze their data more efficiently. What makes DuckDB different?

Data Warehouse

Data Warehouse Programming Language Database Datasets

The Advantages Of Live Data-Streaming In The Competitive Financial Services Sector (Part I)

Cloudera

AUGUST 21, 2020

Can you talk about some of the technology that helps make managing live streaming data possible? Cloudera DataFlow offers the capability for Edge to cloud streaming data processing. This type of end-to-end data processing that starts at the Edge and ends in the cloud is made possible by using Apache NiFi.

Banking

Banking Kafka Cloud Storage Government

Azure Data Engineer Job Description [Roles and Responsibilities]

Knowledge Hut

SEPTEMBER 25, 2023

As an Azure Data Engineer, you will be expected to design, implement, and manage data solutions on the Microsoft Azure cloud platform. You will be in charge of creating and maintaining data pipelines, data storage solutions, data processing, and data integration to enable data-driven decision-making inside a company.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

Metadata: What Is It and Why it Matters

Ascend.io

JULY 11, 2024

It enhances data quality, governance, and optimization, making data retrieval more efficient and enabling powerful automation in data engineering processes. As practitioners using metadata to fuel data teams, we at Ascend understand the critical role it plays in organizing, managing, and optimizing data workflows.

Metadata

Metadata IT Government High Quality Data

Snowflake Summit 2024 Reflections: An Exciting Road Ahead for Data Engineering

Ascend.io

JUNE 10, 2024

At Ascend, we recognize the immense potential in integrating these advanced AI capabilities into our platform, enabling smarter applications and more efficient data workflows. Snowflake’s investment in expanding data engineering capabilities is a game-changer.

Data Engineering

Data Engineering Data Engineer Engineering High Quality Data

The Five Use Cases in Data Observability: Mastering Data Production

DataKitchen

MAY 10, 2024

The Five Use Cases in Data Observability: Mastering Data Production (#3) Introduction Managing the production phase of data analytics is a daunting challenge. Overseeing multi-tool, multi-dataset, and multi-hop data processes ensures high-quality outputs.

Raw Data

Raw Data Data Ingestion Datasets Data

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Startup Spotlight: How ROE AI Empowers Data Teams

Webinars

Trending Sources

The Emerging Role of AI Data Engineers - The New Strategic Role for AI-Driven Success

Webinars

How To Prepare Your Data Team for 2025

Top 10 Data Engineering Trends in 2025

Complete Guide to Data Transformation: Basics to Advanced

6 Ways To Prepare Your Data Team for 2025

X-Ray Vision For Your Flink Stream Processing With Datorios

Simplifying Data Processing with Snowpark

Introducing Snowflake Notebooks, an End-to-End Interactive Environment for Data & AI Teams

Data Engineering Weekly #206

Effective Pandas Patterns For Data Engineering

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

Data logs: The latest evolution in Meta’s access tools

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

3. Psyberg: Automated end to end catch up

Fire Your Super-Smart Data Consultants with DataOps

Data Ops: Transforming the Way We Handle Data

Airflow vs Azure Data Factory: Guide to Choose the Right Tool

Top-10 Open Source Data Orchestration Tools

DataOps: Transforming the Way We Handle Data

Getting the Most From Your Modern Data Platform: A Three-Phase Approach

Deploying AI to Enhance Data Quality and Reliability

DataOps Tools: Key Capabilities & 5 Tools You Must Know About

A Guide to Data Pipelines (And How to Design One From Scratch)

Hadoop vs Spark: Main Big Data Tools Explained

DataOps Architecture: 5 Key Components and How to Get Started

Data Orchestration: Defining, Understanding, and Applying

Tasks Failure Recovery in Snowflake with RETRY LAST

Data Engineering Weekly #191

How to Master Data Transformations with DBT Materializations?

The DataOps Vendor Landscape, 2021

Incremental Processing using Netflix Maestro and Apache Iceberg

What Is A DataOps Engineer? Responsibilities + How A DataOps Platform Facilitates The Role

Data Engineering Trends With Aswin & Ananth

What is Data Timeliness? The Boost Your Business Needs

The State of Data Engineering in 2024: Key Insights and Trends

ETL for Snowflake: Why You Need It and How to Get Started

Unleash the Quacken: A Dummies’ Guide to DuckDB

The Advantages Of Live Data-Streaming In The Competitive Financial Services Sector (Part I)

Azure Data Engineer Job Description [Roles and Responsibilities]

Metadata: What Is It and Why it Matters

Snowflake Summit 2024 Reflections: An Exciting Road Ahead for Data Engineering

The Five Use Cases in Data Observability: Mastering Data Production

Stay Connected