Data Pipeline, Data Process and Data Workflow

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Summary Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up.

Data Process

Data Process Process Data Lake High Quality Data

The Emerging Role of AI Data Engineers - The New Strategic Role for AI-Driven Success

Data Engineering Weekly

JANUARY 15, 2025

The Critical Role of AI Data Engineers in a Data-Driven World How does a chatbot seamlessly interpret your questions? The answer lies in unstructured data processing—a field that powers modern artificial intelligence (AI) systems. Develop modular, reusable components for end-to-end AI pipelines.

Data Engineering

Data Engineering Data Engineer Unstructured Data Engineering

A Guide to Data Pipelines (And How to Design One From Scratch)

Striim

SEPTEMBER 11, 2024

Data pipelines are the backbone of your business’s data architecture. Implementing a robust and scalable pipeline ensures you can effectively manage, analyze, and organize your growing data. We’ll answer the question, “What are data pipelines?” Table of Contents What are Data Pipelines?

Data Pipeline

Data Pipeline Designing Data Lake Data Warehouse

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

As we look towards 2025, it’s clear that data teams must evolve to meet the demands of evolving technology and opportunities. In this blog post, we’ll explore key strategies that data teams should adopt to prepare for the year ahead. How effective are your current data workflows?

Data Pipeline

Data Pipeline Metadata Data Workflow Data

6 Ways To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

As we look towards 2025, it’s clear that data teams must evolve to meet the demands of evolving technology and opportunities. In this blog post, we’ll explore key strategies that data teams should adopt to prepare for the year ahead. How effective are your current data workflows?

Data Pipeline

Data Pipeline Metadata Data Workflow Data

Complete Guide to Data Transformation: Basics to Advanced

Ascend.io

OCTOBER 28, 2024

Tools like Python’s requests library or ETL/ELT tools can facilitate data enrichment by automating the retrieval and merging of external data. Read More: Discover how to build a data pipeline in 6 steps Data Integration Data integration involves combining data from different sources into a single, unified view.

Raw Data

Raw Data Datasets Aggregated Data Data Pipeline

Top 10 Data Engineering Trends in 2025

Edureka

APRIL 22, 2025

AI-powered data engineering solutions make it easier to streamline the data management process, which helps businesses find useful insights with little to no manual work. Real-time data processing has emerged The demand for real-time data handling is expected to increase significantly in the coming years.

Data Engineering

Data Engineering Data Engineer Engineering Consulting

What are Smart Data Pipelines? 9 Key Smart Data Pipelines Capabilities

Striim

AUGUST 14, 2024

When implemented effectively, smart data pipelines seamlessly integrate data from diverse sources, enabling swift analysis and actionable insights. They empower data analysts and business users alike by providing critical information while protecting sensitive production systems. What is a Smart Data Pipeline?

Data Pipeline

Data Pipeline Data Architecture Transportation

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

Summary Streaming data processing enables new categories of data products and analytics. Unfortunately, reasoning about stream processing engines is complex and lacks sufficient tooling. Data lakes are notoriously complex. Data lakes are notoriously complex.

Process

Process Data Lake High Quality Data Machine Learning

Introducing Snowflake Notebooks, an End-to-End Interactive Environment for Data & AI Teams

Snowflake

JUNE 6, 2024

Faster, easier AI/ML and data engineering workflows Explore, analyze and visualize data using Python and SQL. Discover valuable business insights through exploratory data analysis. Develop scalable data pipelines and transformations for data engineering.

SQL

SQL Python Machine Learning Data Workflow

Data Engineering Weekly #206

Data Engineering Weekly

FEBRUARY 2, 2025

I finally found a good critique that discusses its flaws, such as multi-hop architecture, inefficiencies, high costs, and difficulties maintaining data quality and reusability. The article advocates for a "shift left" approach to data processing, improving data accessibility, quality, and efficiency for operational and analytical use cases.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

4 Ways to Tackle Data Pipeline Optimization

Monte Carlo

FEBRUARY 8, 2024

Just as a watchmaker meticulously adjusts every tiny gear and spring in harmonious synchrony for flawless performance, modern data pipeline optimization requires a similar level of finesse and attention to detail. Learn how cost, processing speed, resilience, and data quality all contribute to effective data pipeline optimization.

Data Pipeline

Data Pipeline AWS Datasets Data

3. Psyberg: Automated end to end catch up

Netflix Tech

NOVEMBER 14, 2023

In the previous installments of this series, we introduced Psyberg and delved into its core operational modes: Stateless and Stateful Data Processing. Now, let’s explore the state of our pipelines after incorporating Psyberg. This ensures that the next instance of the workflow will pick up newer updates.

Metadata

Metadata Data Pipeline Scala Data Workflow

What Is Data Pipeline Automation?

Ascend.io

MARCH 17, 2023

These engineering functions are almost exclusively concerned with data pipelines, spanning ingestion, transformation, orchestration, and observation — all the way to data product delivery to the business tools and downstream applications. Pipelines need to grow faster than the cost to run them.

Data Pipeline

Data Pipeline Datasets Data Software Engineer

What Is Data Pipeline Automation?

Ascend.io

MARCH 17, 2023

These engineering functions are almost exclusively concerned with data pipelines, spanning ingestion, transformation, orchestration, and observation — all the way to data product delivery to the business tools and downstream applications. Pipelines need to grow faster than the cost to run them.

Data Pipeline

Data Pipeline Datasets Data Software Engineer

Why You Shouldn’t Use Notebooks for Production Data Pipelines

Ascend.io

AUGUST 18, 2023

This not only jeopardizes the integrity and robustness of production environments but also compounds challenges for both data scientists and engineers. This article delves into the reasons behind our assertion: data science notebooks are not your best choice for production data pipelines. What Are Jupyter Notebooks?

Data Pipeline

Data Pipeline Programming Language Data Science Coding

Data Pipeline vs. ETL: Which Delivers More Value?

Ascend.io

MAY 31, 2023

In the modern world of data engineering, two concepts often find themselves in a semantic tug-of-war: data pipeline and ETL. Fast forward to the present day, and we now have data pipelines. Data Ingestion Data ingestion is the first step of both ETL and data pipelines.

Data Pipeline

Data Pipeline ETL Tools Pipeline-centric Data Warehouse

Effective Pandas Patterns For Data Engineering

Data Engineering Podcast

JANUARY 30, 2022

Matt Harrison is a Python expert with a long history of working with data who now spends his time on consulting and training. Today’s episode is Sponsored by Prophecy.io – the low-code data engineering platform for the cloud. The only thing worse than having bad data is not knowing that you have it.

Data Engineering

Data Engineering Data Engineer Engineering Python

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Monte Carlo

JUNE 14, 2023

In this post, we will help you quickly level up your overall knowledge of data pipeline architecture by reviewing: Table of Contents What is data pipeline architecture? Why is data pipeline architecture important? What is data pipeline architecture? Why is data pipeline architecture important?

Data Pipeline

Data Pipeline Architecture Data Lake Data Warehouse

Data logs: The latest evolution in Meta’s access tools

Engineering at Meta

FEBRUARY 4, 2025

For each data logs table, we initiate a new worker task that fetches the relevant metadata describing how to correctly query the data. Once we know what to query for a specific table, we create a task for each partition that executes a job in Dataswarm (our data pipeline system).

Accessible

Accessible Accessibility Raw Data Data Warehouse

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

Airflow — An open-source platform to programmatically author, schedule, and monitor data pipelines. Apache Oozie — An open-source workflow scheduler system to manage Apache Hadoop jobs. DBT (Data Build Tool) — A command-line tool that enables data analysts and engineers to transform data in their warehouse more effectively.

Consulting

Consulting Machine Learning Data Science Data Pipeline

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Edureka

APRIL 22, 2025

Since all of Fabric’s tools run natively on OneLake, real-time performance without data duplication is possible in Direct Lake mode. Because of the architecture’s ability to abstract infrastructure complexity, users can focus solely on data workflows.

BI

BI Pipeline-centric Data Lake Google Cloud

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

Data Engineering Podcast

AUGUST 28, 2022

Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. The only thing worse than having bad data is not knowing that you have it. Bigeye let’s data teams measure, improve, and communicate the quality of your data to company stakeholders.

Data Engineering

Data Engineering Data Engineer MongoDB Metadata

Data Ops: Transforming the Way We Handle Data

Ascend.io

JULY 3, 2024

This methodology emphasizes automation, collaboration, and continuous improvement, ensuring faster, more reliable data workflows. With data workflows growing in scale and complexity, data teams often struggle to keep up with the increasing volume, variety, and velocity of data. Let’s dive in!

Data Workflow

Data Workflow Data Pipeline Data Data Engineering

Incremental Processing using Netflix Maestro and Apache Iceberg

Netflix Tech

NOVEMBER 20, 2023

IPS provides the incremental processing support with data accuracy, data freshness, and backfill for users and addresses many of the challenges in workflows. IPS enables users to continue to use the data processing patterns with minimal changes. Note that the backfill support is skipped in this blog.

Process

Process Data Pipeline Datasets SQL

Fire Your Super-Smart Data Consultants with DataOps

DataKitchen

JANUARY 25, 2022

DataOps improves the robustness, transparency and efficiency of data workflows through automation. For example, DataOps can be used to automate data integration. Previously, the consulting team had been using a patchwork of ETL to consolidate data from disparate sources into a data lake.

Consulting

Consulting Recruitment Data Lake Data Engineering

DataOps: Transforming the Way We Handle Data

Ascend.io

JULY 3, 2024

Start small, then scale With data workflows growing in scale and complexity, data teams often struggle to keep up with the increasing volume, variety, and velocity of data. This is where DataOps comes ina methodology designed to streamline and automate data workflows, ensuring faster and more reliable data delivery.

Data Workflow

Data Workflow Data Pipeline Data Data Engineering

DataOps Tools: Key Capabilities & 5 Tools You Must Know About

Databand.ai

AUGUST 30, 2023

DataOps , short for data operations, is an emerging discipline that focuses on improving the collaboration, integration, and automation of data processes across an organization. Each type of tool plays a specific role in the DataOps process, helping organizations manage and optimize their data pipelines more effectively.

Data Cleanse

Data Cleanse Data Pipeline Data Ingestion Data Validation

Airflow vs Azure Data Factory: Guide to Choose the Right Tool

Hevo

SEPTEMBER 1, 2024

Managing and orchestrating data workflows efficiently is crucial in today’s data-driven world. As the amount of data constantly increases with each passing day, so does the complexity of the pipelines handling such data processes.

Data Workflow

Data Workflow Data Pipeline Data Data Process

Deploying AI to Enhance Data Quality and Reliability

Ascend.io

SEPTEMBER 6, 2024

AI-driven data quality workflows deploy machine learning to automate data cleansing, detect anomalies, and validate data. Integrating AI into data workflows ensures reliable data and enables smarter business decisions. Data quality is the backbone of successful data engineering projects.

Data Cleanse

Data Cleanse Data Workflow Data Pipeline Machine Learning

Getting the Most From Your Modern Data Platform: A Three-Phase Approach

Snowflake

JULY 22, 2024

From there, you can address more complex use cases, such as creating a 360-degree view of customers by integrating systems across CRM, ERP, marketing applications, social media handles and other data sources. This smoothes out workflows and helps teams swiftly mitigate potential issues.

Government

Government Data Cloud Hadoop

Data Orchestration: Defining, Understanding, and Applying

Ascend.io

DECEMBER 11, 2023

Here’s the deal: for data to truly drive your business forward, you need a reliable and scalable system to keep it moving without hiccups. In other words, you need data orchestration. In this article, we’ll break down what data orchestration is, its significance, and how it differs from data pipeline orchestration.

Data Workflow

Data Workflow Data Pipeline Data Lake Data

DataOps Architecture: 5 Key Components and How to Get Started

Databand.ai

AUGUST 30, 2023

DataOps is a collaborative approach to data management that combines the agility of DevOps with the power of data analytics. It aims to streamline data ingestion, processing, and analytics by automating and integrating various data workflows.

Architecture

Architecture Data Ingestion Data Governance Data Cleanse

Tasks Failure Recovery in Snowflake with RETRY LAST

Cloudyard

JUNE 11, 2024

Read Time: 1 Minute, 48 Second RETRY LAST: In modern data workflows, tasks are often interdependent, forming complex task chains. Ensuring the reliability and resilience of these workflows is critical, especially when dealing with production data pipelines.

Raw Data

Raw Data Aggregated Data Data Pipeline Data Workflow

What is Data Timeliness? The Boost Your Business Needs

Monte Carlo

JULY 9, 2024

Overcoming Challenges to Achieve Data Timeliness Despite the clear benefits of timely data, organizations often encounter several hurdles: Complex Data Pipelines: Intricate data pipelines that involve multiple stages and dependencies can hinder smooth data flow.

Food

Food Data Pipeline Data Data Workflow

What Is A DataOps Engineer? Responsibilities + How A DataOps Platform Facilitates The Role

Meltano

OCTOBER 5, 2022

In the same way, a DataOps engineer designs the data assembly line that enables data scientists to derive insights from data analytics faster and with fewer errors. DataOps engineers improve the speed and quality of the data development process by applying DevOps principles to data workflow, known as DataOps.

Engineering

Engineering Raw Data Data Pipeline Data Warehouse

DataOps Framework: 4 Key Components and How to Implement Them

Databand.ai

AUGUST 30, 2023

It emphasizes the importance of collaboration between different teams, such as data engineers, data scientists, and business analysts, to ensure that everyone has access to the right data at the right time. This includes data ingestion, processing, storage, and analysis.

Data Governance

Data Governance Data Pipeline Government Business Analyst

ETL for Snowflake: Why You Need It and How to Get Started

Ascend.io

DECEMBER 19, 2023

That’s what we call a data pipeline. It could just as well be ‘ELT for Snowflake’ The key takeaway is that these terms are representative of the actual activity being undertaken: the construction and management of data pipelines within the Snowflake environment.

ETL Tools

ETL Tools IT Data Pipeline Data Warehouse

Metadata: What Is It and Why it Matters

Ascend.io

JULY 11, 2024

It enhances data quality, governance, and optimization, making data retrieval more efficient and enabling powerful automation in data engineering processes. As practitioners using metadata to fuel data teams, we at Ascend understand the critical role it plays in organizing, managing, and optimizing data workflows.

Metadata

Metadata IT Government High Quality Data

How to Become a Data Engineer in 2024?

Knowledge Hut

DECEMBER 26, 2023

Data Engineering is typically a software engineering role that focuses deeply on data – namely, data workflows, data pipelines, and the ETL (Extract, Transform, Load) process. What is the role of a Data Engineer? They are also accountable for communicating data trends.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Snowflake Summit 2024 Reflections: An Exciting Road Ahead for Data Engineering

Ascend.io

JUNE 10, 2024

At Ascend, we recognize the immense potential in integrating these advanced AI capabilities into our platform, enabling smarter applications and more efficient data workflows. Snowflake’s investment in expanding data engineering capabilities is a game-changer. Curious to see Cortex in Action?

Data Engineering

Data Engineering Data Engineer Engineering High Quality Data

Unleash the Quacken: A Dummies’ Guide to DuckDB

Monte Carlo

APRIL 28, 2023

DuckDB’s parallel execution capabilities can help DBAs improve the performance of data processing tasks. Researchers : Academics and researchers working with large volumes of data use DuckDB to process and analyze their data more efficiently. What makes DuckDB different?

Data Warehouse

Data Warehouse Programming Language Database Datasets

Azure Data Engineer Job Description [Roles and Responsibilities]

Knowledge Hut

SEPTEMBER 25, 2023

As an Azure Data Engineer, you will be expected to design, implement, and manage data solutions on the Microsoft Azure cloud platform. You will be in charge of creating and maintaining data pipelines, data storage solutions, data processing, and data integration to enable data-driven decision-making inside a company.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

Unleashing the Power of CDC With Snowflake

Workfall

JUNE 12, 2023

and we have now migrated the data from our transactional database to the Snowflake data warehouse. Now we can easily create data streams and create a CDC mechanism in Snowflake. The success is true so let’s now verify this change in our snowflake table.

Telecommunication

Telecommunication Metadata Healthcare Finance

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

The Emerging Role of AI Data Engineers - The New Strategic Role for AI-Driven Success

Webinars

Trending Sources

A Guide to Data Pipelines (And How to Design One From Scratch)

Webinars

How To Prepare Your Data Team for 2025

6 Ways To Prepare Your Data Team for 2025

Complete Guide to Data Transformation: Basics to Advanced

Top 10 Data Engineering Trends in 2025

What are Smart Data Pipelines? 9 Key Smart Data Pipelines Capabilities

X-Ray Vision For Your Flink Stream Processing With Datorios

Introducing Snowflake Notebooks, an End-to-End Interactive Environment for Data & AI Teams

Data Engineering Weekly #206

4 Ways to Tackle Data Pipeline Optimization

3. Psyberg: Automated end to end catch up

What Is Data Pipeline Automation?

What Is Data Pipeline Automation?

Why You Shouldn’t Use Notebooks for Production Data Pipelines

Data Pipeline vs. ETL: Which Delivers More Value?

Effective Pandas Patterns For Data Engineering

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Data logs: The latest evolution in Meta’s access tools

The DataOps Vendor Landscape, 2021

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

Data Ops: Transforming the Way We Handle Data

Incremental Processing using Netflix Maestro and Apache Iceberg

Fire Your Super-Smart Data Consultants with DataOps

DataOps: Transforming the Way We Handle Data

DataOps Tools: Key Capabilities & 5 Tools You Must Know About

Airflow vs Azure Data Factory: Guide to Choose the Right Tool

Deploying AI to Enhance Data Quality and Reliability

Getting the Most From Your Modern Data Platform: A Three-Phase Approach

Data Orchestration: Defining, Understanding, and Applying

DataOps Architecture: 5 Key Components and How to Get Started

Tasks Failure Recovery in Snowflake with RETRY LAST

What is Data Timeliness? The Boost Your Business Needs

What Is A DataOps Engineer? Responsibilities + How A DataOps Platform Facilitates The Role

DataOps Framework: 4 Key Components and How to Implement Them

ETL for Snowflake: Why You Need It and How to Get Started

Metadata: What Is It and Why it Matters

How to Become a Data Engineer in 2024?

Snowflake Summit 2024 Reflections: An Exciting Road Ahead for Data Engineering

Unleash the Quacken: A Dummies’ Guide to DuckDB

Azure Data Engineer Job Description [Roles and Responsibilities]

Unleashing the Power of CDC With Snowflake

Stay Connected