Cloud, Data Workflow and Metadata - Data Engineering Digest

Accelerate AI and Analytics with these 4 New Enhancements in the Precisely Data Integrity Suite

Precisely

JUNE 10, 2025

If you’re still relying on manual processes to match, merge, and resolve data issues, then you’re spending too much time fixing errors and not enough time acting on insights. The Suite’s solution AI-powered match and merge, and metadata-driven automation.

Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents

dbt Developer Hub

APRIL 20, 2025

These tools can be called by LLM systems to learn about your data and metadata. Remember, as with any AI workflows, to make sure that you are taking appropriate caution in terms of giving these access to production systems and data. No - there is functionality for both dbt Cloud and dbt Core users included in the MCP.

Structured Data

Structured Data SQL BI Metadata

50+ Azure Data Factory Interview Questions and Answers [2025]

ProjectPro

JUNE 6, 2025

This growth is due to the increasing adoption of cloud-based data integration solutions such as Azure Data Factory. If you have heard about cloud computing , you would have heard about Microsoft Azure as one of the leading cloud service providers in the world, along with AWS and Google Cloud.

Data Lake

Data Lake Metadata SQL Datasets

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. How can we interoperate between the data domains ? How do we govern all these data products and domains ? It will be illustrated with our technical choices and the services we are using in the Google Cloud Platform.

Technology

Technology Architecture Google Cloud Pipeline-centric

10+ Top Data Pipeline Tools to Streamline Your Data Journey

ProjectPro

JUNE 6, 2025

Open Source Data Pipeline Tools Open-source data pipeline tools are pivotal in data engineering, offering organizations flexible and scalable solutions for managing the end-to-end data workflow. Pros of Google Cloud Dataflow Seamlessly processes both stream and batch data.

Data Pipeline

Data Pipeline Google Cloud Kafka AWS

How To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

Deploy DataOps DataOps , or Data Operations, is an approach that applies the principles of DevOps to data management. It aims to streamline and automate data workflows, enhance collaboration and improve the agility of data teams. How effective are your current data workflows?

Data Pipeline

Data Pipeline Metadata Data Workflow Data

Building Linked Data Products With JSON-LD

Data Engineering Podcast

SEPTEMBER 17, 2023

Summary A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. Linked data technologies provide a means of tightly coupling metadata with raw information.

Building

Building BI SQL Python

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Data Engineering Podcast

DECEMBER 18, 2022

In this episode Abe Gong brings his experiences with the Great Expectations project and community to discuss the technical and organizational considerations involved in implementing these constraints to your data workflows. Atlan is the metadata hub for your data ecosystem. Missing data? Stale dashboards?

Metadata

Metadata Data Lake Business Intelligence MongoDB

6 Ways To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

Deploy DataOps DataOps , or Data Operations, is an approach that applies the principles of DevOps to data management. It aims to streamline and automate data workflows, enhance collaboration and improve the agility of data teams. How effective are your current data workflows?

Data Pipeline

Data Pipeline Metadata Data Workflow Data

Making The Total Cost Of Ownership For External Data Manageable With Crux

Data Engineering Podcast

JULY 17, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code.

Data Management

Data Management Management Metadata MongoDB

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. Data lakes are notoriously complex. Materialize]([link] You shouldn't have to throw away the database to build with fast-changing data.

Architecture

Architecture Data Lake High Quality Data Java

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

Data Engineering Podcast

AUGUST 28, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. RudderStack helps you build a customer data platform on your warehouse or data lake.

Data Engineering

Data Engineering Data Engineer MongoDB Engineering

How To Build A Batch Data Pipeline?

ProjectPro

JUNE 6, 2025

AWS Data Pipeline AWS Data Pipeline is a cloud-based service by Amazon Web Services (AWS) that simplifies the orchestration of data workflows. It offers pre-built connectors for various AWS services, allowing users to seamlessly automate data movement and processing tasks within the AWS ecosystem.

Data Pipeline

Data Pipeline Building Data Ingestion Retail

Understanding The Immune System With Data At ImmunAI

Data Engineering Podcast

FEBRUARY 20, 2022

Summary The life sciences as an industry has seen incredible growth in scale and sophistication, along with the advances in data technology that make it possible to analyze massive amounts of genomic information. Today’s episode is Sponsored by Prophecy.io – the low-code data engineering platform for the cloud.

Systems

Systems Software Engineering Software Engineer Data Warehouse

How to Build an ETL Pipeline in Python? (Hands-On Example)

ProjectPro

JUNE 6, 2025

Thanks to its strong integration capabilities, Python works smoothly with cloud platforms, relational SQL databases, and modern orchestration tools. This makes Python a natural fit for ETL workflows across both fast-moving startups and large-scale enterprise data teams. What does the Spotify ETL Pipeline do?

Python

Python Building PostgreSQL Raw Data

Effective Pandas Patterns For Data Engineering

Data Engineering Podcast

JANUARY 30, 2022

Today’s episode is Sponsored by Prophecy.io – the low-code data engineering platform for the cloud. Prophecy provides an easy-to-use visual interface to design & deploy data pipelines on Apache Spark & Apache Airflow. You can observe your pipelines with built in metadata search and column level lineage.

Data Engineering

Data Engineering Data Engineer Engineering Consulting

Put Your Whole Data Team On The Same Page With Atlan

Data Engineering Podcast

APRIL 5, 2021

It builds your customer data warehouse and your identity graph on your data warehouse, with support for Snowflake, Google BigQuery, Amazon Redshift, and more. Their SDKs and plugins make event streaming easy, and their integrations with cloud applications like Salesforce and ZenDesk help you go beyond event streaming.

Data Warehouse

Data Warehouse BI Data Pipeline Metadata

The Grand Vision And Present Reality of DataOps

Data Engineering Podcast

MAY 3, 2021

Summary The Data industry is changing rapidly, and one of the most active areas of growth is automation of data workflows. Taking cues from the DevOps movement of the past decade data professionals are orienting around the concept of DataOps.

Data Warehouse

Data Warehouse BI Data Pipeline Metadata

Data Catalog - A Broken Promise

Data Engineering Weekly

DECEMBER 29, 2022

Data Catalog as a passive web portal to display metadata requires significant rethinking to adopt modern data workflow, not just adding “modern” in its prefix. I know that is an expensive statement to make😊 To be fair, I’m a big fan of data catalogs, or metadata management , to be precise.

Metadata

Metadata Data Warehouse ETL Tools Data Workflow

The State of Data Engineering in 2024: Key Insights and Trends

Data Engineering Weekly

DECEMBER 16, 2024

Grab’s Metasense , Uber’s DataK9 , and Meta’s classification systems use AI to automatically categorize vast data sets, reducing manual efforts and improving accuracy. Beyond classification, organizations now use AI for automated metadata generation and data lineage tracking, creating more intelligent data infrastructures.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Unlocking Effective Data Governance with Unity Catalog – Data Bricks

RandomTrees

SEPTEMBER 17, 2024

The Unity Catalog is Databricks governance solution which integrates with Databricks workspaces and provides a centralized platform for managing metadata, data access, and security. Improved Data Discovery The tagging and documentation features in Unity Catalog facilitate better data discovery.

Data Governance

Data Governance Government Metadata Machine Learning

Hadoop vs Spark: Main Big Data Tools Explained

AltexSoft

JUNE 7, 2021

A HDFS Master Node, called a NameNode , keeps metadata with critical information about system files (like their names, locations, number of data blocks in the file, etc.) and keeps track of storage capacity, a volume of data being transferred, etc. Among solutions facilitation data management are. Apache Hadoop ecosystem.

Big Data Tools

Big Data Tools Hadoop Big Data Database-centric

Unleashing the Power of CDC With Snowflake

Workfall

JUNE 12, 2023

It facilitates data synchronisation, replication, real-time analytics, and event-driven processing, empowering data-driven decision-making and operational efficiency. These additional columns store metadata like timestamps, user IDs, and change types, ensuring granular change tracking and auditability.

Telecommunication

Telecommunication Metadata Finance Healthcare

October 2021 dbt Update: Metrics and Hat Tricks ?

dbt Developer Hub

OCTOBER 14, 2021

dbt Cloud v1.1.36 - v1.1.37 Changelog + docs located here. The new model timing dashboard in the run detail page helps you quickly assess job composition, order, and duration to optimize your workflows and cut costs? The Model Timing tab in dbt Cloud highlights models taking particularly long to run. Want to know why? (or

Metadata

Metadata BI Software Engineering Software Engineer

The Advantages Of Live Data-Streaming In The Competitive Financial Services Sector (Part I)

Cloudera

AUGUST 21, 2020

One is data at rest, for example in a data lake, warehouse, or cloud storage and from there they can do analytics on this data and that is predominantly around what has already happened or around how to prevent something from happening in the future.

Banking

Banking Kafka Cloud Storage Government

The Components of the dbt Fusion engine and how they fit together

dbt Developer Hub

MAY 27, 2025

Joel commentary - As someone who has been juggling the dbt Cloud CLI alongside dbt Core for the last couple of years, I cannot overstate how thrilled I am by this.) Obviously there's additional cloud-backed services necessary to deliver platform-specific features, such as State-Aware Orchestration. These are Apache 2.0-licensed

Engineering

Engineering SQL Database Data Workflow

Three Takeaways from Gartner’s 2019 Magic Quadrant for Data Management Solutions for Analytics

Cloudera

FEBRUARY 11, 2019

Disruption slows as cloud and nonrelational technology take their place beside traditional approaches , the leaders extend their lead, and distributed data approaches solidify their place as a best practice for DMSA.” Cloudera believes disruption persists around multi-cloud. Why multi-cloud?

Data Management

Data Management Management Metadata Government

The Evolution of Table Formats

Monte Carlo

MAY 14, 2024

At its core, a table format is a sophisticated metadata layer that defines, organizes, and interprets multiple underlying data files. Table formats incorporate aspects like columns, rows, data types, and relationships, but can also include information about the structure of the data itself.

Data Lake

Data Lake Metadata Hadoop Data Governance

Data Engineering Weekly #105

Data Engineering Weekly

OCTOBER 30, 2022

Editor’s Note: The current state of the Data Catalog The results are out for our poll on the current state of the Data Catalogs. The highlights are that 59% of folks think data catalogs are sometimes helpful. We saw in the Data Catalog poll how far it has to go to be helpful and active within a data workflow.

Data Engineering

Data Engineering Data Engineer Engineering Data Ingestion

DataOps Architecture: 5 Key Components and How to Get Started

Databand.ai

AUGUST 30, 2023

DataOps is a collaborative approach to data management that combines the agility of DevOps with the power of data analytics. It aims to streamline data ingestion, processing, and analytics by automating and integrating various data workflows.

Architecture

Architecture Data Ingestion Data Governance Data Cleanse

Data Orchestration: Defining, Understanding, and Applying

Ascend.io

DECEMBER 11, 2023

Data orchestration is the process of efficiently coordinating the movement and processing of data across multiple, disparate systems and services within a company. So, why is data orchestration a big deal? Agility and Adaptability: As businesses grow and evolve, their data needs change.

Data Workflow

Data Workflow Data Pipeline Data Lake Data

How to Use AI in Data Analytics for Quick Insights?

ProjectPro

JUNE 6, 2025

Power BI Microsoft’s Power BI is a business analytics tool that offers interactive data visualization features and business intelligence capabilities. It integrates with various tabular data sources, including Excel, SQL Server, and cloud services, allowing users to create detailed reports and dashboards.

Data Analytics

Data Analytics Healthcare Machine Learning BI

Data Engineering Zoomcamp – Data Ingestion (Week 2)

Hepta Analytics

FEBRUARY 14, 2022

Disadvantages of a data lake are: Can easily become a data swamp data has no versioning Same data with incompatible schemas is a problem without versioning Has no metadata associated It is difficult to join the data Data warehouse stores processed data, mostly structured data.

Data Ingestion

Data Ingestion Data Engineer Data Engineering Engineering

DataOps Tools: Key Capabilities & 5 Tools You Must Know About

Databand.ai

AUGUST 30, 2023

DataOps tools should provide a comprehensive data cataloging solution that allows organizations to create a centralized repository of their data assets, complete with metadata, data lineage information, and data samples. Genie manages and allocates resources for big data jobs.

Data Cleanse

Data Cleanse Data Pipeline Data Ingestion Data Validation

What is Azure Data Factory – Here’s Everything You Need to Know

Edureka

JULY 3, 2024

The Basics How Azure Data Factory Works: Quick Summary Top Features of Azure Data Factory Key Components of Azure Data Factory Azure Data Factory Data Migration: Overview Azure Data Factory: Top Use Cases FAQs Conclusion What is Azure Data Factory?

Pipeline-centric

Pipeline-centric Data Lake Database-centric Data Pipeline

The Modern Data Stack: What It Is, How It Works, Use Cases, and Ways to Implement

AltexSoft

MARCH 14, 2023

As the volume and complexity of data continue to grow, organizations seek faster, more efficient, and cost-effective ways to manage and analyze data. In recent years, cloud-based data warehouses have revolutionized data processing with their advanced massively parallel processing (MPP) capabilities and SQL support.

IT

IT Data Warehouse Data Governance Data Lake

Azure Data Engineer (DP-203) Certification Cost in 2023

Knowledge Hut

SEPTEMBER 29, 2023

Why Should You Get an Azure Data Engineer Certification? Becoming an Azure data engineer allows you to seamlessly blend the roles of a data analyst and a data scientist. One of the pivotal responsibilities is managing data workflows and pipelines, a core aspect of a data engineer's role.

Certification

Certification Data Engineering Data Engineer Engineering

What is integration runtime in Azure data factory?

Edureka

AUGUST 19, 2024

One of the key elements of Azure Data Factory that permits data integration between various network environments is Integration Runtime. It offers the infrastructure needed to transfer data safely between cloud and on-site data storage. The three primary varieties are Azure, Azure-SSIS, and Self-hosted.

Transportation

Transportation Data Storage Data Integration SQL

From Patchwork to Platform: The Rise of the Post-Modern Data Stack

Ascend.io

MAY 19, 2023

In the data world, this disruption manifested in the form of cloud computing with technologies such as Redshift, Snowflake, and Spark. When issues arise, there’s no need to mine and transfer diagnostic metadata or switch between tools and interfaces. Second, it enhances governance and security.

Data Pipeline

Data Pipeline Data Engineering Data Engineer Media

The Top Data Strategy Influencers and Content Creators on LinkedIn

Databand.ai

DECEMBER 29, 2022

Follow Ravit on LinkedIn 5) Priya Krishnan Head of Product Management, Data and AI at IBM Priya is an innovative, customer-focused, data-driven product executive with over 16 years of experience in global product management, strategy, and GTM roles to commercialize and monetize in-demand enterprise solutions.

Consulting

Consulting BI Data Governance Data Science

Big Data (Quality), Small Data Team: How Prefect Saved 20 Hours Per Week with Data Observability

Monte Carlo

SEPTEMBER 20, 2022

Here’s how Prefect , Series B startup and creator of the popular data orchestration tool, harnessed the power of data observability to preserve headcount, improve data quality and reduce time to detection and resolution for data incidents. This left Dylan’s team with a gap to fill.

Big Data

Big Data Data Warehouse Data Governance Data

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Monte Carlo

JUNE 14, 2023

Data pipeline architecture typically consisted of hardcoded pipelines that cleaned, normalized, and transformed the data prior to loading into a database using an ETL pattern. With cost and physical compute/storage limitations largely lifted, data engineers started to optimize data pipeline architecture for speed and agility.

Data Pipeline

Data Pipeline Architecture Data Lake Data Warehouse

Interpreting the Gartner Data Observability Market Guide

Monte Carlo

AUGUST 13, 2024

Here’s how Gartner officially defines the category of data observability tools: “Data observability tools are software applications that enable organizations to understand the state and health of their data, data pipelines, data landscapes, data infrastructures, and the financial operational cost of the data across distributed environments.

Data

Data Data Warehouse Data Pipeline Data Architecture

The Good and the Bad of the Elasticsearch Search and Analytics Engine

AltexSoft

SEPTEMBER 21, 2023

Accessible via a unified API, these new features enhance search relevance and are available on Elastic Cloud. The Elastic Stacks Elasticsearch is integral within analytics stacks, collaborating seamlessly with other tools developed by Elastic to manage the entire data workflow — from ingestion to visualization.

Engineering

Engineering NoSQL Java Programming Language

Accelerate AI and Analytics with these 4 New Enhancements in the Precisely Data Integrity Suite

Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents

Webinars

Trending Sources

50+ Azure Data Factory Interview Questions and Answers [2025]

Webinars

Toward a Data Mesh (part 2) : Architecture & Technologies

10+ Top Data Pipeline Tools to Streamline Your Data Journey

How To Prepare Your Data Team for 2025

Building Linked Data Products With JSON-LD

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

6 Ways To Prepare Your Data Team for 2025

Making The Total Cost Of Ownership For External Data Manageable With Crux

Addressing The Challenges Of Component Integration In Data Platform Architectures

An Exploration Of What Data Automation Can Provide To Data Engineers And Ascend's Journey To Make It A Reality

How To Build A Batch Data Pipeline?

Understanding The Immune System With Data At ImmunAI

How to Build an ETL Pipeline in Python? (Hands-On Example)

Effective Pandas Patterns For Data Engineering

Put Your Whole Data Team On The Same Page With Atlan

The Grand Vision And Present Reality of DataOps

Data Catalog - A Broken Promise

The State of Data Engineering in 2024: Key Insights and Trends

Unlocking Effective Data Governance with Unity Catalog – Data Bricks

Hadoop vs Spark: Main Big Data Tools Explained

Unleashing the Power of CDC With Snowflake

October 2021 dbt Update: Metrics and Hat Tricks ?

The Advantages Of Live Data-Streaming In The Competitive Financial Services Sector (Part I)

The Components of the dbt Fusion engine and how they fit together

Three Takeaways from Gartner’s 2019 Magic Quadrant for Data Management Solutions for Analytics

The Evolution of Table Formats

Data Engineering Weekly #105

DataOps Architecture: 5 Key Components and How to Get Started

Data Orchestration: Defining, Understanding, and Applying

How to Use AI in Data Analytics for Quick Insights?

Data Engineering Zoomcamp – Data Ingestion (Week 2)

DataOps Tools: Key Capabilities & 5 Tools You Must Know About

What is Azure Data Factory – Here’s Everything You Need to Know

The Modern Data Stack: What It Is, How It Works, Use Cases, and Ways to Implement

Azure Data Engineer (DP-203) Certification Cost in 2023

What is integration runtime in Azure data factory?

From Patchwork to Platform: The Rise of the Post-Modern Data Stack

The Top Data Strategy Influencers and Content Creators on LinkedIn

Big Data (Quality), Small Data Team: How Prefect Saved 20 Hours Per Week with Data Observability

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Interpreting the Gartner Data Observability Market Guide

The Good and the Bad of the Elasticsearch Search and Analytics Engine

Stay Connected