Metadata, Process and Webinar - Data Engineering Digest

How To Use Airbyte, dbt-teradata, Dagster, and Teradata Vantage™ for Seamless Data Integration

Teradata

MAY 30, 2025

Fortunately, Teradata offers integrations to many modular tools that facilitate routine processes allowing data engineers to focus on high-value tasks such as governance, data quality, and efficiency. schema.yml`: YAML file defining metadata, tests, and descriptions for the models in this directory. toml │ setup. py constants.

Data Integration

Data Integration Raw Data Metadata Data Pipeline

Scale Unstructured Text Analytics with Batch LLM Inference

Snowflake

MARCH 6, 2025

Customer intelligence teams analyze reviews and forum comments to identify sentiment trends, while support teams process tickets to uncover product issues and inform gaps in a product roadmap. Meanwhile, operations teams use entity extraction on documents to automate workflows and enable metadata-driven analytical filtering.

Unstructured Data

Unstructured Data Medical Media Data Workflow

New With Confluent Platform 8.0: Stream Securely, Monitor Easily, and Scale Endlessly

Confluent

JUNE 24, 2025

Hands-on Flink Workshop: Implement Stream Processing | Register Now Login Contact Us Why Confluent Confluent vs. Apache Kafka® Learn more about how Confluent differs from Apache Kafka For Practitioners Discover the platform that is built and designed for those who build For Executives Unlock the value of data across your business Our Customers Explore (..)

Kafka

Kafka Telecommunication Professional Services Manufacturing

Webinars

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Superset 5.0.0 Release Notes

Preset

JUNE 22, 2025

Enhanced screenshot generation ( #32193 ): Improved Celery-based screenshot generation by fixing cache key synchronization between frontend and worker processes, resolving dashboard screenshot caching issues. If you want to learn more about whats coming up in Theming, sign up for the webinar !

Datasets

Datasets Database BI Business Intelligence

AWS Generative AI Certification Guide | ProjectPro

ProjectPro

JUNE 6, 2025

Generative AI Essentials for Business and Technical Decision Makers Source: aws.amazon.com/events/webinars/ Designed for business leaders and technical decision-makers, the Generative AI Essentials for Business and Technical Decision Makers certification provides a strategic overview of generative AI's role in driving organizational innovation.

AWS

AWS Certification Machine Learning Hadoop

Discovering the World of Third Party Plugins

Preset

JANUARY 3, 2024

Please reach out to me or any other Superset Committer on Slack to provide feedback on how we can improve this resource or this process. Nielsen was one of the early pioneers of building custom visualizations, and even joined us for a webinar on how they deploy these very plugins. We hope to see it flourish! Where this might lead.

BI

BI Business Intelligence Cloud Metadata

Data Engineering Weekly #209

Data Engineering Weekly

FEBRUARY 23, 2025

It covers nine categories: storage systems, data lake platforms, processing, integration, orchestration, infrastructure, ML/AI, metadata management, and analytics. link] Sponsored: Webinar - The State of Airflow 2025 We asked 5,000+ data engineers how Airflow is shaping the modern DataOps landscape. The results?

Data Engineer

Data Engineer Data Engineering Engineering Kafka

The Easiest Way to Power Real-Time AI: Confluent Announces Delta Lake Support & Unity Catalog Integration for Tableflow

Confluent

JUNE 11, 2025

Apache Kafka® Learn more about how Confluent differs from Apache Kafka For Practitioners Discover the platform that is built and designed for those who build For Executives Unlock the value of data across your business Our Customers Explore testimonials and case studies from Confluents customers Products Data Streaming Platform Stream, connect, govern, (..)

Kafka

Kafka Telecommunication Manufacturing Retail

Summary of the Gartner Presentation: “How Can You Leverage Technologies to Solve Data Quality Challenges?”

DataKitchen

DECEMBER 17, 2024

This lack of awareness leads to undetected issues, reactive data cleansing, and costly downstream impacts ( see the webinar: [link] L) The presentation highlights common challenges organizations face when dealing with data quality. The result is a broken, reactive process that fails to prevent data quality issues at their source.

Technology

Technology Data Cleanse High Quality Data Metadata

Why Spatial Data Governance is Critical to Your Business Strategy

Precisely

NOVEMBER 14, 2023

This journey must include a strong data governance framework to align people, processes, and technology, and enable them to understand and trust their data and metadata to achieve their business objectives. How do we align this critical spatial data to our business goals, objectives, metrics, and processes?

Data Governance

Data Governance Government Metadata Retail

Accelerate Your Machine Learning Workflows in Snowflake with Snowpark ML

Snowflake

JANUARY 23, 2024

Behind the scenes, Snowpark ML parallelizes data processing operations by taking advantage of Snowflake’s scalable computing platform. For Snowpark ML Operations, the Snowpark Model Registry allows customers to securely manage and execute models in Snowflake, regardless of origin.

Machine Learning

Machine Learning Metadata Python Telecommunication

Webinar Summary: Data Mesh and Data Products

DataKitchen

MAY 4, 2023

Webinar Summary: DataOps and Data Mesh Chris Bergh, CEO of DataKitchen, delivered a webinar on two themes – Data Products and Data Mesh. They describe five interfaces to a domain: the width (data), the where (location), the what (description), the how (process), and the who (team). Watch the webinar today!

Raw Data

Raw Data Data Datasets Data Engineer

NVIDIA RAPIDS in Cloudera Machine Learning

Cloudera

MAY 19, 2021

This year, we expanded our partnership with NVIDIA , enabling your data teams to dramatically speed up compute processes for data engineering and data science workloads with no code changes using RAPIDS AI. The script will go through loading RAPIDs libraries then leveraging them to load and processing a datafile. RAPIDS (wall time).

Machine Learning

Machine Learning Data Science Datasets Data Lake

Ensono Cuts Costs with Snowflake Connector for ServiceNow

Snowflake

APRIL 23, 2024

Ensono, a managed service provider and technology adviser, joined the initial preview phase of the Snowflake Connector for ServiceNow and began using it as part of its customer portal and data warehouse modernization project (watch their Show Me Your Architecture webinar here ).

Data Warehouse

Data Warehouse Consulting Data Integration Metadata

Top 3 Data + AI Predictions for Manufacturing in 2024

Snowflake

FEBRUARY 8, 2024

Snowflake’s Global Head of Manufacturing Tim Long says smart manufacturing — the use of advanced technologies to improve the efficiency of traditional processes — is a “huge area of interest” that industry leaders can supercharge with data and AI. Data collaboration is the process of gathering and sharing data from various sources.

Manufacturing

Manufacturing Metadata Data Technology

Cloudera Provides First Look at Cloudera Data Platform, the Industry’s First Enterprise Data Cloud

Cloudera

JUNE 25, 2019

Cloudera Unveils Industry’s First Enterprise Data Cloud in Webinar. Over 2000 customers and partners joined us in this live webinar featuring a first-look at our upcoming cloud-native CDP services. Cloudera received extraordinary interest in CDP from participants, exemplified by the more than 300 questions posed throughout the webinar.

Cloud

Cloud Entertainment Government Machine Learning

Long Live Data Products! Understand the 4 Stages of the Data Product Lifecycle

Snowflake

AUGUST 22, 2023

The data product lifecycle includes the following stages: Discovery Design Development Deployment Let’s take a look at what they entail and the roles that lead and support each of them: Discovery starts the process. A prioritization matrix can help formalize this process. High-cost products that deliver high value are OK.

Metadata

Metadata Data AWS Business Analyst

Implement a Multi-Cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

DECEMBER 15, 2022

Compute engines in these CDP data services can access and process data sets in the Iceberg tables concurrently, with shared security and governance provided by our unique Cloudera Shared Data Experience ( SDX ). Only metadata will be regenerated. Metadata management . Amazingly fast table migration. ORC open file format support.

Cloud

Cloud Metadata Data Warehouse Google Cloud

Streamlining the Media Supply Chain

Snowflake

JULY 11, 2024

As a result, operational processes are evolving to accommodate increased content production and distribution channels, including streaming platforms, social media networks, podcast platforms and virtual reality environments. A streamlined media supply chain helps fuel commercial success.

Media

Media Entertainment Cloud Metadata

Breaking State and Local Data Silos with Modern Data Architectures

Cloudera

AUGUST 30, 2022

Solutions that support MDAs are purpose-built for data collection, processing, and sharing. Integration, metadata and governance capabilities glue the individual components together.”. A data mesh supports distributed, domain-specific data consumers and views data as a product, with each domain handling its own data pipelines.

Data Architecture

Data Architecture Architecture Data Lake NoSQL

Cloudera DataFlow for the Public Cloud: A technical deep dive

Cloudera

AUGUST 16, 2021

While NiFi nodes can be added to an existing cluster, it is a multi-step process that requires organizations to set up constant monitoring of resource usage, detect when there is enough demand to scale, automate the provisioning of a new node with the required software and set up the security configuration. and later).

Cloud

Cloud Unstructured Data Utilities Metadata

Materialized Views in Hive for Iceberg Table Format

Cloudera

FEBRUARY 8, 2024

We populated the tables using INSERT-SELECT statements by reading from text format source tables but they can be populated through any ETL process. The snapshotId of the source tables involved in the materialized view are also maintained in the metadata. Such a query pattern is quite common in BI queries.

Metadata

Metadata Data Warehouse BI AWS

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

It added metadata that described the logical and physical layout of the data, enabling cost-based optimizers, dynamic partition pruning, and a number of key performance improvements targeted at SQL analytics. The first generation of the Hive Metastore attempted to address the performance considerations to run SQL efficiently on a data lake.

Data Lake

Data Lake Data Warehouse BI SQL

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

Cloudera

DECEMBER 9, 2022

This allows developers to make changes to their processing logic on the fly while running some test data through their flow and validating that their changes work as intended. Once you have retrieved the data, NiFi stores it in a queue, which allows you to explore the content and metadata attributes of the events.

Designing

Designing Coding Google Cloud AWS

12 Times Faster Query Planning With Iceberg Manifest Caching in Impala

Cloudera

JULY 13, 2023

In this blog, we will discuss performance improvement that Cloudera has contributed to the Apache Iceberg project in regards to Iceberg metadata reads, and we’ll showcase the performance benefit using Apache Impala as the query engine. Impala can access Hive table metadata fast because HMS is backed by RDBMS, such as mysql or postgresql.

Java

Java Metadata PostgreSQL Data Warehouse

Data Engineering Weekly #185

Data Engineering Weekly

AUGUST 18, 2024

The recent Stack Overflow survey echoes the statement that the usage of AI tools is gaining popularity in the development process. The need to adopt software development practices in the ETL process is much higher, as the success of AI-driven applications depends on data quality. link] Sponsored: Airflow 2.10 has landed!

Data Engineering

Data Engineering Data Engineer Engineering Kafka

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Cloudera

APRIL 3, 2023

The table metadata is stored next to the data files under a metadata directory, which allows multiple engines to use the same table simultaneously. CDW separates the compute (Virtual Warehouses) and metadata (DB catalogs) by running them in independent Kubernetes pods. Read why the future of data lakehouses is open.

Data Warehouse

Data Warehouse Java Metadata Data

Data Engineering Weekly #159

Data Engineering Weekly

FEBRUARY 18, 2024

Learn about Cube, the universal semantic layer, in an upcoming technical webinar. Register for our webinar to explore Cube Cloud and learn about the convenient UI for easier data modeling. I believe the data ownership problem is much deeper than simple metadata management.

Data Engineer

Data Engineer Data Engineering Engineering Data

Simplify Spatial Indexing with the Power of H3 — What the World Needs Now Is a Hexagonal Grid

Snowflake

FEBRUARY 29, 2024

This process of placing shapes over a real or imaginary map in a game has parallels to the real world — we call it creating a spatial grid. This grid, along with the metadata that represents the details of how the grid is constructed and used, is referred to as a spatial index. Line 10: the output table is specified.

Insurance

Insurance SQL Consulting BI

A Day in the Life of a DataOps Engineer

DataKitchen

OCTOBER 11, 2021

The biggest challenge is broken data pipelines due to highly manual processes. Figure 1: Example data pipeline with manual processes. There are numerous challenges with this process, as described below. There’s a fear of making changes to the process as it might break production. When can you declare it done?

Engineering

Engineering Business Analyst BI Metadata

From Hive Tables to Iceberg Tables: Hassle-Free

Cloudera

JULY 14, 2023

It could be much easier to simply stop all those jobs rather than allowing them to continue during the migration process. They simply read the underlying data (not even full read, they just read the parquet headers) and create corresponding Iceberg metadata files. Hive creates Iceberg’s metadata files for the same exact table.

Metadata

Metadata Data Warehouse Big Data Ecosystem Java

Knowing Your Data Starts with Data Lineage

Silectis

FEBRUARY 25, 2021

In this blog post (and the accompanying webinar recording) I provide a high-level overview lineage and how it applies in practice. Lineage is history – What is the change log for any element of metadata? LINEAGE IN CONTEXT Data lineage doesn’t exist in a vacuum, it is one of many tools one can use during the data engineering process.

ETL Tools

ETL Tools Metadata Data Data Engineer

Turning petabytes of pharmaceutical data into actionable insights

Cloudera

JUNE 4, 2018

The solution to this massive data challenge embedded the Aspire Content Processing Framework into the Cloudera Enterprise Data Hub as a Cloudera Parcel – a binary distribution format containing the program files, along with additional metadata used by Cloudera Manager. compliance reporting.

Pharmaceutical

Pharmaceutical Unstructured Data Electronics Metadata

Real-time AI: Live Recommendations Using Confluent and Rockset

Rockset

SEPTEMBER 26, 2023

The Rockset console where you can setup the Confluent Cloud integration Real-time updates and metadata filtering in Rockset While Confluent delivers the real-time data for AI applications, the other half of the AI equation is a serving layer capable of handling stringent latency and scale requirements.

Kafka

Kafka Metadata Cloud Database

Re-Imagining Data Observability

Databand.ai

NOVEMBER 4, 2022

In a recent webinar with IBM, we dug into why data observability is so important, what’s needed for data observability, and how Databand can help. Identifying a run duration issue with Databand and DataStage Next, we have a process that ran in DataStage for which we had previously set an alert if the run exceeds 120 seconds.

Data

Data Data Pipeline Retail Metadata

Leveraging AI & Automation in Data Engineering: 4 Essential Frameworks

Ascend.io

JULY 23, 2024

This article (based on the webinar below) explores how data engineering teams can leverage AI and automation to enhance productivity and tackle current challenges. Initially, data teams focused their effort on scaling storage and processing capabilities. Learn how we use metadata to automate 90% of manual data pipeline maintenance.]

Data Engineer

Data Engineer Data Engineering Engineering Metadata

Data Engineering Weekly #162

Data Engineering Weekly

MARCH 10, 2024

Google: Croissant- a metadata format for ML-ready datasets Google Research introduced Croissant, a new metadata format designed to make datasets ML-ready by standardizing the format, facilitating easier use in machine learning projects. Data engineers build the systems that store and process sensitive information.

Data Engineering

Data Engineering Data Engineer Engineering Metadata

Data Engineering Weekly #110

Data Engineering Weekly

DECEMBER 4, 2022

The author discusses the need for richer metadata to support complex data lineage and evolving privacy requirements. Upsolver SQLake lets you process fast-moving data by simply writing a SQL query. link] Barr Moses: What’s Next for Data Engineering in 2023? Pipelines for data in motion can quickly turn into DAG hell.

Data Engineer

Data Engineer Data Engineering Engineering Data Lake

How JetBlue Used Data Observability To Help Improve Internal “Data NPS” By 16 Points Year Over Year

Monte Carlo

JANUARY 30, 2024

This case study is based on information shared in recent Snowflake webinars and Summit presentations. At JetBlue, we use dimension tracking to monitor the health of the data attributes we would expect to see in our business process. It monitors the distribution of values and is really useful. You can proactively receive notifications.

Data

Data SQL Data Engineer Data Engineering

Rise of the MLOps Engineer And 4 Critical ML Model Monitoring Techniques

Monte Carlo

MARCH 9, 2023

That is what JetBlue did as described by data scientist Derrick Olson in a recent Snowflake webinar. For a real Monte Carlo example, one of our production models makes use of a “seconds since last metadata refresh” feature. But we know that this value should never be negative, otherwise we’d be somehow measuring data from the future.

Engineering

Engineering Machine Learning Data Pipeline Data Science

Data Engineering Weekly #104

Data Engineering Weekly

OCTOBER 23, 2022

The Data Engineering Weekly even published a special Metadata Edition focusing on the historical development of the Data Catalog. link] It is almost two years since we published the metadata edition, but I keep thinking back. I'm one of the early advocates for Data Catalogs and am excited about the possibility of Data Catalogs.

Data Engineer

Data Engineer Data Engineering Engineering Deep Learning

Azure Data Engineer (DP-203) Certification Cost in 2023

Knowledge Hut

SEPTEMBER 29, 2023

The Azure Data Engineer Certification test evaluates one's capacity for organizing and putting into practice data processing, security, and storage, as well as their capacity for keeping track of and maximizing data processing and storage. Why Should You Get an Azure Data Engineer Certification?

Certification

Certification Data Engineer Data Engineering Engineering

15+ Must Have Data Engineer Skills in 2023

Knowledge Hut

NOVEMBER 28, 2023

The contemporary world experiences a huge growth in cloud implementations, consequently leading to a rise in demand for data engineers and IT professionals who are well-equipped with a wide range of application and process expertise. Kafka is great for ETL and provides memory buffers that provide process reliability and resilience.

Data Engineer

Data Engineer Data Engineering Engineering Generalist

The Top Data Strategy Influencers and Content Creators on LinkedIn

Databand.ai

DECEMBER 29, 2022

She’s trained thousands on how to strategically use the power of data visualization to enhance the decision-making process. Beyond her hands-on work, Colleen is determined to make engineering organizations better for both humans and business through mentoring, leadership, and streamlining processes.

Consulting

Consulting BI Data Science Data Governance

How To Use Airbyte, dbt-teradata, Dagster, and Teradata Vantage™ for Seamless Data Integration

Scale Unstructured Text Analytics with Batch LLM Inference

Webinars

Trending Sources

New With Confluent Platform 8.0: Stream Securely, Monitor Easily, and Scale Endlessly

Webinars

Superset 5.0.0 Release Notes

AWS Generative AI Certification Guide | ProjectPro

Discovering the World of Third Party Plugins

Data Engineering Weekly #209

The Easiest Way to Power Real-Time AI: Confluent Announces Delta Lake Support & Unity Catalog Integration for Tableflow

Summary of the Gartner Presentation: “How Can You Leverage Technologies to Solve Data Quality Challenges?”

Why Spatial Data Governance is Critical to Your Business Strategy

Accelerate Your Machine Learning Workflows in Snowflake with Snowpark ML

Webinar Summary: Data Mesh and Data Products

NVIDIA RAPIDS in Cloudera Machine Learning

Ensono Cuts Costs with Snowflake Connector for ServiceNow

Top 3 Data + AI Predictions for Manufacturing in 2024

Cloudera Provides First Look at Cloudera Data Platform, the Industry’s First Enterprise Data Cloud

Long Live Data Products! Understand the 4 Stages of the Data Product Lifecycle

Implement a Multi-Cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform

Streamlining the Media Supply Chain

Breaking State and Local Data Silos with Modern Data Architectures

Cloudera DataFlow for the Public Cloud: A technical deep dive

Materialized Views in Hive for Iceberg Table Format

The Future of the Data Lakehouse – Open

Introducing Cloudera DataFlow Designer: Self-service, No-Code Dataflow Design

12 Times Faster Query Planning With Iceberg Manifest Caching in Impala

Data Engineering Weekly #185

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Data Engineering Weekly #159

Simplify Spatial Indexing with the Power of H3 — What the World Needs Now Is a Hexagonal Grid

A Day in the Life of a DataOps Engineer

From Hive Tables to Iceberg Tables: Hassle-Free

Knowing Your Data Starts with Data Lineage

Turning petabytes of pharmaceutical data into actionable insights

Real-time AI: Live Recommendations Using Confluent and Rockset

Re-Imagining Data Observability

Leveraging AI & Automation in Data Engineering: 4 Essential Frameworks

Data Engineering Weekly #162

Data Engineering Weekly #110

How JetBlue Used Data Observability To Help Improve Internal “Data NPS” By 16 Points Year Over Year

Rise of the MLOps Engineer And 4 Critical ML Model Monitoring Techniques

Data Engineering Weekly #104

Azure Data Engineer (DP-203) Certification Cost in 2023

15+ Must Have Data Engineer Skills in 2023

The Top Data Strategy Influencers and Content Creators on LinkedIn

Stay Connected