Demo and Metadata - Data Engineering Digest

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Iceberg tables become interoperable while maintaining ACID compliance by adding a layer of metadata to the data files in a users object storage. An external catalog tracks the latest table metadata and helps ensure consistency across multiple readers and writers. Put simply: Iceberg is metadata.

Data Lake

Data Lake Cloud Storage Metadata Data Warehouse

Metadata Management And Integration At LinkedIn With DataHub

Data Engineering Podcast

AUGUST 24, 2020

The key to those solutions is a robust and flexible metadata management system. LinkedIn has gone through several iterations on the most maintainable and scalable approach to metadata, leading them to their current work on DataHub. What were you using at LinkedIn for metadata management prior to the introduction of DataHub?

Metadata

Metadata Management Kafka Data Engineering

The Best Data Dictionary Tools in 2025

Monte Carlo

APRIL 28, 2025

Next, look for automatic metadata scanning. It has real-time metadata updates, deep data lineage, and its flexible if you want to customize or extend it for your teams specific needs. OpenMetadata Source: DataHub Then theres OpenMetadata , which is kind of like the Swiss Army knife of metadata tools.

Metadata

Metadata Hadoop Data SQL

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Announcing Open Source DataOps Data Quality TestGen 3.0

DataKitchen

FEBRUARY 20, 2025

Better Metadata Management Add Descriptions and Data Product tags to tables and columns in the Data Catalog for improved governance. Watch the Launch Webinar Here: [link] Download Now Request Demo Smarter Profiling & Test Generation Improved logic reduces false positives , making test results more accurate and actionable.

Datasets

Datasets Metadata Data Government

AI-Driven Data Integrity Innovations to Solve Your Top Data Management Challenges

Precisely

FEBRUARY 26, 2025

Automated metadata management – AI-generated catalog asset descriptions significantly reduce manual efforts and improve metadata quality – enabling teams to focus on more strategic tasks. Take the next steps in your data integrity journey today learn more about the Data Integrity Suite and schedule a personalized demo.

Data Integration

Data Integration Data Management Management Data Governance

Combining The Simplicity Of Spreadsheets With The Power Of Modern Data Infrastructure At Canvas

Data Engineering Podcast

JUNE 19, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. Atlan is the metadata hub for your data ecosystem.

Metadata

Metadata Unstructured Data MongoDB MySQL

An instant demo of data lineage is worth a thousand words

Datakin

AUGUST 10, 2021

Blog An instant demo of data lineage is worth a thousand words Written by Ross Turk on August 10, 2021 They say that a picture is worth a thousand words. That’s why we have made an instant demo of Datakin available at demo.datakin.com. After that, you can explore a collection of metadata from a fictional data pipeline.

Data Pipeline

Data Pipeline Metadata Data IT

Bring Geospatial Analytics Across Disparate Datasets Into Your Toolkit With The Unfolded Platform

Data Engineering Podcast

JUNE 26, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. Atlan is the metadata hub for your data ecosystem.

Datasets

Datasets Unstructured Data Metadata MongoDB

Snowflake Expands Partnership with Microsoft to Improve Interoperability Through Apache Iceberg

Snowflake

MAY 21, 2024

As Snowflake operates on the tables and writes data, OneLake will automatically convert the Iceberg metadata to Delta Lake format, without rewriting the Parquet files, so that Fabric engines can query the same tables. Similarly, Fabric OneLake enables Snowflake to read all OneLake data in Iceberg format, for consumption by Snowflake’s engine.

Metadata

Metadata Cloud Accessible Accessibility

Making The Total Cost Of Ownership For External Data Manageable With Crux

Data Engineering Podcast

JULY 17, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. Atlan is the metadata hub for your data ecosystem.

Data Management

Data Management Management Metadata MongoDB

Accelerate Your Machine Learning Workflows in Snowflake with Snowpark ML

Snowflake

JANUARY 23, 2024

Snowpark ML Operations: Model management The path to production from model development starts with model management, which is the ability to track versioned model artifacts and metadata in a scalable, governed manner. The Snowpark Model Registry API provides simple catalog and retrieval operations on models.

Machine Learning

Machine Learning Metadata Python Telecommunication

Why Spatial Data Governance is Critical to Your Business Strategy

Precisely

NOVEMBER 14, 2023

This journey must include a strong data governance framework to align people, processes, and technology, and enable them to understand and trust their data and metadata to achieve their business objectives. Does our organization’s data governance service include visibility and transparency of our spatial data and their metadata?

Data Governance

Data Governance Government Metadata Retail

Snowflake Cortex Search: State-of-the-Art Hybrid Search for RAG Applications

Snowflake

JULY 25, 2024

It supports “fuzzy” search — the service takes in natural language queries and returns the most relevant text results, along with associated metadata. For document- or chunk-level access controls, you can use metadata filtering to ensure that the service only returns the results that the client is authorized to view.

Unstructured Data

Unstructured Data Metadata Government SQL

A Breakthrough AI-Powered SQL Assistant

Snowflake

APRIL 11, 2024

Not only do we have a unique vantage point into the challenges faced by data analysts, we also possess rich metadata that feeds into Snowflake’s dedicated text-to-SQL model that Copilot leverages in combination with Mistral’s technology. Or, experience Copilot firsthand at our free Dev Day event on June 6th in the Demo Zone!

SQL

SQL Data Analysis AWS High Quality Data

The Scoop: Turmoil at Twitter

The Pragmatic Engineer

NOVEMBER 3, 2022

Printing it loses all this metadata. On 29 Oct, Saturday morning, a new project kicked off, with the first demo due on Monday morning. The first demo was scheduled for Monday, 31 October, in the morning: only two days later. He indirectly mandated working over the weekend by setting a Monday morning demo deadline, on a Saturday.

Software Engineer

Software Engineer Software Engineering Coding Media

Solving Data Lineage Tracking And Data Discovery At WeWork

Data Engineering Podcast

DECEMBER 16, 2019

The solution to discoverability and tracking of data lineage is to incorporate a metadata repository into your data platform. The metadata repository serves as a data catalog and a means of reporting on the health and status of your datasets when it is properly integrated into the rest of your tools.

Metadata

Metadata PostgreSQL Datasets Data Warehouse

Open, Interoperable Storage with Iceberg Tables, Now Generally Available

Snowflake

JUNE 21, 2024

Metadata and evolution support : We’ve added structured-type schema evolution for flexibility as source systems or business reporting needs change. Get better Iceberg ecosystem interoperability with Primary Key information added to Iceberg table metadata. In the meantime, you can get hands-on with Iceberg today!

Data Lake

Data Lake BI Business Intelligence Metadata

Hire And Scale Your Data Team With Intention

Data Engineering Podcast

JUNE 12, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking all of that information into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Go to dataengineeringpodcast.com/atlan today to learn more about how you can take advantage of active metadata and escape the chaos.

Metadata

Metadata Unstructured Data Business Intelligence MongoDB

Data News — Week 24.28

Christophe Blefari

JULY 13, 2024

kyutai released Moshi — Moshi is a "voice-enabled AI" The team as kyutai developed the model with an audio interface-first with an audio language model, which make the conversation with the AI more real (demo at 5:00 min) as it can interrupt you or kinda "think" (meaning for predict the next audio segment) while it speaks.

Kafka

Kafka AWS Data Database

Charting the Path of Riskified's Data Platform Journey

Data Engineering Podcast

JULY 10, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. Atlan is the metadata hub for your data ecosystem.

Metadata

Metadata MongoDB MySQL Machine Learning

Unifying Iceberg Tables on Snowflake

Snowflake

AUGUST 31, 2023

Catalog Integration: Our newly developed Catalog Integration feature allows you to seamlessly plug Snowflake into other Iceberg catalogs tracking table metadata. For metadata management, you can configure Snowflake to manage your Iceberg data or use an external Iceberg catalog. Have a Snowflake account and want to try this out?

Metadata

Metadata AWS Data Lake Datasets

Unlocking The Power of Data Lineage In Your Platform with OpenLineage

Data Engineering Podcast

MAY 18, 2021

If you need to deliver unprecedented speed, cost savings, and simplified access to large scale, real-time data, visit dataengineeringpodcast.com/molecula and request a demo. What is the current state of the ecosystem for generating and sharing metadata between systems? What are your goals for the OpenLineage effort?

Metadata

Metadata Kafka Data Warehouse Hadoop

The View From The Lakehouse Of Architectural Patterns For Your Data Platform

Data Engineering Podcast

JULY 3, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. Atlan is the metadata hub for your data ecosystem.

Architecture

Architecture Metadata MongoDB MySQL

From Apache Kafka to Amazon S3: Exactly Once

Confluent

APRIL 11, 2019

For this demo, I’ll use one of my favorite public feeds: real-time reservations in Meetup groups around the world. Given that events are produced in real time and at a reasonable pace, the Meetup feed is pretty handy for demos with real data. How about we take it for a spin? Try it yourself!

Kafka

Kafka AWS Metadata Architecture

Your Enterprise Data Needs an Agent

Snowflake

FEBRUARY 12, 2025

Additionally, Cortex Search supports date-range filtering on metadata columns. Watch the demo: See Cortex Agents in action. Improved customizability Cortex Search now provides the ability to select the vector embedding model for semantic search. This includes two multilingual models, snowflake-arctic-embed-l-v2.0 ai and Seek AI.

Unstructured Data

Unstructured Data Government SQL Structured Data

Monitoring Data Replication in Multi-Datacenter Apache Kafka Deployments

Confluent

APRIL 10, 2019

For example, when Confluent Monitoring Interceptors are configured on Kafka clients, they write metadata to a Kafka topic called _confluent-monitoring. In one command, this demo environment brings up an active-active multi-datacenter environment with Confluent Replicator copying data bidirectionally. Try it out yourself.

Kafka

Kafka Java Metadata Cloud

Tame The Entropy In Your Data Stack And Prevent Failures With Sifflet

Data Engineering Podcast

NOVEMBER 20, 2022

Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. What are some of the data modeling considerations that need to be considered when pushing metadata to Sifflet? Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold.

Data Lake

Data Lake Data Ingestion MongoDB MySQL

Build and deploy ML with ease Using Snowpark ML, Snowflake Notebooks, and Snowflake Feature Store

Snowflake

NOVEMBER 1, 2023

It consists of Python APIs accessible through the Snowpark ML library, and SQL interfaces for defining, managing and retrieving features, along with managed infrastructure for feature metadata management and continuous feature processing. Check out the Snowpark ML demo from Snowday to see the latest launches in action.

Building

Building Python SQL Programming Language

How DataOS Nails Gartner’s Magic Quadrant for Data Integration

The Modern Data Company

JANUARY 22, 2024

Data products contain data and its metadata, code, and infrastructure so that each product is self-contained and usable independently. These features elevate the data integration process, leveraging extensive metadata, AI, and ML algorithms. Operational data integration leadership DataOS supports a broad spectrum of use cases.

Data Integration

Data Integration Metadata Government Unstructured Data

ConsoleMe: A Central Control Plane for AWS Permissions and Access

Netflix Tech

MARCH 10, 2021

Access the AWS console ( docs , talk , demo ) ConsoleMe allows users to access the AWS console through the use of temporary IAM role credentials. Utilize ConsoleMe’s native policy editors for advanced requests ( docs , talk , demo ) ConsoleMe offers a native policy editor for popular resource types. Where can I learn more?

AWS

AWS Accessible Accessibility Cloud

Simplify Data Security For Sensitive Information With The Skyflow Data Privacy Vault

Data Engineering Podcast

JUNE 5, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking all of that information into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Go to dataengineeringpodcast.com/atlan today to learn more about how you can take advantage of active metadata and escape the chaos.

Data Security

Data Security Metadata MongoDB MySQL

Introducing Vector Search on Rockset: How to run semantic search with OpenAI and Rockset

Rockset

APRIL 18, 2023

To highlight these new capabilities, we built a search demo using OpenAI to create embeddings for Amazon product descriptions and Rockset to generate relevant search results. In the demo, you’ll see how Rockset delivers search results in 15 milliseconds over thousands of documents. What does this mean for search?

Unstructured Data

Unstructured Data Metadata Machine Learning SQL

Change Data Capture (CDC): What it is and How it Works

Striim

MARCH 21, 2025

Schedule a demo and well give you a personalized walkthrough or try Striim at production-scale for free! This would potentially require you to change your database log parsing logic with each new database release. Small data volumes or hoping to get hands on quickly? At Striim we also offer a free developer version.

IT

IT Data Lake Relational Database Data Warehouse

Snowflake Horizon Advances Industry-Leading Governance with Simplified Internal Marketplaces and AI Innovations

Snowflake

JUNE 5, 2024

Governed internal collaboration with better discoverability and AI-powered object metadata The Internal Marketplace (private preview) introduces a new way for customers to boost secure collaboration, through a single directory of all data products specifically curated for use within an organization.

Government

Government Accessibility Accessible Cloud

Data News — Week 23.24

Christophe Blefari

JUNE 16, 2023

Malloy's Near Term Roadmap — I've shared recently Malloy demo , which was awesome. Masthead does not run SQL on your data—which generate costs uplift—but reading logs and metadata to identify anomalies. You can put space in BigQuery column names — The editors of blef.fr (me) have no comment.

Programming Language

Programming Language SQL PostgreSQL Data

A Primer On Enterprise Data Curation with Todd Walter - Episode 49

Data Engineering Podcast

SEPTEMBER 23, 2018

Request a demo at dataengineeringpodcast.com/metis-machine to learn more about how Metis Machine is operationalizing data science. Request a demo at dataengineeringpodcast.com/metis-machine to learn more about how Metis Machine is operationalizing data science.

Data Lake

Data Lake Data Warehouse Data Architecture Architecture

How DataOS Nails Gartner’s Magic Quadrant for Data Integration

The Modern Data Company

JANUARY 22, 2024

Data products contain data and its metadata, code, and infrastructure so that each product is self-contained and usable independently. Augmented data integration, self-service data preparation, metadata support, and data governance are key strengths. Operational data integration leadership DataOS supports a broad spectrum of use cases.

Data Integration

Data Integration Metadata Government Unstructured Data

Our product vision for analytics in the age of AI

ThoughtSpot

JANUARY 31, 2024

Not only does ThoughtSpot not store your sample data or metadata, or use this information for model training, but we are also investing in bring-your-own and host-your-own model capabilities for both generative AI and machine learning models.

BI

BI Machine Learning Business Intelligence Metadata

Beyond Legacy Detection: How AI-Driven Data Governance Surpasses Traditional Methods

Striim

MARCH 4, 2025

It logs each event with AI-enhanced metadata for effective tracking and auditing, while its adaptive design accommodates evolving data sources through schema evolution. Try Striim today with a free trial or book a demo to see it in action. Start Your Free Trial | Schedule a Demo

Data Governance

Data Governance Government Healthcare NoSQL

Materialized Views in Hive for Iceberg Table Format

Cloudera

FEBRUARY 8, 2024

The snapshotId of the source tables involved in the materialized view are also maintained in the metadata. A Note on Iceberg materialized view specification Currently, the metadata needed for materialized views is maintained in Hive Metastore and it builds upon the materialized views metadata previously supported for Hive ACID tables.

Metadata

Metadata Data Warehouse BI AWS

A Flexible and Efficient Storage System for Diverse Workloads

Cloudera

SEPTEMBER 15, 2022

Apache Ozone achieves this significant capability through the use of some novel architectural choices by introducing bucket type in the metadata namespace server. Provides high performance namespace metadata operations similar to HDFS. BucketLayout Feature Demo , describes the ozone shell, ozoneFS and aws cli operations.

Systems

Systems Hadoop Metadata Telecommunication

Building a Customer 360 in the Snowflake Data Cloud with RudderStack

Snowflake

OCTOBER 2, 2023

To achieve a viable customer 360 solution at scale, a data team must: Build and maintain an identity graph Compute complex user traits Represent funnels as part of the journey Maintain metadata for predictive traits that change over time This work is complex and laborious. Click here to request a demo of Profiles.

Cloud

Cloud Building Insurance Data Engineering

12 Times Faster Query Planning With Iceberg Manifest Caching in Impala

Cloudera

JULY 13, 2023

In this blog, we will discuss performance improvement that Cloudera has contributed to the Apache Iceberg project in regards to Iceberg metadata reads, and we’ll showcase the performance benefit using Apache Impala as the query engine. Impala can access Hive table metadata fast because HMS is backed by RDBMS, such as mysql or postgresql.

Java

Java Metadata PostgreSQL Data Warehouse

Mapbox Snowflake Native App Opens Geospatial Analytics to New Audiences

Snowflake

MARCH 12, 2024

It also provides metadata on what assumptions and fixes were used to inform the address match. Use our Quickstart to learn how to geocode address data with Mapbox’s native app, or watch a quick two-minute demo to see how Mapbox used the Snowflake Native App Framework to create its native app.

Business Analyst

Business Analyst Retail Data Analysis Business Intelligence

How Apache Iceberg Is Changing the Face of Data Lakes

Metadata Management And Integration At LinkedIn With DataHub

Webinars

Trending Sources

The Best Data Dictionary Tools in 2025

Webinars

Announcing Open Source DataOps Data Quality TestGen 3.0

AI-Driven Data Integrity Innovations to Solve Your Top Data Management Challenges

Combining The Simplicity Of Spreadsheets With The Power Of Modern Data Infrastructure At Canvas

An instant demo of data lineage is worth a thousand words

Bring Geospatial Analytics Across Disparate Datasets Into Your Toolkit With The Unfolded Platform

Snowflake Expands Partnership with Microsoft to Improve Interoperability Through Apache Iceberg

Making The Total Cost Of Ownership For External Data Manageable With Crux

Accelerate Your Machine Learning Workflows in Snowflake with Snowpark ML

Why Spatial Data Governance is Critical to Your Business Strategy

Snowflake Cortex Search: State-of-the-Art Hybrid Search for RAG Applications

A Breakthrough AI-Powered SQL Assistant

The Scoop: Turmoil at Twitter

Solving Data Lineage Tracking And Data Discovery At WeWork

Open, Interoperable Storage with Iceberg Tables, Now Generally Available

Hire And Scale Your Data Team With Intention

Data News — Week 24.28

Charting the Path of Riskified's Data Platform Journey

Unifying Iceberg Tables on Snowflake

Unlocking The Power of Data Lineage In Your Platform with OpenLineage

The View From The Lakehouse Of Architectural Patterns For Your Data Platform

From Apache Kafka to Amazon S3: Exactly Once

Your Enterprise Data Needs an Agent

Monitoring Data Replication in Multi-Datacenter Apache Kafka Deployments

Tame The Entropy In Your Data Stack And Prevent Failures With Sifflet

Build and deploy ML with ease Using Snowpark ML, Snowflake Notebooks, and Snowflake Feature Store

How DataOS Nails Gartner’s Magic Quadrant for Data Integration

ConsoleMe: A Central Control Plane for AWS Permissions and Access

Simplify Data Security For Sensitive Information With The Skyflow Data Privacy Vault

Introducing Vector Search on Rockset: How to run semantic search with OpenAI and Rockset

Change Data Capture (CDC): What it is and How it Works

Snowflake Horizon Advances Industry-Leading Governance with Simplified Internal Marketplaces and AI Innovations

Data News — Week 23.24

A Primer On Enterprise Data Curation with Todd Walter - Episode 49

How DataOS Nails Gartner’s Magic Quadrant for Data Integration

Our product vision for analytics in the age of AI

Beyond Legacy Detection: How AI-Driven Data Governance Surpasses Traditional Methods

Materialized Views in Hive for Iceberg Table Format

A Flexible and Efficient Storage System for Diverse Workloads

Building a Customer 360 in the Snowflake Data Cloud with RudderStack

12 Times Faster Query Planning With Iceberg Manifest Caching in Impala

Mapbox Snowflake Native App Opens Geospatial Analytics to New Audiences

Stay Connected