article thumbnail

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Cloudera

It leverages knowledge graphs to keep track of all the data sources and data flows, using AI to fill the gaps so you have the most comprehensive metadata management solution. Together, Cloudera and Octopai will help reinvent how customers manage their metadata and track lineage across all their data sources.

article thumbnail

Materialized Views in Hive for Iceberg Table Format

Cloudera

The snapshotId of the source tables involved in the materialized view are also maintained in the metadata. A Note on Iceberg materialized view specification Currently, the metadata needed for materialized views is maintained in Hive Metastore and it builds upon the materialized views metadata previously supported for Hive ACID tables.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

A Flexible and Efficient Storage System for Diverse Workloads

Cloudera

Apache Ozone achieves this significant capability through the use of some novel architectural choices by introducing bucket type in the metadata namespace server. It removes the need to port data from an object store to a file system so analytics applications can read it. FILE_SYSTEM_OPTIMIZED Bucket (“FSO”). LEGACY Bucket.

Systems 87
article thumbnail

Demystifying Modern Data Platforms

Cloudera

Modern data platforms deliver an elastic, flexible, and cost-effective environment for analytic applications by leveraging a hybrid, multi-cloud architecture to support data fabric, data mesh, data lakehouse and, most recently, data observability. Are there things they should keep in mind?

article thumbnail

Ozone Write Pipeline V2 with Ratis Streaming

Cloudera

It enables cloud-native applications to store and process mass amounts of data in a hybrid multi-cloud environment and on premises. These could be traditional analytics applications like Spark, Impala, or Hive, or custom applications that access a cloud object store natively. This results in write amplification.

article thumbnail

Turning Streams Into Data Products

Cloudera

For governance and security teams, the questions revolve around chain of custody, audit, metadata, access control, and lineage. Building real-time data analytics pipelines is a complex problem, and we saw customers struggle using processing frameworks such as Apache Storm, Spark Streaming, and Kafka Streams. .

Kafka 88
article thumbnail

The Evolution of Table Formats

Monte Carlo

At its core, a table format is a sophisticated metadata layer that defines, organizes, and interprets multiple underlying data files. For example, a single table named ‘Customers’ is actually an aggregation of metadata that manages and references several data files, ensuring that the table behaves as a cohesive unit.