Remove Analytics Application Remove Blog Remove Metadata
article thumbnail

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Cloudera

It leverages knowledge graphs to keep track of all the data sources and data flows, using AI to fill the gaps so you have the most comprehensive metadata management solution. Together, Cloudera and Octopai will help reinvent how customers manage their metadata and track lineage across all their data sources.

article thumbnail

Materialized Views in Hive for Iceberg Table Format

Cloudera

Overview This blog post describes support for materialized views for the Iceberg table format. Apache Iceberg is a high-performance open table format for petabyte-scale analytic datasets. The snapshotId of the source tables involved in the materialized view are also maintained in the metadata.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

A Flexible and Efficient Storage System for Diverse Workloads

Cloudera

In this blog post, we will talk about a single Ozone cluster with the capabilities of both Hadoop Core File System (HCFS) and Object Store (like Amazon S3). Please refer to our earlier Cloudera blog for more details about Ozone’s performance benefits and atomicity guarantees. FILE_SYSTEM_OPTIMIZED Bucket (“FSO”). LEGACY Bucket.

Systems 87
article thumbnail

Ozone Write Pipeline V2 with Ratis Streaming

Cloudera

It enables cloud-native applications to store and process mass amounts of data in a hybrid multi-cloud environment and on premises. These could be traditional analytics applications like Spark, Impala, or Hive, or custom applications that access a cloud object store natively. This results in write amplification.

article thumbnail

Demystifying Modern Data Platforms

Cloudera

Modern data platforms deliver an elastic, flexible, and cost-effective environment for analytic applications by leveraging a hybrid, multi-cloud architecture to support data fabric, data mesh, data lakehouse and, most recently, data observability. The post Demystifying Modern Data Platforms appeared first on Cloudera Blog.

article thumbnail

Discover and Explore Data Faster with the CDP DDE Template

Cloudera

It is designed to simplify deployment, configuration, and serviceability of Solr-based analytics applications. DDE also makes it much easier for application developers or data workers to self-service and get started with building insight applications or exploration services based on text or other unstructured data (i.e.

article thumbnail

How to Update Documents in Elasticsearch

Rockset

When building applications on change data capture (CDC) data using Elasticsearch, you’ll want to architect the system to handle frequent updates or modifications to the existing documents in an index. In this blog, we’ll walk through the different options available for updates including full updates, partial updates and scripted updates.