Blog, Cloud and Metadata - Data Engineering Digest

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Cloudera

NOVEMBER 13, 2024

The combined platform will integrate data – from wherever it originates and wherever it is stored (cloud or on prem) – to deliver real-time insights required for faster decision making and predictive generative AI applications for personalized customer experiences.

Metadata

Metadata Management Data Governance Government

Level Up Your Data Platform With Active Metadata

Data Engineering Podcast

JUNE 19, 2022

Summary Metadata is the lifeblood of your data platform, providing information about what is happening in your systems. In order to level up their value a new trend of active metadata is being implemented, allowing use cases like keeping BI reports up to date, auto-scaling your warehouses, and automated data governance.

Metadata

Metadata MongoDB MySQL Scala

The Data Turf Wars are Over, But the Metadata Turf Wars Have Just Begun

Cloudera

AUGUST 6, 2024

Many companies adopted the public cloud, but very few organizations will ever move everything to the cloud, or to a single cloud. The future for most data teams will be multi-cloud and hybrid. And for that future to be a reality, data teams must shift their attention to metadata, the new turf war for data.

Metadata

Metadata Government Datasets Architecture

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Cloudera

OCTOBER 23, 2024

In August, we wrote about how in a future where distributed data architectures are inevitable, unifying and managing operational and business metadata is critical to successfully maximizing the value of data, analytics, and AI. It is a critical feature for delivering unified access to data in distributed, multi-engine architectures.

Metadata

Metadata BI Data Lake Business Intelligence

The Struggle Between Data Dark Ages and LLM Accuracy

Cloudera

DECEMBER 6, 2024

The AI Forecast: Data and AI in the Cloud Era , sponsored by Cloudera, aims to take an objective look at the impact of AI on business, industry, and the world at large. And specifically, I was reading one of your blog posts recently that talked about the dark ages of data. It could be metadata that you weren’t capturing before.

Manufacturing

Manufacturing Retail Finance Metadata

Upgrade Journey: The Path from CDH to CDP Private Cloud

Cloudera

SEPTEMBER 28, 2020

Cloudera delivers an enterprise data cloud that enables companies to build end-to-end data pipelines for hybrid cloud, spanning edge devices to public or private cloud, with integrated security and governance underpinning it to protect customers data. Review the Upgrade document topic for the supported upgrade paths.

Cloud

Cloud Kafka Professional Services Metadata

Apache Ozone Powers Data Science in CDP Private Cloud

Cloudera

AUGUST 26, 2021

The object store is readily available alongside HDFS in CDP (Cloudera Data Platform) Private Cloud Base 7.1.3+. Learn more about the impacts of global data sharing in this blog, The Ethics of Data Exchange. Ozone Namespace Overview. Data ingestion through ‘s3’. As described above, Ozone introduces volumes to the world of S3.

Data Science

Data Science Cloud Hadoop Metadata

Choose Compliance, Choose Hybrid Cloud

Cloudera

MAY 2, 2022

There are many reasons to deploy a hybrid cloud architecture — not least cost, performance, reliability, security, and control of infrastructure. But increasingly at Cloudera, our clients are looking for a hybrid cloud architecture in order to manage compliance requirements.

Cloud

Cloud Government Healthcare Metadata

What’s New in CDP Private Cloud Base 7.1.7?

Cloudera

AUGUST 10, 2021

With the release of CDP Private Cloud (PvC) Base 7.1.7, We expand on this feature later in this blog. Atlas / Kafka integration provides metadata collection for Kafa producers/consumers so that consumers can manage, govern, and monitor Kafka metadata and metadata lineage in the Atlas UI. x, and 6.3.x,

Cloud

Cloud Kafka Metadata SQL

Cloudera Data Platform extends Hybrid Cloud vision support by supporting Google Cloud

Cloudera

MARCH 31, 2021

CDP Public Cloud is now available on Google Cloud. The addition of support for Google Cloud enables Cloudera to deliver on its promise to offer its enterprise data platform at a global scale. CDP Public Cloud is already available on Amazon Web Services and Microsoft Azure.

Google Cloud

Google Cloud Cloud Amazon Web Services Cloud Storage

Cloudera DataFlow for the Public Cloud: A technical deep dive

Cloudera

AUGUST 16, 2021

We just announced Cloudera DataFlow for the Public Cloud (CDF-PC), the first cloud-native runtime for Apache NiFi data flows. In this blog post we’re revisiting the challenges that come with running Apache NiFi at scale before we take a closer look at the architecture and core features of CDF-PC.

Cloud

Cloud Unstructured Data Utilities Metadata

Metadata Management and Data Governance with Cloudera SDX

Cloudera

JANUARY 26, 2024

This will allow a data office to implement access policies over metadata management assets like tags or classifications, business glossaries, and data catalog entities, laying the foundation for comprehensive data access control. First, a set of initial metadata objects are created by the data steward.

Metadata

Metadata Data Governance Government Management

Netflix Cloud Packaging in the Terabyte Era

Netflix Tech

SEPTEMBER 24, 2021

Our previous tech blog Packaging award-winning shows with award-winning technology detailed our packaging technology deployed on the streaming side. As an example, cloud-based post-production editing and collaboration pipelines demand a complex set of functionalities, including the generation and hosting of high quality proxy content.

Cloud

Cloud Bytes Cloud Storage Media

A Reference Architecture for the Cloudera Private Cloud Base Data Platform

Cloudera

JULY 15, 2021

The release of Cloudera Data Platform (CDP) Private Cloud Base edition provides customers with a next generation hybrid cloud architecture. Private Cloud Base Overview. The storage layer for CDP Private Cloud, including object storage. Traditional data clusters for workloads not ready for cloud. Edge or Gateway.

Architecture

Architecture Cloud Kafka Hadoop

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

From the start, the Snowflake platform has been delivered as a service, consisting of optimized storage, elastic multi-cluster compute, and cloud services. To learn more about how Snowflake supports the architecture patterns described in this blog post, visit our pages for data warehouse , data lake , data lakehouse , and data mesh.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

Is Apache Iceberg the New Hadoop? Navigating the Complexities of Modern Data Lakehouses

Data Engineering Weekly

MARCH 5, 2025

This blog post expands on that insightful conversation, offering a critical look at Iceberg's potential and the hurdles organizations face when adopting it. This ecosystem includes: Catalogs: Services that manage metadata about Iceberg tables (e.g., If not handled correctly, managing this metadata can become a bottleneck.

Hadoop

Hadoop Metadata Data Ingestion Data Governance

Quantifying the value of multi-cloud deployment strategies with CDP Public Cloud

Cloudera

MAY 6, 2021

In this article, I will be focusing on the contribution that a multi-cloud strategy has towards these value drivers, and address a question that I regularly get from clients: Is there a quantifiable benefit to a multi-cloud deployment? Infrastructure Cost Optimization.

Cloud

Cloud Insurance Metadata Utilities

Databricks, Snowflake and the future

Christophe Blefari

JUNE 21, 2024

Snowflake and Databricks have the same goal, both are selling a cloud on top of classic 1 cloud vendors. Both companies have added Data and AI to their slogan, Snowflake used to be The Data Cloud and now they're The AI Data Cloud. Snowflake Summit Snowflake took the lead, setting the tone.

Metadata

Metadata Data Warehouse BI MySQL

Build an Open Data Lakehouse with Iceberg Tables, Now in Public Preview

Snowflake

DECEMBER 4, 2023

With this public preview, those external catalog options are either “GLUE”, where Snowflake can retrieve table metadata snapshots from AWS Glue Data Catalog, or “OBJECT_STORE”, where Snowflake retrieves metadata snapshots directly from the specified cloud storage location. With these three options, which one should you use?

Building

Building Metadata Cloud Storage AWS

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

Cloudera

APRIL 1, 2024

We’re excited to share that Gartner has recognized Cloudera as a Visionary among all vendors evaluated in the 2023 Gartner® Magic Quadrant for Cloud Database Management Systems. Download the complimentary 2023 Gartner Magic Quadrant for Cloud Database Management Systems report.

Cloud

Cloud Unstructured Data Metadata Government

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Cloudera

JANUARY 15, 2021

Cloud data warehouses allow users to run analytic workloads with greater agility, better isolation and scale, and lower administrative overhead than ever before. DW1 is an anonymized cloud data warehouse running on AWS and DW2 is an anonymized data warehouse running on GCP. Impala use of KRPC (see dedicated blog post ).

Data Warehouse

Data Warehouse Cloud Consulting SQL

Extreme data center pressure? Burst to the cloud with CDP!

Cloudera

NOVEMBER 12, 2020

Cloud has given us hope, with public clouds at our disposal we now have virtually infinite resources, but they come at a different cost – using the cloud means we may be creating yet another series of silos, which also creates unmeasurable new risks in security and traceability of our data. A solution.

Cloud

Cloud Data Warehouse Banking Data

Confluent Cloud Schema Registry is Now Generally Available

Confluent

AUGUST 27, 2019

We are excited to announce the release of Confluent Cloud Schema Registry in general availability (GA), available in Confluent Cloud , our fully managed event streaming service based on Apache Kafka ®. Before we dive into Confluent Cloud Schema Registry, let’s recap what Confluent Schema Registry is and does.

Cloud

Cloud Kafka Electronics Metadata

New Features in Cloudera Streams Messaging Public Cloud 7.2.12

Cloudera

OCTOBER 25, 2021

With the launch of the Cloudera Public Cloud 7.2.12, the Streams Messaging for Data Hub deployments have gotten some interesting new features! In CDP Public Cloud 7.2.8, an Atlas hook was provided that once configured allows for Kafka metadata to be collected. The post New Features in Cloudera Streams Messaging Public Cloud 7.2.12

Cloud

Cloud Kafka Metadata Management

Upgrade Hortonworks Data Platform (HDP) to Cloudera Data Platform (CDP) Private Cloud Base

Cloudera

FEBRUARY 17, 2022

CDP Private Cloud Base is an on-premises version of Cloudera Data Platform (CDP). Provides a consistent experience on Public Cloud, Multi-Cloud, and Private Cloud deployments. One of our previous blogs discussed the four paths to get from legacy platforms to CDP Private Cloud Base. Upgrade HDP 3.1.5

Cloud

Cloud Data Professional Services Database

2024 Governance Trends for Data Leaders

phData: Data Engineering

NOVEMBER 1, 2024

This blog is a collection of those insights, but for the full trendbook, we recommend downloading the PDF. Organizations also need a better understanding of how LLMs are trained, especially with external vendors or public cloud environments. Quotes Data governance is going to play a large role in what data can go into an LLM.

Government

Government Data Governance Finance Metadata

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

In this blog, we will discuss: What is the Open Table format (OTF)? Then, we add another column called HASHKEY , add more data, and locate the S3 file containing metadata for the iceberg table. In the screenshot below, we can see that the metadata file for the Iceberg table retains the snapshot history. Why should we use it?

Architecture

Architecture Systems Data Lake Google Cloud

Key considerations when making a decision on a Cloud Data Warehouse

Cloudera

MAY 17, 2021

Making a decision on a cloud data warehouse is a big deal. Modernizing your data warehousing experience with the cloud means moving from dedicated, on-premises hardware focused on traditional relational analytics on structured data to a modern platform.

Data Warehouse

Data Warehouse Cloud Government Metadata

Cloudera Data Platform in AWS Marketplace Simplifies and Accelerates Cloud Adoption

Cloudera

SEPTEMBER 21, 2020

As organizations look to optimize the speed and cost of their cloud journey in today’s rapidly evolving economy, Cloudera is delighted to announce the availability of Cloudera Data Platform (CDP) Public Cloud in AWS Marketplace. It also includes AWS credits for free CDP trials and the mitigation of migration costs.

AWS

AWS Cloud Amazon Web Services Data Warehouse

Implement a Multi-Cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

DECEMBER 15, 2022

Today, we are thrilled to share some new advancements in Cloudera’s integration of Apache Iceberg in CDP to help accelerate your multi-cloud open data lakehouse implementation. Multi-cloud deployment with CDP public cloud. Multi-cloud capability is now available for Apache Iceberg in CDP. Advanced capabilitie.

Cloud

Cloud Metadata Data Warehouse Google Cloud

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

Customers can now seamlessly automate migration to Cloudera’s Hybrid Data Platform — Cloudera Data Platform (CDP) to dynamically auto-scale cloud services with Cloudera Data Engineering (CDE) integration with Modak Nabu. Cloud Speed and Scale. Also, enterprises can tap into new technologies like Kubernetes. Integrated security model

Data Engineer

Data Engineer Data Engineering Cloud Engineering

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

In this blog, I will demonstrate the value of Cloudera DataFlow (CDF) , the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP) , as a Data integration and Democratization fabric. Data and Metadata: Data inputs and data outputs produced based on the application logic. Introduction.

Architecture

Architecture Metadata Kafka Government

Cloudera Named a Leader in the 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems (DBMS)

Cloudera

DECEMBER 16, 2022

We are pleased to announce that Cloudera has been named a Leader in the 2022 Gartner ® Magic Quadrant for Cloud Database Management Systems. Cloudera has been recognized in this cloud DBMS report since its inception in 2020. We do it today when data is even bigger, and hybrid — and clouds — are expensive. This is unique.

Database

Database Cloud Systems Management

Gartner® Magic Quadrant™ for Cloud Database Report Recognizes Cloudera as a Visionary

Cloudera

JANUARY 19, 2022

Gartner® recognized Cloudera in three recent reports – Magic Quadrant for Cloud Database Management Systems (DBMS), Critical Capabilities for Cloud Database Management Systems for Analytical Use Cases and Critical Capabilities for Cloud Database Management Systems for Operational Use Cases.

Database

Database Cloud Data Warehouse Data Lake

Migrate Hive data from CDH to CDP public cloud

Cloudera

JUNE 25, 2021

Many Cloudera customers are making the transition from being completely on-prem to cloud by either backing up their data in the cloud, or running multi-functional analytics on CDP Public cloud in AWS or Azure. Configure the required ports to enable connectivity from CDH to CDP Public Cloud (see docs for details).

Cloud

Cloud Data Lake Cloud Storage Metadata

Continued Investments in Price Performance and Faster Top-K Queries

Snowflake

AUGUST 7, 2024

The Snowflake AI Data Cloud is an end-to-end platform that supports all types of data, compute, use cases and personas across an entire organization. Before Snowflake starts executing the query, we look at the metadata of the partitions to determine whether the contents of a given partition are likely to end up in the final result.

Metadata

Metadata Algorithm Process Utilities

Apache Ozone – A High Performance Object Store for CDP Private Cloud

Cloudera

OCTOBER 15, 2021

Apache Ozone is a distributed, scalable, and high performance object store, available with Cloudera Data Platform Private Cloud. CDP Private Cloud uses Ozone to separate storage from compute, which enables it to handle billions of objects on-premises, akin to Public Cloud deployments which benefit from the likes of S3.

Cloud

Cloud Hadoop Data Analytics Metadata

The Security Challenges of Data Warehousing in the Cloud

Cloudera

NOVEMBER 5, 2020

How do you optimize your enterprise-wide infrastructure (mostly cloud) and application expenditures? CDP includes Cloudera Shared Data eXperience (SDX), a centralized set of security, governance, and management capabilities that make it possible to use cloud resources without sacrificing data privacy or creating compliance risks.

Cloud

Cloud Data Lake Data Warehouse Metadata

Cloudera Provides First Look at Cloudera Data Platform, the Industry’s First Enterprise Data Cloud

Cloudera

JUNE 25, 2019

Cloudera Unveils Industry’s First Enterprise Data Cloud in Webinar. How do you take a mission-critical on-premises workload and rapidly burst it to the cloud? Can you instantly auto-scale resources as demand requires and just as easily pause your work so you don’t run up your cloud bill? First-of-its-kind enterprise data cloud.

Cloud

Cloud Entertainment Machine Learning Government

Announcing Nickel 1.0

Tweag

MAY 16, 2023

The manifest of a web app, the configuration of an Apache virtual host, an Infrastructure-as-Code (IaC) cloud deployment (Terraform, Kubernetes, etc.). That’s a dangerous mistake: with the advent of IaC for the cloud, configuration has become an important aspect of modern software systems, and a critical point of failure.

MySQL

MySQL Metadata Coding Data Validation

Building an effective data approach in a hybrid cloud world – part 2

Cloudera

AUGUST 24, 2020

In the last blog with Deloitte’s Marc Beierschoder, we talked about what the hybrid cloud is, why it can benefit a business and what the key blockers often are in implementation. When building your data foundation, how can you prioritize innovation within a hybrid cloud strategy? You can read it here. .

Cloud

Cloud Building Government Data

Introducing Cloudera Observability Premium

Cloudera

JULY 10, 2024

In the public cloud, these cost management issues are compounded by consumption rates, where compute is often overused due to a lack of visibility into optimization opportunities. You can tap into insights such as where to optimize for the biggest gains, what you can do to fix workloads that don’t run, and how you can save money in the cloud.

Metadata

Metadata Cloud Management IT

Introducing Apache Iceberg in Cloudera Data Platform

Cloudera

FEBRUARY 22, 2022

Today, we are announcing a private technical preview (TP) release of Iceberg for CDP Data Services in the public cloud, including Cloudera Data Warehousing ( CDW ) and Cloudera Data Engineering ( CDE ). . Apache Iceberg is a new open table format targeted for petabyte-scale analytic datasets. Key Design Goals . Multi-function analytics .

Metadata

Metadata Datasets BI SQL

Cloudera Data Warehouse outperforms Azure HDInsight in TPC-DS benchmark

Cloudera

SEPTEMBER 29, 2020

Performance is one of the key, if not the most important deciding criterion, in choosing a Cloud Data Warehouse service. In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to Microsoft HDInsight (also powered by Apache Hive-LLAP) on Azure using the TPC-DS 2.9

Data Warehouse

Data Warehouse Cloud Storage Metadata Cloud

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Level Up Your Data Platform With Active Metadata

Webinars

Trending Sources

The Data Turf Wars are Over, But the Metadata Turf Wars Have Just Begun

Webinars

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

The Struggle Between Data Dark Ages and LLM Accuracy

Upgrade Journey: The Path from CDH to CDP Private Cloud

Apache Ozone Powers Data Science in CDP Private Cloud

Choose Compliance, Choose Hybrid Cloud

What’s New in CDP Private Cloud Base 7.1.7?

Cloudera Data Platform extends Hybrid Cloud vision support by supporting Google Cloud

Cloudera DataFlow for the Public Cloud: A technical deep dive

Metadata Management and Data Governance with Cloudera SDX

Netflix Cloud Packaging in the Terabyte Era

A Reference Architecture for the Cloudera Private Cloud Base Data Platform

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Is Apache Iceberg the New Hadoop? Navigating the Complexities of Modern Data Lakehouses

Quantifying the value of multi-cloud deployment strategies with CDP Public Cloud

Databricks, Snowflake and the future

Build an Open Data Lakehouse with Iceberg Tables, Now in Public Preview

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

Cloudera Data Warehouse Demonstrates Best-in-Class Cloud-Native Price-Performance

Extreme data center pressure? Burst to the cloud with CDP!

Confluent Cloud Schema Registry is Now Generally Available

New Features in Cloudera Streams Messaging Public Cloud 7.2.12

Upgrade Hortonworks Data Platform (HDP) to Cloudera Data Platform (CDP) Private Cloud Base

2024 Governance Trends for Data Leaders

Why Open Table Format Architecture is Essential for Modern Data Systems

Key considerations when making a decision on a Cloud Data Warehouse

Cloudera Data Platform in AWS Marketplace Simplifies and Accelerates Cloud Adoption

Implement a Multi-Cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera Named a Leader in the 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems (DBMS)

Gartner® Magic Quadrant™ for Cloud Database Report Recognizes Cloudera as a Visionary

Migrate Hive data from CDH to CDP public cloud

Continued Investments in Price Performance and Faster Top-K Queries

Apache Ozone – A High Performance Object Store for CDP Private Cloud

The Security Challenges of Data Warehousing in the Cloud

Cloudera Provides First Look at Cloudera Data Platform, the Industry’s First Enterprise Data Cloud

Announcing Nickel 1.0

Building an effective data approach in a hybrid cloud world – part 2

Introducing Cloudera Observability Premium

Introducing Apache Iceberg in Cloudera Data Platform

Cloudera Data Warehouse outperforms Azure HDInsight in TPC-DS benchmark

Stay Connected