Accessibility, Analytics Application and Blog

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Cloudera

NOVEMBER 13, 2024

Cloudera, together with Octopai, will make it easier for organizations to better understand, access, and leverage all their data in their entire data estate – including data outside of Cloudera – to power the most robust data, analytics and AI applications.

Metadata

Metadata Management Data Governance Government

Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds

Cloudera

OCTOBER 12, 2020

We believe Eventador will accelerate innovation in our Cloudera DataFlow streaming platform and deliver more business value to our customers in their real-time analytics applications. The post Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds appeared first on Cloudera Blog.

Cloud

Cloud Process Scala Kafka

AWS Lambda Cold Start: A Beginner’s Guide

ProjectPro

JUNE 6, 2025

From understanding the delays to implementing effective solutions, dive into practical strategies for optimizing serverless performance in this blog. Every now and then, your application experiences a slow start-up time, affecting user experience. Discover all there is to know about AWS Lambda Cold Starts with our in-depth guide.

AWS

AWS Amazon Web Services Programming Language Media

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Data Engineer vs. Data Architect-Who Builds the Data Castle?

ProjectPro

JUNE 6, 2025

In this blog, we will explore the roles of data engineers and data architects and the key differences between them. After reading this blog, you'll have a better understanding of who builds the data castle and how they do it. Data Engineers are responsible for integrating and cleaning data for usage in analytics applications.

Data Architect

Data Architect Data Engineer Data Engineering Building

Apache Ozone – A Multi-Protocol Aware Storage System

Cloudera

NOVEMBER 7, 2023

This blog post is intended to provide guidance to Ozone administrators and application developers on the optimal usage of the bucket layouts for different applications. Bucket Layouts in Apache Ozone Interoperability between FS and S3 API Users can store their data in Apache Ozone and can access the data with multiple protocols.

Systems

Systems Hadoop Unstructured Data Media

DynamoDB vs. MongoDB- Battle of The Best NoSQL Databases

ProjectPro

JUNE 6, 2025

With over 24K customers worldwide and 2K Github repositories, DynamoDB is one of the most popular NoSQL databases available today, allowing developers to focus on building applications without worrying about maintaining the underlying infrastructure. MongoDB fully supports secondary indexes, ensuring fast access to data by any field.

NoSQL

NoSQL MongoDB Database Amazon Web Services

AWS vs GCP - Which One to Choose in 2025?

ProjectPro

JUNE 6, 2025

AWS vs. GCP blog compares the two major cloud platforms to help you choose the best one. Table of Contents AWS vs. GCP - The Cloud Battle AWS vs. Popular instances where GCP is used widely are machine learning analytics, application modernization, security, and business collaboration. Let’s get started!

AWS

AWS Amazon Web Services Google Cloud Cloud Storage

Using SQL to democratize streaming data

Cloudera

MARCH 2, 2021

However, in the typical enterprise, only a small team has the core skills needed to gain access and create value from streams of data. Contrast that with the skills honed over decades for gaining access, building data warehouses, performing ETL, creating reports and/or applications using structured query language (SQL).

SQL

SQL Data Lake Java Scala

10+ Top Data Pipeline Tools to Streamline Your Data Journey

ProjectPro

JUNE 6, 2025

This blog is all about that—specifically, the top 10 data pipeline tools that data engineers worldwide rely on. From the initial extraction of raw data to its eventual loading into a data warehouse or analytical platform, data pipelines play a pivotal role in shaping the information narrative within an organization.

Data Pipeline

Data Pipeline Google Cloud Kafka AWS

Azure Blob Storage: Hidden Gem of Cloud Storage Solutions

ProjectPro

JUNE 6, 2025

From reducing storage costs to improving data accessibility and enhancing security, the advantages of cloud storage solutions are endless. This blog will help you take a closer look at Azure Blob Storage and explore its key features, benefits, and use cases. Table of Contents What is Microsoft Azure Blob Storage?

Cloud Storage

Cloud Storage Cloud Unstructured Data Data Lake

Apache Spark on Azure: When Big Data Meets Cloud

ProjectPro

JUNE 6, 2025

Its integration with other Azure services and support for real-time analytics and machine learning make it a valuable tool for many businesses. Read this blog to understand the benefits of using Apache Spark on Azure, the various Azure services available for Spark, and a few suitable use case scenarios for Spark on Azure.

Big Data

Big Data Cloud Data Lake Big Data Tools

15 AWS DevOps Project Ideas to Step Up Your DevOps Game

ProjectPro

JUNE 6, 2025

With AWS DevOps, data scientists and engineers can access a vast range of resources to help them build and deploy complex data processing pipelines, machine learning models, and more. This blog will explore 15 exciting AWS DevOps project ideas that can help you gain hands-on experience with these powerful tools and services.

AWS

AWS Project Medical Deep Learning

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

Cloudera

AUGUST 13, 2021

The ability to manage how the data flows and transforms during the first mile of the data pipeline and control the data distribution can accelerate the performance of all analytic applications. The post Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics appeared first on Cloudera Blog.

Data Pipeline

Data Pipeline Data Lake ETL Tools Unstructured Data

A Flexible and Efficient Storage System for Diverse Workloads

Cloudera

SEPTEMBER 15, 2022

In this blog post, we will talk about a single Ozone cluster with the capabilities of both Hadoop Core File System (HCFS) and Object Store (like Amazon S3). Interoperability of the same data for several workloads: multi-protocol access. Ranger policies enable authorization access to Ozone resources (volume, bucket, and key).

Systems

Systems Hadoop Telecommunication Metadata

How to Learn AWS for Data Engineering?

ProjectPro

JUNE 6, 2025

Becoming a successful aws data engineer demands you to learn AWS for data engineering and leverage its various services for building efficient business applications. It is useful to learn about the different cloud services AWS offers for the first-ever step of any data analytics process, i.e., data engineering on AWS!

AWS

AWS Data Engineer Data Engineering Engineering

Data News — Week 23.01

Christophe Blefari

JANUARY 7, 2023

The blog crossed the 2000 members mark (❤️) and I won the best data science newsletter award. Introducing ADBC: Database Access for Apache Arrow — When I see "minimal-overhead alternative to JDBC/ODBC for analytical applications" I'm instantly in.

Data

Data Data Science BI Kafka

How to Learn Big Data Step by Step from Scratch in 2025?

ProjectPro

JUNE 6, 2025

Introduction to Big Data Big data combines structured, semi-structured, and unstructured data collected by organizations to glean valuable insights and information using machine learning, predictive modeling , and other advanced analytical applications. This will give you an overview of the theory around big data analytics.

Big Data

Big Data Big Data Skills Scala Hadoop

SQL for Data Engineering: Success Blueprint for Data Engineers

ProjectPro

JUNE 6, 2025

If you are still wondering whether or why you need to master SQL for data engineering, read this blog to take a deep dive into the world of SQL for data engineering and how it can take your data engineering skills to the next level. Your SQL skills as a data engineer are crucial for data modeling and analytics tasks.

Data Engineering

Data Engineering Data Engineer SQL Engineering

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

Modern data platforms deliver an elastic, flexible, and cost-effective environment for analytic applications by leveraging a hybrid, multi-cloud architecture to support data fabric, data mesh, data lakehouse and, most recently, data observability. The post Demystifying Modern Data Platforms appeared first on Cloudera Blog.

Data Lake

Data Lake Cloud Storage Analytics Application Architecture

Unify your data: AI and Analytics in an Open Lakehouse

Cloudera

MAY 30, 2024

This unified data environment eliminates the need for maintaining separate data silos and facilitates seamless access to data for AI and analytics applications. The post Unify your data: AI and Analytics in an Open Lakehouse appeared first on Cloudera Blog. Learn more about the Cloudera Open Data Lakehouse here.

Data Lake

Data Lake Data Warehouse Programming Language Data Ingestion

Top 6 Big Data and Business Analytics Companies to Work For in 2025

ProjectPro

JUNE 6, 2025

There are several big data and business analytics companies that offer a novel kind of big data innovation through unprecedented personalization and efficiency at scale. Which big data analytic companies are believed to have the biggest potential?

Big Data

Big Data Hadoop Business Analyst Data Analytics

Discover and Explore Data Faster with the CDP DDE Template

Cloudera

SEPTEMBER 1, 2020

It is designed to simplify deployment, configuration, and serviceability of Solr-based analytics applications. DDE also makes it much easier for application developers or data workers to self-service and get started with building insight applications or exploration services based on text or other unstructured data (i.e.

Cloud Storage

Cloud Storage Unstructured Data AWS Analytics Application

Top 15 Data Analysis Tools To Become a Data Wizard in 2025

ProjectPro

JUNE 6, 2025

This blog will help you determine which data analysis tool best fits your organization by exploring the top data analysis tools in the market with their key features, pros, and cons. When choosing the right analytics tool, a million questions run through one’s mind. Well, this blog will answer all these questions in one go!

Data Analysis Tools

Data Analysis Tools Data Analysis BI R (Programming)

An Overview of Real Time Data Warehousing on Cloudera

Cloudera

NOVEMBER 2, 2020

Optimized access to both full fidelity raw data and aggregations. Optimized access to both current data and historical data. Time Series and Event Analytics Specialized RTDW. Analytics storage engine for huge volumes of fast arriving data. Mutability, random access, fast scans, interactive queries.

Kafka

Kafka Data Warehouse Lambda Architecture Telecommunication

Top 5 Predictive Financial Modeling Project Ideas for Practice

ProjectPro

JUNE 6, 2025

In this blog, we will cover the most popular ideas for Predictive Financial Modeling Projects you need to explore. That's why having access to a repository of solved projects can save you time and effort. But, before we start with that, let us discuss a few real-life examples of predictive modeling.

Project

Project Medical Portfolio Algorithm

15 Most Popular Data Science Tools to Consider Using in 2025

ProjectPro

JUNE 6, 2025

The open-source KNIME Analytics Platform allows anyone to analyze data and develop data science workflows and reusable elements. The KNIME Server is a commercial platform that allows you to automate, manage, and deploy data science workflows as analytical applications and services.

Data Science

Data Science Hadoop Unstructured Data Machine Learning

15 Data Science Kubernetes Projects for Practice in 2025

ProjectPro

JUNE 6, 2025

Discover the perfect synergy between Kubernetes and Data Science as we unveil a treasure trove of innovative Data Science Kubernetes projects in this blog. By practicing Kubernetes projects, data scientists can learn how to effectively deploy and scale data processing and analytics applications. Say hello to Kubernetes!

Data Science

Data Science Project Pipeline-centric Healthcare

Top 21 Big Data Tools That Empower Data Wizards

ProjectPro

JUNE 6, 2025

This blog is your go-to guide for the top 21 big data tools, their key features, and some interesting project ideas that leverage these big data tools and technologies to gain hands-on experience on enterprise. Many developers have access to it due to its integration with Python IDEs like PyCharm. Starting a career in Big Data ?

Big Data Tools

Big Data Tools Big Data Hadoop BI

Turning Streams Into Data Products

Cloudera

JUNE 16, 2022

This blog aims to answer two questions as illustrated in the diagram below: How have stream processing requirements and use cases evolved as more organizations shift to “streaming first” architectures and attempt to build streaming analytics pipelines? Conclusion.

Kafka

Kafka Manufacturing Data Lake SQL

Top 40+ Cloud Computing Projects to Boost Your Cloud Skills

ProjectPro

JUNE 6, 2025

This blog invites you to explore the best cloud computing projects that will inspire you to explore the power of cloud computing and take your big data skills to the next level. Create a Kinesis Data Analytics Application and utilize Glue and Athena to define the Partition Key. But why go to lengths and work on such projects?

Cloud Computing

Cloud Computing Cloud Project Google Cloud

How To Choose Right AWS Databases for Your Needs

ProjectPro

JUNE 6, 2025

Explore the world of data analytics with the top AWS databases! Check out this blog to discover your ideal database and uncover the power of scalable and efficient solutions for all your data analytical requirements. Developers can access data without complex configurations, ensuring increased productivity.

AWS

AWS Database Amazon Web Services MySQL

7-Step Guide to Become a Machine Learning Engineer in 2025

ProjectPro

JUNE 6, 2025

A machine learning engineer performs the following tasks- Implement statistical analysis and machine learning into highly available and high performance production level systems to provide ease of access to users. Automate feature engineering , model training, and evaluation process. Enrich machine learning frameworks and libraries.

Machine Learning

Machine Learning Engineering Programming Language Portfolio

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

SDX , which is an integral part of CDP , delivers uniform data security and governance, coupled with data visualization capabilities enabling quick onboarding of data and data platform consumers and access to insights for all of CDP across hybrid clouds at no extra cost. benchmarking study conducted by independent 3rd party ). Conclusion .

Government

Government Hadoop Data Security Data Warehouse

100+ Big Data Interview Questions and Answers 2025

ProjectPro

JUNE 6, 2025

In this blog, we'll dive into some of the most commonly asked big data interview questions and provide concise and informative answers to help you ace your next big data job interview. The Hadoop MapReduce architecture has a Distributed Cache feature that allows applications to cache files. Data Size HDFS stores and processes big data.

Big Data

Big Data Hadoop Relational Database NoSQL

Rockset Is Up to 9.4x Faster than Apache Druid on the Star Schema Benchmark

Rockset

FEBRUARY 18, 2021

times faster than Druid in the latest performance blog post. Real-time analytics is all about deriving insights and taking actions as soon as data is produced. When broken down into its core requirements, real-time analytics means two things: access to fresh data and fast responses to queries. Learn how Rockset is 1.67

Analytics Application

Analytics Application SQL Database Structured Data

Ozone Write Pipeline V2 with Ratis Streaming

Cloudera

NOVEMBER 8, 2022

It enables cloud-native applications to store and process mass amounts of data in a hybrid multi-cloud environment and on premises. These could be traditional analytics applications like Spark, Impala, or Hive, or custom applications that access a cloud object store natively. Conclusion.

Metadata

Metadata Algorithm Hadoop Cloud

JetBlue Scales Real-Time AI on Rockset

Rockset

OCTOBER 26, 2023

That’s why JetBlue innovates with real-time analytics and AI, using over 15 machine learning applications in production today for dynamic pricing, customer personalization, alerting applications, chatbots and more. Rockset provides the speed and scale required of ML applications accessed daily by over 2,000 employees at JetBlue.

Machine Learning

Machine Learning Data Science Data Warehouse Architecture

Altus SDX: Shared services for cloud-based analytics

Cloudera

MARCH 6, 2018

This leads to extra cost, effort, and risk to stitch together a sub-optimal platform for multi-disciplinary, cloud-based analytics applications. Because metadata is always associated with your data, you can open up self-service access to more diverse users and apps without those apps becoming data silos in cloud.

Cloud

Cloud Metadata Big Data Analytics Application

Rockset Ushers in the New Era of Search and AI with a 30% Lower Price

Rockset

JANUARY 30, 2024

In 2023, Rockset announced a new cloud architecture for search and analytics that separates compute-storage and compute-compute. With this architecture, users can separate ingestion compute from query compute, all while accessing the same real-time data. This is a game changer in disaggregated, real-time architectures.

Data Ingestion

Data Ingestion Utilities Architecture SQL

SQL and Complex Queries Are Needed for Real-Time Analytics

Rockset

MAY 17, 2022

We'll be publishing more posts in the series in the near future, so subscribe to our blog so you don't miss them! More application code not only takes more time to create, but it almost always results in slower queries.

SQL

SQL NoSQL Hadoop MongoDB

Why Mutability Is Essential for Real-Time Data Analytics

Rockset

MARCH 10, 2022

We'll be publishing more posts in the series in the near future, so subscribe to our blog so you don't miss them! Analytical queries could be accelerated by caching heavily-accessed read-only data in RAM or SSDs. Get faster analytics on fresher data, at lower costs, by exploiting indexing over brute-force scanning.

Data Analytics

Data Analytics Data Warehouse MySQL Kafka

A Cost-Effective Data Warehouse Solution in CDP Public Cloud – Part1

Cloudera

FEBRUARY 9, 2021

A typical approach that we have seen in customers’ environments is that ETL applications pull data with a frequency of minutes and land it into HDFS storage as an extra Hive table partition file. In this way, the analytic applications are able to turn the latest data into instant business insights. Design Detail.

Data Warehouse

Data Warehouse Cloud Kafka Cloud Storage

How Snowflake Native Apps Help DTCC Bring Hypothetical Market Scenarios to Customers

Snowflake

MAY 4, 2023

In the end, we want all of DTCC’s data securely accessible to our internal and external stakeholders. Forward-Looking Statements This blog contains express and implied forward-looking statements, including statements regarding Snowflake and DTCC’s products, services, and technology offerings that are under development.

Portfolio

Portfolio Cloud Analytics Application Data Security

Intel and Cloudera collaborate to bring improved performance to customers with Optane DC Persistent Memory

Cloudera

APRIL 2, 2019

Apache HBase® is one of many analytics applications that benefit from the capabilities of Intel Optane DC persistent memory. HBase is a distributed, scalable NoSQL database that enterprises use to power applications that need random, real time read/write access to semi-structured data.

NoSQL

NoSQL Google Cloud Hadoop Machine Learning

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds

Webinars

Trending Sources

AWS Lambda Cold Start: A Beginner’s Guide

Webinars

Data Engineer vs. Data Architect-Who Builds the Data Castle?

Apache Ozone – A Multi-Protocol Aware Storage System

DynamoDB vs. MongoDB- Battle of The Best NoSQL Databases

AWS vs GCP - Which One to Choose in 2025?

Using SQL to democratize streaming data

10+ Top Data Pipeline Tools to Streamline Your Data Journey

Azure Blob Storage: Hidden Gem of Cloud Storage Solutions

Apache Spark on Azure: When Big Data Meets Cloud

15 AWS DevOps Project Ideas to Step Up Your DevOps Game

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

A Flexible and Efficient Storage System for Diverse Workloads

How to Learn AWS for Data Engineering?

Data News — Week 23.01

How to Learn Big Data Step by Step from Scratch in 2025?

SQL for Data Engineering: Success Blueprint for Data Engineers

Demystifying Modern Data Platforms

Unify your data: AI and Analytics in an Open Lakehouse

Top 6 Big Data and Business Analytics Companies to Work For in 2025

Discover and Explore Data Faster with the CDP DDE Template

Top 15 Data Analysis Tools To Become a Data Wizard in 2025

An Overview of Real Time Data Warehousing on Cloudera

Top 5 Predictive Financial Modeling Project Ideas for Practice

15 Most Popular Data Science Tools to Consider Using in 2025

15 Data Science Kubernetes Projects for Practice in 2025

Top 21 Big Data Tools That Empower Data Wizards

Turning Streams Into Data Products

Top 40+ Cloud Computing Projects to Boost Your Cloud Skills

How To Choose Right AWS Databases for Your Needs

7-Step Guide to Become a Machine Learning Engineer in 2025

Addressing the Three Scalability Challenges in Modern Data Platforms

100+ Big Data Interview Questions and Answers 2025

Rockset Is Up to 9.4x Faster than Apache Druid on the Star Schema Benchmark

Ozone Write Pipeline V2 with Ratis Streaming

JetBlue Scales Real-Time AI on Rockset

Altus SDX: Shared services for cloud-based analytics

Rockset Ushers in the New Era of Search and AI with a 30% Lower Price

SQL and Complex Queries Are Needed for Real-Time Analytics

Why Mutability Is Essential for Real-Time Data Analytics

A Cost-Effective Data Warehouse Solution in CDP Public Cloud – Part1

How Snowflake Native Apps Help DTCC Bring Hypothetical Market Scenarios to Customers

Intel and Cloudera collaborate to bring improved performance to customers with Optane DC Persistent Memory

Stay Connected