Analytics Application, Blog and Cloud - Data Engineering Digest

Azure Databricks: A Comprehensive Guide

Analytics Vidhya

FEBRUARY 28, 2023

Introduction Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics platform that is built on top of the Microsoft Azure cloud. In this blog post, we will take a closer look at Azure Databricks, its key features, […] The post Azure Databricks: A Comprehensive Guide appeared first on Analytics Vidhya.

Big Data

Big Data Machine Learning Cloud Data Process

Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds

Cloudera

OCTOBER 12, 2020

We are thrilled to announce that Cloudera has acquired Eventador , a provider of cloud-native services for enterprise-grade stream processing. We believe Eventador will accelerate innovation in our Cloudera DataFlow streaming platform and deliver more business value to our customers in their real-time analytics applications.

Cloud

Cloud Process Scala Kafka

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Cloudera

NOVEMBER 13, 2024

Combining Octopai capabilities with Cloudera’s AI powered hybrid data platform provides deeper data understanding, enhanced security, and robust data governance – essential for driving AI and analytics success. This will also accelerate deployment of new data products for AI, gen AI, and analytics applications.

Metadata

Metadata Management Data Governance Government

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

5 Streaming Cloud Integration Use Cases: Whiteboard Wednesdays

Striim

MARCH 21, 2025

Today were going to talk about five streaming cloud integration use cases. Streaming cloud integration moves data continuously in real time between heterogeneous databases, with in-flight data processing. Use Case #1 Online Migration/Cloud Adoption Lets start with the first one. This unlimited testing minimizes your risks.

Cloud

Cloud Database Architecture BI

5 Streaming Cloud Integration Use Cases: Whiteboard Wednesdays

Striim

MARCH 21, 2025

Today were going to talk about five streaming cloud integration use cases. Streaming cloud integration moves data continuously in real time between heterogeneous databases, with in-flight data processing. Use Case #1 Online Migration/Cloud Adoption Lets start with the first one. This unlimited testing minimizes your risks.

Cloud

Cloud Database Architecture BI

Handling Bursty Traffic in Real-Time Analytics Applications

Rockset

MAY 12, 2022

We'll be publishing more posts in the series in the near future, so subscribe to our blog so you don't miss them! Finally, the database must be cloud native, so all scaling is automatic and hidden from developers and users. For more details, read my blog post on ALT and why it beats the Lambda architecture for real-time analytics.

Analytics Application

Analytics Application Lambda Architecture Hadoop Database

Apache Ozone – A Multi-Protocol Aware Storage System

Cloudera

NOVEMBER 7, 2023

Navigating this intricate maze of data can be challenging, and that’s why Apache Ozone has become a popular, cloud-native storage solution that spans any data use case with the performance needed for today’s data architectures. Most traditional analytics applications like Hive, Spark, Impala, YARN etc.

Systems

Systems Hadoop Unstructured Data Media

A Cost-Effective Data Warehouse Solution in CDP Public Cloud – Part1

Cloudera

FEBRUARY 9, 2021

A typical approach that we have seen in customers’ environments is that ETL applications pull data with a frequency of minutes and land it into HDFS storage as an extra Hive table partition file. In this way, the analytic applications are able to turn the latest data into instant business insights. Cost-Effective.

Data Warehouse

Data Warehouse Cloud Kafka Cloud Storage

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

Cloudera

AUGUST 13, 2021

A global oil and gas company collects, transforms, and distributes over hundreds terabytes of desktop, server, and application log data to their SIEM per day. As the company evolves into a hybrid and multi-cloud strategy, they need to start collecting applications, servers, and network logs from the cloud.

Data Pipeline

Data Pipeline Data Lake ETL Tools Unstructured Data

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

Modern data platforms deliver an elastic, flexible, and cost-effective environment for analytic applications by leveraging a hybrid, multi-cloud architecture to support data fabric, data mesh, data lakehouse and, most recently, data observability. Ramsey International Modern Data Platform Architecture. What is a data mesh?

Data Lake

Data Lake Analytics Application Cloud Storage Architecture

Discover and Explore Data Faster with the CDP DDE Template

Cloudera

SEPTEMBER 1, 2020

DDE is a new template flavor within CDP Data Hub in Cloudera’s public cloud deployment option (CDP PC). It is designed to simplify deployment, configuration, and serviceability of Solr-based analytics applications. For the examples presented in this blog, we assume you have a CDP account already. What does DDE entail?

Cloud Storage

Cloud Storage Unstructured Data AWS Analytics Application

Data News — Week 23.01

Christophe Blefari

JANUARY 7, 2023

The blog crossed the 2000 members mark (❤️) and I won the best data science newsletter award. Introducing ADBC: Database Access for Apache Arrow — When I see "minimal-overhead alternative to JDBC/ODBC for analytical applications" I'm instantly in. I think this is even relevant to data world.

Data Science

Data Science Data BI Kafka

Altus SDX: Shared services for cloud-based analytics

Cloudera

MARCH 6, 2018

People are gravitating to the analytics services of the large public cloud providers because the “house-brand” offerings seem to be the easiest choice. This leads to extra cost, effort, and risk to stitch together a sub-optimal platform for multi-disciplinary, cloud-based analytics applications.

Cloud

Cloud Metadata Big Data Analytics Application

Addressing the Three Scalability Challenges in Modern Data Platforms

Cloudera

NOVEMBER 22, 2021

Explosion of data availability from a variety of sources, including on-premises data stores used by enterprise data warehousing / data lake platforms, data on cloud object stores typically produced by heterogenous, cloud-only processing technologies, or data produced by SaaS applications that have now evolved into distinct platform ecosystems (e.g.,

Hadoop

Hadoop Government Data Security Cloud

The Future of Cloud-based Analytics (Part 3)

Cloudera

NOVEMBER 13, 2017

As the market moves toward cloud-based big data and analytics, three qualities emerge as vital for success. Cloud IaaS facilitates resource self-service provisioning, eliminating the hassles of procurement and deployment on-premises. Make sure any cloud-based analytics service meets these criteria.

Cloud

Cloud Big Data Metadata Machine Learning

Materialized Views in Hive for Iceberg Table Format

Cloudera

FEBRUARY 8, 2024

Overview This blog post describes support for materialized views for the Iceberg table format. Apache Iceberg is a high-performance open table format for petabyte-scale analytic datasets. Starting from the CDW Public Cloud DWX-1.6.1 In a future blog, we will evaluate the incremental versus full rebuild performance.

Metadata

Metadata Data Warehouse BI AWS

A Flexible and Efficient Storage System for Diverse Workloads

Cloudera

SEPTEMBER 15, 2022

In this blog post, we will talk about a single Ozone cluster with the capabilities of both Hadoop Core File System (HCFS) and Object Store (like Amazon S3). Please refer to our earlier Cloudera blog for more details about Ozone’s performance benefits and atomicity guarantees. The same data can be read as an object, or a file.

Systems

Systems Hadoop Metadata Telecommunication

Turning Streams Into Data Products

Cloudera

JUNE 16, 2022

This blog aims to answer two questions as illustrated in the diagram below: How have stream processing requirements and use cases evolved as more organizations shift to “streaming first” architectures and attempt to build streaming analytics pipelines? Better yet, it works in any cloud environment. Not to worry.

Kafka

Kafka Manufacturing Data Lake SQL

How to Use Kafka for Event Streaming in a Microservices Architecture?

Workfall

JUNE 27, 2023

In this blog, we’ll explore how to harness the power of Kafka to streamline event streaming within a microservices architecture and unlock its full potential for building scalable and responsive systems. Conclusion In this blog, we demonstrated how we can introduce Kafka as a message broker into a microservices architecture.

Kafka

Kafka Architecture AWS Transportation

You Can’t Hit What You Can’t See

Cloudera

DECEMBER 1, 2022

For analytic applications to properly leverage a hybrid, multi-cloud ecosystem to support modern data architectures, data observability has become even more important. The post You Can’t Hit What You Can’t See appeared first on Cloudera Blog. Source: IDC .

Data Lake

Data Lake Data Pipeline Analytics Application Data Governance

Ozone Write Pipeline V2 with Ratis Streaming

Cloudera

NOVEMBER 8, 2022

It enables cloud-native applications to store and process mass amounts of data in a hybrid multi-cloud environment and on premises. These could be traditional analytics applications like Spark, Impala, or Hive, or custom applications that access a cloud object store natively. Conclusion.

Metadata

Metadata Algorithm Hadoop Cloud

An Overview of Real Time Data Warehousing on Cloudera

Cloudera

NOVEMBER 2, 2020

Cloudera offers a platform, Cloudera Data Platform (CDP), for building end-to-end data applications in both the public and private cloud. In addition, we have a webinar and blog explaining how you can use Apache Kudu and Apache Impala to create a time series application within CDP. Building an RTDW with Cloudera.

Data Warehouse

Data Warehouse Kafka Lambda Architecture Telecommunication

Rockset Ushers in the New Era of Search and AI with a 30% Lower Price

Rockset

JANUARY 30, 2024

In 2023, Rockset announced a new cloud architecture for search and analytics that separates compute-storage and compute-compute. It also unlocks ways to make it easier and cheaper to build applications on Rockset. Rockset’s cloud-native architecture contrasts with the tightly coupled architecture of Elasticsearch.

Data Ingestion

Data Ingestion Utilities Architecture SQL

Object-centric Process Mining on Data Mesh Architectures

Data Science Blog: Data Engineering

NOVEMBER 15, 2023

The database for Process Mining is also establishing itself as an important hub for Data Science and AI applications, as process traces are very granular and informative about what is really going on in the business processes. DATANOMIQ Data Mesh Cloud Architecture – This image is animated! Click to enlarge!

Architecture

Architecture Database-centric Process BI

How Snowflake Native Apps Help DTCC Bring Hypothetical Market Scenarios to Customers

Snowflake

MAY 4, 2023

Expanding the DTCC ecosystem with Snowflake Native Apps To explain how DTCC is leveraging Snowflake Native Apps, I first need to paint the broader picture of the DTCC Data Cloud on Snowflake. Snowflake Native Apps allow my team to manage the application layer in much the same way that we manage our Data Cloud layer.

Portfolio

Portfolio Cloud Analytics Application Data Security

JetBlue Scales Real-Time AI on Rockset

Rockset

OCTOBER 26, 2023

That’s why JetBlue innovates with real-time analytics and AI, using over 15 machine learning applications in production today for dynamic pricing, customer personalization, alerting applications, chatbots and more. Rockset took that time down to days due to the ease of converting a SQL query into a REST API.”

Machine Learning

Machine Learning Data Science Architecture Database

Building a Self-Managed Shared Data Experience

Cloudera

DECEMBER 7, 2017

Cloud promises many advantages as an environment for machine learning and analytics. Cloud makes it fast and easy to spin up resources for new applications. Cloud offers elasticity of those resources to efficiently support transient analytics workloads and data pipelines.

Building

Building Management Government BI

Benchmarking Elasticsearch and Rockset: Rockset achieves up to 4X faster streaming data ingestion

Rockset

MAY 3, 2023

In scenarios involving analytics on massive data streams, we’re often asked the maximum throughput and lowest data latency Rockset can achieve and how it stacks up to other databases. In this blog, we’ll walk through the benchmark framework, configuration and results. Rockset achieved up to 4x higher throughput and 2.5x

Data Ingestion

Data Ingestion Kafka Database Architecture

The Good and the Bad of Apache Kafka Streaming Platform

AltexSoft

OCTOBER 21, 2022

A single cluster can span across multiple data centers and cloud facilities. cloud data warehouses — for example, Snowflake , Google BigQuery, and Amazon Redshift. Depending on the type of deployment (cloud or on-premise), cluster size, and the number of integrations, the deployment may take days to weeks to even months.

Kafka

Kafka Hadoop Big Data ETL Tools

SQL and Complex Queries Are Needed for Real-Time Analytics

Rockset

MAY 17, 2022

We'll be publishing more posts in the series in the near future, so subscribe to our blog so you don't miss them! The truth is that modern cloud native SQL databases support all of the key features necessary for real-time analytics , including: Mutable data for incredibly fast data ingestion and smooth handling of late-arriving events.

SQL

SQL NoSQL Hadoop MongoDB

Intel and Cloudera collaborate to bring improved performance to customers with Optane DC Persistent Memory

Cloudera

APRIL 2, 2019

Cloudera customers who want more flexibility in how and where they run their applications can benefit from Intel Optane DC persistent memory as well. A key characteristic of an enterprise data cloud is its ability to run multiple workloads on shared data without encountering “noisy neighbor” problems.

NoSQL

NoSQL Google Cloud Hadoop Machine Learning

Delivering a Shared Multidisciplinary Analytics Experience Anywhere With SDX and Altus

Cloudera

SEPTEMBER 10, 2018

Uniquely, Cloudera’s machine learning and analytics platform have a fundamental characteristic called the Shared Data Experience (SDX) that provides just that. When transient cloud infrastructures are used to complement existing on-premises investments, establishing and capturing this data context is essential for success.

Data Warehouse

Data Warehouse Metadata Cloud Retail

Making Sense of Real-Time Analytics on Streaming Data, Part 1: The Landscape

Rockset

FEBRUARY 24, 2023

Introduction Let’s get this out of the way at the beginning: understanding effective streaming data architectures is hard, and understanding how to make use of streaming data for analytics is really hard. A few noteworthy points: Self-managed Kafka can be deployed on-premises or in the cloud. Kafka or Kinesis ?

Kafka

Kafka AWS Amazon Web Services Programming Language

Why Mutability Is Essential for Real-Time Data Analytics

Rockset

MARCH 10, 2022

We'll be publishing more posts in the series in the near future, so subscribe to our blog so you don't miss them! If you want to see all of the key requirements of real-time analytics databases, watch my recent talk at the Hive on Designing the Next Generation of Data Systems for Real-Time Analytics , available below.

Data Analytics

Data Analytics Data Warehouse MySQL Medical

Why Real-Time Analytics Requires Both the Flexibility of NoSQL and Strict Schemas of SQL Systems

Rockset

JULY 6, 2022

We'll be publishing more posts in the series in the near future, so subscribe to our blog so you don't miss them! And when it comes to the cloud and developers, that means wasted money. Take the Hive analytics database that is part of the Hadoop stack. Fixing and rerunning the queries is a time-wasting hassle.

NoSQL

NoSQL SQL Systems PostgreSQL

AWS vs GCP - Which One to Choose in 2023?

ProjectPro

SEPTEMBER 6, 2021

Are you confused about choosing the best cloud platform for your next data engineering project ? AWS vs. GCP blog compares the two major cloud platforms to help you choose the best one. So, are you ready to explore the differences between two cloud giants, AWS vs. google cloud? Let’s get started!

AWS

AWS Amazon Web Services Google Cloud Cloud Storage

Elasticsearch or Rockset for Real-Time Analytics: Real-Time Ingestion and Indexing

Rockset

MARCH 15, 2021

The Demands of Real-Time Analytics Real-time analytics applications have specific demands (i.e., and your solution will only be able to provide valuable real-time analytics if you are able to meet them. Indexing Efficiency Indexing data is another crucial requirement for real-time analytics applications.

MongoDB

MongoDB Data Ingestion Analytics Application Kafka

Data Mesh Architecture: Revolutionizing Event Streaming with Striim

Striim

NOVEMBER 8, 2023

Data Mesh is revolutionizing event streaming architecture by enabling organizations to quickly and easily integrate real-time data, streaming analytics, and more. Striim is a cloud-native Data Mesh platform that offers features such as automated data mapping, real-time data integration, streaming analytics, and more.

Architecture

Architecture Generalist Government Datasets

What is AWS Kinesis (Amazon Kinesis Data Streams)?

Edureka

AUGUST 23, 2024

The AWS training will prepare you to become a master of the cloud, storing, processing, and developing applications for the cloud data. This blog will explore the AWS Amazon Kinesis and how this managed platform can revamp data analytics. As of 2024, about 73% of enterprises have deployed a hybrid cloud.

AWS

AWS Kafka Amazon Web Services Medical

How to Update Documents in Elasticsearch

Rockset

JANUARY 23, 2024

When building applications on change data capture (CDC) data using Elasticsearch, you’ll want to architect the system to handle frequent updates or modifications to the existing documents in an index. In this blog, we’ll walk through the different options available for updates including full updates, partial updates and scripted updates.

Metadata

Metadata Coding Analytics Application Python

Understanding Zero-Code Development Life Cycle in Matillion

phData: Data Engineering

MAY 11, 2023

Practices centered on software engineering principles can create a barrier to entry for teams with skilled data wranglers looking to take their infrastructure to the next level with cloud-native tools like Matillion for the Snowflake Data Cloud. When Is ZDLC Better Than SDLC?

Coding

Coding Software Engineer Software Engineering Project

Top 8 Data Engineering Books [Beginners to Advanced]

Knowledge Hut

JUNE 30, 2023

With helpful illustrations and thorough explanations, it assists readers in comprehending how to use Spark for big data processing and analytics applications. Continuously Learn and Stay Curious To broaden your knowledge and skills, read books, follow blogs, join online communities, and engage in data engineering initiatives.

Data Engineer

Data Engineer Data Engineering Engineering Data Warehouse

What Data Engineers Think About - Variety, Volume, Velocity and Real-Time Analytics

Rockset

DECEMBER 9, 2019

It continuously ingests raw data from multiple sources--data lakes, data streams, databases--into its storage layer and allows fast SQL access from both visualisation tools and analytic applications. And if you are planning on copying huge amounts of data to Rockset, this also isn’t a problem.

Data Engineer

Data Engineer Data Engineering Engineering Raw Data

The Role of Database Applications in Modern Business Environments

Knowledge Hut

JULY 26, 2023

Database applications also help in data-driven decision-making by providing data analysis and reporting tools. In this blog, we will deep dive into database system applications in DBMS, and their components and look at a list of database applications. What are Database Applications? Spatial Database (e.g.-

Database

Database NoSQL MongoDB Telecommunication

Azure Databricks: A Comprehensive Guide

Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds

Webinars

Trending Sources

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Webinars

5 Streaming Cloud Integration Use Cases: Whiteboard Wednesdays

5 Streaming Cloud Integration Use Cases: Whiteboard Wednesdays

Handling Bursty Traffic in Real-Time Analytics Applications

Apache Ozone – A Multi-Protocol Aware Storage System

A Cost-Effective Data Warehouse Solution in CDP Public Cloud – Part1

Why Modernizing the First Mile of the Data Pipeline Can Accelerate all Analytics

Demystifying Modern Data Platforms

Discover and Explore Data Faster with the CDP DDE Template

Data News — Week 23.01

Altus SDX: Shared services for cloud-based analytics

Addressing the Three Scalability Challenges in Modern Data Platforms

The Future of Cloud-based Analytics (Part 3)

Materialized Views in Hive for Iceberg Table Format

A Flexible and Efficient Storage System for Diverse Workloads

Turning Streams Into Data Products

How to Use Kafka for Event Streaming in a Microservices Architecture?

You Can’t Hit What You Can’t See

Ozone Write Pipeline V2 with Ratis Streaming

An Overview of Real Time Data Warehousing on Cloudera

Rockset Ushers in the New Era of Search and AI with a 30% Lower Price

Object-centric Process Mining on Data Mesh Architectures

How Snowflake Native Apps Help DTCC Bring Hypothetical Market Scenarios to Customers

JetBlue Scales Real-Time AI on Rockset

Building a Self-Managed Shared Data Experience

Benchmarking Elasticsearch and Rockset: Rockset achieves up to 4X faster streaming data ingestion

The Good and the Bad of Apache Kafka Streaming Platform

SQL and Complex Queries Are Needed for Real-Time Analytics

Intel and Cloudera collaborate to bring improved performance to customers with Optane DC Persistent Memory

Delivering a Shared Multidisciplinary Analytics Experience Anywhere With SDX and Altus

Making Sense of Real-Time Analytics on Streaming Data, Part 1: The Landscape

Why Mutability Is Essential for Real-Time Data Analytics

Why Real-Time Analytics Requires Both the Flexibility of NoSQL and Strict Schemas of SQL Systems

AWS vs GCP - Which One to Choose in 2023?

Elasticsearch or Rockset for Real-Time Analytics: Real-Time Ingestion and Indexing

Data Mesh Architecture: Revolutionizing Event Streaming with Striim

What is AWS Kinesis (Amazon Kinesis Data Streams)?

How to Update Documents in Elasticsearch

Understanding Zero-Code Development Life Cycle in Matillion

Top 8 Data Engineering Books [Beginners to Advanced]

What Data Engineers Think About - Variety, Volume, Velocity and Real-Time Analytics

The Role of Database Applications in Modern Business Environments

Stay Connected