Sat.Sep 26, 2020 - Fri.Oct 02, 2020

article thumbnail

How Real-Time Stream Processing Works with ksqlDB, Animated

Confluent

ksqlDB, the event streaming database, is becoming one of the most popular ways to work with Apache Kafka®. Every day, we answer many questions about the project, but here’s a […].

Process 145
article thumbnail

Upgrade Journey: The Path from CDH to CDP Private Cloud

Cloudera

Cloudera delivers an enterprise data cloud that enables companies to build end-to-end data pipelines for hybrid cloud, spanning edge devices to public or private cloud, with integrated security and governance underpinning it to protect customers data. Cloudera has found that customers have spent many years investing in their big data assets and want to continue to build on that investment by moving towards a more modern architecture that helps leverage the multiple form factors.

Cloud 132
article thumbnail

Data Engineering Project: Stream Edition

Start Data Engineering

Table of Contents Table of Contents Introduction Project description and requirements Infrastructure overview Apache Flink Apache Kafka Design Detect fraudulent accounts Log account actions Prerequisites Code Defining dependencies Inheritance Server logs generator Defining data flow in Apache Flink Create a streaming environment Creating a consumer to read events from Apache Kafka Detecting fraud and generating alert events Writing server logs to a PostgreSQL DB Fraud detection logic Open proces

article thumbnail

Speed Up And Simplify Your Streaming Data Workloads With Red Panda

Data Engineering Podcast

Summary Kafka has become a de facto standard interface for building decoupled systems and working with streaming data. Despite its widespread popularity, there are numerous accounts of the difficulty that operators face in keeping it reliable and performant, or trying to scale an installation. To make the benefits of the Kafka ecosystem more accessible and reduce the operational burden, Alexander Gallego and his team at Vectorized created the Red Panda engine.

Kafka 100
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Introducing Confluent Platform 6.0

Confluent

Each month, we’ve announced a set of Confluent features organized around what we think are the key foundational traits of cloud-native data systems as part of Project Metamorphosis. Data systems […].

Project 143
article thumbnail

UK Government: From cloud first to cloud appropriate?

Cloudera

Since 2013 the UK Government’s flagship ‘Cloud First’ policy has been at the forefront of enabling departments to shed their legacy IT architecture in order to meaningfully embrace digital transformation. The policy outlines that the cloud (and specifically, public cloud) be the default position for any new services; unless it can be demonstrated that other alternatives offer better value for money. .

More Trending

article thumbnail

Three Insights Into Delivering Value at Scale From Smart Factory Investments

Teradata

Industry 4.0 has promised productivity gains, but has not yet delivered. A large part of this has to do with the challenge of deploying analytics at scale. Find out more.

52
article thumbnail

ksqlDB Meets Java: An IoT-Inspired Demo of the Java Client for ksqlDB

Confluent

Stream processing applications, including streaming ETL pipelines, materialized caches, and event-driven microservices, are made easy with ksqlDB. Until recently, your options for interacting with ksqlDB were limited to its command-line […].

Java 122
article thumbnail

Cloudera Data Warehouse outperforms Azure HDInsight in TPC-DS benchmark

Cloudera

Performance is one of the key, if not the most important deciding criterion, in choosing a Cloud Data Warehouse service. In today’s fast changing world, enterprises have to make data driven decisions quickly and for that they rely heavily on their data warehouse service. . In this blog post, we compare Cloudera Data Warehouse (CDW) on Cloudera Data Platform (CDP) using Apache Hive-LLAP to Microsoft HDInsight (also powered by Apache Hive-LLAP) on Azure using the TPC-DS 2.9 benchmark.

article thumbnail

Scala 3: Traits Quickly Explained

Rock the JVM

This article delves into Scala 3's advanced trait functionalities, building on our previous explorations of the language's new features

Scala 52
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Demystifying the Business Continuity Space: A Two Part Series

Teradata

In part 1 of this 2 part topic, we will define some of the commonly used (& misused) terms in the business continuity space & help you navigate what they mean to your organization.

52
article thumbnail

ksqlDB 0.12.0 Introduces Real-Time Query Upgrades and Automatic Query Restarts

Confluent

The ksqlDB team is pleased to announce ksqlDB 0.12.0. This release continues to improve upon the usability of ksqlDB and aims to reduce administration time. Highlights include query upgrades, which […].

Process 98
article thumbnail

How to enable Cloudera Data Visualization in CDW

Cloudera

In our previous blog post we introduced Cloudera Data Visualization in Cloudera Data Warehouse (CDW) available in tech preview, in CDP Public Cloud. This blog will help you get started with Cloudera Data Visualization, so you can start building interesting and powerful applications on all types of data. Before you start. Make sure that. You have a CDP account set up (for instance, you may use our trial experience ).

article thumbnail

Enums in Scala 3: Quickly Explained

Rock the JVM

Scala 3 Introduces Enums: A Major Update with Significant Implications

Scala 52
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Demystifying the Business Continuity Space: A Three Part Series

Teradata

In part 1 of this 3 part series, we will define some of the commonly used (& misused) terms in the business continuity space & help you navigate what they mean to your organization.

52
article thumbnail

Build a Slack Dashboard (Part 2): Loading Into Postgres & Creating Basic Charts

Preset

Build a beautiful Slack dashboard using open source tools Meltano and Superset. Part 2 of 3.

article thumbnail

Coffee with Cloudera: Meet Ali Bajwa, Partner Solutions – Engineer by Day, Rockstar by Night!

Cloudera

Meet Ali Bajwa , Director of Partner Solutions Engineering at Cloudera. For the past 6 years, Ali has been front and center in many partner field deployments, training, and discussions; he is a rockstar in the Cloudera Partner Ecosystem! We hope this interview helps you get to know the afterhours Ali. If you get a chance, follow Ali on twitter! @abajwa_hdp.

article thumbnail

How to Solve the “You’re Using THAT Table?!” Problem

Monte Carlo

As companies increasingly rely on data to power decision making and drive innovation, it’s important that this data is timely, accurate, and reliable. When you consider that only a small fraction of the over 7.5 septillion (7,700,000,000, 000,000,000,000) GB of data generated worldwide every day is usable, keeping tabs on what data assets are important has only gotten harder.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Break Out of the Data Silo!

Teradata

Marketing might be the best place to start operationalizing a bank-wide data strategy. But, to be effective, the CMO needs to dissolve data silos & create a model for data orchestration.

Banking 52
article thumbnail

Building a Real-Time Customer 360 on Kafka, MongoDB and Rockset

Rockset

Users interact with services in real-time. They login to websites, like and share posts, purchase goods and even converse, all in real-time. So why is it that whenever you have a problem when using a service and you reach a customer support representative, that they never seem to know who you are or what you’ve been doing recently? This is likely because they haven’t built a customer 360 profile and if they have, it certainly isn’t real-time.

MongoDB 40
article thumbnail

PowerBI distribution and sharing

FreshBI

Spotlight: The PowerBI Service Lately we have been getting a lot of questions surrounding licensing and release strategy in PowerBI. This guide should serve as an internal, quick reference manual. The following is a list of topics covered in this guide, each containing a summary of how it works and what the use case is. Licensing PowerBI Desktop / Free Who uses this?

BI 52