Accessibility, Events and Kafka - Data Engineering Digest

They Handle 500B Events Daily. Here’s Their Data Engineering Architecture.

Monte Carlo

NOVEMBER 12, 2024

While not every company needs to process millions of events per second, understanding these advanced architectures helps us make better decisions about our own data infrastructure, whether we’re handling user recommendations, ride-sharing logistics, or simply figuring out which meeting rooms are actually being used.

Architecture

Architecture Data Engineering Data Engineer Engineering

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Can you describe your experiences with Kafka? What are the operational challenges that you have had to overcome while working with Kafka?

Kafka

Kafka Data Lake High Quality Data SQL

Beyond Kafka: Conversation with Jark Wu on Fluss - Streaming Storage for Real-Time Analytics

Data Engineering Weekly

FEBRUARY 18, 2025

It addresses many of Kafka's challenges in analytical infrastructure. The combination of Kafka and Flink is not a perfect fit for real-time analytics; the integration of Kafka and Lakehouse is very shallow. How do you compare Fluss with Apache Kafka? Fluss and Kafka differ fundamentally in design principles.

Kafka

Kafka Lambda Architecture SQL Architecture

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Realtime Data Applications Made Easier With Meroxa

Data Engineering Podcast

APRIL 23, 2023

Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. What are the shifts that have made them more accessible to a wider variety of teams? Email hosts@dataengineeringpodcast.com ) with your story.

Data Lake

Data Lake Kafka Machine Learning Data Warehouse

Introducing Derivative Event Sourcing

Confluent

SEPTEMBER 6, 2019

First, what is event sourcing? We can answer all those questions because the individual events that make up our balance are stored. In fact, it’s the summation of these events that result in our current account balance. This, in a nutshell, is event sourcing. Event sourcing: primary vs. derivative.

Kafka

Kafka MySQL Banking Java

Internet of Things (IoT) and Event Streaming at Scale with Apache Kafka and MQTT

Confluent

OCTOBER 10, 2019

Apache Kafka ® and its surrounding ecosystem, which includes Kafka Connect, Kafka Streams, and KSQL, have become the technology of choice for integrating and processing these kinds of datasets. Use cases for IoT technologies and an event streaming platform. Example: Severstal.

Kafka

Kafka Google Cloud Architecture Machine Learning

Netflix’s Distributed Counter Abstraction

Netflix Tech

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Datasets

Datasets Computer Science Systems Kafka

Putting Events in Their Place with Dynamic Routing

Confluent

APRIL 4, 2019

Event-driven architecture means just that: It’s all about the events. In a microservices architecture, events drive microservice actions. No event, no shoes, no service. In the most basic scenario, microservices that need to take action on a common stream of events all listen to that stream.

Kafka

Kafka Data Cleanse Retail Finance

Journey to Event Driven – Part 3: The Affinity Between Events, Streams and Serverless

Confluent

FEBRUARY 27, 2019

FaaS functions only solve the compute part, but where is data stored and managed, and how is it accessed? What is more, as the world adopts the event-driven streaming architecture, how does it fit with serverless? The key to event-first systems design is understanding that a series of events captures behavior. Next Steps.

Kafka

Kafka AWS Architecture Cloud

Testing Event-Driven Systems

Confluent

APRIL 24, 2019

So you’ve convinced your friends and stakeholders about the benefits of event-driven systems. You have successfully piloted a few services backed by Apache Kafka ® , and it is now supporting business-critical dataflow. Send some input events. The only requirement is that all the events we care about are seen by Kafka.

Systems

Systems Kafka Transportation Coding

Using Graph Processing for Kafka Stream Visualizations

Confluent

AUGUST 29, 2019

We know that Apache Kafka ® is great when you’re dealing with streams, allowing you to conveniently look at streams as tables. In an identity/access management application, it’s the relationships between roles and their privileges that matters most. The approach we’ll use works with any Kafka run though. 8, and so on.

Kafka

Kafka Process Algorithm Cloud

Spring for Apache Kafka Deep Dive – Part 4: Continuous Delivery of Event Streaming Pipelines

Confluent

JUNE 11, 2019

For event streaming application developers, it is important to continuously update the streaming pipeline based on the need for changes in the individual applications in the pipeline. It is also important to understand some of the common streaming topologies that streaming developers use to build an event streaming pipeline.

Kafka

Kafka Cloud Java MongoDB

Microservices, Apache Kafka, and Domain-Driven Design

Confluent

JUNE 26, 2019

I see this pattern coming up more and more in the field in conjunction with Apache Kafka ®. In these projects, microservice architectures use Kafka as an event streaming platform. Apache Kafka – An event streaming platform for microservices. Store streams of events in a fault-tolerant way. Microservices.

Kafka

Kafka Designing Architecture Coding

Apache Kafka Vs Apache Spark: Know the Differences

Knowledge Hut

MAY 3, 2024

Spark Streaming Vs Kafka Stream Now that we have understood high level what these tools mean, it’s obvious to have curiosity around differences between both the tools. Spark Streaming Kafka Streams 1 Data received from live input data streams is Divided into Micro-batched for processing. 6 Spark streaming is a standalone framework.

Kafka

Kafka Scala Java Amazon Web Services

What’s New in Apache Kafka 2.3

Confluent

JUNE 25, 2019

It’s official: Apache Kafka ® 2.3 Core Kafka. In order to keep your data safe, Kafka creates several replicas of it on different brokers. Kafka will not allow writes to proceed unless the partition has a minimum number of in-sync replicas. Kafka Connect. has been released! This is called the “minimum ISR.”.

Kafka

Kafka Accessible Accessibility IT

Reliable, Fast Access to On-Chain Data Insights

Confluent

JUNE 7, 2019

How we use Apache Kafka and the Confluent Platform. Apache Kafka ® is the central data hub of our company. At TokenAnalyst, we’re using Kafka for ingestion of blockchain data—which is directly pushed from our cluster of Bitcoin and Ethereum nodes—to different streams of transformation and loading processes.

Accessible

Accessible Accessibility Kafka Scala

The Rise of Managed Services for Apache Kafka

Confluent

SEPTEMBER 20, 2019

As a distributed system for collecting, storing, and processing data at scale, Apache Kafka ® comes with its own deployment complexities. To simplify all of this, different providers have emerged to offer Apache Kafka as a managed service. Before Confluent Cloud was announced , a managed service for Apache Kafka did not exist.

Kafka

Kafka Management Cloud AWS

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

Ingest data more efficiently and manage costs For data managed by Snowflake, we are introducing features that help you access data easily and cost-effectively. This reduces the overall complexity of getting streaming data ready to use: Simply create external access integration with your existing Kafka solution.

Data Architecture

Data Architecture Architecture Data Lake Kafka

Getting started with the MongoDB Connector for Apache Kafka and MongoDB

Confluent

JULY 17, 2019

Together, MongoDB and Apache Kafka ® make up the heart of many modern data architectures today. Integrating Kafka with external systems like MongoDB is best done though the use of Kafka Connect. The official MongoDB Connector for Apache Kafka is developed and supported by MongoDB engineers. Getting started.

MongoDB

MongoDB Kafka Database Medical

Spring for Apache Kafka Deep Dive – Part 3: Apache Kafka and Spring Cloud Data Flow

Confluent

MAY 30, 2019

Following part 1 and part 2 of the Spring for Apache Kafka Deep Dive blog series, here in part 3 we will discuss another project from the Spring team: Spring Cloud Data Flow , which focuses on enabling developers to easily develop, deploy, and orchestrate event streaming pipelines based on Apache Kafka ®. and OpenID Connect.

Kafka

Kafka Cloud Data Pipeline PostgreSQL

From Apache Kafka to Amazon S3: Exactly Once

Confluent

APRIL 11, 2019

This explains why users have been looking for a reliable way to stream their data from Apache Kafka ® to S3 since Kafka Connect became available. In March 2017, we released the Kafka Connect S3 connector as part of the Confluent Platform. And no one likes missing events. So, it happened. How about we take it for a spin?

Kafka

Kafka AWS Metadata Architecture

Real-Time Analytics and Monitoring Dashboards with Apache Kafka and Rockset

Confluent

SEPTEMBER 26, 2019

In the early days, many companies simply used Apache Kafka ® for data ingestion into Hadoop or another data lake. However, Apache Kafka is more than just messaging. Some Kafka and Rockset users have also built real-time e-commerce applications , for example, using Rockset’s Java, Node.js

Kafka

Kafka SQL BI Hadoop

Introducing Impressions at Netflix

Netflix Tech

FEBRUARY 14, 2025

Collecting Raw Impression Events As Netflix members explore our platform, their interactions with the user interface spark a vast array of raw events. These events are promptly relayed from the client side to our servers, entering a centralized event processing queue.

Kafka

Kafka Datasets Metadata Utilities

Stream Processing with Python, Kafka & Faust

Towards Data Science

FEBRUARY 18, 2024

Although the Faust library aims to bring Kafka Streaming ideas into the Python ecosystem, it may pose challenges in terms of ease of use. An event is a small, self-contained object that contains the details of something happened at some point in time e.g. user interaction. An event is generated by a producer (e.g.

Kafka

Kafka Python Process Google Cloud

Apache Kafka Deployments and Systems Reliability – Part 1

Cloudera

SEPTEMBER 20, 2021

There are many ways that Apache Kafka has been deployed in the field. In our Kafka Summit 2021 presentation, we took a brief overview of many different configurations that have been observed to date. Kafka as software falls more cleanly into the Parallel Systems Reliability discussed below but some parts of it can end up Serial.

Kafka

Kafka Systems Utilities Bytes

Introducing a Cloud-Native Experience for Apache Kafka in Confluent Cloud

Confluent

MAY 13, 2019

In the last year, we’ve experienced enormous growth on Confluent Cloud, our fully managed Apache Kafka ® service. As Confluent Cloud has grown, we’ve noticed two gaps that very clearly remain to be filled in managed Apache Kafka services. Five seconds to Kafka (or, never make another cluster again!).

Kafka

Kafka Cloud Management Building

Building Shared State Microservices for Distributed Systems Using Kafka Streams

Confluent

AUGUST 1, 2019

The Kafka Streams API boasts a number of capabilities that make it well suited for maintaining the global state of a distributed system. At Imperva, we took advantage of Kafka Streams to build shared state microservices that serve as fault-tolerant, highly available single sources of truth about the state of objects in our system.

Kafka

Kafka Systems Building Metadata

Streaming Data from the Universe with Apache Kafka

Confluent

JUNE 13, 2019

This data pipeline is a great example of a use case for Apache Kafka ®. Astronomers need to be able to collect, process, characterize, and distribute data on these objects in near real time, especially for time-sensitive events. The case for Apache Kafka. Astronomy in real time. Alert data pipeline and system design.

Kafka

Kafka Bytes Python Data Pipeline

Dawn of Kafka DevOps: Managing Multi-Cluster Kafka Connect and KSQL with Confluent Control Center

Confluent

MAY 8, 2019

In anything but the smallest deployment of Apache Kafka ® , there are often going to be multiple clusters of Kafka Connect and KSQL. Kafka Connect rebalances when connectors are added/removed, and this can impact the performance of other connectors on the same cluster. Streaming data into Kafka with Kafka Connect.

Kafka

Kafka Management Hadoop Database

API-First Approach to Kafka Topic Creation

DoorDash Engineering

DECEMBER 5, 2023

DoorDash’s Engineering teams revamped Kafka Topic creation by replacing a Terraform/Atlantis based approach with an in-house API, Infra Service. DoorDash’s Real-Time Streaming Platform, or RTSP, team is under the Data Platform organization and manages over 2,500 Kafka Topics across five clusters.

Kafka

Kafka Programming Language Metadata Architecture

?? On Track with Apache Kafka – Building a Streaming ETL Solution with Rail Data

Confluent

OCTOBER 16, 2019

Trains are an excellent source of streaming data—their movements around the network are an unbounded series of events. Using this data, Apache Kafka ® and Confluent Platform can provide the foundations for both event-driven applications as well as an analytical platform. Resolving codes in events to their full values.

Kafka

Kafka Building Data Coding

Sysmon Security Event Processing in Real Time with KSQL and HELK

Confluent

FEBRUARY 21, 2019

During a recent talk titled Hunters ATT&CKing with the Right Data , which I presented with my brother Jose Luis Rodriguez at ATT&CKcon, we talked about the importance of documenting and modeling security event logs before developing any data analytics while preparing for a threat hunting engagement. FROM SYSMON_JOIN.

Process

Process Kafka SQL Datasets

Change Data Capture at Pinterest

Pinterest Engineering

NOVEMBER 18, 2024

This is particularly useful in environments where multiple applications need to access and process the same data. This configuration ensures that if the host goes down due to an EC2® event or any other reason, it will be automatically reprovisioned.

Kafka

Kafka MySQL Database Software Engineer

Stress Testing Kafka And Cassandra For Real-Time Anomaly Detection

Data Engineering Podcast

JULY 1, 2019

Scaling the volume of events that can be processed in real-time can be challenging, so Paul Brebner from Instaclustr set out to see how far he could push Kafka and Cassandra for this use case. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit.

Kafka

Kafka Finance Media Architecture

DoorDash Empowers Engineers with Kafka Self-Serve

DoorDash Engineering

AUGUST 13, 2024

This journey began with Kafka, one of our most critical and widely used infrastructure components. Kafka is a distributed event streaming platform that DoorDash uses to handle billions of real-time events. To address this, we developed Kafka Self-Serve, our flagship self-serve storage infrastructure platform.

Kafka

Kafka Engineering AWS Designing

Getting Started with Rust and Apache Kafka

Confluent

OCTOBER 24, 2019

I’ve written an event sourcing bank simulation in Clojure (a lisp build for Java virtual machines or JVMs) called open-bank-mark , which you are welcome to read about in my previous blog post explaining the story behind this open source example. Either way, both are accomplished with event sourcing. The bank application.

Kafka

Kafka Java Banking Bytes

The Good and the Bad of Apache Kafka Streaming Platform

AltexSoft

OCTOBER 21, 2022

Kafka can continue the list of brand names that became generic terms for the entire type of technology. In this article, we’ll explain why businesses choose Kafka and what problems they face when using it. In this article, we’ll explain why businesses choose Kafka and what problems they face when using it. What is Kafka?

Kafka

Kafka Hadoop Big Data ETL Tools

Kafka Connect Deep Dive – JDBC Source Connector

Confluent

FEBRUARY 12, 2019

One of the most common integrations that people want to do with Apache Kafka ® is getting data in from a database. That is because relational databases are a rich source of events. The existing data in a database, and any changes to that data, can be streamed into a Kafka topic. Setting the Kafka message key.

Kafka

Kafka MySQL Bytes Java

How to Run Apache Kafka with Spring Boot on Pivotal Application Service (PAS)

Confluent

OCTOBER 7, 2019

This tutorial describes how to set up a sample Spring Boot application in Pivotal Application Service (PAS), which consumes and produces events to an Apache Kafka ® cluster running in Pivotal Container Service (PKS). With this tutorial, you can set up your PAS and PKS configurations so that they work with Kafka. Methodology.

Kafka

Kafka Java Coding Accessible

Data Engineering Weekly #209

Data Engineering Weekly

FEBRUARY 23, 2025

Try Astro Free → Editor’s Note: Data Council 2025, Apr 22-24, Oakland, CA Data Council has always been one of my favorite events to connect with and learn from the data engineering community. The proposal discusses how Kafka will implement queue functionality similar to SQS and RabbitMQ.

Data Engineering

Data Engineering Data Engineer Engineering Kafka

Building a Real-Time, Event-Driven Stock Platform at Euronext

Confluent

OCTOBER 8, 2019

As we provide a complete event streaming platform that is radically changing how companies handle data, I get to work with customers in almost every industry, partner closely with our sales teams, and learn from and be inspired by the event streaming community. That’s why Euronext turned to Confluent. TechProductAwards.

Building

Building Kafka Data Warehouse Project

Every Company is Becoming a Software Company

Confluent

SEPTEMBER 25, 2019

Emergence of event streams. I believe the answer starts with the concept of events and event streams. What is an event? What is an event stream? A continually updating series of events, representing what happened in the past and what is happening now. Apache Kafka ® and its uses.

Database-centric

Database-centric Kafka Pipeline-centric Retail

Scaling Kafka Brokers in Cloudera Data Hub

Cloudera

OCTOBER 4, 2022

This blog post will provide guidance to administrators currently using or interested in using Kafka nodes to maintain cluster changes as they scale up or down to balance performance and cloud costs in production deployments. Kafka brokers contained within host groups enable the administrators to more easily add and remove nodes.

Kafka

Kafka Data Cloud Big Data

Introducing Confluent Platform 5.2

Confluent

APRIL 2, 2019

Includes free forever Confluent Platform on a single Apache Kafka ® broker, improved Control Center functionality at scale and hybrid cloud streaming. the event streaming platform built by the original creators of Apache Kafka. What do we mean by contextual event-driven applications? Confluent Platform 5.2

Kafka

Kafka Java Cloud Metadata

They Handle 500B Events Daily. Here’s Their Data Engineering Architecture.

Troubleshooting Kafka In Production

Webinars

Trending Sources

Beyond Kafka: Conversation with Jark Wu on Fluss - Streaming Storage for Real-Time Analytics

Webinars

Realtime Data Applications Made Easier With Meroxa

Introducing Derivative Event Sourcing

Internet of Things (IoT) and Event Streaming at Scale with Apache Kafka and MQTT

Netflix’s Distributed Counter Abstraction

Putting Events in Their Place with Dynamic Routing

Journey to Event Driven – Part 3: The Affinity Between Events, Streams and Serverless

Testing Event-Driven Systems

Using Graph Processing for Kafka Stream Visualizations

Spring for Apache Kafka Deep Dive – Part 4: Continuous Delivery of Event Streaming Pipelines

Microservices, Apache Kafka, and Domain-Driven Design

Apache Kafka Vs Apache Spark: Know the Differences

What’s New in Apache Kafka 2.3

Reliable, Fast Access to On-Chain Data Insights

The Rise of Managed Services for Apache Kafka

Simplifying Data Architecture and Security to Accelerate Value

Getting started with the MongoDB Connector for Apache Kafka and MongoDB

Spring for Apache Kafka Deep Dive – Part 3: Apache Kafka and Spring Cloud Data Flow

From Apache Kafka to Amazon S3: Exactly Once

Real-Time Analytics and Monitoring Dashboards with Apache Kafka and Rockset

Introducing Impressions at Netflix

Stream Processing with Python, Kafka & Faust

Apache Kafka Deployments and Systems Reliability – Part 1

Introducing a Cloud-Native Experience for Apache Kafka in Confluent Cloud

Building Shared State Microservices for Distributed Systems Using Kafka Streams

Streaming Data from the Universe with Apache Kafka

Dawn of Kafka DevOps: Managing Multi-Cluster Kafka Connect and KSQL with Confluent Control Center

API-First Approach to Kafka Topic Creation

?? On Track with Apache Kafka – Building a Streaming ETL Solution with Rail Data

Sysmon Security Event Processing in Real Time with KSQL and HELK

Change Data Capture at Pinterest

Stress Testing Kafka And Cassandra For Real-Time Anomaly Detection

DoorDash Empowers Engineers with Kafka Self-Serve

Getting Started with Rust and Apache Kafka

The Good and the Bad of Apache Kafka Streaming Platform

Kafka Connect Deep Dive – JDBC Source Connector

How to Run Apache Kafka with Spring Boot on Pivotal Application Service (PAS)

Data Engineering Weekly #209

Building a Real-Time, Event-Driven Stock Platform at Euronext

Every Company is Becoming a Software Company

Scaling Kafka Brokers in Cloudera Data Hub

Introducing Confluent Platform 5.2

Stay Connected