Java and Kafka - Data Engineering Digest

How to Tune RocksDB for Your Kafka Streams Application

Confluent

MARCH 10, 2021

Apache Kafka ships with Kafka Streams, a powerful yet lightweight client library for Java and Scala to implement highly scalable and elastic applications and microservices that process and analyze data […].

Kafka

Kafka Scala Java Process

Stream Rows and Kafka Topics Directly into Snowflake with Snowpipe Streaming

Snowflake

MARCH 2, 2023

As part of this, we are also supporting Snowpipe Streaming as an ingestion method for our Snowflake Connector for Kafka. Now we are able to ingest our data in near real time directly from Kafka topics to a Snowflake table, drastically reducing the cost of ingestion and improving our SLA from 15 minutes to within 60 seconds.

Kafka

Kafka Data Ingestion Data Pipeline Cloud Storage

My Python/Java/Spring/Go/Whatever Client Won’t Connect to My Apache Kafka Cluster in Docker/AWS/My Brother’s Laptop. Please Help!

Confluent

JUNE 9, 2020

tl;dr When a client wants to send or receive a message from Apache Kafka®, there are two types of connection that must succeed: The initial connection to a broker (the […].

Kafka

Kafka Java Python AWS

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Apache Kafka Vs Apache Spark: Know the Differences

Knowledge Hut

MAY 3, 2024

Spark Streaming Vs Kafka Stream Now that we have understood high level what these tools mean, it’s obvious to have curiosity around differences between both the tools. Spark Streaming Kafka Streams 1 Data received from live input data streams is Divided into Micro-batched for processing. 6 Spark streaming is a standalone framework.

Kafka

Kafka Scala Java Amazon Web Services

Bust the Burglars – Machine Learning with TensorFlow and Apache Kafka

Confluent

JULY 16, 2019

How cool would it be to build your own burglar alarm system that can alert you before the actual event takes place simply by using a few network-connected cameras and analyzing the camera images with Apache Kafka ® , Kafka Streams, and TensorFlow? Uploading your images into Kafka. Setting up your burglar alarm.

Machine Learning

Machine Learning Kafka Java Datasets

Getting Started with Rust and Apache Kafka

Confluent

OCTOBER 24, 2019

I’ve written an event sourcing bank simulation in Clojure (a lisp build for Java virtual machines or JVMs) called open-bank-mark , which you are welcome to read about in my previous blog post explaining the story behind this open source example. The schemas are also useful for generating specific Java classes. The bank application.

Kafka

Kafka Java Banking Bytes

12 Programming Languages Walk into a Kafka Cluster…

Confluent

APRIL 23, 2019

When it was first created, Apache Kafka ® had a client API for just Scala and Java. Since then, the Kafka client API has been developed for many other programming languages which enables you to pick the language you want. At Confluent, we have an engineering team dedicated to the development of these Kafka clients.

Programming Language

Programming Language Kafka Programming Scala

The Rise of Managed Services for Apache Kafka

Confluent

SEPTEMBER 20, 2019

As a distributed system for collecting, storing, and processing data at scale, Apache Kafka ® comes with its own deployment complexities. To simplify all of this, different providers have emerged to offer Apache Kafka as a managed service. Before Confluent Cloud was announced , a managed service for Apache Kafka did not exist.

Kafka

Kafka Management Cloud AWS

The Importance of Distributed Tracing for Apache-Kafka-Based Applications

Confluent

MARCH 26, 2019

Apache-Kafka ® -based applications stand out for their ability to decouple producers and consumers using an event log as an intermediate layer. This article describes how to instrument Kafka-based applications with distributed tracing capabilities in order to make dataflows between event-based components more visible.

Kafka

Kafka Transportation Metadata Consulting

Real-Time Analytics and Monitoring Dashboards with Apache Kafka and Rockset

Confluent

SEPTEMBER 26, 2019

In the early days, many companies simply used Apache Kafka ® for data ingestion into Hadoop or another data lake. However, Apache Kafka is more than just messaging. Some Kafka and Rockset users have also built real-time e-commerce applications , for example, using Rockset’s Java, Node.js

Kafka

Kafka SQL BI Hadoop

Designing the.NET API for Apache Kafka

Confluent

JUNE 27, 2019

Confluent’s clients for Apache Kafka ® recently passed a major milestone—the release of version 1.0. Magnus Edenhill first started developing librdkafka about seven years ago, later joining Confluent in the very early days to help foster the community of Kafka users outside the Java ecosystem. Leading up to the 1.0

Kafka

Kafka Designing Java Coding

Kafka Connect Deep Dive – JDBC Source Connector

Confluent

FEBRUARY 12, 2019

One of the most common integrations that people want to do with Apache Kafka ® is getting data in from a database. The existing data in a database, and any changes to that data, can be streamed into a Kafka topic. Here, I’m going to dig into one of the options available—the JDBC connector for Kafka Connect. Introduction.

Kafka

Kafka MySQL Bytes Java

Getting started with the MongoDB Connector for Apache Kafka and MongoDB

Confluent

JULY 17, 2019

Together, MongoDB and Apache Kafka ® make up the heart of many modern data architectures today. Integrating Kafka with external systems like MongoDB is best done though the use of Kafka Connect. The official MongoDB Connector for Apache Kafka is developed and supported by MongoDB engineers. Getting started.

MongoDB

MongoDB Kafka Database Medical

Deploying Kafka Streams and KSQL with Gradle – Part 3: KSQL User-Defined Functions and Kafka Streams

Confluent

JULY 10, 2019

As discussed in part 2, I created a GitHub repository with Docker Compose functionality for starting a Kafka and Confluent Platform environment, as well as the code samples mentioned below. We used Groovy instead of Java to write our UDFs, so we’ve applied the groovy plugin. gradlew composeUp. Note: When executing./gradlew

Kafka

Kafka Java Bytes SQL

Fault Tolerance in Distributed Systems: Tracing with Apache Kafka and Jaeger

Confluent

JULY 24, 2019

Using Jaeger tracing, I’ve been able to answer an important question that nearly every Apache Kafka ® project that I’ve worked on posed: how is data flowing through my distributed system? Distributed tracing with Apache Kafka and Jaeger. Example of a Kafka project with Jaeger tracing. What does this all mean?

Kafka

Kafka Systems Bytes Project

How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka

Confluent

NOVEMBER 29, 2022

Apache Kafka’s Streams API embeds Machine Learning into any app or microservice (Java, Docker, Kubernetes, etc.) to add business value.

Machine Learning

Machine Learning Kafka Java Building

Easier Stream Processing On Kafka With ksqlDB

Data Engineering Podcast

MARCH 2, 2020

The ksqlDB project was created to address this state of affairs by building a unified layer on top of the Kafka ecosystem for stream processing. Developers can work with the SQL constructs that they are familiar with while automatically getting the durability and reliability that Kafka offers. How is ksqlDB architected?

Kafka

Kafka Process PostgreSQL MySQL

Dawn of Kafka DevOps: Managing Multi-Cluster Kafka Connect and KSQL with Confluent Control Center

Confluent

MAY 8, 2019

In anything but the smallest deployment of Apache Kafka ® , there are often going to be multiple clusters of Kafka Connect and KSQL. Kafka Connect rebalances when connectors are added/removed, and this can impact the performance of other connectors on the same cluster. Streaming data into Kafka with Kafka Connect.

Kafka

Kafka Management Hadoop Database

All About the Kafka Connect Neo4j Sink Plugin

Confluent

FEBRUARY 28, 2019

Only a little more than one month after the first release, we are happy to announce another milestone for our Kafka integration. Today, you can grab the Kafka Connect Neo4j Sink from Confluent Hub. . Neo4j extension – Kafka sink refresher. Testing the Kafka Connect Neo4j Sink. curl -X POST [link]. jar -f AVRO -e 100000.

Kafka

Kafka Java Programming Language Big Data

Building Shared State Microservices for Distributed Systems Using Kafka Streams

Confluent

AUGUST 1, 2019

The Kafka Streams API boasts a number of capabilities that make it well suited for maintaining the global state of a distributed system. At Imperva, we took advantage of Kafka Streams to build shared state microservices that serve as fault-tolerant, highly available single sources of truth about the state of objects in our system.

Kafka

Kafka Systems Building Metadata

Internet of Things (IoT) and Event Streaming at Scale with Apache Kafka and MQTT

Confluent

OCTOBER 10, 2019

Apache Kafka ® and its surrounding ecosystem, which includes Kafka Connect, Kafka Streams, and KSQL, have become the technology of choice for integrating and processing these kinds of datasets. Microservices, Apache Kafka, and Domain-Driven Design (DDD) covers this in more detail. Example: Severstal. High throughput.

Kafka

Kafka Google Cloud Architecture Machine Learning

Spring for Apache Kafka Deep Dive – Part 3: Apache Kafka and Spring Cloud Data Flow

Confluent

MAY 30, 2019

Following part 1 and part 2 of the Spring for Apache Kafka Deep Dive blog series, here in part 3 we will discuss another project from the Spring team: Spring Cloud Data Flow , which focuses on enabling developers to easily develop, deploy, and orchestrate event streaming pipelines based on Apache Kafka ®. Command Line Shell.

Kafka

Kafka Cloud Data Pipeline PostgreSQL

Monitoring Data Replication in Multi-Datacenter Apache Kafka Deployments

Confluent

APRIL 10, 2019

Previously in 3 Ways to Prepare for Disaster Recovery in Multi-Datacenter Apache Kafka Deployments , we provided resources for multi-datacenter designs, centralized schema management, prevention of cyclic repetition of messages, and automatic consumer offset translation to automatically resume applications.

Kafka

Kafka Java Metadata Cloud

The Good and the Bad of Apache Kafka Streaming Platform

AltexSoft

OCTOBER 21, 2022

Kafka can continue the list of brand names that became generic terms for the entire type of technology. In this article, we’ll explain why businesses choose Kafka and what problems they face when using it. In this article, we’ll explain why businesses choose Kafka and what problems they face when using it. What is Kafka?

Kafka

Kafka Hadoop Big Data ETL Tools

How to Run Apache Kafka with Spring Boot on Pivotal Application Service (PAS)

Confluent

OCTOBER 7, 2019

This tutorial describes how to set up a sample Spring Boot application in Pivotal Application Service (PAS), which consumes and produces events to an Apache Kafka ® cluster running in Pivotal Container Service (PKS). With this tutorial, you can set up your PAS and PKS configurations so that they work with Kafka. Methodology.

Kafka

Kafka Java Coding Accessible

Data Engineering Weekly #218

Data Engineering Weekly

APRIL 27, 2025

link] Confluent: Guide to Consumer Offsets - Manual Control, Challenges, and the Innovations of KIP-1094 The article provides a comprehensive guide to Kafka consumer offsets, explaining their role in tracking consumption progress and the importance of manual offset control for reliability and exactly-once semantics (EOS).

Data Engineering

Data Engineering Data Engineer Engineering Kafka

Data News — Week 24.11

Christophe Blefari

MARCH 15, 2024

Obviously Benoit prefers Kestra, at the expense of writing YAML and running a Java application. Unlocking Kafka's potential: tackling tail latency with eBPF. New Apache Arrow engines — Arrow has become one of the most used library when it comes to built in-memory engines.

Metadata

Metadata Data Data Warehouse Software Engineer

?? On Track with Apache Kafka – Building a Streaming ETL Solution with Rail Data

Confluent

OCTOBER 16, 2019

Using this data, Apache Kafka ® and Confluent Platform can provide the foundations for both event-driven applications as well as an analytical platform. With tools like KSQL and Kafka Connect, the concept of streaming ETL is made accessible to a much wider audience of developers and data engineers. Ingesting the data.

Kafka

Kafka Building Data Coding

Dawn of DevOps: Managing Apache Kafka Clusters at Scale with Confluent Control Center

Confluent

MAY 2, 2019

When managing Apache Kafka ® clusters at scale, tasks that are simple on small clusters turn into significant burdens. Relatedly, KIP-226 enabled dynamic broker reconfiguration since Apache Kafka 1.1. See the documentation (or, if you please, the Apache Kafka wiki ) for a complete list of which parameters this applies to.

Kafka

Kafka Management Food Consulting

How to Use Schema Registry and Avro in Spring Boot Applications

Confluent

SEPTEMBER 5, 2019

Following on from How to Work with Apache Kafka in Your Spring Boot Application , which shows how to get started with Spring Boot and Apache Kafka ® , here I will demonstrate how to enable usage of Confluent Schema Registry and Avro serialization format in your Spring Boot applications. Initial revision. Prerequisities. Avro SerDes.

Kafka

Kafka Java Food Cloud

Spring for Apache Kafka Deep Dive – Part 4: Continuous Delivery of Event Streaming Pipelines

Confluent

JUNE 11, 2019

Here in part 4 of the Spring for Apache Kafka Deep Dive blog series, we will cover: Common event streaming topology patterns supported in Spring Cloud Data Flow. Create and manage event streaming pipelines, including a Kafka Streams application using Spring Cloud Data Flow. java -jar spring-cloud-dataflow-shell-2.1.0.RELEASE.jar.

Kafka

Kafka Cloud Java MongoDB

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Confluent

FEBRUARY 6, 2019

The blog posts How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka and Using Apache Kafka to Drive Cutting-Edge Machine Learning describe the benefits of leveraging the Apache Kafka ® ecosystem as a central, scalable and mission-critical nervous system. For now, we’ll focus on Kafka.

Machine Learning

Machine Learning Python Kafka Java

Dataflow Programming with Apache Flink and Apache Kafka

Confluent

SEPTEMBER 14, 2023

Learn how to use Apache Flink to build a Java pipeline that consumes clickstream data from Apache Kafka.

Kafka

Kafka Programming Java Building

Running Unified PubSub Client in Production at Pinterest

Pinterest Engineering

NOVEMBER 7, 2023

A central component of data ingestion infrastructure at Pinterest is our PubSub stack, and the Logging Platform team currently runs deployments of Apache Kafka and MemQ. Given that around 50% of Java clients at Pinterest are on Flink, PSC integration with Flink was key to achieving our platform goals of fully migrating Java clients to PSC.

Kafka

Kafka Java Software Engineer Software Engineering

How to configure clients to connect to Apache Kafka Clusters securely – Part 1: Kerberos

Cloudera

DECEMBER 2, 2020

This is the first installment in a short series of blog posts about security in Apache Kafka. Secured Apache Kafka clusters can be configured to enforce authentication using different methods, including the following: SSL – TLS client authentication. We use the kafka-console-consumer for all the examples below.

Kafka

Kafka Java Big Data Ecosystem Cloud

How to configure clients to connect to Apache Kafka Clusters securely – Part 4: TLS Client Authentication

Cloudera

FEBRUARY 2, 2021

In the previous posts in this series, we have discussed Kerberos , LDAP and PAM authentication for Kafka. In this post we will look into how to configure a Kafka cluster and client to use a TLS client authentication. TLS is assumed to be enabled for the Apache Kafka cluster, as it should be for every secure cluster.

Kafka

Kafka Certification Java Management

Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable

Data Engineering Podcast

OCTOBER 15, 2023

What was the process for adding full Java support in addition to SQL? What was the process for adding full Java support in addition to SQL? What are the problems that customers are trying to solve when they come to Decodable? When you launched your focus was on SQL transformations of streaming data.

Process

Process Building SQL BI

Webify Event Streams Using the Kafka Connect HTTP Sink Connector

Confluent

APRIL 20, 2020

The goal of this post is to illustrate PUSH to web from Apache Kafka® with a hands-on example. Our business users are always wanting their data faster so they can […].

Kafka

Kafka Java Data

Optimizing Kafka Clients: A Hands-On Guide

Rock the JVM

JANUARY 21, 2023

Introduction Apache Kafka is a well-known event streaming platform used in many organizations worldwide. The focus of this article is to provide a better understanding of how Kafka works under the hood to better design and tune your client applications. Environment Setup First, we want to have a Kafka Cluster up and running.

Kafka

Kafka Java Scala Coding

Deploying Kafka Streams and KSQL with Gradle – Part 2: Managing KSQL Implementations

Confluent

MAY 29, 2019

In part 1 , we discussed an event streaming architecture that we implemented for a customer using Apache Kafka ® , KSQL from Confluent, and Kafka Streams. In part 3, we’ll explore using Gradle to build and deploy KSQL user-defined functions (UDFs) and Kafka Streams microservices. gradlew composeUp. The KSQL pipeline flow.

Kafka

Kafka Management Bytes SQL

Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds

Cloudera

OCTOBER 12, 2020

This typically involved a lot of coding with Java, Scala or similar technologies. The DataFlow platform has established a leading position in the data streaming market by unlocking the combined value and synergies of Apache NiFi, Apache Kafka and Apache Flink.

Cloud

Cloud Process Scala Kafka

Replace and Boost your Apache Storm Topologies with Apache NiFi Flows

Cloudera

AUGUST 2, 2021

Since all the flows were simple event processing, the NiFi flows were built out in a matter of hours (drag-and-drop) instead of months (coding in Java). . They asked, “Can NiFi keep up with the same throughput as Storm?” Setting the context, why would a customer want to use Apache NiFi, Apache Kafka, and Apache HBase? Nifi Flows.

Kafka

Kafka Java Coding Process

Data Engineering Weekly #182

Data Engineering Weekly

JULY 28, 2024

JSON workflow definition gives flexibility to build DSL on higher-level languages like Python & Java. link] Uber: Introduction to Kafka Tiered Storage at Uber The effectiveness of Kafka Tiered-Storage is a widely discussed topic. A key highlight for me is the following features from Maestro. Pipeline breakpoint feature.

Data Engineering

Data Engineering Data Engineer Engineering Database-centric

Introducing Confluent Platform 5.2

Confluent

APRIL 2, 2019

Includes free forever Confluent Platform on a single Apache Kafka ® broker, improved Control Center functionality at scale and hybrid cloud streaming. the event streaming platform built by the original creators of Apache Kafka. in order to bring our C/C++, Python, Go and.NET clients closer to parity with the Java client.

Kafka

Kafka Java Cloud Metadata

How to Tune RocksDB for Your Kafka Streams Application

Stream Rows and Kafka Topics Directly into Snowflake with Snowpipe Streaming

Webinars

Trending Sources

My Python/Java/Spring/Go/Whatever Client Won’t Connect to My Apache Kafka Cluster in Docker/AWS/My Brother’s Laptop. Please Help!

Webinars

Apache Kafka Vs Apache Spark: Know the Differences

Bust the Burglars – Machine Learning with TensorFlow and Apache Kafka

Getting Started with Rust and Apache Kafka

12 Programming Languages Walk into a Kafka Cluster…

The Rise of Managed Services for Apache Kafka

The Importance of Distributed Tracing for Apache-Kafka-Based Applications

Real-Time Analytics and Monitoring Dashboards with Apache Kafka and Rockset

Designing the.NET API for Apache Kafka

Kafka Connect Deep Dive – JDBC Source Connector

Getting started with the MongoDB Connector for Apache Kafka and MongoDB

Deploying Kafka Streams and KSQL with Gradle – Part 3: KSQL User-Defined Functions and Kafka Streams

Fault Tolerance in Distributed Systems: Tracing with Apache Kafka and Jaeger

How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka

Easier Stream Processing On Kafka With ksqlDB

Dawn of Kafka DevOps: Managing Multi-Cluster Kafka Connect and KSQL with Confluent Control Center

All About the Kafka Connect Neo4j Sink Plugin

Building Shared State Microservices for Distributed Systems Using Kafka Streams

Internet of Things (IoT) and Event Streaming at Scale with Apache Kafka and MQTT

Spring for Apache Kafka Deep Dive – Part 3: Apache Kafka and Spring Cloud Data Flow

Monitoring Data Replication in Multi-Datacenter Apache Kafka Deployments

The Good and the Bad of Apache Kafka Streaming Platform

How to Run Apache Kafka with Spring Boot on Pivotal Application Service (PAS)

Data Engineering Weekly #218

Data News — Week 24.11

?? On Track with Apache Kafka – Building a Streaming ETL Solution with Rail Data

Dawn of DevOps: Managing Apache Kafka Clusters at Scale with Confluent Control Center

How to Use Schema Registry and Avro in Spring Boot Applications

Spring for Apache Kafka Deep Dive – Part 4: Continuous Delivery of Event Streaming Pipelines

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Dataflow Programming with Apache Flink and Apache Kafka

Running Unified PubSub Client in Production at Pinterest

How to configure clients to connect to Apache Kafka Clusters securely – Part 1: Kerberos

How to configure clients to connect to Apache Kafka Clusters securely – Part 4: TLS Client Authentication

Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable

Webify Event Streams Using the Kafka Connect HTTP Sink Connector

Optimizing Kafka Clients: A Hands-On Guide

Deploying Kafka Streams and KSQL with Gradle – Part 2: Managing KSQL Implementations

Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds

Replace and Boost your Apache Storm Topologies with Apache NiFi Flows

Data Engineering Weekly #182

Introducing Confluent Platform 5.2

Stay Connected