Blog, Java and Kafka - Data Engineering Digest

Stream Rows and Kafka Topics Directly into Snowflake with Snowpipe Streaming

Snowflake

MARCH 2, 2023

As part of this, we are also supporting Snowpipe Streaming as an ingestion method for our Snowflake Connector for Kafka. Now we are able to ingest our data in near real time directly from Kafka topics to a Snowflake table, drastically reducing the cost of ingestion and improving our SLA from 15 minutes to within 60 seconds.

Kafka

Kafka Data Ingestion Data Pipeline Cloud Storage

Bust the Burglars – Machine Learning with TensorFlow and Apache Kafka

Confluent

JULY 16, 2019

How cool would it be to build your own burglar alarm system that can alert you before the actual event takes place simply by using a few network-connected cameras and analyzing the camera images with Apache Kafka ® , Kafka Streams, and TensorFlow? I will show how to implement this use case in this blog post.

Machine Learning

Machine Learning Kafka Java Datasets

Spring for Apache Kafka Deep Dive – Part 2: Apache Kafka and Spring Cloud Stream

Confluent

MARCH 12, 2019

On the heels of part one in this blog series, Spring for Apache Kafka – Part 1: Error Handling, Message Conversion and Transaction Support , here in part two we’ll focus on another project that enhances the developer experience when building streaming applications on Kafka: Spring Cloud Stream. What is Spring Cloud Stream?

Kafka

Kafka Cloud Programming Coding

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

The Rise of Managed Services for Apache Kafka

Confluent

SEPTEMBER 20, 2019

As a distributed system for collecting, storing, and processing data at scale, Apache Kafka ® comes with its own deployment complexities. To simplify all of this, different providers have emerged to offer Apache Kafka as a managed service. Before Confluent Cloud was announced , a managed service for Apache Kafka did not exist.

Kafka

Kafka Management Cloud AWS

Fault Tolerance in Distributed Systems: Tracing with Apache Kafka and Jaeger

Confluent

JULY 24, 2019

Using Jaeger tracing, I’ve been able to answer an important question that nearly every Apache Kafka ® project that I’ve worked on posed: how is data flowing through my distributed system? Distributed tracing with Apache Kafka and Jaeger. Example of a Kafka project with Jaeger tracing. What does this all mean?

Kafka

Kafka Systems Bytes Project

Getting Started with Rust and Apache Kafka

Confluent

OCTOBER 24, 2019

I’ve written an event sourcing bank simulation in Clojure (a lisp build for Java virtual machines or JVMs) called open-bank-mark , which you are welcome to read about in my previous blog post explaining the story behind this open source example. The schemas are also useful for generating specific Java classes.

Kafka

Kafka Java Banking Bytes

The Importance of Distributed Tracing for Apache-Kafka-Based Applications

Confluent

MARCH 26, 2019

Apache-Kafka ® -based applications stand out for their ability to decouple producers and consumers using an event log as an intermediate layer. This article describes how to instrument Kafka-based applications with distributed tracing capabilities in order to make dataflows between event-based components more visible.

Kafka

Kafka Transportation Metadata Consulting

Real-Time Analytics and Monitoring Dashboards with Apache Kafka and Rockset

Confluent

SEPTEMBER 26, 2019

In the early days, many companies simply used Apache Kafka ® for data ingestion into Hadoop or another data lake. However, Apache Kafka is more than just messaging. Some Kafka and Rockset users have also built real-time e-commerce applications , for example, using Rockset’s Java, Node.js

Kafka

Kafka SQL BI Hadoop

Getting started with the MongoDB Connector for Apache Kafka and MongoDB

Confluent

JULY 17, 2019

Together, MongoDB and Apache Kafka ® make up the heart of many modern data architectures today. Integrating Kafka with external systems like MongoDB is best done though the use of Kafka Connect. The official MongoDB Connector for Apache Kafka is developed and supported by MongoDB engineers. Getting started.

MongoDB

MongoDB Kafka Database Medical

Spring for Apache Kafka Deep Dive – Part 3: Apache Kafka and Spring Cloud Data Flow

Confluent

MAY 30, 2019

Following part 1 and part 2 of the Spring for Apache Kafka Deep Dive blog series, here in part 3 we will discuss another project from the Spring team: Spring Cloud Data Flow , which focuses on enabling developers to easily develop, deploy, and orchestrate event streaming pipelines based on Apache Kafka ®. Command Line Shell.

Kafka

Kafka Cloud Data Pipeline PostgreSQL

Deploying Kafka Streams and KSQL with Gradle – Part 3: KSQL User-Defined Functions and Kafka Streams

Confluent

JULY 10, 2019

As discussed in part 2, I created a GitHub repository with Docker Compose functionality for starting a Kafka and Confluent Platform environment, as well as the code samples mentioned below. We used Groovy instead of Java to write our UDFs, so we’ve applied the groovy plugin. gradlew composeUp. Note: When executing./gradlew

Kafka

Kafka Java Bytes SQL

Internet of Things (IoT) and Event Streaming at Scale with Apache Kafka and MQTT

Confluent

OCTOBER 10, 2019

Apache Kafka ® and its surrounding ecosystem, which includes Kafka Connect, Kafka Streams, and KSQL, have become the technology of choice for integrating and processing these kinds of datasets. Microservices, Apache Kafka, and Domain-Driven Design (DDD) covers this in more detail. Example: Severstal.

Kafka

Kafka Google Cloud Architecture Machine Learning

Building Shared State Microservices for Distributed Systems Using Kafka Streams

Confluent

AUGUST 1, 2019

The Kafka Streams API boasts a number of capabilities that make it well suited for maintaining the global state of a distributed system. At Imperva, we took advantage of Kafka Streams to build shared state microservices that serve as fault-tolerant, highly available single sources of truth about the state of objects in our system.

Kafka

Kafka Systems Building Metadata

Monitoring Data Replication in Multi-Datacenter Apache Kafka Deployments

Confluent

APRIL 10, 2019

Previously in 3 Ways to Prepare for Disaster Recovery in Multi-Datacenter Apache Kafka Deployments , we provided resources for multi-datacenter designs, centralized schema management, prevention of cyclic repetition of messages, and automatic consumer offset translation to automatically resume applications.

Kafka

Kafka Java Metadata Cloud

The Good and the Bad of Apache Kafka Streaming Platform

AltexSoft

OCTOBER 21, 2022

Kafka can continue the list of brand names that became generic terms for the entire type of technology. In this article, we’ll explain why businesses choose Kafka and what problems they face when using it. In this article, we’ll explain why businesses choose Kafka and what problems they face when using it. What is Kafka?

Kafka

Kafka Hadoop Big Data ETL Tools

Running Unified PubSub Client in Production at Pinterest

Pinterest Engineering

NOVEMBER 7, 2023

A central component of data ingestion infrastructure at Pinterest is our PubSub stack, and the Logging Platform team currently runs deployments of Apache Kafka and MemQ. years since our previous blog post, PSC has been battle-tested at large scale in Pinterest with notably positive feedback and results.

Kafka

Kafka Java Software Engineer Software Engineering

Spring for Apache Kafka Deep Dive – Part 4: Continuous Delivery of Event Streaming Pipelines

Confluent

JUNE 11, 2019

Here in part 4 of the Spring for Apache Kafka Deep Dive blog series, we will cover: Common event streaming topology patterns supported in Spring Cloud Data Flow. Create and manage event streaming pipelines, including a Kafka Streams application using Spring Cloud Data Flow. java -jar spring-cloud-dataflow-shell-2.1.0.RELEASE.jar.

Kafka

Kafka Cloud Java MongoDB

Data Engineering Weekly #182

Data Engineering Weekly

JULY 28, 2024

The blog is an excellent summarization of the common patterns emerging in GenAI platforms. The blog Prompt Engineering for a Better SQL Code Generation With LLMs is a pretty good guide on applying prompt engineering to improve productivity. Swiggy recently wrote about its internal platform, Hermes, a text-to-SQL solution.

Data Engineering

Data Engineering Data Engineer Engineering Database-centric

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Confluent

FEBRUARY 6, 2019

The blog posts How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka and Using Apache Kafka to Drive Cutting-Edge Machine Learning describe the benefits of leveraging the Apache Kafka ® ecosystem as a central, scalable and mission-critical nervous system. For now, we’ll focus on Kafka.

Machine Learning

Machine Learning Python Kafka Java

Replace and Boost your Apache Storm Topologies with Apache NiFi Flows

Cloudera

AUGUST 2, 2021

Since all the flows were simple event processing, the NiFi flows were built out in a matter of hours (drag-and-drop) instead of months (coding in Java). . As you’ll see in this blog, NiFi is not only keeping up with Storm; it beats Storm by 4x throughput. . Nifi Flows. Take the next steps to: Learn about Cloudera Flow Management.

Kafka

Kafka Java Coding Process

How to configure clients to connect to Apache Kafka Clusters securely – Part 1: Kerberos

Cloudera

DECEMBER 2, 2020

This is the first installment in a short series of blog posts about security in Apache Kafka. Secured Apache Kafka clusters can be configured to enforce authentication using different methods, including the following: SSL – TLS client authentication. We use the kafka-console-consumer for all the examples below.

Kafka

Kafka Java Big Data Ecosystem Cloud

How to configure clients to connect to Apache Kafka Clusters securely – Part 4: TLS Client Authentication

Cloudera

FEBRUARY 2, 2021

In the previous posts in this series, we have discussed Kerberos , LDAP and PAM authentication for Kafka. In this post we will look into how to configure a Kafka cluster and client to use a TLS client authentication. TLS is assumed to be enabled for the Apache Kafka cluster, as it should be for every secure cluster.

Kafka

Kafka Certification Java Management

Deploying Kafka Streams and KSQL with Gradle – Part 2: Managing KSQL Implementations

Confluent

MAY 29, 2019

In part 1 , we discussed an event streaming architecture that we implemented for a customer using Apache Kafka ® , KSQL from Confluent, and Kafka Streams. In part 3, we’ll explore using Gradle to build and deploy KSQL user-defined functions (UDFs) and Kafka Streams microservices. Sample repository. gradlew composeUp.

Kafka

Kafka Management Bytes SQL

17 Ways to Mess Up Self-Managed Schema Registry

Confluent

MAY 28, 2019

Part 1 of this blog series by Gwen Shapira explained the benefits of schemas, contracts between services, and compatibility checking for schema evolution. Actually, we recommend that you consider another alternative to self-managing Schema Registry, and the next blog post in this series reveals what that alternative is! Bad designs.

Management

Management Kafka Java Certification

Building a Scalable Search Architecture

Confluent

JUNE 18, 2019

Distributed transactions are very hard to implement successfully, which is why we’ll introduce a log-inspired system such as Apache Kafka ®. Building an indexing pipeline at scale with Kafka Connect. Moving data into Apache Kafka with the JDBC connector. For this use case, we are going to use it as a source connector.

Architecture

Architecture Building Kafka Database-centric

Introducing Derivative Event Sourcing

Confluent

SEPTEMBER 6, 2019

In the above scenario, we would have to update all five services to connect to Apache Kafka ® , create the event in all the appropriate places inside each service, and then produce that event to a Kafka topic. The first point to consider is if connecting to Kafka from all services is even possible.

Kafka

Kafka MySQL Banking Java

Fraud Detection With Cloudera Stream Processing Part 2: Real-Time Streaming Analytics

Cloudera

JULY 18, 2022

In part 1 of this blog we discussed how Cloudera DataFlow for the Public Cloud (CDF-PC), the universal data distribution service powered by Apache NiFi, can make it easy to acquire data from wherever it originates and move it efficiently to make it available to other applications in a streaming fashion.

Process

Process Kafka Scala SQL

Build Hybrid Data Pipelines and Enable Universal Connectivity With CDF-PC Inbound Connections

Cloudera

JUNE 17, 2022

In the second blog of the Universal Data Distribution blog series , we explored how Cloudera DataFlow for the Public Cloud (CDF-PC) can help you implement use cases like data lakehouse and data warehouse ingest, cybersecurity, and log optimization, as well as IoT and streaming data collection. Kafka REST Proxy for streaming data.

Data Pipeline

Data Pipeline Building Kafka Java

KSQL in Football: FIFA Women’s World Cup Data Analysis

Confluent

JULY 3, 2019

The idea in this blog post is to mix information coming from two distinct channels: the RSS feeds of sport-related newspapers and Twitter feeds of the FIFA Women’s World Cup. Ingesting Twitter data is very easy with Kafka Connect , a framework for connecting Kafka with external systems. Ingesting Twitter data. connector.state].

Data Analysis

Data Analysis Kafka Datasets Java

Databricks, Snowflake and the future

Christophe Blefari

JUNE 21, 2024

you could write the same pipeline in Java, in Scala, in Python, in SQL, etc.—with I won't delve into every announcement here, but for more details, SELECT has written a blog covering the 28 announcements and takeaways from the Summit. Accordingly to the press Snowflake and Confluent (Kafka) were also trying to buy Tabular.

Metadata

Metadata Data Warehouse BI MySQL

Revolutionizing Real-Time Streaming Processing: 4 Trillion Events Daily at LinkedIn

LinkedIn Engineering

OCTOBER 19, 2023

In 2010, they introduced Apache Kafka , a pivotal Big Data ingestion backbone for LinkedIn’s real-time infrastructure. To transition from batch-oriented processing and respond to Kafka events within minutes or seconds, they built an in-house distributed event streaming framework, Apache Samza. hours to 25 minutes).

Process

Process Lambda Architecture Kafka Machine Learning

Brief History of Data Engineering

Jesse Anderson

DECEMBER 12, 2022

Apache Kafka came in 2011 and gave the industry a much better way to move real-time data. Apache Kafka has its architectural limitations, and Apache Pulsar was released in 2016. At various times it’s been Java, Scala, and Python. Apache Flink came in 2011 and gave us our first real streaming engine.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Java vs Python for Data Science in 2023-What's your choice?

ProjectPro

JUNE 18, 2021

Why do data scientists prefer Python over Java? Java vs Python for Data Science- Which is better? Which has a better future: Python or Java in 2021? This blog aims to answer all questions on how Java vs Python compare for data science and which should be the programming language of your choice for doing data science in 2021.

Java

Java Data Science Python Programming Language

Cloudera Streaming Analytics 1.6 Release Notes

Cloudera

JANUARY 19, 2022

Some of these improvements and features are: Flink JAR submission (for Java UDF’s). Run on your desktop or cloud node, connecting to Kafka or other sources/sinks via API calls to their respective clusters. We plan future blog posts on this workflow. Release Notes appeared first on Cloudera Blog. x compatibility.

Java

Java Kafka SQL Cloud

Schemas, Contracts, and Compatibility

Confluent

MAY 21, 2019

The profile service will publish the changes in profiles, including address changes to an Apache Kafka ® topic, and the quote service will subscribe to the updates from the profile changes topic, calculate a new quote if needed and publish the new quota to a Kafka topic so other services can subscribe to the updated quote event.

Kafka

Kafka Insurance Architecture Database

Data Reprocessing Pipeline in Asset Management Platform @Netflix

Netflix Tech

MARCH 10, 2023

Data Sharding strategy in elasticsearch is updated to provide low search latency (as described in blog post) Design of new Cassandra reverse indices to support different sets of queries. For asynchronous processing, events are sent to Apache Kafka topics to be processed. Apache Kafka topic is configured as a message broker.

Management

Management Kafka Metadata Media

What is Streaming Analytics?

Cloudera

APRIL 20, 2021

The developers must understand lower-level languages like Java and Scala and be familiar with the streaming APIs. Streamings Messaging , powered by Apache Kafka, buffers and scales massive volumes of data streams for streaming analytics. appeared first on Cloudera Blog. Take the next step and learn more: Cloudera DataFlow.

Kafka

Kafka Hospitality Retail Data Ingestion

Druid Deprecation and ClickHouse Adoption at Lyft

Lyft Engineering

NOVEMBER 29, 2023

In this particular blog post, we explain how Druid has been used at Lyft and what led us to adopt ClickHouse for our sub-second analytic system. Real-time Ingestion Events from our real-time analytics pipeline were configured to be sent into our internal Flink application, streamed to Kafka, and written into Druid.

Kafka

Kafka Data Ingestion Datasets Architecture

KSQL UDFs and UDAFs Made Easy

Confluent

AUGUST 6, 2019

The previous blog post How to Build a UDF and/or UDAF in KSQL 5.0 creating custom KSQL functions is even easier when you leverage Maven , a tool for building and managing dependencies in Java projects. java ? ? ??? For this example, we’ll assume there’s a topic named api_logs in our Kafka cluster. my-udf/ ???

Kafka

Kafka Java Coding Project

Implementing and Using UDFs in Cloudera SQL Stream Builder

Cloudera

FEBRUARY 22, 2023

In SSB, today we are supporting JavaScript (JS) and Java UDFs, which can be used as a function with your data. But let’s assume we have already set up such a table, based off of a Kafka topic that has the ADSB data streaming through it, and we have named it airplanes. With UDFs you can really enhance the capabilities of your queries.

SQL

SQL Raw Data Kafka Programming Language

Rockset Enhances Kafka Integration to Simplify Real-Time Analytics on Streaming Data

Rockset

SEPTEMBER 14, 2021

We’re introducing a new Rockset Integration for Apache Kafka that offers native support for Confluent Cloud and Apache Kafka, making it simpler and faster to ingest streaming data for real-time analytics. With the Kafka Integration, users no longer need to build, deploy or operate any infrastructure component on the Kafka side.

Kafka

Kafka SQL MongoDB Computer Science

Fraud Detection with Cloudera Stream Processing Part 1

Cloudera

JUNE 28, 2022

In a previous blog of this series, Turning Streams Into Data Products , we talked about the increased need for reducing the latency between data generation/ingestion and producing analytical results and insights from this data. This blog will be published in two parts. This is what we call the first-mile problem. The use case.

Process

Process Kafka SQL Machine Learning

Journey to Event Driven – Part 3: The Affinity Between Events, Streams and Serverless

Confluent

FEBRUARY 27, 2019

When it comes to the emerging serverless world, It makes sense to validate how Apache Kafka ® fits in considering that it is mission critical in 90 percent of companies. By persisting the streams in Kafka we then have a record of all system activity (a source of truth), and also a mechanism to drive reactions. Event-first FaaS.

Kafka

Kafka AWS Architecture Cloud

Cloudera Operational Database application development concepts

Cloudera

FEBRUARY 9, 2021

If you are new to Cloudera Operational Database, see this blog post. In this blog post, we’ll look at both Apache HBase and Apache Phoenix concepts relevant to developing applications for Cloudera Operational Database. To know more about Apache HBase region splitting and merging, see the blog post here: [link].

Database

Database Java SQL Data Ingestion

Stream Rows and Kafka Topics Directly into Snowflake with Snowpipe Streaming

Bust the Burglars – Machine Learning with TensorFlow and Apache Kafka

Webinars

Trending Sources

Spring for Apache Kafka Deep Dive – Part 2: Apache Kafka and Spring Cloud Stream

Webinars

The Rise of Managed Services for Apache Kafka

Fault Tolerance in Distributed Systems: Tracing with Apache Kafka and Jaeger

Getting Started with Rust and Apache Kafka

The Importance of Distributed Tracing for Apache-Kafka-Based Applications

Real-Time Analytics and Monitoring Dashboards with Apache Kafka and Rockset

Getting started with the MongoDB Connector for Apache Kafka and MongoDB

Spring for Apache Kafka Deep Dive – Part 3: Apache Kafka and Spring Cloud Data Flow

Deploying Kafka Streams and KSQL with Gradle – Part 3: KSQL User-Defined Functions and Kafka Streams

Internet of Things (IoT) and Event Streaming at Scale with Apache Kafka and MQTT

Building Shared State Microservices for Distributed Systems Using Kafka Streams

Monitoring Data Replication in Multi-Datacenter Apache Kafka Deployments

The Good and the Bad of Apache Kafka Streaming Platform

Running Unified PubSub Client in Production at Pinterest

Spring for Apache Kafka Deep Dive – Part 4: Continuous Delivery of Event Streaming Pipelines

Data Engineering Weekly #182

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Replace and Boost your Apache Storm Topologies with Apache NiFi Flows

How to configure clients to connect to Apache Kafka Clusters securely – Part 1: Kerberos

How to configure clients to connect to Apache Kafka Clusters securely – Part 4: TLS Client Authentication

Deploying Kafka Streams and KSQL with Gradle – Part 2: Managing KSQL Implementations

17 Ways to Mess Up Self-Managed Schema Registry

Building a Scalable Search Architecture

Introducing Derivative Event Sourcing

Fraud Detection With Cloudera Stream Processing Part 2: Real-Time Streaming Analytics

Build Hybrid Data Pipelines and Enable Universal Connectivity With CDF-PC Inbound Connections

KSQL in Football: FIFA Women’s World Cup Data Analysis

Databricks, Snowflake and the future

Revolutionizing Real-Time Streaming Processing: 4 Trillion Events Daily at LinkedIn

Brief History of Data Engineering

Java vs Python for Data Science in 2023-What's your choice?

Cloudera Streaming Analytics 1.6 Release Notes

Schemas, Contracts, and Compatibility

Data Reprocessing Pipeline in Asset Management Platform @Netflix

What is Streaming Analytics?

Druid Deprecation and ClickHouse Adoption at Lyft

KSQL UDFs and UDAFs Made Easy

Implementing and Using UDFs in Cloudera SQL Stream Builder

Rockset Enhances Kafka Integration to Simplify Real-Time Analytics on Streaming Data

Fraud Detection with Cloudera Stream Processing Part 1

Journey to Event Driven – Part 3: The Affinity Between Events, Streams and Serverless

Cloudera Operational Database application development concepts

Stay Connected