Kafka and Scala - Data Engineering Digest

A Detailed Guide of Interview Questions on Apache Kafka

Analytics Vidhya

APRIL 28, 2023

Introduction Apache Kafka is an open-source publish-subscribe messaging application initially developed by LinkedIn in early 2011. It is a famous Scala-coded data processing tool that offers low latency, extensive throughput, and a unified platform to handle the data in real-time.

Kafka

Kafka Scala Coding Data Process

Getting Started with Scala and Apache Kafka

Confluent

DECEMBER 8, 2020

If you’re getting started with Apache Kafka® and event streaming applications, you’ll be pleased to see the variety of languages available to start interacting with the event streaming platform. It […].

Kafka

Kafka Scala IT

How to Tune RocksDB for Your Kafka Streams Application

Confluent

MARCH 10, 2021

Apache Kafka ships with Kafka Streams, a powerful yet lightweight client library for Java and Scala to implement highly scalable and elastic applications and microservices that process and analyze data […].

Kafka

Kafka Scala Java Process

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

What’s New in Apache Kafka 2.5

Confluent

APRIL 16, 2020

On behalf of the Apache Kafka® community, it is my pleasure to announce the release of Apache Kafka 2.5.0. The community has created another exciting release. We are making progress […].

Kafka

Kafka Scala IT

12 Programming Languages Walk into a Kafka Cluster…

Confluent

APRIL 23, 2019

When it was first created, Apache Kafka ® had a client API for just Scala and Java. Since then, the Kafka client API has been developed for many other programming languages which enables you to pick the language you want. At Confluent, we have an engineering team dedicated to the development of these Kafka clients.

Programming Language

Programming Language Kafka Programming Scala

Apache Kafka Vs Apache Spark: Know the Differences

Knowledge Hut

MAY 3, 2024

Spark Streaming Vs Kafka Stream Now that we have understood high level what these tools mean, it’s obvious to have curiosity around differences between both the tools. Spark Streaming Kafka Streams 1 Data received from live input data streams is Divided into Micro-batched for processing. 6 Spark streaming is a standalone framework.

Kafka

Kafka Scala Java Amazon Web Services

Bust the Burglars – Machine Learning with TensorFlow and Apache Kafka

Confluent

JULY 16, 2019

How cool would it be to build your own burglar alarm system that can alert you before the actual event takes place simply by using a few network-connected cameras and analyzing the camera images with Apache Kafka ® , Kafka Streams, and TensorFlow? Uploading your images into Kafka. Receiving burglar alerts from Kafka.

Machine Learning

Machine Learning Kafka Java Datasets

The Rise of Managed Services for Apache Kafka

Confluent

SEPTEMBER 20, 2019

As a distributed system for collecting, storing, and processing data at scale, Apache Kafka ® comes with its own deployment complexities. To simplify all of this, different providers have emerged to offer Apache Kafka as a managed service. Before Confluent Cloud was announced , a managed service for Apache Kafka did not exist.

Kafka

Kafka Management Cloud AWS

Scala In Demand Technologies Built On Scala

Knowledge Hut

MAY 20, 2024

The term Scala originated from “Scalable language” and it means that Scala grows with you. In recent times, Scala has attracted developers because it has enabled them to deliver things faster with fewer codes. Developers are now much more interested in having Scala training to excel in the big data field.

Scala

Scala Technology Kafka Hadoop

What’s New in Apache Kafka 2.4

Confluent

DECEMBER 16, 2019

On behalf of the Apache Kafka® community, it is my pleasure to announce the release of Apache Kafka 2.4.0. This release includes a number of key new features and improvements […].

Kafka

Kafka Scala IT

The Good and the Bad of Apache Kafka Streaming Platform

AltexSoft

OCTOBER 21, 2022

Kafka can continue the list of brand names that became generic terms for the entire type of technology. In this article, we’ll explain why businesses choose Kafka and what problems they face when using it. In this article, we’ll explain why businesses choose Kafka and what problems they face when using it. What is Kafka?

Kafka

Kafka Hadoop Big Data ETL Tools

Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds

Cloudera

OCTOBER 12, 2020

This typically involved a lot of coding with Java, Scala or similar technologies. The DataFlow platform has established a leading position in the data streaming market by unlocking the combined value and synergies of Apache NiFi, Apache Kafka and Apache Flink.

Cloud

Cloud Process Scala Kafka

Ranking Websites in Real-time with Apache Kafka’s Streams API

Confluent

NOVEMBER 29, 2022

Learn how Zalando, Europe’s largest online fashion retailer, uses Apache Kafka and the Kafka Streams API with Scala on AWS for real-time fashion insights.

Scala

Scala Kafka Retail AWS

Reliable, Fast Access to On-Chain Data Insights

Confluent

JUNE 7, 2019

How we use Apache Kafka and the Confluent Platform. Apache Kafka ® is the central data hub of our company. At TokenAnalyst, we’re using Kafka for ingestion of blockchain data—which is directly pushed from our cluster of Bitcoin and Ethereum nodes—to different streams of transformation and loading processes.

Accessible

Accessible Accessibility Kafka Scala

Optimizing Kafka Clients: A Hands-On Guide

Rock the JVM

JANUARY 21, 2023

Introduction Apache Kafka is a well-known event streaming platform used in many organizations worldwide. The focus of this article is to provide a better understanding of how Kafka works under the hood to better design and tune your client applications. Environment Setup First, we want to have a Kafka Cluster up and running.

Kafka

Kafka Java Scala Coding

Accelerated integration of Eventador with Cloudera – SQL Stream Builder

Cloudera

MARCH 29, 2021

It offers a slick user interface for writing SQL queries to run against real-time data streams in Apache Kafka or Apache Flink. They no longer have to depend on any skilled Java or Scala developers to write special programs to gain access to such data streams. . SQL Stream Builder continuously runs SQL via Flink.

SQL

SQL Scala Manufacturing Java

Scala For Big Data Engineering – Why should you care?

Advancing Analytics: Data Engineering

APRIL 23, 2020

The thought of learning Scala fills many with fear, its very name often causes feelings of terror. The truth is Scala can be used for many things; from a simple web application to complex ML (Machine Learning). The name Scala stands for “scalable language.” So what companies are actually using Scala?

Scala

Scala Big Data Data Engineering Data Engineer

A Comprehensive Guide to Choosing the Best Scala Course

Rock the JVM

MAY 22, 2023

This article is all about choosing the right Scala course for your journey. How should I get started with Scala? Do you have any tips to learn Scala quickly? How to Learn Scala as a Beginner Scala is not necessarily aimed at first-time programmers. Which course should I take?

Scala

Scala Java Programming Language Programming

Fraud Detection With Cloudera Stream Processing Part 2: Real-Time Streaming Analytics

Cloudera

JULY 18, 2022

In this blog we will explore how we can use Apache Flink to get insights from data at a lightning-fast speed, and we will use Cloudera SQL Stream Builder GUI to easily create streaming jobs using only SQL language (no Java/Scala coding required). It provides flexible and expressive APIs for Java and Scala. Use case recap. Apache Flink.

Process

Process Kafka Scala SQL

The Alooma Data Pipeline With CTO Yair Weinberger - Episode 33

Data Engineering Podcast

MAY 27, 2018

Links Alooma Convert Media Data Integration ESB (Enterprise Service Bus) Tibco Mulesoft ETL (Extract, Transform, Load) Informatica Microsoft SSIS OLAP Cube S3 Azure Cloud Storage Snowflake DB Redshift BigQuery Salesforce Hubspot Zendesk Spark The Log: What every software engineer should know about real-time data’s unifying abstraction by Jay (..)

Data Pipeline

Data Pipeline MongoDB Google Cloud Scala

Stateful, Distributed Stream Processing on Flink with Fabian Hueske - Episode 57

Data Engineering Podcast

NOVEMBER 18, 2018

How does Flink compare to other streaming engines such as Spark, Kafka, Pulsar, and Storm? How does Flink compare to other streaming engines such as Spark, Kafka, Pulsar, and Storm? Can you start by describing what Flink is and how the project got started? What are some of the primary ways that Flink is used? How is Flink architected?

Process

Process Google Cloud Scala Kafka

Putting Apache Spark Into Action with Jean Georges Perrin - Episode 60

Data Engineering Podcast

DECEMBER 9, 2018

How does it compare to some of the other streaming frameworks such as Flink, Kafka, or Storm? How does it compare to some of the other streaming frameworks such as Flink, Kafka, or Storm? What are some of the problems that Spark is uniquely suited to address? Who uses Spark? What are the tools offered to Spark users? Who uses Spark?

MySQL

MySQL Scala Kafka Hadoop

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

In addition, AI data engineers should be familiar with programming languages such as Python , Java, Scala, and more for data pipeline, data lineage, and AI model development.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Databricks, Snowflake and the future

Christophe Blefari

JUNE 21, 2024

you could write the same pipeline in Java, in Scala, in Python, in SQL, etc.—with Here what Databricks brought this year: Spark 4.0 — (1) PySpark erases the differences with the Scala version, creating a first class experience for Python users. (2) Databricks sells a toolbox, you don't buy any UX. 3) Spark 4.0

Metadata

Metadata Data Warehouse BI MySQL

Data News — Week 23.02

Christophe Blefari

JANUARY 14, 2023

The history repeat, we've seen it with Scala, Go or even Julia at some scale. Analysis of Confluent buying Immerok — Jesse Anderson analyses last week news of Confluent (Kafka) buying Immerok (Flink) and what it implies in the real-time low-level technologies competition between Kafka / Flink / Spark.

Python

Python Kafka Data Scala

User Analytics In Depth At Heap with Dan Robinson - Episode 36

Data Engineering Podcast

JUNE 17, 2018

Kafka Scala Citus React MobX Redshift Heap SQL BigQuery Webhooks Drip Data Virtualization DNS PII SOC2 The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast Summary Web and mobile analytics are an important part of any business, and difficult to get right.

Scala

Scala Kafka SQL Architecture

Using SQL to democratize streaming data

Cloudera

MARCH 2, 2021

This data engineering skillset typically consists of Java or Scala programming skills mated with deep DevOps acumen. But as data streaming technologies like Apache Kafka and Apache Flink have evolved, only until recently have SQL interfaces become deeply integrated. A rare breed.

SQL

SQL Java Data Lake Scala

What is Streaming Analytics?

Cloudera

APRIL 20, 2021

The developers must understand lower-level languages like Java and Scala and be familiar with the streaming APIs. Streamings Messaging , powered by Apache Kafka, buffers and scales massive volumes of data streams for streaming analytics.

Kafka

Kafka Hospitality Retail Data Ingestion

A Recipe for Kafka Lag Monitoring

Zalando Engineering

DECEMBER 4, 2017

A closer look at the ingredients needed for ultimate stability This is part of a series of posts on Kafka. See Ranking Websites in Real-time with Apache Kafka’s Streams API for the first post in the series. Remora is a small application to track the monitoring of Kafka. Some use cloud infrastructure such as AWS Kinesis or SQS.

Kafka

Kafka Scala Java AWS

Kafka vs RabbitMQ - A Head-to-Head Comparison for 2023

ProjectPro

JULY 21, 2021

As a big data architect or a big data developer, when working with Microservices-based systems, you might often end up in a dilemma whether to use Apache Kafka or RabbitMQ for messaging. Rabbit MQ vs. Kafka - Which one is a better message broker? Table of Contents Kafka vs. RabbitMQ - An Overview What is RabbitMQ? What is Kafka?

Kafka

Kafka Big Data Java Architecture

Fundamentals of Apache Spark

Knowledge Hut

MAY 3, 2024

Spark offers over 80 high-level operators that make it easy to build parallel apps and one can use it interactively from the Scala, Python, R, and SQL shells. The core is the distributed execution engine and the Java, Scala, and Python APIs offer a platform for distributed ETL application development.

Hadoop

Hadoop Scala Healthcare Big Data

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Data Engineering Podcast

SEPTEMBER 11, 2022

Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java.

Data Pipeline

Data Pipeline Building MongoDB MySQL

An Exploration Of The Expectations, Ecosystem, and Realities Of Real-Time Data Applications

Data Engineering Podcast

AUGUST 21, 2022

Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java.

Lambda Architecture

Lambda Architecture MongoDB MySQL Scala

Turning Streams Into Data Products

Cloudera

JUNE 16, 2022

In 2015, Cloudera became one of the first vendors to provide enterprise support for Apache Kafka, which marked the genesis of the Cloudera Stream Processing (CSP) offering. Today, CSP is powered by Apache Flink and Kafka and provides a complete, enterprise-grade stream management and stateful processing solution. Who is affected?

Kafka

Kafka Manufacturing Data Lake SQL

Many-to-Many Relationships Using Kafka

Zalando Engineering

MAY 7, 2018

Real-time joins in event-driven microservices As discussed in my previous blog post , Kafka is one of the key components of our event-driven microservice architecture in Zalando’s Smart Product Platform. This is where Kafka API comes in handy! We use it for sequencing events and building an aggregated view of data hierarchies.

Kafka

Kafka Scala Java Media

A Candid Exploration Of Timeseries Data Analysis With InfluxDB

Data Engineering Podcast

JUNE 28, 2021

To help other people find the show please leave a review on iTunes and tell your friends and co-workers Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Links Influx Data Influx DB Search and Information Retrieval Datadog Podcast Episode New Relic StackDriver Scala Cassandra Redis KDB Latent Semantic Indexing TICK (..)

Data Analysis

Data Analysis Scala Data Warehouse Kafka

Low Code And High Quality Data Engineering For The Whole Organization With Prophecy

Data Engineering Podcast

JULY 16, 2021

__init__ Episode Kubernetes Operator Scala Kafka Abstract Syntax Tree Language Server Protocol Amazon Deequ dbt Tecton Podcast Episode Informatica The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Support Data Engineering Podcast

High Quality Data

High Quality Data Data Engineering Data Engineer Coding

Power Your Real-Time Analytics Without The Headache Using Fivetran's Change Data Capture Integrations

Data Engineering Podcast

SEPTEMBER 25, 2022

Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java.

Food

Food MongoDB MySQL Scala

Apache Kafka – Next Generation Distributed Messaging System

ProjectPro

JUNE 28, 2016

Apache Kafka is breaking barriers and eliminating the slow batch processing method that is used by Hadoop. This is just one of the reasons why Apache Kafka was developed in LinkedIn. Kafka was mainly developed to make working with Hadoop easier. Apache Kafka attempts to solve this issue. Where is Kafka heading to?

Kafka

Kafka Systems Hadoop Big Data

Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus

Data Engineering Podcast

AUGUST 6, 2022

Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java.

Machine Learning

Machine Learning Database MySQL MongoDB

Fraud Detection with Cloudera Stream Processing Part 1

Cloudera

JUNE 28, 2022

We discussed how Cloudera Stream Processing (CSP) with Apache Kafka and Apache Flink could be used to process this data in real time and at scale. If the fraud score is above a certain threshold, NiFi immediately routes the transaction to a Kafka topic that is subscribed by notification systems that will trigger the appropriate actions.

Process

Process Kafka SQL Machine Learning

Data Engineering Weekly #165

Data Engineering Weekly

MARCH 31, 2024

[link] Databricks: PySpark in 2023 - A Year in Review Can we safely say PySpark killed Scala-based data pipelines? I’m looking forward to playing around with Testing API and Arrow-optimized UDF since UDF is the only reason I write Scala nowadays. The blog is an excellent overview of all the improvements made to PySpark in 2023.

Data Engineering

Data Engineering Data Engineer Engineering Scala

Top 12 Data Engineering Project Ideas [With Source Code]

Knowledge Hut

JUNE 26, 2023

Stock and Twitter Data Extraction Using Python, Kafka, and Spark Project Overview: The rising and falling of GameStop's stock price and the proliferation of cryptocurrency exchanges have made stocks a topic of widespread attention. Source Code: Stock and Twitter Data Extraction Using Python, Kafka, and Spark 2.

Data Engineering

Data Engineering Data Engineer Coding Project

Data Architect: Role Description, Skills, Certifications and When to Hire

AltexSoft

FEBRUARY 11, 2023

It serves as a foundation for the entire data management strategy and consists of multiple components including data pipelines; , on-premises and cloud storage facilities – data lakes , data warehouses , data hubs ;, data streaming and Big Data analytics solutions ( Hadoop , Spark , Kafka , etc.);

Data Architect

Data Architect Certification Generalist Big Data

A Detailed Guide of Interview Questions on Apache Kafka

Getting Started with Scala and Apache Kafka

Webinars

Trending Sources

How to Tune RocksDB for Your Kafka Streams Application

Webinars

What’s New in Apache Kafka 2.5

12 Programming Languages Walk into a Kafka Cluster…

Apache Kafka Vs Apache Spark: Know the Differences

Bust the Burglars – Machine Learning with TensorFlow and Apache Kafka

The Rise of Managed Services for Apache Kafka

Scala In Demand Technologies Built On Scala

What’s New in Apache Kafka 2.4

The Good and the Bad of Apache Kafka Streaming Platform

Cloudera acquires Eventador to accelerate Stream Processing in Public & Hybrid Clouds

Ranking Websites in Real-time with Apache Kafka’s Streams API

Reliable, Fast Access to On-Chain Data Insights

Optimizing Kafka Clients: A Hands-On Guide

Accelerated integration of Eventador with Cloudera – SQL Stream Builder

Scala For Big Data Engineering – Why should you care?

A Comprehensive Guide to Choosing the Best Scala Course

Fraud Detection With Cloudera Stream Processing Part 2: Real-Time Streaming Analytics

The Alooma Data Pipeline With CTO Yair Weinberger - Episode 33

Stateful, Distributed Stream Processing on Flink with Fabian Hueske - Episode 57

Putting Apache Spark Into Action with Jean Georges Perrin - Episode 60

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Databricks, Snowflake and the future

Data News — Week 23.02

User Analytics In Depth At Heap with Dan Robinson - Episode 36

Using SQL to democratize streaming data

What is Streaming Analytics?

A Recipe for Kafka Lag Monitoring

Kafka vs RabbitMQ - A Head-to-Head Comparison for 2023

Fundamentals of Apache Spark

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

An Exploration Of The Expectations, Ecosystem, and Realities Of Real-Time Data Applications

Turning Streams Into Data Products

Many-to-Many Relationships Using Kafka

A Candid Exploration Of Timeseries Data Analysis With InfluxDB

Low Code And High Quality Data Engineering For The Whole Organization With Prophecy

Power Your Real-Time Analytics Without The Headache Using Fivetran's Change Data Capture Integrations

Apache Kafka – Next Generation Distributed Messaging System

Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus

Fraud Detection with Cloudera Stream Processing Part 1

Data Engineering Weekly #165

Top 12 Data Engineering Project Ideas [With Source Code]

Data Architect: Role Description, Skills, Certifications and When to Hire

Stay Connected