Database and Kafka - Data Engineering Digest

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data.

Kafka

Kafka Data Lake High Quality Data SQL

Kafka to MongoDB: Building a Streamlined Data Pipeline

Analytics Vidhya

FEBRUARY 28, 2024

We know that streaming data is data that is emitted at high volume […] The post Kafka to MongoDB: Building a Streamlined Data Pipeline appeared first on Analytics Vidhya. IT industries rely heavily on real-time insights derived from streaming data sources.

MongoDB

MongoDB Data Pipeline Kafka Building

Keeping Multiple Databases in Sync Using Kafka Connect and CDC

Confluent

SEPTEMBER 20, 2022

Learn how Kafka Connect and CDC provide real-time database synchronization, bridging data silos between all microservice applications. Microservices have numerous benefits, but data silos are incredibly challenging.

Kafka

Kafka Database Data

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Beyond Kafka: Conversation with Jark Wu on Fluss - Streaming Storage for Real-Time Analytics

Data Engineering Weekly

FEBRUARY 18, 2025

It addresses many of Kafka's challenges in analytical infrastructure. The combination of Kafka and Flink is not a perfect fit for real-time analytics; the integration of Kafka and Lakehouse is very shallow. How do you compare Fluss with Apache Kafka? Fluss and Kafka differ fundamentally in design principles.

Kafka

Kafka Lambda Architecture SQL Data Lake

100+ Kafka Interview Questions and Answers for 2025

ProjectPro

JUNE 6, 2025

Your search for Apache Kafka interview questions ends right here! Let us now dive directly into the Apache Kafka interview questions and answers and help you get started with your Big Data interview preparation! What are topics in Apache Kafka? A stream of messages that belong to a particular category is called a topic in Kafka.

Kafka

Kafka Bytes Big Data Java

How to Use Apache Kafka for Real-Time Data Streaming?

ProjectPro

JUNE 6, 2025

If you’re looking for everything a beginner needs to know about using Apache Kafka for real-time data streaming, you’ve come to the right place. This blog post explores the basics about Apache Kafka and its uses, the benefits of utilizing real-time data streaming, and how to set up your data pipeline. Let's dive in.

Kafka

Kafka Hadoop Big Data Data Warehouse

Top Apache Kafka Certifications for Data Professionals

ProjectPro

JUNE 6, 2025

Today, Kafka is used by thousands of companies, including over 80% of the Fortune 100. Kafka's popularity is skyrocketing, and for good reason—it helps organizations manage real-time data streams and build scalable data architectures. As a result, there's a growing demand for highly skilled professionals in Kafka.

Kafka

Kafka Certification AWS Retail

How to Get Started with Kafka Topics : A Beginner's Guide

ProjectPro

JUNE 6, 2025

Kafka Topics are your trusty companions. Learn how Kafka Topics simplify the complex world of big data processing in this comprehensive blog. More than 80% of all Fortune 100 companies trust, and use Kafka. Apache Kafka The meteoric rise of Apache Kafka's popularity is no accident, as it plays a crucial role in data engineering.

Kafka

Kafka Big Data Python Java

How To Learn Apache Kafka By Doing in 2025

ProjectPro

JUNE 6, 2025

Looking for the ultimate guide on mastering Apache Kafka in 2024? The ultimate hands-on learning guide with secrets on how you can learn Kafka by doing. Discover the key resources to help you master the art of real-time data streaming and building robust data pipelines with Apache Kafka. How Difficult Is It To Learn Kafka?

Kafka

Kafka Java Big Data Data Pipeline

Stream Rows and Kafka Topics Directly into Snowflake with Snowpipe Streaming

Snowflake

MARCH 2, 2023

The volume of data generated in real time from application databases, sensors, and mobile devices continues to grow exponentially. As part of this, we are also supporting Snowpipe Streaming as an ingestion method for our Snowflake Connector for Kafka. How does Snowpipe Streaming work?

Kafka

Kafka Data Ingestion Data Pipeline Cloud Storage

Change Data Capture at Pinterest

Pinterest Engineering

NOVEMBER 18, 2024

Change Data Capture (CDC) is a crucial technology that enables organizations to efficiently track and capture changes in their databases. In this blog post, we’ll explore what CDC is, why it’s important, and our journey of implementing Generic CDC solutions for all online databases at Pinterest. What is Change Data Capture?

Kafka

Kafka MySQL Database Software Engineering

Building a Formula 1 Streaming Data Pipeline With Kafka and Risingwave

KDnuggets

SEPTEMBER 5, 2023

Build a streaming data pipeline using Formula 1 data, Python, Kafka, RisingWave as the streaming database, and visualize all the real-time data in Grafana.

Data Pipeline

Data Pipeline Kafka Building Python

AWS Kafka: Your Go-to Solution for Real-Time Data Streaming

ProjectPro

JUNE 6, 2025

Explore the full potential of AWS Kafka with this ultimate guide. Elevate your data processing skills with Amazon Managed Streaming for Apache Kafka, making real-time data streaming a breeze. According to IDC , the worldwide streaming market for event-streaming software, such as Kafka, is likely to reach $5.3

Kafka

Kafka AWS Amazon Web Services Data Pipeline

How To Choose Right AWS Databases for Your Needs

ProjectPro

JUNE 6, 2025

Explore the world of data analytics with the top AWS databases! Check out this blog to discover your ideal database and uncover the power of scalable and efficient solutions for all your data analytical requirements. Let’s understand more about AWS Databases in the following section.

AWS

AWS Database Amazon Web Services MySQL

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

Unify transactional and analytical workloads in Snowflake for greater simplicity Many businesses must maintain two separate databases: one to handle transactional workloads and another for analytical workloads.

Data Architecture

Data Architecture Architecture Data Lake Kafka

Setting Up Kafka Multi-Tenancy

DoorDash Engineering

MARCH 27, 2024

At DoorDash, we rely on message queue systems based on Kafka to handle billions of real-time events. We will delve here into how we set up multi-tenancy with a messaging queue system based on Kafka. While we have achieved this in databases, it also needs to be extended to other infrastructure components.

Kafka

Kafka Architecture Algorithm Systems

Improving Efficiency Of Goku Time Series Database at Pinterest (Part?—?1)

Pinterest Engineering

NOVEMBER 22, 2023

Goku is our in-house time series database providing cost efficient and low latency storage for metrics data. From these kafka topics, an ingestion service would consume the data points and push them into the GokuS cluster(s) with a retry mechanism (via a separate kafka + small ingestion service) to handle failure.

Database

Database Bytes Kafka Architecture

HBase vs Cassandra-The Battle of the Best NoSQL Databases

ProjectPro

JUNE 6, 2025

NoSQL databases are the new-age solutions to distributed unstructured data storage and processing. The speed, scalability, and fail-over safety offered by NoSQL databases are needed in the current times in the wake of Big Data Analytics and Data Science technologies. The databases are run on a single instance of 2VCPUs and 8GP memory.

NoSQL

NoSQL Database Hadoop Big Data

Combining CDC Transactional Messages Using Kafka Streams

Confluent

FEBRUARY 23, 2023

How to use Kafka Streams to aggregate change data capture (CDC) messages from a relational database into transactional messages, powering a scalable microservices architecture.

Kafka

Kafka Relational Database Architecture Database

Analysing Changes with Debezium and Kafka Streams

Confluent

JULY 29, 2020

Change Data Capture (CDC) is an excellent way to introduce streaming analytics into your existing database, and using Debezium enables you to send your change data through Apache Kafka®. Although […].

Kafka

Kafka Database MongoDB Data

Data Engineering Weekly #221

Data Engineering Weekly

MAY 25, 2025

link] Gunnar Morling: What If We Could Rebuild Kafka From Scratch? KIP-1150 ("Diskless Kafka") is one of my most anticipated releases from Apache Kafka. The blog is an excellent compilation of types of query engines on top of the lakehouse, its internal architecture, and benchmarking against various categories.

Data Engineer

Data Engineer Data Engineering Engineering PostgreSQL

Stream Processing with Python, Kafka & Faust

Towards Data Science

FEBRUARY 18, 2024

Although the Faust library aims to bring Kafka Streaming ideas into the Python ecosystem, it may pose challenges in terms of ease of use. Traditional databases are ill-suited for storing events in high throughput event streams. This document serves as a tutorial and offers best practices for effectively utilizing Faust.

Kafka

Kafka Python Process Google Cloud

Kafka Streams Interactive Queries Go Prime Time

Confluent

MAY 18, 2020

What is stopping you from using Kafka Streams as your data layer for building applications? After all, it comes with fast, embedded RocksDB storage, takes care of redundancy for you, […].

Kafka

Kafka Building IT Database

30+ Data Engineering Projects for Beginners in 2025

ProjectPro

JUNE 6, 2025

In 2024, the data engineering job market is flourishing, with roles like database administrators and architects projected to grow by 8% and salaries averaging $153,000 annually in the US (as per Glassdoor ). Use Kafka for real-time data ingestion, preprocess with Apache Spark, and store data in Snowflake.

Data Engineer

Data Engineer Data Engineering Project Engineering

API-First Approach to Kafka Topic Creation

DoorDash Engineering

DECEMBER 5, 2023

DoorDash’s Engineering teams revamped Kafka Topic creation by replacing a Terraform/Atlantis based approach with an in-house API, Infra Service. DoorDash’s Real-Time Streaming Platform, or RTSP, team is under the Data Platform organization and manages over 2,500 Kafka Topics across five clusters.

Kafka

Kafka Programming Language Metadata Architecture

Apache Kafka Vs Apache Spark: Know the Differences

Knowledge Hut

MAY 3, 2024

Spark Streaming Vs Kafka Stream Now that we have understood high level what these tools mean, it’s obvious to have curiosity around differences between both the tools. Spark Streaming Kafka Streams 1 Data received from live input data streams is Divided into Micro-batched for processing. 6 Spark streaming is a standalone framework.

Kafka

Kafka Scala Java Amazon Web Services

End-to-End Data Engineering System on Real Data with Kafka, Spark, Airflow, Postgres, and Docker

Towards Data Science

FEBRUARY 9, 2024

This involves getting data from an API and storing it in a PostgreSQL database. In the second phase, we’ll develop an application that uses a language model to interact with this database. The second article, which will come later, will delve into creating agents using tools like LangChain to communicate with external databases.

Kafka

Kafka Data Engineer Data Engineering PostgreSQL

Debezium vs Kafka Connect Simplified: 3 Critical Differences

Hevo

APRIL 24, 2025

Based on a report, Apache Kafka stores and streams more than 7 trillion real-time messages per day. To eradicate such complexities, you can use database connecting tools like Debezium and Kafka […]

Kafka

Kafka Database Coding Process

Top 10 Data Engineering Tools You Must Learn in 2025

ProjectPro

JUNE 6, 2025

Data ingestion systems such as Kafka , for example, offer a seamless and quick data ingestion process while also allowing data engineers to locate appropriate data sources, analyze them, and ingest data for further processing. Database tools/frameworks like SQL, NoSQL , etc.,

Data Engineer

Data Engineer Data Engineering Engineering Kafka

The Good and the Bad of Apache Kafka Streaming Platform

AltexSoft

OCTOBER 21, 2022

Kafka can continue the list of brand names that became generic terms for the entire type of technology. In this article, we’ll explain why businesses choose Kafka and what problems they face when using it. In this article, we’ll explain why businesses choose Kafka and what problems they face when using it. What is Kafka?

Kafka

Kafka Hadoop ETL Tools Java

Data Ingestion-The Key to a Successful Data Engineering Project

ProjectPro

JUNE 6, 2025

It is the process of consuming data from multiple sources and transferring it into a destination database or data warehouse where you can perform data transformations and analytics. Prepare for Your Next Big Data Job Interview with Kafka Interview Questions and Answers Types of Data Ingestion 1.

Data Ingestion

Data Ingestion Data Engineer Data Engineering Project

Monte Carlo Announces Support for Kafka and Vector Databases at IMPACT 2023

Monte Carlo

NOVEMBER 8, 2023

Kafka and Vector Database support According to Databricks’ State of Data and AI report , the number of companies using SaaS LLM APIs has grown more than 1300% since November 2022 with a nearly 411% increase in the number of AI models put into production during that same period.

Kafka

Kafka Database High Quality Data Cloud

Cloudera Operational Database application development concepts

Cloudera

FEBRUARY 9, 2021

Cloudera Operational Database is now available in three different form-factors in Cloudera Data Platform (CDP). . If you are new to Cloudera Operational Database, see this blog post. In this blog post, we’ll look at both Apache HBase and Apache Phoenix concepts relevant to developing applications for Cloudera Operational Database.

Database

Database Java Data Ingestion SQL

10+ Top Data Pipeline Tools to Streamline Your Data Journey

ProjectPro

JUNE 6, 2025

Data pipelines streamline the movement and transformation of data from various sources to a destination, typically a database or data warehouse. Choose a tool that integrates with existing data sources, storage systems, and analytics platforms, supporting popular databases and formats. How Do Data Pipelines Work?

Data Pipeline

Data Pipeline Google Cloud Kafka AWS

Setting The Stage For The Next Chapter Of The Cassandra Database

Data Engineering Podcast

SEPTEMBER 12, 2021

Summary The Cassandra database is one of the first open source options for globally scalable storage systems. The community recently released a new major version that marks a milestone in its maturity and stability as a project and database. Since its introduction in 2008 it has been powering systems at every scale.

Database

Database Kafka Metadata Data Storage

Data News — Week 24.11

Christophe Blefari

MARCH 15, 2024

Postgres creator launches DBOS, a transactional serverless computing platform — Mike sees DBOS like a cloud-native OS that runs on-top of the database in order to rethink application development and deployment. Unlocking Kafka's potential: tackling tail latency with eBPF.

Metadata

Metadata Software Engineering Software Engineer Data Warehouse

Scaling Analysis of Connected Data And Modeling Complex Relationships With The TigerGraph Graph Database

Data Engineering Podcast

MAY 8, 2022

TigerGraph is a leading database that offers a highly scalable and performant native graph engine for powering graph analytics and machine learning. How has the ecosystem of graph databases changed in usage and design in recent years? Start trusting your data with Monte Carlo today! Visit [link] to learn more.

Database

Database Data Lake BI Kafka

Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus

Data Engineering Podcast

AUGUST 6, 2022

For machine learning applications relational models require additional processing to be directly useful, which is why there has been a growth in the use of vector databases. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services.

Machine Learning

Machine Learning Database MySQL MongoDB

An IBM Z Data Integration Success Story

Precisely

MARCH 28, 2025

Some departments used IBM Db2, while others relied on VSAM files or IMS databases creating complex data governance processes and costly data pipeline maintenance. With near real-time data synchronization, the solution ensures that databases stay in sync for reporting, analytics, and data warehousing.

Data Integration

Data Integration Pipeline-centric Database-centric Kafka

IBM Technology Chooses Cloudera as its Preferred Partner for Addressing Real Time Data Movement Using Kafka

Cloudera

SEPTEMBER 26, 2023

The post IBM Technology Chooses Cloudera as its Preferred Partner for Addressing Real Time Data Movement Using Kafka appeared first on Cloudera Blog. Learn more about how you can benefit from a well-supported data management platform and ecosystem of products, services and support by visiting the IBM and Cloudera partnership page.

Kafka

Kafka Technology IT Government

Gartner® Magic Quadrant™ for Cloud Database Report Recognizes Cloudera as a Visionary

Cloudera

JANUARY 19, 2022

Gartner® recognized Cloudera in three recent reports – Magic Quadrant for Cloud Database Management Systems (DBMS), Critical Capabilities for Cloud Database Management Systems for Analytical Use Cases and Critical Capabilities for Cloud Database Management Systems for Operational Use Cases. Get started with CDP.

Database

Database Cloud Data Warehouse Data Lake

Analysis of Confluent Buying Immerok

Jesse Anderson

JANUARY 9, 2023

I’ve always been vocal about ksqlDB’s and Kafka Stream’s limitations. The Future of ksqlDB and Kafka Streams With this announcement, the future of primarily ksqlDB and, to a lesser extent, Kafka Streams comes into view. Since Kafka Streams is part of the Apache project, I don’t see it going away as quickly.

Kafka

Kafka Technology Coding SQL

Upgrade Journey: The Path from CDH to CDP Private Cloud

Cloudera

SEPTEMBER 28, 2020

The customer also wanted to utilize the new features in CDP PvC Base like Apache Ranger for dynamic policies, Apache Atlas for lineage, comprehensive Kafka streaming services and Hive 3 features that are not available in legacy CDH versions. Support Kafka connectivity to HDFS, AWS S3 and Kafka Streams. Kafka, SRM, SMM.

Cloud

Cloud Kafka Professional Services Metadata

How Real-Time Stream Processing Works with ksqlDB, Animated

Confluent

SEPTEMBER 29, 2020

ksqlDB, the event streaming database, is becoming one of the most popular ways to work with Apache Kafka®. Every day, we answer many questions about the project, but here’s a […].

Process

Process Kafka Database Project

Troubleshooting Kafka In Production

Kafka to MongoDB: Building a Streamlined Data Pipeline

Webinars

Trending Sources

Keeping Multiple Databases in Sync Using Kafka Connect and CDC

Webinars

Beyond Kafka: Conversation with Jark Wu on Fluss - Streaming Storage for Real-Time Analytics

100+ Kafka Interview Questions and Answers for 2025

How to Use Apache Kafka for Real-Time Data Streaming?

Top Apache Kafka Certifications for Data Professionals

How to Get Started with Kafka Topics : A Beginner's Guide

How To Learn Apache Kafka By Doing in 2025

Stream Rows and Kafka Topics Directly into Snowflake with Snowpipe Streaming

Change Data Capture at Pinterest

Building a Formula 1 Streaming Data Pipeline With Kafka and Risingwave

AWS Kafka: Your Go-to Solution for Real-Time Data Streaming

How To Choose Right AWS Databases for Your Needs

Simplifying Data Architecture and Security to Accelerate Value

Setting Up Kafka Multi-Tenancy

Improving Efficiency Of Goku Time Series Database at Pinterest (Part?—?1)

HBase vs Cassandra-The Battle of the Best NoSQL Databases

Combining CDC Transactional Messages Using Kafka Streams

Analysing Changes with Debezium and Kafka Streams

Data Engineering Weekly #221

Stream Processing with Python, Kafka & Faust

Kafka Streams Interactive Queries Go Prime Time

30+ Data Engineering Projects for Beginners in 2025

API-First Approach to Kafka Topic Creation

Apache Kafka Vs Apache Spark: Know the Differences

End-to-End Data Engineering System on Real Data with Kafka, Spark, Airflow, Postgres, and Docker

Debezium vs Kafka Connect Simplified: 3 Critical Differences

Top 10 Data Engineering Tools You Must Learn in 2025

The Good and the Bad of Apache Kafka Streaming Platform

Data Ingestion-The Key to a Successful Data Engineering Project

Monte Carlo Announces Support for Kafka and Vector Databases at IMPACT 2023

Cloudera Operational Database application development concepts

10+ Top Data Pipeline Tools to Streamline Your Data Journey

Setting The Stage For The Next Chapter Of The Cassandra Database

Data News — Week 24.11

Scaling Analysis of Connected Data And Modeling Complex Relationships With The TigerGraph Graph Database

Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus

An IBM Z Data Integration Success Story

IBM Technology Chooses Cloudera as its Preferred Partner for Addressing Real Time Data Movement Using Kafka

Gartner® Magic Quadrant™ for Cloud Database Report Recognizes Cloudera as a Visionary

Analysis of Confluent Buying Immerok

Upgrade Journey: The Path from CDH to CDP Private Cloud

How Real-Time Stream Processing Works with ksqlDB, Animated

Stay Connected