Data Pipeline and Kafka - Data Engineering Digest

Kafka to MongoDB: Building a Streamlined Data Pipeline

Analytics Vidhya

FEBRUARY 28, 2024

Handling and processing the streaming data is the hardest work for Data Analysis. We know that streaming data is data that is emitted at high volume […] The post Kafka to MongoDB: Building a Streamlined Data Pipeline appeared first on Analytics Vidhya.

MongoDB

MongoDB Data Pipeline Kafka Building

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. Can you describe your experiences with Kafka?

Kafka

Kafka Data Lake High Quality Data SQL

Building a Formula 1 Streaming Data Pipeline With Kafka and Risingwave

KDnuggets

SEPTEMBER 5, 2023

Build a streaming data pipeline using Formula 1 data, Python, Kafka, RisingWave as the streaming database, and visualize all the real-time data in Grafana.

Data Pipeline

Data Pipeline Kafka Building Python

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Beyond Kafka: Conversation with Jark Wu on Fluss - Streaming Storage for Real-Time Analytics

Data Engineering Weekly

FEBRUARY 18, 2025

It addresses many of Kafka's challenges in analytical infrastructure. The combination of Kafka and Flink is not a perfect fit for real-time analytics; the integration of Kafka and Lakehouse is very shallow. How do you compare Fluss with Apache Kafka? Fluss and Kafka differ fundamentally in design principles.

Kafka

Kafka Lambda Architecture SQL Data Lake

10+ Top Data Pipeline Tools to Streamline Your Data Journey

ProjectPro

JUNE 6, 2025

It requires a skillful blend of data engineering expertise and the strategic use of tools designed to streamline this process. That’s where data pipeline tools come in. This blog is all about that—specifically, the top 10 data pipeline tools that data engineers worldwide rely on.

Data Pipeline

Data Pipeline Google Cloud AWS Kafka

Data Pipeline- Definition, Architecture, Examples, and Use Cases

ProjectPro

JUNE 6, 2025

Data pipelines are a significant part of the big data domain, and every professional working or willing to work in this field must have extensive knowledge of them. Table of Contents What is a Data Pipeline? The Importance of a Data Pipeline What is an ETL Data Pipeline?

Data Pipeline

Data Pipeline Architecture Kafka Data Lake

8 Essential Data Pipeline Design Patterns You Should Know

Monte Carlo

NOVEMBER 21, 2024

Whether it’s customer transactions, IoT sensor readings, or just an endless stream of social media hot takes, you need a reliable way to get that data from point A to point B while doing something clever with it along the way. That’s where data pipeline design patterns come in. Data Mesh Pattern 8.

Data Pipeline

Data Pipeline Designing Lambda Architecture Kafka

Stream Rows and Kafka Topics Directly into Snowflake with Snowpipe Streaming

Snowflake

MARCH 2, 2023

Snowflake enables organizations to be data-driven by offering an expansive set of features for creating performant, scalable, and reliable data pipelines that feed dashboards, machine learning models, and applications. But before data can be transformed and served or shared, it must be ingested from source systems.

Kafka

Kafka Data Ingestion Data Pipeline Cloud Storage

Declarative Data Pipelines with Hoptimator

LinkedIn Engineering

JUNE 26, 2023

For example, developers can provision Kafka topics, Espresso tables, Venice stores and more via Nuage , our internal cloud-like infra management platform. Data pipelines power foundational parts of LinkedIn's infrastructure, including replication between data centers.

Data Pipeline

Data Pipeline Kafka MySQL SQL

Top Apache Kafka Certifications for Data Professionals

ProjectPro

JUNE 6, 2025

Today, Kafka is used by thousands of companies, including over 80% of the Fortune 100. Kafka's popularity is skyrocketing, and for good reason—it helps organizations manage real-time data streams and build scalable data architectures. As a result, there's a growing demand for highly skilled professionals in Kafka.

Kafka

Kafka Certification AWS Retail

How To Learn Apache Kafka By Doing in 2025

ProjectPro

JUNE 6, 2025

Looking for the ultimate guide on mastering Apache Kafka in 2024? The ultimate hands-on learning guide with secrets on how you can learn Kafka by doing. Discover the key resources to help you master the art of real-time data streaming and building robust data pipelines with Apache Kafka. Here it is!

Kafka

Kafka Java Big Data Data Pipeline

Realtime Data Applications Made Easier With Meroxa

Data Engineering Podcast

APRIL 23, 2023

Rudderstack]([link] RudderStack provides all your customer data pipelines in one platform. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines. Rudderstack]([link] RudderStack provides all your customer data pipelines in one platform.

Data Lake

Data Lake Kafka Machine Learning Data Warehouse

AWS Kafka: Your Go-to Solution for Real-Time Data Streaming

ProjectPro

JUNE 6, 2025

Explore the full potential of AWS Kafka with this ultimate guide. Elevate your data processing skills with Amazon Managed Streaming for Apache Kafka, making real-time data streaming a breeze. According to IDC , the worldwide streaming market for event-streaming software, such as Kafka, is likely to reach $5.3

Kafka

Kafka AWS Amazon Web Services Data Pipeline

100+ Kafka Interview Questions and Answers for 2025

ProjectPro

JUNE 6, 2025

Your search for Apache Kafka interview questions ends right here! Let us now dive directly into the Apache Kafka interview questions and answers and help you get started with your Big Data interview preparation! What are topics in Apache Kafka? Kafka stores data in topics that are split into partitions.

Kafka

Kafka Bytes Big Data Java

How to Use Apache Kafka for Real-Time Data Streaming?

ProjectPro

JUNE 6, 2025

If you’re looking for everything a beginner needs to know about using Apache Kafka for real-time data streaming, you’ve come to the right place. This blog post explores the basics about Apache Kafka and its uses, the benefits of utilizing real-time data streaming, and how to set up your data pipeline.

Kafka

Kafka Hadoop Big Data Data Warehouse

Drafting Your Data Pipelines

Team Data Science

MAY 10, 2020

Kafka, while not in the top 5 most in demand skills, was still the most requested buffer technology requested which makes it worthwhile to include it. I'll use Python and Spark because they are the top 2 requested skills in Toronto. The remaining tech (stages 3, 4, 7 and 8) are all AWS technologies.

Data Pipeline

Data Pipeline Data Ingestion Kafka AWS

How to Get Started with Kafka Topics : A Beginner's Guide

ProjectPro

JUNE 6, 2025

Taming the torrent of data pouring into your systems can be daunting. Kafka Topics are your trusty companions. Learn how Kafka Topics simplify the complex world of big data processing in this comprehensive blog. More than 80% of all Fortune 100 companies trust, and use Kafka. Table of Contents What is Kafka Topic?

Kafka

Kafka Big Data Python Java

Deploying Data Pipelines using the Saga pattern

Picnic Engineering

FEBRUARY 8, 2023

In our previous blog, Dima Kalashnikov explained how we configure our Internal services pipeline in the Analytics Platform. In this post, we will explain how our team automates the creation of new data pipeline deployments. Now, we can have a pipeline ready in minutes.

Data Pipeline

Data Pipeline Kafka Data Architecture

Introducing Stream Designer: The Visual Builder for Streaming Data Pipelines

Confluent

OCTOBER 4, 2022

Confluent’s new Stream Designer is the industry’s first visual interface for rapidly building, testing, and deploying streaming data pipelines natively on Apache Kafka.

Data Pipeline

Data Pipeline Designing Kafka Data

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Data Engineering Podcast

SEPTEMBER 11, 2022

Building reliable data pipelines is a complex and costly undertaking with many layered requirements. In order to reduce the amount of time and effort required to build pipelines that power critical insights Manish Jethani co-founded Hevo Data. Data stacks are becoming more and more complex.

Data Pipeline

Data Pipeline Building MongoDB MySQL

Streaming Data Pipelines: What Are They and How to Build One

Precisely

DECEMBER 28, 2023

Business success is based on how we use continuously changing data. That’s where streaming data pipelines come into play. This article explores what streaming data pipelines are, how they work, and how to build this data pipeline architecture. What is a streaming data pipeline?

Data Pipeline

Data Pipeline Building Kafka NoSQL

Spring for Apache Kafka Deep Dive – Part 3: Apache Kafka and Spring Cloud Data Flow

Confluent

MAY 30, 2019

Following part 1 and part 2 of the Spring for Apache Kafka Deep Dive blog series, here in part 3 we will discuss another project from the Spring team: Spring Cloud Data Flow , which focuses on enabling developers to easily develop, deploy, and orchestrate event streaming pipelines based on Apache Kafka ®.

Kafka

Kafka Cloud Data Pipeline PostgreSQL

Streaming Data from the Universe with Apache Kafka

Confluent

JUNE 13, 2019

This data pipeline is a great example of a use case for Apache Kafka ®. The data processing pipeline characterizes these objects, deriving key parameters such as brightness, color, ellipticity, and coordinate location, and broadcasts this information in alert packets. The case for Apache Kafka.

Kafka

Kafka Bytes Data Pipeline Python

Serverless Data Pipelines On DataCoral

Data Engineering Podcast

APRIL 7, 2019

Summary How much time do you spend maintaining your data pipeline? Contact Info LinkedIn Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? How much end user value does that provide? How much end user value does that provide? Links Datacoral Yahoo!

Data Pipeline

Data Pipeline Pipeline-centric Database-centric AWS

Streaming Data Pipelines Made SQL With Decodable

Data Engineering Podcast

OCTOBER 28, 2021

In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Get started for free at dataengineeringpodcast.com/hightouch. Can you describe what Decodable is and the story behind it?

Data Pipeline

Data Pipeline SQL Data Warehouse Data Lake

Easier Stream Processing On Kafka With ksqlDB

Data Engineering Podcast

MARCH 2, 2020

The ksqlDB project was created to address this state of affairs by building a unified layer on top of the Kafka ecosystem for stream processing. Developers can work with the SQL constructs that they are familiar with while automatically getting the durability and reliability that Kafka offers. How is ksqlDB architected?

Kafka

Kafka Process PostgreSQL MySQL

Build Hybrid Data Pipelines and Enable Universal Connectivity With CDF-PC Inbound Connections

Cloudera

JUNE 17, 2022

In this third installment of the Universal Data Distribution blog series, we will take a closer look at how CDF-PC’s new Inbound Connections feature enables universal application connectivity and allows you to build hybrid data pipelines that span the edge, your data center, and one or more public clouds.

Data Pipeline

Data Pipeline Building Kafka Java

30+ Data Engineering Projects for Beginners in 2025

ProjectPro

JUNE 6, 2025

Project Idea : Use the StatsBomb Open Data to study player and team performances. Build a data pipeline to ingest player and match data, clean it for inconsistencies, and transform it for analysis. Load raw data into Google Cloud Storage, preprocess it using Mage VM, and store results in BigQuery.

Data Engineer

Data Engineer Data Engineering Project Engineering

Why I Can’t Wait for Kafka Summit San Francisco

Confluent

JULY 23, 2019

The Kafka Summit Program Committee recently published the schedule for the San Francisco event, and there’s quite a bit to look forward to. I remember two to three years back, I spent all my time listening to talks about various ETL architectures in the Pipelines track. Interests evolve over time too. What’s the Time?…and

Kafka

Kafka Hadoop Software Engineer Software Engineering

Data News — Week 24.11

Christophe Blefari

MARCH 15, 2024

Unlocking Kafka's potential: tackling tail latency with eBPF. Forward thinking Dataviz is hierarchical — Malloy, once again, provides an excellent article about a new way to see data visualisations. Coding data pipelines is faster than renting connector catalogs — This is something I've always believed.

Metadata

Metadata Software Engineer Software Engineering Data Warehouse

Top 10 Data Engineering Tools You Must Learn in 2025

ProjectPro

JUNE 6, 2025

Data engineers manage that massive amount of data using various data engineering tools, frameworks, and technologies. Data engineering tools are specialized applications that make building data pipelines and designing algorithms easier and more efficient.

Data Engineer

Data Engineer Data Engineering Engineering Kafka

Apache Kafka Vs Apache Spark: Know the Differences

Knowledge Hut

MAY 3, 2024

Spark Streaming Vs Kafka Stream Now that we have understood high level what these tools mean, it’s obvious to have curiosity around differences between both the tools. Spark Streaming Kafka Streams 1 Data received from live input data streams is Divided into Micro-batched for processing.

Kafka

Kafka Scala Java Amazon Web Services

End-to-End Data Engineering System on Real Data with Kafka, Spark, Airflow, Postgres, and Docker

Towards Data Science

FEBRUARY 9, 2024

The first phase focuses on building a data pipeline. This involves getting data from an API and storing it in a PostgreSQL database. Overview Let’s break down the data pipeline process step-by-step: Data Streaming: Initially, data is streamed from the API into a Kafka topic.

Kafka

Kafka Data Engineer Data Engineering PostgreSQL

A Data Engineer’s Guide To Real-time Data Ingestion

ProjectPro

JUNE 6, 2025

Data Collection The first step is to collect real-time data (purchase_data) from various sources, such as sensors, IoT devices, and web applications, using data collectors or agents. These collectors send the data to a central location, typically a message broker like Kafka.

Data Ingestion

Data Ingestion Kafka Google Cloud AWS

An IBM Z Data Integration Success Story

Precisely

MARCH 28, 2025

The data generated was as varied as the departments relying on these applications. Some departments used IBM Db2, while others relied on VSAM files or IMS databases creating complex data governance processes and costly data pipeline maintenance. They chose the Precisely Data Integrity Suites Data Integration Service.

Data Integration

Data Integration Pipeline-centric Database-centric Kafka

Data Ingestion-The Key to a Successful Data Engineering Project

ProjectPro

JUNE 6, 2025

Table of Contents What is Data Ingestion in a Data Engineering Project? Why do you need a Data Ingestion Layer in a Data Engineering Project? Types of Data Ingestion How does Data Ingestion Work in the Data Pipeline? Data Ingestion vs. ETL - How are they different?

Data Ingestion

Data Ingestion Data Engineer Data Engineering Project

Creating a Data Pipeline with Spark, Google Cloud Storage and Big Query

Towards Data Science

MARCH 6, 2023

On-premise and cloud working together to deliver a data product Photo by Toro Tseleng on Unsplash Developing a data pipeline is somewhat similar to playing with lego, you mentalize what needs to be achieved (the data requirements), choose the pieces (software, tools, platforms), and fit them together.

Google Cloud

Google Cloud Cloud Storage Data Pipeline Cloud

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

Cloudera

MARCH 14, 2023

In this blog post we will put these capabilities in context and dive deeper into how the built-in, end-to-end data flow life cycle enables self-service data pipeline development. Key requirements for building data pipelines Every data pipeline starts with a business requirement.

Data Pipeline

Data Pipeline Designing Kafka Metadata

The Good and the Bad of Apache Kafka Streaming Platform

AltexSoft

OCTOBER 21, 2022

Kafka can continue the list of brand names that became generic terms for the entire type of technology. Similar to Google in web browsing and Photoshop in image processing, it became a gold standard in data streaming, preferred by 70 percent of Fortune 500 companies. What is Kafka? What Kafka is used for.

Kafka

Kafka Hadoop ETL Tools Java

?? On Track with Apache Kafka – Building a Streaming ETL Solution with Rail Data

Confluent

OCTOBER 16, 2019

Trains are an excellent source of streaming data—their movements around the network are an unbounded series of events. Using this data, Apache Kafka ® and Confluent Platform can provide the foundations for both event-driven applications as well as an analytical platform. As with any real system, the data has “character.”

Kafka

Kafka Building Data PostgreSQL

15 ETL Project Ideas for Practice in 2025

ProjectPro

JUNE 6, 2025

This indicates that more businesses will adopt the tools and methodologies useful in big data analytics, including implementing the ETL pipeline. Data engineers are in charge of developing data models , constructing data pipelines, and monitoring ETL (extract, transform, load).

Project

Project Kafka AWS Data Pipeline

IBM Technology Chooses Cloudera as its Preferred Partner for Addressing Real Time Data Movement Using Kafka

Cloudera

SEPTEMBER 26, 2023

Learn more about how you can benefit from a well-supported data management platform and ecosystem of products, services and support by visiting the IBM and Cloudera partnership page. The post IBM Technology Chooses Cloudera as its Preferred Partner for Addressing Real Time Data Movement Using Kafka appeared first on Cloudera Blog.

Kafka

Kafka Technology IT Government

Introduction to Streaming Data Pipelines with Apache Kafka and ksqlDB

Confluent

JANUARY 20, 2022

A data pipeline is a method for getting data from one system to another, whether for analytics purposes or for storage. Learning the elements that make up this proven architecture […].

Data Pipeline

Data Pipeline Kafka Architecture Data

To Pull or to Push Your Data with Kafka Connect? That Is the Question.

Confluent

MARCH 2, 2021

Today, every company is a data company. There are many different data pipeline, integration, and ingestion tools in the market, but before you can feed your data analytics needs, data […].

Kafka

Kafka Data Pipeline Data Analytics Data

Kafka to MongoDB: Building a Streamlined Data Pipeline

Troubleshooting Kafka In Production

Webinars

Trending Sources

Building a Formula 1 Streaming Data Pipeline With Kafka and Risingwave

Webinars

Beyond Kafka: Conversation with Jark Wu on Fluss - Streaming Storage for Real-Time Analytics

10+ Top Data Pipeline Tools to Streamline Your Data Journey

Data Pipeline- Definition, Architecture, Examples, and Use Cases

8 Essential Data Pipeline Design Patterns You Should Know

Stream Rows and Kafka Topics Directly into Snowflake with Snowpipe Streaming

Declarative Data Pipelines with Hoptimator

Top Apache Kafka Certifications for Data Professionals

How To Learn Apache Kafka By Doing in 2025

Realtime Data Applications Made Easier With Meroxa

AWS Kafka: Your Go-to Solution for Real-Time Data Streaming

100+ Kafka Interview Questions and Answers for 2025

How to Use Apache Kafka for Real-Time Data Streaming?

Drafting Your Data Pipelines

How to Get Started with Kafka Topics : A Beginner's Guide

Deploying Data Pipelines using the Saga pattern

Introducing Stream Designer: The Visual Builder for Streaming Data Pipelines

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Streaming Data Pipelines: What Are They and How to Build One

Spring for Apache Kafka Deep Dive – Part 3: Apache Kafka and Spring Cloud Data Flow

Streaming Data from the Universe with Apache Kafka

Serverless Data Pipelines On DataCoral

Streaming Data Pipelines Made SQL With Decodable

Easier Stream Processing On Kafka With ksqlDB

Build Hybrid Data Pipelines and Enable Universal Connectivity With CDF-PC Inbound Connections

30+ Data Engineering Projects for Beginners in 2025

Why I Can’t Wait for Kafka Summit San Francisco

Data News — Week 24.11

Top 10 Data Engineering Tools You Must Learn in 2025

Apache Kafka Vs Apache Spark: Know the Differences

End-to-End Data Engineering System on Real Data with Kafka, Spark, Airflow, Postgres, and Docker

A Data Engineer’s Guide To Real-time Data Ingestion

An IBM Z Data Integration Success Story

Data Ingestion-The Key to a Successful Data Engineering Project

Creating a Data Pipeline with Spark, Google Cloud Storage and Big Query

Cloudera DataFlow Designer: The Key to Agile Data Pipeline Development

The Good and the Bad of Apache Kafka Streaming Platform

?? On Track with Apache Kafka – Building a Streaming ETL Solution with Rail Data

15 ETL Project Ideas for Practice in 2025

IBM Technology Chooses Cloudera as its Preferred Partner for Addressing Real Time Data Movement Using Kafka

Introduction to Streaming Data Pipelines with Apache Kafka and ksqlDB

To Pull or to Push Your Data with Kafka Connect? That Is the Question.

Stay Connected