Bytes, Kafka and Relational Database - Data Engineering Digest

Bytes

Kafka

Relational Database

Kafka Connect Deep Dive – JDBC Source Connector

Confluent

FEBRUARY 12, 2019

One of the most common integrations that people want to do with Apache Kafka ® is getting data in from a database. That is because relational databases are a rich source of events. The existing data in a database, and any changes to that data, can be streamed into a Kafka topic. What we’ll cover.

Kafka

Kafka MySQL Bytes Java

Understanding Change Data Capture (CDC) in MySQL and PostgreSQL: BinLog vs. WAL + Logical Decoding

Towards Data Science

JANUARY 7, 2025

Change Data Capture (CDC) is a powerful and efficient tool for transmitting data changes from relational databases such as MySQL and PostgreSQL. PostgreSQL (Physical Replication) : Uses Write-Ahead Logs (WAL), which record low-level changes to the database at a disk block level.

PostgreSQL

PostgreSQL MySQL Bytes Data Lake

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Trending Sources

Start Data Engineering

Deploying Kafka Streams and KSQL with Gradle – Part 2: Managing KSQL Implementations

Confluent

MAY 29, 2019

In part 1 , we discussed an event streaming architecture that we implemented for a customer using Apache Kafka ® , KSQL from Confluent, and Kafka Streams. In part 3, we’ll explore using Gradle to build and deploy KSQL user-defined functions (UDFs) and Kafka Streams microservices. gradlew composeUp. The KSQL pipeline flow.

Kafka

Kafka Management Bytes SQL

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

50 PySpark Interview Questions and Answers For 2025

ProjectPro

JUNE 6, 2025

For input streams receiving data through networks such as Kafka , Flume, and others, the default persistence level setting is configured to achieve data replication on two nodes to achieve fault tolerance. MEMORY ONLY SER: The RDD is stored as One Byte per partition serialized Java Objects.

Hadoop

Hadoop Metadata Java Datasets

Reflections on Event Streaming as Confluent Turns Five – Part 2

Confluent

SEPTEMBER 19, 2019

When people ask me the very top-level question “why do people use Kafka,” I usually lead with the story in my last post , where I talked about how Apache Kafka ® is helping us deliver on the promises the cloud made to us a decade ago. Industry heavyweights like Capital One use event streaming on Kafka for this very task.

Kafka

Kafka Bytes Data Pipeline Data Architect

Forge Your Career Path with Best Data Engineering Certifications

ProjectPro

JUNE 6, 2025

Exabytes are 10006 bytes, so to put it into perspective, 463 exabytes is the same as 212,765,957 DVDs. Proficiency in data ingestion , including the ability to import and export data between your cluster and external relational database management systems and ingest real-time and near-real-time (NRT) streaming data into HDFS.

Certification

Certification Data Engineering Data Engineer Engineering

97 things every data engineer should know

Grouparoo

OCTOBER 6, 2021

39 How to Prevent a Data Mutiny Key trends: modular architecture, declarative configuration, automated systems 40 Know the Value per Byte of Your Data Check if you are actually using your data 41 Know Your Latencies key questions: how old is data? 55 Pipe Dreams Kafka was good because it had replaying of messages. Increase visibility.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Forge Your Career Path with Best Data Engineering Certifications

ProjectPro

FEBRUARY 21, 2023

Exabytes are 10006 bytes, so to put it into perspective, 463 exabytes is the same as 212,765,957 DVDs. Proficiency in data ingestion, including the ability to import and export data between your cluster and external relational database management systems and ingest real-time and near-real-time (NRT) streaming data into HDFS.

Certification

Certification Data Engineering Data Engineer Engineering

50 PySpark Interview Questions and Answers For 2023

ProjectPro

NOVEMBER 22, 2021

For input streams receiving data through networks such as Kafka, Flume, and others, the default persistence level setting is configured to achieve data replication on two nodes to achieve fault tolerance. MEMORY ONLY SER: The RDD is stored as One Byte per partition serialized Java Objects.

Hadoop

Hadoop Metadata Java Python

Kafka Connect Deep Dive – JDBC Source Connector

Understanding Change Data Capture (CDC) in MySQL and PostgreSQL: BinLog vs. WAL + Logical Decoding

Webinars

Trending Sources

Deploying Kafka Streams and KSQL with Gradle – Part 2: Managing KSQL Implementations

Webinars

50 PySpark Interview Questions and Answers For 2025

Reflections on Event Streaming as Confluent Turns Five – Part 2

Forge Your Career Path with Best Data Engineering Certifications

97 things every data engineer should know

Forge Your Career Path with Best Data Engineering Certifications

50 PySpark Interview Questions and Answers For 2023

Stay Connected