Database, Kafka and PostgreSQL - Data Engineering Digest

End-to-End Data Engineering System on Real Data with Kafka, Spark, Airflow, Postgres, and Docker

Towards Data Science

FEBRUARY 9, 2024

This involves getting data from an API and storing it in a PostgreSQL database. In the second phase, we’ll develop an application that uses a language model to interact with this database. The second article, which will come later, will delve into creating agents using tools like LangChain to communicate with external databases.

Kafka

Kafka Data Engineering Data Engineer PostgreSQL

Easier Stream Processing On Kafka With ksqlDB

Data Engineering Podcast

MARCH 2, 2020

The ksqlDB project was created to address this state of affairs by building a unified layer on top of the Kafka ecosystem for stream processing. Developers can work with the SQL constructs that they are familiar with while automatically getting the durability and reliability that Kafka offers. How is ksqlDB architected?

Kafka

Kafka Process PostgreSQL MySQL

TimescaleDB: The Timeseries Database Built For SQL And Scale - Episode 65

Data Engineering Podcast

JANUARY 13, 2019

How has the market for timeseries databases changed since we last spoke? How have the improvements and new features in the recent releases of PostgreSQL impacted the Timescale product? How has the market for timeseries databases changed since we last spoke? Can you refresh our memory about what TimescaleDB is?

Database

Database PostgreSQL SQL MongoDB

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Change Data Capture For All Of Your Databases With Debezium

Data Engineering Podcast

JANUARY 5, 2020

Summary Databases are useful for inspecting the current state of your application, but inspecting the history of that data can get messy without a way to track changes as they happen. How has the tight coupling with Kafka impacted the direction and capabilities of Debezium? What, if any, other substrates does Debezium support (e.g.

Database

Database Kafka PostgreSQL MySQL

Streaming Heterogeneous Databases with Kafka Connect – The Easy Way

Confluent

AUGUST 11, 2020

However, not all databases can be in the […]. Building a Cloud ETL Pipeline on Confluent Cloud shows you how to build and deploy a data pipeline entirely in the cloud.

Database

Database Kafka Data Pipeline Cloud

Getting Started with Rust and Apache Kafka

Confluent

OCTOBER 24, 2019

We’ll also take a look at some performance tests to see if Rust might be a viable alternative for Java applications using Apache Kafka ®. In this case, that means a command is created for a particular action, which will be assigned to a Kafka topic specific for that action. On May 15, 2015, the Core Kafka team released version 1.0

Kafka

Kafka Java Banking Bytes

Kafka Connect Deep Dive – JDBC Source Connector

Confluent

FEBRUARY 12, 2019

One of the most common integrations that people want to do with Apache Kafka ® is getting data in from a database. That is because relational databases are a rich source of events. The existing data in a database, and any changes to that data, can be streamed into a Kafka topic. Setting the Kafka message key.

Kafka

Kafka MySQL Bytes Java

Spring for Apache Kafka Deep Dive – Part 3: Apache Kafka and Spring Cloud Data Flow

Confluent

MAY 30, 2019

Following part 1 and part 2 of the Spring for Apache Kafka Deep Dive blog series, here in part 3 we will discuss another project from the Spring team: Spring Cloud Data Flow , which focuses on enabling developers to easily develop, deploy, and orchestrate event streaming pipelines based on Apache Kafka ®.

Kafka

Kafka Cloud Data Pipeline PostgreSQL

Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable

Data Engineering Podcast

OCTOBER 15, 2023

Learn more about Datafold by visiting dataengineeringpodcast.com/datafold You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. With Materialize, you can! With Materialize, you can!

Process

Process Building SQL BI

Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus

Data Engineering Podcast

AUGUST 6, 2022

For machine learning applications relational models require additional processing to be directly useful, which is why there has been a growth in the use of vector databases. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services.

Machine Learning

Machine Learning Database MySQL MongoDB

Ingest Data Faster, Easier and Cost-Effectively with New Connectors and Product Updates

Snowflake

JUNE 13, 2024

Snowflake is launching native integrations with some of the most popular databases, including PostgreSQL and MySQL. With other ingestion improvements and our new database connectors, we are smoothing out the data ingestion process, making it radically simple and efficient to bring data to Snowflake. In case of errors (e.g.,

Data Ingestion

Data Ingestion MySQL PostgreSQL Data Pipeline

Inside Agoda’s Private Cloud - Exclusive

The Pragmatic Engineer

JUNE 13, 2023

For transactional databases, it’s mostly the Microsoft SQL Server, but also other databases like PostgreSQL, ScyllaDB and Couchbase. queries per second as total load, spread across its managed database-as-a-service (DBAAS.) It uses Spark for the data platform. At peak load, Agoda sees around 7.5M

Cloud

Cloud Database Utilities BI

TimescaleDB: Fast And Scalable Timeseries with Ajay Kulkarni and Mike Freedman - Episode 18

Data Engineering Podcast

FEBRUARY 11, 2018

The landscape of time series databases is extensive and oftentimes difficult to navigate. release of PostGreSQL had on the design of the project? Which came first, Timescale the business or Timescale the database, and what is your strategy for ensuring that the open source project and the company around it both maintain their health?

PostgreSQL

PostgreSQL NoSQL Google Cloud MongoDB

Reliable Data Exchange with the Outbox Pattern and Cloudera DiM

Cloudera

MARCH 15, 2023

Writing to a database and sending messages to a message bus is not atomic, which means that if one of these operations fails, the state of the application can become inconsistent. It uses a Postgres database as a local storage, and Spring Data to handle persistence. The service and the database run in docker containers.

PostgreSQL

PostgreSQL Kafka Database Data

?? On Track with Apache Kafka – Building a Streaming ETL Solution with Rail Data

Confluent

OCTOBER 16, 2019

Using this data, Apache Kafka ® and Confluent Platform can provide the foundations for both event-driven applications as well as an analytical platform. With tools like KSQL and Kafka Connect, the concept of streaming ETL is made accessible to a much wider audience of developers and data engineers. Ingesting the data.

Kafka

Kafka Building Data Coding

Getting Started with Cloudera Stream Processing Community Edition

Cloudera

AUGUST 10, 2022

Cloudera Stream Processing (CSP), powered by Apache Flink and Apache Kafka, provides a complete stream management and stateful processing solution. In CSP, Kafka serves as the storage streaming substrate, and Flink as the core in-stream processing engine that supports SQL and REST interfaces. Apache Kafka and SMM.

Process

Process Kafka PostgreSQL MySQL

Building a Scalable Search Architecture

Confluent

JUNE 18, 2019

As the databases professor at my university used to say, it depends. Using SQL to run your search might be enough for your use case, but as your project requirements grow and more advanced features are needed—for example, enabling synonyms, multilingual search, or even machine learning—your relational database might not be enough.

Architecture

Architecture Building Kafka Database-centric

Combining Transactional And Analytical Workloads On MemSQL with Nikita Shamgunov - Episode 51

Data Engineering Podcast

OCTOBER 9, 2018

Summary One of the most complex aspects of managing data for analytical workloads is moving it from a transactional database into the data warehouse. MemSQL is a distributed database built to support concurrent use by transactional, application oriented, and analytical, high volume, workloads on the same hardware.

PostgreSQL

PostgreSQL BI Machine Learning Data Warehouse

DBLog: A Generic Change-Data-Capture Framework

Netflix Tech

DECEMBER 17, 2019

Andreas Andreakis , Ioannis Papapanagiotou Overview Change-Data-Capture (CDC) allows capturing committed changes from a database in real-time and propagating those changes to downstream consumers [1][2]. In databases like MySQL and PostgreSQL, transaction logs are the source of CDC events. Triggering repairs at any time.

MySQL

MySQL PostgreSQL Database Transportation

Data News — Week 23.24

Christophe Blefari

JUNE 16, 2023

Change Data Capture (CDC) with PostgreSQL and ClickHouse — This is a nice vendor post about CDC with Kafka as movement layer (using Debezium). From databases introduction to SQL writing. — Marie wrote best practices for establishing complete and reliable data documentation. This is neat. but I missed it).

Programming Language

Programming Language SQL PostgreSQL Data

Paving The Road For Fast Analytics On Distributed Clouds With The Yellowbrick Data Warehouse

Data Engineering Podcast

MAY 27, 2021

How would you characterize Yellowbrick’s position in the database/DWH market? How would you characterize Yellowbrick’s position in the database/DWH market? Can you start by describing what Yellowbrick is and some of the story behind it? How is Yellowbrick architected? How is Yellowbrick architected?

Data Warehouse

Data Warehouse Cloud PostgreSQL Kafka

DBLog: A Generic Change-Data-Capture Framework

Netflix Tech

DECEMBER 17, 2019

Andreas Andreakis , Ioannis Papapanagiotou Overview Change-Data-Capture (CDC) allows capturing committed changes from a database in real-time and propagating those changes to downstream consumers [1][2]. In databases like MySQL and PostgreSQL, transaction logs are the source of CDC events. Triggering repairs at any time.

MySQL

MySQL PostgreSQL Database Transportation

Building A Real Time Event Data Warehouse For Sentry

Data Engineering Podcast

NOVEMBER 26, 2019

You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute.

Data Warehouse

Data Warehouse Building PostgreSQL Kafka

Making Analytical APIs Fast With Tinybird

Data Engineering Podcast

MAY 10, 2021

separate clusters, in-database quotas, etc.) Links Tinybird Carto PostgreSQL Podcast Episode PostGIS Clickhouse Podcast Episode Kafka Tornado Podcast.__init__ separate clusters, in-database quotas, etc.) Links Tinybird Carto PostgreSQL Podcast Episode PostGIS Clickhouse Podcast Episode Kafka Tornado Podcast.__init__

PostgreSQL

PostgreSQL Data Warehouse Data Pipeline Kafka

Metal as a Service (MaaS): DIY server-management at scale

LinkedIn Engineering

MAY 11, 2023

For MaaS, the starting point was co-hosting the web service, relational database ( Postgres ), and Redis -based caching layer on a server. In the extant deployment model, multiple API workers functioned in tandem without sharing common memory or database connectors. We decided to leverage Kafka as a distributed messaging queue.

Management

Management PostgreSQL MySQL Kafka

Keeping Your Data Warehouse In Order With DataForm

Data Engineering Podcast

OCTOBER 14, 2019

You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. Which database engines do you support and how do you reduce the maintenance burden for supporting different dialects and capabilities?

Data Warehouse

Data Warehouse PostgreSQL AWS Programming Language

Solving Data Lineage Tracking And Data Discovery At WeWork

Data Engineering Podcast

DECEMBER 16, 2019

You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. What are the benefits of using PostgreSQL as the system of record for Marquez? How is the metadata itself stored and managed in Marquez?

Metadata

Metadata PostgreSQL Datasets Data Warehouse

Powering Real-Time Analytics at Scale on MySQL and PostgreSQL

Rockset

APRIL 15, 2021

Relational databases today are widely known to be suboptimal for supporting high-scale analytical use cases, and are all but certain to run into issues as your production data size and query volume grow.

PostgreSQL

PostgreSQL MySQL Relational Database NoSQL

Materialized Views in SQL Stream Builder

Cloudera

MARCH 23, 2023

An MV is a special type of sink that allows us to output data from our query into a tabular format persisted in a PostgreSQL database. Primary key Every MV requires a primary key, as this will be our primary key in the underlying relational database as well. Why use a materialized view? It is set to five minutes by default.

SQL

SQL Kafka PostgreSQL Database

A decade of scaling (real-time) analytics and master data at Picnic

Picnic Engineering

MARCH 28, 2025

We started with PostgreSQL, laying the foundation for structured analytics early on. At the same time, demand for data surged, but our database technology lacked the ability to efficiently isolate workloads, creating performance bottlenecks as more teams relied on the system for insights.

Data Warehouse

Data Warehouse PostgreSQL Python Machine Learning

Real-Time CDC With Rockset And Confluent Cloud

Rockset

MARCH 26, 2023

Breaking Bad… Data Silos We haven’t quite figured out how to avoid using relational databases. Folks have definitely tried, and while Apache Kafka® has become the standard for event-driven architectures, it still struggles to replace your everyday PostgreSQL database instance in the modern application stack.

Cloud

Cloud PostgreSQL Kafka Database

Event First Development - Moving Towards Kafka Pipeline Applications

Zalando Engineering

OCTOBER 9, 2017

Initially, we built a quick prototype of the data services - primitive CRUD-type services, with synchronous HTTP APIs, each interacting directly with a simple (dedicated) PostgreSQL database as the operational store for the data. Outbound events were generated after completion of DB updates.

Kafka

Kafka PostgreSQL Database Architecture

Updates, Inserts, Deletes: Comparing Elasticsearch and Rockset for Real-Time Data Ingest

Rockset

OCTOBER 11, 2022

Introduction Managing streaming data from a source system, like PostgreSQL, MongoDB or DynamoDB, into a downstream system for real-time analytics is a challenge for many teams. Rockset, on the other hand, is a cloud-native database, removing a lot of the tooling and overhead required to get data into the system.

Data Ingestion

Data Ingestion Kafka Relational Database PostgreSQL

10 Best Azure Data Engineer Tools in 2023

Knowledge Hut

NOVEMBER 19, 2023

Open Source Support: Many Azure services support popular open-source frameworks like Apache Spark, Kafka, and Hadoop, providing flexibility for data engineering tasks. Microsoft Azure SQL Database The SQL database is Microsoft's premier database offering.

Data Engineering

Data Engineering Data Engineer Engineering PostgreSQL

How to Use ChatGPT ETL Prompts For Your ETL Game

Monte Carlo

DECEMBER 4, 2023

Extraction ChatGPT ETL prompts can be used to help write scripts to extract data from different sources, including: Databases I have a SQL database with a table named employees. Filtering In my SQL database, I have a table named orders with columns order_id, customer_name, order_date, and total_amount.

PostgreSQL

PostgreSQL ETL Tools Data Lake MySQL

Democratizing Data Streaming with Striim Developer

Striim

FEBRUARY 14, 2023

If you’d like to join our first cohort of Striim Developers, you can sign up here.

MongoDB

MongoDB PostgreSQL MySQL Kafka

JOINs and Aggregations Using Real-Time Indexing on MongoDB Atlas

Rockset

JUNE 16, 2020

Applications of today often operate on data from multiple sources—databases like MongoDB, streaming platforms, and data lakes. An omni-channel retail personalization application, as an example, may require order data from MongoDB, user activity streams from Kafka, and third-party data from a data lake.

MongoDB

MongoDB Data Lake PostgreSQL Kafka

Data Integration in a World of Microservices

Zalando Engineering

SEPTEMBER 20, 2015

For example, if you want to analyze yesterday’s sales you can — at least in principle — simply query your operational database. Article as well as order data is horizontally sharded over eight PostgreSQL databases, so there is no way to simply fire up some ad hoc SQL to do a quick analysis.

Data Integration

Data Integration PostgreSQL Amazon Web Services Kafka

What’s new in CDP Private Cloud Base 7.1.6?

Cloudera

APRIL 15, 2021

Added support for standalone NiFi/Kafka clusters. Operational Database. Support for complex x-row/x-table distributed transactions that runs TPC-C benchmarks alongside support for ANSI SQL makes it easy to migrate from MySQL databases to Operational Database. Operational Database – Apache Phoenix 5.1.

Cloud

Cloud MySQL PostgreSQL SQL

Data Engineering Annotated Monthly – September 2021

Big Data Tools

OCTOBER 5, 2021

Kafka 3.0.0 – The Apache Software Foundation needed less than one month to go from Kafka version 3.0.0-rc0 Druid 0.22.0 – Apache Druid is claimed to be a high-performance analytical database competing with ClickHouse. And of course, PostgreSQL is one of the most popular databases. rc0 to the release of 3.0.0.

Data Engineering

Data Engineering Data Engineer Engineering Big Data Tools

Data Engineering Annotated Monthly – September 2021

Big Data Tools

OCTOBER 5, 2021

Kafka 3.0.0 – The Apache Software Foundation needed less than one month to go from Kafka version 3.0.0-rc0 Druid 0.22.0 – Apache Druid is claimed to be a high-performance analytical database competing with ClickHouse. And of course, PostgreSQL is one of the most popular databases. rc0 to the release of 3.0.0.

Data Engineering

Data Engineering Data Engineer Engineering Big Data Tools

Azure Data Engineer Resume

Edureka

FEBRUARY 9, 2023

This language is used to interact with databases and perform data manipulations and querying. It offers an interactive and user-friendly interface for creating dashboards, reports, and charts from a variety of data sources such as spreadsheets, databases, and cloud-based sources. SQL is also an essential skill for Azure Data Engineers.

Data Engineering

Data Engineering Data Engineer Engineering Amazon Web Services

Migrate to CDP Private Cloud Base – A Step by Step Guide

Cloudera

SEPTEMBER 30, 2021

PostgreSQL 10, 11 and 12 and OracleDB 12c, 19c and 19.9. Add Kafka Service – Required for Atlas if it’s not already installed. Backup Cluster Metadata and Databases for CM, Hive and Oozie. Step 4b: Upgrading the RDBMS. CDP supports MariaDB 10.2-10.4, Step 5: Upgrading Cloudera Manager. Add Atlas Service. Run Upgrade.

Cloud

Cloud PostgreSQL Metadata MySQL

How Rockset Enables SQL-Based Rollups for Streaming Data

Rockset

AUGUST 30, 2021

Apache Kafka has made acquiring real-time data more mainstream, but only a small sliver are turning batch analytics, run nightly, into real-time analytical dashboards with alerts and automatic anomaly detection. Rockset is a real-time analytics database in the cloud that uses an indexing approach to deliver low-latency analytics at scale.

SQL

SQL Kafka MongoDB MySQL

End-to-End Data Engineering System on Real Data with Kafka, Spark, Airflow, Postgres, and Docker

Easier Stream Processing On Kafka With ksqlDB

Webinars

Trending Sources

TimescaleDB: The Timeseries Database Built For SQL And Scale - Episode 65

Webinars

Change Data Capture For All Of Your Databases With Debezium

Streaming Heterogeneous Databases with Kafka Connect – The Easy Way

Getting Started with Rust and Apache Kafka

Kafka Connect Deep Dive – JDBC Source Connector

Spring for Apache Kafka Deep Dive – Part 3: Apache Kafka and Spring Cloud Data Flow

Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable

Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus

Ingest Data Faster, Easier and Cost-Effectively with New Connectors and Product Updates

Inside Agoda’s Private Cloud - Exclusive

TimescaleDB: Fast And Scalable Timeseries with Ajay Kulkarni and Mike Freedman - Episode 18

Reliable Data Exchange with the Outbox Pattern and Cloudera DiM

?? On Track with Apache Kafka – Building a Streaming ETL Solution with Rail Data

Getting Started with Cloudera Stream Processing Community Edition

Building a Scalable Search Architecture

Combining Transactional And Analytical Workloads On MemSQL with Nikita Shamgunov - Episode 51

DBLog: A Generic Change-Data-Capture Framework

Data News — Week 23.24

Paving The Road For Fast Analytics On Distributed Clouds With The Yellowbrick Data Warehouse

DBLog: A Generic Change-Data-Capture Framework

Building A Real Time Event Data Warehouse For Sentry

Making Analytical APIs Fast With Tinybird

Metal as a Service (MaaS): DIY server-management at scale

Keeping Your Data Warehouse In Order With DataForm

Solving Data Lineage Tracking And Data Discovery At WeWork

Powering Real-Time Analytics at Scale on MySQL and PostgreSQL

Materialized Views in SQL Stream Builder

A decade of scaling (real-time) analytics and master data at Picnic

Real-Time CDC With Rockset And Confluent Cloud

Event First Development - Moving Towards Kafka Pipeline Applications

Updates, Inserts, Deletes: Comparing Elasticsearch and Rockset for Real-Time Data Ingest

10 Best Azure Data Engineer Tools in 2023

How to Use ChatGPT ETL Prompts For Your ETL Game

Democratizing Data Streaming with Striim Developer

JOINs and Aggregations Using Real-Time Indexing on MongoDB Atlas

Data Integration in a World of Microservices

What’s new in CDP Private Cloud Base 7.1.6?

Data Engineering Annotated Monthly – September 2021

Data Engineering Annotated Monthly – September 2021

Azure Data Engineer Resume

Migrate to CDP Private Cloud Base – A Step by Step Guide

How Rockset Enables SQL-Based Rollups for Streaming Data

Stay Connected