Information and PostgreSQL - Data Engineering Digest

Patching the PostgreSQL JDBC Driver

Zalando Engineering

NOVEMBER 8, 2023

A replication slot should be created for each client - so in this case we have blue (upper, denoted 1 ) and pink (lower, denoted 2 ) replication slots - and each slot will contain information about the progress of its client through the WAL.

PostgreSQL

PostgreSQL Java Database Bytes

WebSockets in Scala, Part 2: Integrating Redis and PostgreSQL

Rock the JVM

MAY 22, 2024

Docker for Redis and PostgreSQL We’ll be using Docker images for Redis and Postgres. Skunk for PostgreSQL Integration In this section, we’ll implement the protocols necessary for interacting with Postgres in our application using Skunk. object message {. To follow along, you’ll need Docker and Docker Compose installed. flatMap { _.

PostgreSQL

PostgreSQL Scala Database SQL

Data Engineering Weekly #221

Data Engineering Weekly

MAY 25, 2025

[link] SquareSpace: Leveraging Change Data Capture For Database Migrations At Scale Squarespace writes about migrating their business-critical PostgreSQL databases to CockroachDB (CRDB) at scale. I have provided links for informational purposes and do not suggest endorsement. link] All rights reserved, ProtoGrowth Inc.,

Data Engineering

Data Engineering Data Engineer Engineering PostgreSQL

Webinars

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Delivering the Most Enterprise-Ready Postgres, Built for Snowflake

Snowflake

JUNE 1, 2025

Today, Snowflake advances our vision to be the ultimate platform for data-driven innovation with our announcement that we have agreed to acquire Crunchy Data, a leading provider of trusted, open source PostgreSQL technology. Snowflakes platform serves as the foundation for Blue Yonders vast amount of supply chain data.

PostgreSQL

PostgreSQL Database Cloud Designing

Maps with PostgreSQL and PostGIS

Zalando Engineering

DECEMBER 1, 2021

This blog post explains to you which tools to use to serve geospatial data from a database system (PostgreSQL) to your web browser. At Zalando, the open source database system PostgreSQL is used by many teams and it offers a geospatial component called PostGIS. The vector tile format must not consist solely of the geometry.

PostgreSQL

PostgreSQL Metadata Database SQL

PostgreSQL vs. MySQL: 10 Key Differences

Meltano

OCTOBER 19, 2022

PostgreSQL and MySQL are among the most popular open-source relational database management systems (RDMS) worldwide. For all of their similarities, PostgreSQL and MySQL differ from one another in many ways. That’s because MySQL isn’t fully SQL-compliant, while PostgreSQL is.

PostgreSQL

PostgreSQL MySQL Database SQL

Data Classification: A Step-by-Step Guide

Monte Carlo

APRIL 8, 2025

In practical terms, this means creating a system where everyone in your organization understands what data they’re handling and how to treat it appropriately, with safeguards if someone accidentally tries to mishandle sensitive information. Step 2: Hunt Down the Sensitive Stuff Now its time to play detective in your database.

PostgreSQL

PostgreSQL Medical Database Data

Turbocharging Atlas: How we reduced server initialization time to less than 2 minutes

ThoughtSpot

NOVEMBER 5, 2024

While Atlas operates as an in-memory graph database for speed and performance, it uses PostgreSQL as its persistent storage layer to ensure durability and long-term data storage. A table named commit, that stores information about schema commits, including commit number and time. These data objects are referred to as “storables”.

Metadata

Metadata PostgreSQL Java Database

Create your Private Data Warehousing Environment Using Azure Kubernetes Service

Cloudera

DECEMBER 2, 2021

The following sections provide additional details on other aspects of how this is implemented, as well as information on steps to take to set this up for yourself. In addition to AKS and the load balancers mentioned above, this includes VNET, Data Lake Storage, PostgreSQL Azure database, and more. Activating CDW with Private AKS.

PostgreSQL

PostgreSQL Data Lake Data Warehouse Retail

How to Speed up Local Development of a Docker Application running on AWS

DoorDash Engineering

MARCH 7, 2023

We knew we’d be deploying a Docker container to Fargate as well as using an Amazon Aurora PostgreSQL database and Terraform to model our infrastructure as code. Set up a locally running containerized PostgreSQL database. For more information on the Dockerfile spec, you can check out the Docker documentation here.

AWS

AWS PostgreSQL Database SQL

CockroachDB In Depth with Peter Mattis - Episode 35

Data Engineering Podcast

JUNE 10, 2018

With the first wave of cloud era databases the ability to replicate information geographically came at the expense of transactions and familiar query languages. I know that your SQL syntax is PostGreSQL compatible, so is it possible to use existing ORMs unmodified with CockroachDB?

PostgreSQL

PostgreSQL NoSQL Relational Database SQL

Building a Kimball dimensional model with dbt

dbt Developer Hub

APRIL 19, 2023

Part 1: Setup dbt project and database Step 1: Install project dependencies Before you can get started: You must have either DuckDB or PostgreSQL installed. Choose one, and download and install the database using one of the following links: Download DuckDB Download PostgreSQL You must have Python 3.8

Building

Building PostgreSQL BI Database

Analyzing Extreme Distributions in PostgreSQL

Zalando Engineering

JULY 29, 2015

Recently my team and I observed in our PostgreSQL databases a sporadic increase in the execution time of stored procedures (see the graph above). To find answers, we tested how different configurations of PostgreSQL influenced the results of the query planner. PostgreSQL also addresses non-uniform distributions.

PostgreSQL

PostgreSQL Database Process Systems

Connect PostgreSQL on Google Cloud SQL to Amazon Aurora: 2 Ways to Integrate Data

Hevo

SEPTEMBER 26, 2023

As a data-driven business, extracting meaningful data from various sources and making informed decisions relies heavily on effective data analysis. To unlock the full potential of your data in PostgreSQL on Google Cloud SQL necessitates data integration with Amazon Aurora.

Google Cloud

Google Cloud PostgreSQL SQL Cloud

The Perils of Modifying PostgreSQL System Catalogs

Zalando Engineering

JULY 13, 2015

PostgreSQL is a very flexible database system. There are multiple system catalogs: pg_class stores information about tables, pg_type describes types. There are multiple system catalogs: pg_class stores information about tables, pg_type describes types. Its flexibility derives from its way of storing metadata.

PostgreSQL

PostgreSQL Systems Metadata Database

Beyond REST

Netflix Tech

FEBRUARY 25, 2021

Since early 2020, Netflix has been iteratively developing systems to provide internal stakeholders and business leaders with up-to-date tools and dashboards with the latest information on the pandemic. Use PostgreSQL Composite Types when taking advantage of PostgreSQL Aggregate Functions.

PostgreSQL

PostgreSQL Database Entertainment SQL

End-to-End Data Engineering System on Real Data with Kafka, Spark, Airflow, Postgres, and Docker

Towards Data Science

FEBRUARY 9, 2024

This involves getting data from an API and storing it in a PostgreSQL database. Data Processing: A Spark job then takes over, consuming the data from the Kafka topic and transferring it to a PostgreSQL database. This article is part of a project that’s split into two main phases. The first phase focuses on building a data pipeline.

Kafka

Kafka Data Engineering Data Engineer PostgreSQL

Astronomer with Ry Walker - Episode 6

Data Engineering Podcast

AUGUST 6, 2017

What are some of the most interesting or unexpected uses of your platform that you are aware of? What are some of the most interesting or unexpected uses of your platform that you are aware of?

MongoDB

MongoDB PostgreSQL Data Pipeline Kafka

PrestoDB and Starburst Data with Kamil Bajda-Pawlikowski - Episode 32

Data Engineering Podcast

MAY 20, 2018

Presto is a distributed SQL engine that allows you to tie all of your information together without having to first aggregate it all into a data warehouse. Presto is a distributed SQL engine that allows you to tie all of your information together without having to first aggregate it all into a data warehouse.

PostgreSQL

PostgreSQL Hadoop SQL Kafka

Keep Your Data And Query It Too Using Chaos Search with Thomas Hazel and Pete Cheslock - Episode 47

Data Engineering Podcast

SEPTEMBER 9, 2018

Summary Elasticsearch is a powerful tool for storing and analyzing data, but when using it for logs and other time oriented information it can become problematic to keep all of your history. What are some of the most interesting or unexpected uses of Chaos Search and access to large amounts of historical log information that you have seen?

IT

IT PostgreSQL Scala AWS

Reliable Data Exchange with the Outbox Pattern and Cloudera DiM

Cloudera

MARCH 15, 2023

The record in the “outbox” table contains information about the event that happened inside the application, as well as some metadata that is required for further processing or routing. Other events such as DELETE can be ignored now, as it does not contain useful information for our use case. transforms: The name of the transformation.

PostgreSQL

PostgreSQL Kafka Database Data

ThreatStack: Data Driven Cloud Security with Pete Cheslock and Patrick Cable - Episode 25

Data Engineering Podcast

APRIL 1, 2018

In this episode ThreatStack’s director of operations, Pete Cheslock, and senior infrastructure security engineer, Patrick Cable, discuss the data infrastructure that supports their platform, how they capture and process the data from client systems, and how that information can be used to keep your systems safe from attackers.

Amazon Web Services

Amazon Web Services Cloud PostgreSQL Kafka

Data News — Week 23.24

Christophe Blefari

JUNE 16, 2023

I'm not informed enough so I'll wait before giving my opinion on it. Change Data Capture (CDC) with PostgreSQL and ClickHouse — This is a nice vendor post about CDC with Kafka as movement layer (using Debezium). Also called GDPR 2.0 the AI Act is meant to regulate the usage of AI in tomorrow's world. This is neat.

Programming Language

Programming Language SQL PostgreSQL Data

Solving Data Discovery At Lyft

Data Engineering Podcast

AUGUST 5, 2019

As organizations grow and data sources proliferate it becomes difficult to keep track of everything, particularly for analysts and data scientists who are not involved with the collection and management of that information. How does the information in Amundsen get populated and what is the process for keeping it up to date?

MongoDB

MongoDB PostgreSQL Metadata Media

Use SurrealDB to Persist Data with Rocket REST API

Workfall

MARCH 21, 2023

SurrealDB is the solution for database administration, which includes general admin and user management, enforcing data security and control, performance monitoring, maintaining data integrity, dealing with concurrency transactions, and recovering information in the event of an unexpected system failure. What is Jamstack?

PostgreSQL

PostgreSQL NoSQL Database Unstructured Data

Business Intelligence Beyond The Dashboard With ClicData

Data Engineering Podcast

NOVEMBER 6, 2021

What is overlooked in that characterization is the level of complexity and effort that are required to collect and present that information, and the opportunities for providing those insights in other contexts. How are you approaching schema design and evolution in the storage layer?

Business Intelligence

Business Intelligence PostgreSQL BI Data Warehouse

Building A Cost Effective Data Catalog With Tree Schema

Data Engineering Podcast

NOVEMBER 9, 2020

In addition to storing schema information you allow users to store information about the transformations being performed. How can users populate information about their transformations in an automated fashion? How do you approach evolution and versioning of schema information? How is that represented?

Building

Building PostgreSQL BI Metadata

Change Data Capture For All Of Your Databases With Debezium

Data Engineering Podcast

JANUARY 5, 2020

What are the downstream challenges or complications that application designers or systems architects have to deal with to make use of that information? What are the downstream challenges or complications that application designers or systems architects have to deal with to make use of that information?

Database

Database Kafka PostgreSQL MySQL

Make Data Lineage A Ubiquitous Part Of Your Work By Simplifying Its Implementation With Alvin

Data Engineering Podcast

OCTOBER 2, 2022

In this episode co-founder Martin Sahlen explains the impact that easy access to lineage information can have on the work of data engineers and analysts, and how he and his team have designed their platform to offer that information to engineers and stakeholders in the places that they interact with data.

IT

IT Food MongoDB PostgreSQL

Graph Databases In Production At Scale Using DGraph with Manish Jain - Episode 44

Data Engineering Podcast

AUGUST 19, 2018

For a substantial number of use cases, the optimal format for storing and querying that information is as a graph, however databases architected around that use case have historically been difficult to use at scale or for serving fast, distributed queries. How does the query interface and data storage in DGraph differ from other options?

Database

Database PostgreSQL NoSQL Transportation

Cleaning And Curating Open Data For Archaeology

Data Engineering Podcast

FEBRUARY 3, 2019

In this episode Eric Kansa describes how they process, clean, and normalize the data that they host, the challenges that they face with scaling ETL processes which require domain specific knowledge, and how the information contained in connections that they expose is being used for interesting projects.

Digital Media

Digital Media Media PostgreSQL Datasets

Managing The DoorDash Data Platform

Data Engineering Podcast

MARCH 15, 2021

In order to handle the volume and variety of information that they use to run and improve the business the data team has to build a platform that analysts and data scientists can use in a self-service manner. What secondary or third party sources of information do you rely on? What do you do with that information?

Management

Management Data Warehouse PostgreSQL Kafka

Easier Stream Processing On Kafka With ksqlDB

Data Engineering Podcast

MARCH 2, 2020

What is a typical lifecycle of information in ksqlDB? What is a typical lifecycle of information in ksqlDB? Typically a database is considered a long term storage location for data, whereas Kafka is a streaming layer with a bounded amount of durable storage.

Kafka

Kafka Process PostgreSQL MySQL

Snowflake’s Best-in-Class Enterprise Data Foundation Unlocks Interoperability with Open Data and Internal Collaboration

Snowflake

JUNE 4, 2024

See our latest 10-Q for more information. Snowflake’s native connectors , including the existing Snowflake Connector for Kafka and for ServiceNow , are built with scalability, cost efficiency and lower latency. Getting data ingested now only takes a few clicks, and the data is encrypted.

Government

Government Data Ingestion Data PostgreSQL

Solving Data Lineage Tracking And Data Discovery At WeWork

Data Engineering Podcast

DECEMBER 16, 2019

What are the benefits of using PostgreSQL as the system of record for Marquez? What are some of the interesting questions that can be answered from the information stored in Marquez? What are the benefits of using PostgreSQL as the system of record for Marquez? How is the metadata itself stored and managed in Marquez?

Metadata

Metadata PostgreSQL Datasets Data Warehouse

Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus

Data Engineering Podcast

AUGUST 6, 2022

In addition, data discovery is made easy through Sifflet’s information-rich data catalog with a powerful search engine and real-time health statuses. In addition, data discovery is made easy through Sifflet’s information-rich data catalog with a powerful search engine and real-time health statuses.

Machine Learning

Machine Learning Database MySQL MongoDB

Metal as a Service (MaaS): DIY server-management at scale

LinkedIn Engineering

MAY 11, 2023

There was reliance on an unmanaged data layer.Redis (for caching) and PostgreSQL (as primary datastore)served as single points of failure for this product. Managing data replication for data in PostgreSQL could have been more robust. It required an engineer (with access to this keystore) to be logged in at the time of deployment.

Management

Management PostgreSQL MySQL Kafka

Data Engineering Weekly #129

Data Engineering Weekly

APRIL 30, 2023

Intuit provides insightful information about performing a set of tests called machine learning model sanity checks in a pre-production environment. link] WTTJ Tech: From PostgreSQL to Snowflake - A data migration story WTTJ Tech has an interesting story to share about data migration.

Data Engineering

Data Engineering Data Engineer Engineering PostgreSQL

Top 12 Backend Developer Skills You Must Know in 2024

Knowledge Hut

APRIL 25, 2024

Two of the most recognized positions for API information are XML and JSON. Knowledge of Databases When working on a project, you must realize that data storage is essential since they contain a lot of information. Therefore, developers employ MySQL, SQL, PostgreSQL, MongoDB, etc., to manage DBMS.

Programming Language

Programming Language Java Algorithm MySQL

Getting Started with Rust and Apache Kafka

Confluent

OCTOBER 24, 2019

The result will be put on another topic, in which case a failed response would contain a reason for the failure, and the successful response might contain additional information. The blue parts represent PostgreSQL databases, and turquoise is a Nginx web server. The command handler has two external connections to Kafka and PostgreSQL.

Kafka

Kafka Java Banking Bytes

How to Handle Forms Efficiently in Yew Web Development?

Workfall

MARCH 7, 2023

It is important because you would want to ensure that you are collecting appropriate information from users and this is achieved by ensuring that your users are entering the correct details. One way to ensure that users give the appropriate information is through form handling. Let’s understand this with an example.

PostgreSQL

PostgreSQL Database AWS Data Collection

Top Database Project Ideas to Work on 2023 [with Source Code]

Knowledge Hut

MAY 31, 2023

However, managing data can be a challenging task, especially when dealing with large amounts of information. A database management system (DBMS) is a software system that helps organize, store and manage information efficiently. The system can be designed to track employee information such as hours worked, wages and taxes.

Database

Database Coding MongoDB Project

Staying in the Zone: How DoorDash used a service mesh to manage data transfer, reducing hops and cloud spend

DoorDash Engineering

JANUARY 16, 2024

Storage traffic: Includes traffic from microservices to stateful systems such as Aurora PostgreSQL, CockroachDB, Redis, and Kafka. The EDS resource includes pod IP addresses and their AZ information. The EDS resource includes pod IP addresses and their AZ information.

Bytes

Bytes Cloud Management PostgreSQL

Patching the PostgreSQL JDBC Driver

WebSockets in Scala, Part 2: Integrating Redis and PostgreSQL

Webinars

Trending Sources

Data Engineering Weekly #221

Webinars

Delivering the Most Enterprise-Ready Postgres, Built for Snowflake

Maps with PostgreSQL and PostGIS

PostgreSQL vs. MySQL: 10 Key Differences

Data Classification: A Step-by-Step Guide

Popular PostgreSQL Tools to Know in 2024

Turbocharging Atlas: How we reduced server initialization time to less than 2 minutes

Create your Private Data Warehousing Environment Using Azure Kubernetes Service

How to Speed up Local Development of a Docker Application running on AWS

CockroachDB In Depth with Peter Mattis - Episode 35

Building a Kimball dimensional model with dbt

Analyzing Extreme Distributions in PostgreSQL

Connect PostgreSQL on Google Cloud SQL to Amazon Aurora: 2 Ways to Integrate Data

The Perils of Modifying PostgreSQL System Catalogs

Beyond REST

End-to-End Data Engineering System on Real Data with Kafka, Spark, Airflow, Postgres, and Docker

Astronomer with Ry Walker - Episode 6

PrestoDB and Starburst Data with Kamil Bajda-Pawlikowski - Episode 32

Keep Your Data And Query It Too Using Chaos Search with Thomas Hazel and Pete Cheslock - Episode 47

Reliable Data Exchange with the Outbox Pattern and Cloudera DiM

ThreatStack: Data Driven Cloud Security with Pete Cheslock and Patrick Cable - Episode 25

Data News — Week 23.24

Solving Data Discovery At Lyft

Use SurrealDB to Persist Data with Rocket REST API

Business Intelligence Beyond The Dashboard With ClicData

Building A Cost Effective Data Catalog With Tree Schema

Change Data Capture For All Of Your Databases With Debezium

Make Data Lineage A Ubiquitous Part Of Your Work By Simplifying Its Implementation With Alvin

Graph Databases In Production At Scale Using DGraph with Manish Jain - Episode 44

Cleaning And Curating Open Data For Archaeology

Managing The DoorDash Data Platform

Easier Stream Processing On Kafka With ksqlDB

Snowflake’s Best-in-Class Enterprise Data Foundation Unlocks Interoperability with Open Data and Internal Collaboration

Solving Data Lineage Tracking And Data Discovery At WeWork

Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus

Metal as a Service (MaaS): DIY server-management at scale

Data Engineering Weekly #129

Top 12 Backend Developer Skills You Must Know in 2024

Getting Started with Rust and Apache Kafka

How to Handle Forms Efficiently in Yew Web Development?

Top Database Project Ideas to Work on 2023 [with Source Code]

Staying in the Zone: How DoorDash used a service mesh to manage data transfer, reducing hops and cloud spend

Stay Connected