AWS and Database - Data Engineering Digest

What Is AWS DMS And Why You Shouldn’t Use It As An ELT

Seattle Data Guy

NOVEMBER 8, 2024

Recently, I’ve encountered a few projects that used AWS DMS, which is almost like an ELT solution. Whether it was moving data from a local database instance to S3 or some other data storage layer. It was interesting to see AWS DMS used in this manner. But it’s not what DMS was built for.

AWS

AWS IT Data Storage Database

Handling a Regional Outage: Comparing the Response From AWS, Azure and GCP

The Pragmatic Engineer

OCTOBER 31, 2023

13 June 2023: AWS. The largest AWS region (us-east-1) degraded heavily for 3 hours, impacting 104 AWS services. We did a deepdive into this incident earlier in AWS’s us-east-1 outage. We’ll also learn how this article contributed to AWS publishing its first public postmortem in two years!

AWS

AWS Google Cloud Cloud Engineering

Handling Network Throttling with AWS EC2 at Pinterest

Pinterest Engineering

APRIL 7, 2025

Jia Zhan, Senior Staff Software Engineer, Pinterest Sachin Holla, Principal Solution Architect, AWS Summary Pinterest is a visual search engine and powers over 550 million monthly active users globally. Pinterests infrastructure runs on AWS and leverages Amazon EC2 instances for its compute fleet. 4xl with up to 12.5 4xl with up to 12.5

AWS

AWS Bytes Database Data Ingestion

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Interesting startup idea: benchmarking cloud platform pricing

The Pragmatic Engineer

OCTOBER 17, 2024

There is an increasing number of cloud providers offering the ability to rent virtual machines, the largest being AWS, GCP, and Azure. A startup called Spare Cores attempts to help compare prices between AWS, GCP, Azure and Hetzner by monitoring offerings in close to realtime. Each benchmarking task is evaluated sequentially.

Cloud

Cloud AWS Metadata Cloud Computing

AWS RDS PostgreSQL Setup

Start Data Engineering

JULY 18, 2020

RDS AWS RDS is a managed service provided by AWS to run a relational database. We will see how to setup a postgres instance using AWS RDS. Log in to your AWS account. Go to Services -> RDS Click on Create Database, In the Create Database prompt, choose Standard Create option with PostgreSQL as engine type.

PostgreSQL

PostgreSQL AWS Relational Database Database

AWS Shared Responsibility Model – Amazon Web Services

Edureka

APRIL 22, 2025

Understanding the AWS Shared Responsibility Model is essential for aligning security and compliance obligations. The model delineates the division of labor between AWS and its customers in securing cloud infrastructure and applications. Let us begin by defining the Shared Responsibility Model and its core purpose in the AWS ecosystem.

Amazon Web Services

Amazon Web Services AWS Cloud Data Governance

The Roots of Today's Modern Backend Engineering Practices

The Pragmatic Engineer

NOVEMBER 21, 2023

After Zynga, he rejoined Amazon, and was the General Manager (GM) for Compute services at AWS, and later chief of staff, and advisor to AWS executives like Charlie Bell and Andy Jassy (Amazon’s current CEO.) We dabbled in network engineering, database management, and system administration. were in english only.

Engineering

Engineering Bytes Cloud Computing AWS

Datadog’s $65M/year customer mystery solved

The Pragmatic Engineer

MAY 11, 2023

The company racked up huge bills for the likes of AWS, Snowflake, and also Datadog. A quick summary of these technologies: Prometheus : a time series database. A fast and open-source column-oriented database management system, which is a popular choice for log management. And so, the $65M bill was for Datadog, for 2021.

AWS

AWS Software Engineer Software Engineering Google Cloud

How to Speed up Local Development of a Docker Application running on AWS

DoorDash Engineering

MARCH 7, 2023

As backend developers, we needed to stay unblocked while the infrastructure — in this case AWS resources — was being created. We knew we’d be deploying a Docker container to Fargate as well as using an Amazon Aurora PostgreSQL database and Terraform to model our infrastructure as code. Additionally, some require a paid subscription.

AWS

AWS PostgreSQL Database SQL

Is Aws Certification Worth It?

Knowledge Hut

NOVEMBER 16, 2023

There is a clear shortage of professionals certified with Amazon Web Services (AWS). As far as AWS certifications are concerned, there is always a certain debate surrounding them. AWS certification helps you reach new heights in your career with improved pay and job opportunities. What is AWS?

AWS

AWS Certification IT Amazon Web Services

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

Unify transactional and analytical workloads in Snowflake for greater simplicity Many businesses must maintain two separate databases: one to handle transactional workloads and another for analytical workloads.

Data Architecture

Data Architecture Architecture Data Lake Kafka

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

ProjectPro

FEBRUARY 8, 2023

AWS Glue is here to put an end to all your worries! Read this blog to understand everything about AWS Glue that makes it one of the most popular data integration solutions in the industry. Well, AWS Glue is the answer to your problems! In 2023, more than 5140 businesses worldwide have started using AWS Glue as a big data tool.

AWS

AWS Scala Metadata Data Lake

Cloudera Operational Database Replication in a Nutshell

Cloudera

JULY 9, 2021

Using Operational Database Replication Plugin. The Operational Database Replication Plugin is available both as a standalone plugin as well as installed automatically via Cloudera Replication Manager. Operational Database Replication Plugin uses PAM authentication to validate the machine user credentials. Implementation Details.

Database

Database Cloud AWS Systems

Make Database Performance Optimization A Playful Experience With OtterTune

Data Engineering Podcast

JUNE 22, 2021

Summary The database is the core of any system because it holds the data that drives your entire experience. Andy Pavlo researches autonomous database systems, and out of that research he created OtterTune to find the optimal set of parameters to use for your specific workload. How does it relate to your work with NoisePage?

Database

Database MySQL PostgreSQL Data Warehouse

Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue)

Towards Data Science

APRIL 6, 2023

But, instead of GCP, we’ll be using AWS. AWS is, by far, the most popular cloud computing platform, it has an absurd number of products to solve every type of specific problem you imagine. So, join me on this post to develop a full data pipeline from scratch using some pieces from the AWS toolset. S3 is AWS’ blob storage.

Data Pipeline

Data Pipeline AWS Amazon Web Services Python

Delivering High Performance for Cloudera Data Platform Operational Database (HBase) When Using S3

Cloudera

DECEMBER 8, 2021

CDP Operational Database (COD) is a real-time auto-scaling operational database powered by Apache HBase and Apache Phoenix. COD is easy-to-provision and is autonomous, that means developers can provision a new database instance within minutes and start creating prototypes quickly. AWS EC2 instance configurations.

Database

Database AWS Datasets Cloud Storage

Accelerate AI Development with Snowflake

Snowflake

NOVEMBER 11, 2024

Deliver multimodal analytics with familiar SQL syntax Database queries are the underlying force that runs the insights across organizations and powers data-driven experiences for users. Expanded multimodal support enriches responses for diverse tasks such as summarization, classification and entity extraction across various media types.

Unstructured Data

Unstructured Data SQL AWS Healthcare

Google Cloud vs AWS- Which is Better: A Comparison

Knowledge Hut

NOVEMBER 17, 2023

When we talk of top cloud computing providers, there are 2 names that are ruling the markets right now- AWS and Google Cloud. Hosting sites at AWS and Google Cloud has become fairly easy. When it comes to public cloud adoption, AWS is still the leader. All the traffic between the data centers is now encrypted by default.

Google Cloud

Google Cloud AWS Cloud Cloud Computing

Value Proposition of the Cloudera Operational Database over Legacy Apache HBase Deployments

Cloudera

SEPTEMBER 9, 2021

The CDP Operational Database ( COD ) builds on the foundation of existing operational database capabilities that were available with Apache HBase and/or Apache Phoenix in legacy CDH and HDP deployments. AWS and Azure standards) reducing cost, complexity and ensuing risk mitigation in HA scenarios: . Savings opportunity on AWS.

Database

Database AWS Relational Database Cloud

Building Pinterest’s new wide column database using RocksDB

Pinterest Engineering

JANUARY 4, 2024

While KVStore was the client facing abstraction, we also built a storage service called Rockstorewidecolumn : a wide column, schemaless NoSQL database built using RocksDB. Additionally, the last section explains how this new database supports a key platform in the product. All names, addresses, phone numbers are illustrative/not real.

Database

Database Building Datasets Relational Database

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. With Materialize, you can!

Kafka

Kafka Data Lake High Quality Data SQL

Run Your Applications Worldwide Without Worrying About The Database With Planetscale

Data Engineering Podcast

DECEMBER 11, 2022

Managing the operational concerns for your database can be complex and expensive, especially if you need to scale to large volumes of data, high traffic, or geographically distributed usage. No more shipping and praying, you can now know exactly what will change in your database! Can you describe how Planetscale is implemented?

Database

Database MySQL Data Lake MongoDB

Improving Efficiency Of Goku Time Series Database at Pinterest (Part?—?1)

Pinterest Engineering

NOVEMBER 22, 2023

Goku is our in-house time series database providing cost efficient and low latency storage for metrics data. data before the last 2 hours, since GokuS allows only 2 hours of backfill old data in most cases), it stores a copy of the finalized data on AWS EFS (deep persistent storage). Once the data becomes immutable (i.e.

Database

Database Bytes Kafka Architecture

Change Data Capture at Pinterest

Pinterest Engineering

NOVEMBER 18, 2024

Change Data Capture (CDC) is a crucial technology that enables organizations to efficiently track and capture changes in their databases. In this blog post, we’ll explore what CDC is, why it’s important, and our journey of implementing Generic CDC solutions for all online databases at Pinterest. What is Change Data Capture?

Kafka

Kafka MySQL Database Software Engineer

Easily Build Advanced Similarity Search With The Pinecone Vector Database

Data Engineering Podcast

MAY 25, 2021

To eliminate this impedance mismatch Edo Liberty founded Pinecone to build database that works natively with vectors. To eliminate this impedance mismatch Edo Liberty founded Pinecone to build database that works natively with vectors. Mention that you’re a Data Engineering Podcast listener, and they’ll send you a free t-shirt.

Database

Database Building Data Warehouse Machine Learning

Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus

Data Engineering Podcast

AUGUST 6, 2022

For machine learning applications relational models require additional processing to be directly useful, which is why there has been a growth in the use of vector databases. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services.

Machine Learning

Machine Learning Database MySQL MongoDB

A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore

Data Engineering Podcast

MAY 29, 2022

Singlestore aims to cut down on the number of database engines that you need to run so that you can reduce the amount of copying that is required. By supporting fast, in-memory row-based queries and columnar on-disk representation, it lets your transactional and analytical workloads run in the same database.

Database

Database Architecture Data Architecture PostgreSQL

Going From Transactional To Analytical And Self-managed To Cloud On One Database With MariaDB

Data Engineering Podcast

OCTOBER 23, 2022

Summary The database market has seen unprecedented activity in recent years, with new options addressing a variety of needs being introduced on a nearly constant basis. Despite that, there are a handful of databases that continue to be adopted due to their proven reliability and robust features.

Database

Database MySQL Cloud MongoDB

Building a Semantic Book Search: Scale an Embedding Pipeline with Apache Spark and AWS EMR…

Towards Data Science

FEBRUARY 19, 2024

Image from Unsplash Building a Semantic Book Search: Scale an Embedding Pipeline with Apache Spark and AWS EMR Serverless Using OpenAI’s Clip model to support natural language search on a collection of 70k book covers In a previous post I did a little PoC to see if I could use OpenAI’s Clip model to build a semantic book search.

AWS

AWS Building Python Bytes

Navigating the Cloud Modernization Journey: Insights from Precisely’s Partnership with AWS

Precisely

APRIL 11, 2024

In an era where cloud technology is not just an option but a necessity for competitive business operations, the collaboration between Precisely and Amazon Web Services (AWS) has set a new benchmark for mainframe and IBM i modernization. Solution page Precisely on Amazon Web Services (AWS) Precisely brings data integrity to the AWS cloud.

AWS

AWS Amazon Web Services Cloud Cloud Computing

Cloudera Operational Database (COD) Performance Benchmarking: Comparing HDFS and Cloud Storage

Cloudera

NOVEMBER 9, 2023

To deploy high-performance applications at scale, a rugged operational database is essential. Cloudera Operational Database (COD) is a high-performance and highly scalable operational database designed for powering the biggest data applications on the planet at any scale. We tested for two cloud storages, AWS S3 and Azure ABFS.

Cloud Storage

Cloud Storage Database Cloud AWS

12 Golden Signals To Discover Anomalies And Performance Issues on Your AWS RDS Fleet

Zalando Engineering

FEBRUARY 19, 2024

TL;DR : Database per service pattern in the microservices world brings an overhead on operating database instances, observing its health status and anomalies. Often, microservices are implemented with a datastore following the design pattern – database per service , where each service deploys its own database instances.

AWS

AWS Utilities Database SQL

Data Engineering Weekly #195

Data Engineering Weekly

OCTOBER 27, 2024

Astasia Myers: The three components of the unstructured data stack LLMs and vector databases significantly improved the ability to process and understand unstructured data. I never thought of PDF as a self-contained document database, but that seems a reality that we can’t deny.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Optimizing EC2 costs on Databricks

Sync Computing

JANUARY 27, 2025

Databricks clusters and AWS EC2 In todays landscape, big data, which is data too large to fit into a single node machine, is transformed and managed by clusters. M6GD instances are general-purpose EC2 instances equipped with AWS Graviton2 processors and local NVMe-based SSD storage, offering a balanced mix of compute, memory, and storage.

AWS

AWS Data Lake Big Data Machine Learning

Create MySQL and Postgres instances using AWS Cloudformation

Towards Data Science

MARCH 20, 2023

Infrastructure as Code for database practitioners Continue reading on Towards Data Science »

MySQL

MySQL AWS Data Science Database

How to process simple data stream and consume with Lambda

Team Data Science

MARCH 31, 2020

I built a serverless architecture for my simulated credit card complaints stream using, AWS S3 AWS Lambda AWS Kinesis the above picture gives a high-level view of the data flow. Instead of running database queries over stored data, stream processing applications process data continuously in realtime, even before it is stored.

Process

Process AWS Python Architecture

Five Reasons for Migrating HBase Applications to the Cloudera Operational Database in the Public Cloud

Cloudera

SEPTEMBER 1, 2022

Apache HBase has long been the database of choice for business-critical applications across industries. This is primarily because HBase provides unmatched scale, performance, and fault-tolerance that few other databases can come close to. It’s a cloud-native data service that is available on AWS, Azure, and GCP.

Database

Database Cloud NoSQL SQL

What Is Amazon EventBridge?

Edureka

APRIL 22, 2025

Enter Amazon EventBridge, a fully managed serverless event bus service that makes it easier to build event-driven applications using data from your AWS services, custom applications, or SaaS providers. Overall, Amazon EventBridge is a foundational service for anyone looking to embrace modern, event-driven architecture on AWS.

AWS

AWS Architecture Media Cloud

AWS DMS Redshift: Migrate Data to Redshift using AWS DMS

Hevo

AUGUST 2, 2024

AWS offers robust tools to facilitate this, including the AWS Database Migration Service (DMS).Most In the modern data-centric world, efficient data transfer and management are essential to staying competitive. In 2024, over 11441 companies1 […]

AWS

AWS Database-centric Data Warehouse Data Storage

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. With Materialize, you can!

Architecture

Architecture Data Lake High Quality Data SQL

Mainframe to Cloud Migrations: Expert Insights from AWS, Confluent, and Precisely

Precisely

AUGUST 26, 2024

Key Takeaways: Enhance capabilities through partnerships: AWS, Confluent, and Precisely accelerate mainframe modernization efforts, providing you with essential tools for success. To explore this topic, experts from AWS , Confluent , and Precisely came together to discuss the challenges and opportunities of this migration.

AWS

AWS Cloud Data Integration Accessibility

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. There are numerous stream processing engines, near-real-time database engines, streaming SQL systems, etc. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer.

SQL

SQL Data Lake High Quality Data Machine Learning

What Is AWS DMS And Why You Shouldn’t Use It As An ELT

Handling a Regional Outage: Comparing the Response From AWS, Azure and GCP

Webinars

Trending Sources

Handling Network Throttling with AWS EC2 at Pinterest

Webinars

Interesting startup idea: benchmarking cloud platform pricing

AWS RDS PostgreSQL Setup

AWS Shared Responsibility Model – Amazon Web Services

The Roots of Today's Modern Backend Engineering Practices

Datadog’s $65M/year customer mystery solved

How to Speed up Local Development of a Docker Application running on AWS

Is Aws Certification Worth It?

Simplifying Data Architecture and Security to Accelerate Value

Top 6 Amazon S3 Interview Questions

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

Top 6 Amazon Athena Interview Questions

Cloudera Operational Database Replication in a Nutshell

Make Database Performance Optimization A Playful Experience With OtterTune

Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue)

Delivering High Performance for Cloudera Data Platform Operational Database (HBase) When Using S3

Accelerate AI Development with Snowflake

Google Cloud vs AWS- Which is Better: A Comparison

Value Proposition of the Cloudera Operational Database over Legacy Apache HBase Deployments

Building Pinterest’s new wide column database using RocksDB

Troubleshooting Kafka In Production

Run Your Applications Worldwide Without Worrying About The Database With Planetscale

Improving Efficiency Of Goku Time Series Database at Pinterest (Part?—?1)

Change Data Capture at Pinterest

Easily Build Advanced Similarity Search With The Pinecone Vector Database

Optimize Your Machine Learning Development And Serving With The Open Source Vector Database Milvus

A Multipurpose Database For Transactions And Analytics To Simplify Your Data Architecture With Singlestore

Going From Transactional To Analytical And Self-managed To Cloud On One Database With MariaDB

Building a Semantic Book Search: Scale an Embedding Pipeline with Apache Spark and AWS EMR…

Navigating the Cloud Modernization Journey: Insights from Precisely’s Partnership with AWS

Cloudera Operational Database (COD) Performance Benchmarking: Comparing HDFS and Cloud Storage

12 Golden Signals To Discover Anomalies And Performance Issues on Your AWS RDS Fleet

Data Engineering Weekly #195

Optimizing EC2 costs on Databricks

Create MySQL and Postgres instances using AWS Cloudformation

How to process simple data stream and consume with Lambda

Five Reasons for Migrating HBase Applications to the Cloudera Operational Database in the Public Cloud

What Is Amazon EventBridge?

AWS DMS Redshift: Migrate Data to Redshift using AWS DMS

Addressing The Challenges Of Component Integration In Data Platform Architectures

Mainframe to Cloud Migrations: Expert Insights from AWS, Confluent, and Precisely

Tackling Real Time Streaming Data With SQL Using RisingWave

Stay Connected