Accessible, Database and Systems - Data Engineering Digest

Redefining AIOps IT Workflows with Legacy System Visibility

Precisely

DECEMBER 16, 2024

Modern IT environments require comprehensive data for successful AIOps, that includes incorporating data from legacy systems like IBM i and IBM Z into ITOps platforms. AIOps presents enormous promise, but many organizations face hurdles in its implementation: Complex ecosystems made of multiple, fragmented systems that lack interoperability.

Systems

Systems IT Machine Learning Insurance

The Roots of Today's Modern Backend Engineering Practices

The Pragmatic Engineer

NOVEMBER 21, 2023

If you had a continuous deployment system up and running around 2010, you were ahead of the pack: but today it’s considered strange if your team would not have this for things like web applications. We dabbled in network engineering, database management, and system administration. and hand-rolled C -code.

Engineering

Engineering Bytes Cloud Computing AWS

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

Summary Any software system that survives long enough will require some form of migration or evolution. When that system is responsible for the data layer the process becomes more challenging. As you have gone through successive migration projects, how has that influenced the ways that you think about architecting data systems?

Systems

Systems Data Lake High Quality Data Google Cloud

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Summary Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. Can you describe what constitutes a NoSQL database? If you were to start from scratch today, what database would you build?

Non-relational Database

Non-relational Database Relational Database Database Designing

Monetizing Analytics Features: Why Data Visualizations Will Never Be Enough

Think your customers will pay more for data visualizations in your application? Five years ago they may have. But today, dashboards and visualizations have become table stakes. Discover which features will differentiate your application and maximize the ROI of your embedded analytics. Brought to you by Logi Analytics.

Data

A Beginner’s Guide to Geospatial with DuckDB

Simon Späti

FEBRUARY 26, 2025

Traditionally, answering this question would require expensive GIS (Geographic Information Systems) software or complex database setups. Today, DuckDB offers a simpler, more accessible approach for data engineers to tackle spatial problems without specialized infrastructure.

Database

Database Data Engineer Data Engineering Accessibility

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

The world we live in today presents larger datasets, more complex data, and diverse needs, all of which call for efficient, scalable data systems. These systems are built on open standards and offer immense analytical and transactional processing flexibility. These formats are transforming how organizations manage large datasets.

Architecture

Architecture Systems Data Lake Google Cloud

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. Your host is Tobias Macey and today I'm welcoming back Gleb Mezhanskiy to talk about how to reconcile data in database environments Interview Introduction How did you get involved in the area of data management?

Database

Database Data Lake High Quality Data Data Workflow

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Summary Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. This was the core of your recent re-write of the InfluxDB engine.

Database

Database Technology Data Lake High Quality Data

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Data transfer systems are a critical component of data enablement, and building them to support large volumes of information is a complex endeavor. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data.

Systems

Systems Designing Data Lake SQL

How Meta discovers data flows via lineage at scale

Engineering at Meta

JANUARY 22, 2025

It is a critical and powerful tool for scalable discovery of relevant data and data flows, which supports privacy controls across Metas systems. It enhances the traceability of data flows within systems, ultimately empowering developers to swiftly implement privacy controls and create innovative products. Hack, C++, Python, etc.)

Data Warehouse

Data Warehouse SQL Programming Language Data

Interesting startup idea: benchmarking cloud platform pricing

The Pragmatic Engineer

OCTOBER 17, 2024

The startup was able to start operations thanks to getting access to an EU grant called NGI Search grant. The current database includes 2,000 server types in 130 regions and 340 zones. Results are stored in git and their database, together with benchmarking metadata. Each benchmarking task is evaluated sequentially.

Cloud

Cloud AWS Metadata Cloud Computing

Data Integrity for AI: What’s Old is New Again

Precisely

JANUARY 9, 2025

These are all big questions about the accessibility, quality, and governance of data being used by AI solutions today. The simple idea was, hey how can we get more value from the transactional data in our operational systems spanning finance, sales, customer relationship management, and other siloed functions.

Data Integration

Data Integration Hadoop Data Warehouse Data Lake

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Traditional databases excelled at structured data and transactional workloads but struggled with performance at scale as data volumes grew.

Data Lake

Data Lake Cloud Storage Metadata Data Warehouse

Paying down tech debt: further learnings

The Pragmatic Engineer

SEPTEMBER 19, 2024

In the early 90’s, DOS programs like the ones my company made had its own Text UI screen rendering system. This rendering system was easy for me to understand, even on day one. Our rendering system was very memory inefficient, but that could be fixed. By doing so, I got to see every screen of the system.

Recruitment

Recruitment Java Coding Project

Weekend maintenance kicks an Italian bank offline for days

The Pragmatic Engineer

APRIL 11, 2024

From Sella’s status page : “Following the installation of an update to the operating system and related firmware which led to an unstable situation. The changes messed up all major databases in some unexpected way. Still, I’m puzzled by how long the system has been down.

Banking

Banking Utilities Database Engineering

A Look At The Data Systems Behind The Gameplay For League Of Legends

Data Engineering Podcast

NOVEMBER 20, 2022

Summary The majority of blog posts and presentations about data engineering and analytics assume that the consumers of those efforts are internal business users accessing an environment controlled by the business. The biggest challenge with modern data systems is understanding what data you have, where it is located, and who is using it.

Systems

Systems Metadata Data Pipeline MongoDB

What is System Hacking? Types and Prevention

Edureka

APRIL 10, 2025

When you hear the term System Hacking, it might bring to mind shadowy figures behind computer screens and high-stakes cyber heists. In this blog, we’ll explore the definition, purpose, process, and methods of prevention related to system hacking, offering a detailed overview to help demystify the concept.

Systems

Systems Education Banking Accessibility

Building Pinterest’s new wide column database using RocksDB

Pinterest Engineering

JANUARY 4, 2024

In 2020, anticipating the growing needs of the business and to simplify our storage offerings, we decided to consolidate our different key-value systems in the company into a single unified service called KVStore. Additionally, the last section explains how this new database supports a key platform in the product.

Database

Database Building Datasets Relational Database

A Dive into the Basics of Big Data Storage with HDFS

Analytics Vidhya

FEBRUARY 6, 2023

Introduction HDFS (Hadoop Distributed File System) is not a traditional database but a distributed file system designed to store and process big data. It provides high-throughput access to data and is optimized for […] The post A Dive into the Basics of Big Data Storage with HDFS appeared first on Analytics Vidhya.

Data Storage

Data Storage Big Data Hadoop Datasets

Real-Time AI for Crisis Management: Responding Faster with Smarter Systems

Striim

JANUARY 30, 2025

Todays organizations have access to more data than ever before, and consequently are faced with the challenge of determining how to transform this tremendous stream of real-time information into actionable insights. Encryption, access controls, and regulatory compliance (HIPAA, GDPR, etc.) patient records or geolocation data).

Systems

Systems Management Hospitality Healthcare

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Snowflake

NOVEMBER 26, 2024

A consolidated data system to accommodate a big(ger) WHOOP When a company experiences exponential growth over a short period, it’s easy for its data foundation to feel a bit like it was built on the fly. This blog post is the second in a three-part series on migrations. million in cost savings annually.

Data Warehouse

Data Warehouse Cloud PostgreSQL Hadoop

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

Unify transactional and analytical workloads in Snowflake for greater simplicity Many businesses must maintain two separate databases: one to handle transactional workloads and another for analytical workloads. Sensitive data can have enormous value but is oftentimes locked down due to privacy requirements.

Data Architecture

Data Architecture Architecture Data Lake Kafka

Change Data Capture at Pinterest

Pinterest Engineering

NOVEMBER 18, 2024

Change Data Capture (CDC) is a crucial technology that enables organizations to efficiently track and capture changes in their databases. In this blog post, we’ll explore what CDC is, why it’s important, and our journey of implementing Generic CDC solutions for all online databases at Pinterest. What is Change Data Capture?

Kafka

Kafka MySQL Database Software Engineer

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data.

Kafka

Kafka Data Lake High Quality Data SQL

The Future of Data Management Is Agentic AI

Snowflake

APRIL 13, 2025

Agentic AI refers to AI systems that act autonomously on behalf of their users. These systems make decisions, learn from interactions and continuously improve without constant human intervention. Many enterprises face overwhelming data sources, from structured databases to unstructured social media feeds. What is agentic AI?

Data Management

Data Management Management Consulting Unstructured Data

A Data Mesh Implementation: Expediting Value Extraction from ERP/CRM Systems

Towards Data Science

FEBRUARY 6, 2024

ERP and CRM systems are designed and built to fulfil a broad range of business processes and functions. Then you begin researching database objects and find a couple of views, but there are some inconsistencies between them so you do not know which one to use. Your first step might be to locate the orders. Does it sound familiar?

Systems

Systems Raw Data Metadata Data Cleanse

Datadog’s $65M/year customer mystery solved

The Pragmatic Engineer

MAY 11, 2023

A quick summary of these technologies: Prometheus : a time series database. A very popular open-source solution for systems and services monitoring. A fast and open-source column-oriented database management system, which is a popular choice for log management. It evaluates rules and can trigger alerts.

AWS

AWS Software Engineering Software Engineer Google Cloud

Turbocharging Atlas: How we reduced server initialization time to less than 2 minutes

ThoughtSpot

NOVEMBER 5, 2024

ThoughtSpot prioritizes the high availability and minimal downtime of our systems to ensure a seamless user experience. In the realm of modern analytics platforms, where rapid and efficient processing of large datasets is essential, swift metadata access and management are critical for optimal system performance. What is Atlas?

Metadata

Metadata PostgreSQL Java Database

Behind the Scenes with Two New Salary Transparency Websites

The Pragmatic Engineer

APRIL 6, 2023

Our hope is that making salary ranges more accessible on Comprehensive.io For AI, we’ve built a system to efficiently use GPT-4 for this purpose, including auto-crafting prompts and performing pre and post-processing. on the backend, and Postgres for database storage.” ” How does Comprehensive.io

Software Engineer

Software Engineer Software Engineering Datasets Database

Inside Agoda’s Private Cloud - Exclusive

The Pragmatic Engineer

JUNE 13, 2023

For transactional databases, it’s mostly the Microsoft SQL Server, but also other databases like PostgreSQL, ScyllaDB and Couchbase. queries per second as total load, spread across its managed database-as-a-service (DBAAS.) It uses Spark for the data platform. At peak load, Agoda sees around 7.5M

Cloud

Cloud Database Utilities BI

End-to-End Data Engineering System on Real Data with Kafka, Spark, Airflow, Postgres, and Docker

Towards Data Science

FEBRUARY 9, 2024

This involves getting data from an API and storing it in a PostgreSQL database. In the second phase, we’ll develop an application that uses a language model to interact with this database. The second article, which will come later, will delve into creating agents using tools like LangChain to communicate with external databases.

Kafka

Kafka PostgreSQL Data Engineering Data Engineer

Accelerate AI Development with Snowflake

Snowflake

NOVEMBER 11, 2024

Optimize performance and cost with a broader range of model options Cortex AI provides easy access to industry-leading models via LLM functions or REST APIs, enabling you to focus on driving generative AI innovations. We offer a broad selection of models in various sizes, context window lengths and language supports.

Unstructured Data

Unstructured Data SQL AWS Healthcare

How Meta understands data at scale

Engineering at Meta

APRIL 28, 2025

Meta’s vast and diverse systems make it particularly challenging to comprehend its structure, meaning, and context at scale. We discovered that a flexible and incremental approach was necessary to onboard the wide variety of systems and languages used in building Metas products. We believe that privacy drives product innovation.

Metadata

Metadata Data Utilities Data Warehouse

Snowflake’s Fully Managed Service: Beyond Serverless

Snowflake

FEBRUARY 13, 2025

Furthermore, most vendors require valuable time and resources for cluster spin-up and spin-down, disruptive upgrades, code refactoring or even migrations to new editions to access features such as serverless capabilities and performance improvements.

Management

Management Government Cloud Unstructured Data

Data Engineering Weekly #195

Data Engineering Weekly

OCTOBER 27, 2024

Astasia Myers: The three components of the unstructured data stack LLMs and vector databases significantly improved the ability to process and understand unstructured data. I never thought of PDF as a self-contained document database, but that seems a reality that we can’t deny. What are you waiting for?

Data Engineer

Data Engineer Data Engineering Engineering Unstructured Data

Snowflake Startup Challenge 2025: Meet the Top 10

Snowflake

APRIL 9, 2025

KAWA Analytics Digital transformation is an admirable goal, but legacy systems and inefficient processes hold back many companies efforts. It connects structured and unstructured databases across sources and uses a no-code UI or Python for advanced and predictive analytics.

Pharmaceutical

Pharmaceutical Manufacturing Data Ingestion SQL

Building and Evaluating GenAI Knowledge Management Systems using Ollama, Trulens and Cloudera

Cloudera

MAY 23, 2024

A “Knowledge Management System” (KMS) allows businesses to collate this information in one place, but not necessarily to search through it accurately. The interface allows for accurate, business-wide, querying that is quick and easy to scale with access to data sets provided through Cloudera’s platform.

Systems

Systems Building Management Data Lake

Part 1: A Survey of Analytics Engineering Work at Netflix

Netflix Tech

DECEMBER 17, 2024

Analytics Engineers deliver these insights by establishing deep business and product partnerships; translating business challenges into solutions that unblock critical decisions; and designing, building, and maintaining end-to-end analytical systems. Enter DataJunction (DJ).

Engineering

Engineering Entertainment Amazon Web Services Utilities

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. Can you describe what RisingWave is and the story behind it?

SQL

SQL Data Lake High Quality Data Machine Learning

The Best Data Dictionary Tools in 2025

Monte Carlo

APRIL 28, 2025

This basically means the tool updates itself by pulling in changes to data structures from your systems. Finally, access control helps keep things organized. It supports a ton of connectorsfrom SQL databases to machine learning modelsso if youre juggling different tools and platforms, this one can help bring everything together.

Metadata

Metadata Hadoop Data SQL

Data Classification: A Step-by-Step Guide

Monte Carlo

APRIL 8, 2025

In practical terms, this means creating a system where everyone in your organization understands what data they’re handling and how to treat it appropriately, with safeguards if someone accidentally tries to mishandle sensitive information. And most importantlywho really needs access to this data? Want even tighter security?

PostgreSQL

PostgreSQL Medical Database Data

The Challenge of Data Quality and Availability—And Why It’s Holding Back AI and Analytics

Striim

APRIL 18, 2025

If the underlying data is incomplete, inconsistent, or delayed, even the most advanced AI models and business intelligence systems will produce unreliable insights. Many organizations struggle with: Inconsistent data formats : Different systems store data in varied structures, requiring extensive preprocessing before analysis.

High Quality Data

High Quality Data Business Intelligence Unstructured Data Data Pipeline

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

This architecture is valuable for organizations dealing with large volumes of diverse data sources, where maintaining accuracy and accessibility at every stage is a priority. This foundational layer is a repository for various data types, from transaction logs and sensor data to social media feeds and system logs.

Architecture

Architecture Raw Data Pipeline-centric Data Ingestion

Redefining AIOps IT Workflows with Legacy System Visibility

The Roots of Today's Modern Backend Engineering Practices

Webinars

Trending Sources

Data Migration Strategies For Large Scale Systems

Webinars

Designing A Non-Relational Database Engine

Monetizing Analytics Features: Why Data Visualizations Will Never Be Enough

A Beginner’s Guide to Geospatial with DuckDB

Why Open Table Format Architecture is Essential for Modern Data Systems

Reconciling The Data In Your Databases With Datafold

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Designing Data Transfer Systems That Scale

How Meta discovers data flows via lineage at scale

Interesting startup idea: benchmarking cloud platform pricing

Data Integrity for AI: What’s Old is New Again

How Apache Iceberg Is Changing the Face of Data Lakes

Paying down tech debt: further learnings

Weekend maintenance kicks an Italian bank offline for days

A Look At The Data Systems Behind The Gameplay For League Of Legends

What is System Hacking? Types and Prevention

Building Pinterest’s new wide column database using RocksDB

A Dive into the Basics of Big Data Storage with HDFS

Real-Time AI for Crisis Management: Responding Faster with Smarter Systems

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Simplifying Data Architecture and Security to Accelerate Value

Change Data Capture at Pinterest

Troubleshooting Kafka In Production

The Future of Data Management Is Agentic AI

A Data Mesh Implementation: Expediting Value Extraction from ERP/CRM Systems

Datadog’s $65M/year customer mystery solved

Turbocharging Atlas: How we reduced server initialization time to less than 2 minutes

Behind the Scenes with Two New Salary Transparency Websites

Inside Agoda’s Private Cloud - Exclusive

End-to-End Data Engineering System on Real Data with Kafka, Spark, Airflow, Postgres, and Docker

Accelerate AI Development with Snowflake

How Meta understands data at scale

Snowflake’s Fully Managed Service: Beyond Serverless

Data Engineering Weekly #195

Snowflake Startup Challenge 2025: Meet the Top 10

Building and Evaluating GenAI Knowledge Management Systems using Ollama, Trulens and Cloudera

Part 1: A Survey of Analytics Engineering Work at Netflix

Tackling Real Time Streaming Data With SQL Using RisingWave

The Best Data Dictionary Tools in 2025

Data Classification: A Step-by-Step Guide

The Challenge of Data Quality and Availability—And Why It’s Holding Back AI and Analytics

The Race For Data Quality in a Medallion Architecture

Stay Connected