Accessible and Database - Data Engineering Digest

What are Data Access Object and Data Transfer Object in Python?

Analytics Vidhya

FEBRUARY 6, 2023

Especially while working with databases, it is often considered a good practice to follow a design pattern. This ensures easy […] The post What are Data Access Object and Data Transfer Object in Python? The pattern is not an actual code but a template that can be used to solve problems in different situations.

Accessibility

Accessibility Accessible Python Database

SQL Injection: The Cyber Attack Hiding in Your Database

Analytics Vidhya

FEBRUARY 2, 2023

Introduction SQL injection is an attack in which a malicious user can insert arbitrary SQL code into a web application’s query, allowing them to gain unauthorized access to a database. We can use this to steal sensitive information or make unauthorized changes to the data stored in the database.

Database

Database SQL Coding Accessible

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Summary Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. Can you describe what constitutes a NoSQL database? If you were to start from scratch today, what database would you build?

Non-relational Database

Non-relational Database Relational Database Database Designing

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Mirroring SQL Server Database to Microsoft Fabric

Striim

NOVEMBER 19, 2024

SQL2Fabric Mirroring is a new fully managed service offered by Striim to mirror on premise SQL Databases. It’s a collaborative service between Striim and Microsoft based on Fabric Open Mirroring that enables real-time data replication from on-premise SQL Server databases to Azure Fabric OneLake.

SQL

SQL Database Data Warehouse Data Pipeline

Monetizing Analytics Features: Why Data Visualizations Will Never Be Enough

Think your customers will pay more for data visualizations in your application? Five years ago they may have. But today, dashboards and visualizations have become table stakes. Discover which features will differentiate your application and maximize the ROI of your embedded analytics. Brought to you by Logi Analytics.

Data

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. Your host is Tobias Macey and today I'm welcoming back Gleb Mezhanskiy to talk about how to reconcile data in database environments Interview Introduction How did you get involved in the area of data management?

Database

Database Data Lake High Quality Data Data Workflow

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Summary Building a database engine requires a substantial amount of engineering effort and time investment. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database.

Database

Database Technology Data Lake High Quality Data

A Beginner’s Guide to Geospatial with DuckDB

Simon Späti

FEBRUARY 26, 2025

Traditionally, answering this question would require expensive GIS (Geographic Information Systems) software or complex database setups. Today, DuckDB offers a simpler, more accessible approach for data engineers to tackle spatial problems without specialized infrastructure.

Database

Database Data Engineer Data Engineering Accessible

The Roots of Today's Modern Backend Engineering Practices

The Pragmatic Engineer

NOVEMBER 21, 2023

We didn’t build our applications in neat containers, but in bulky monoliths which commingled business, database, backend, and frontend logic. We dabbled in network engineering, database management, and system administration. Our deployments were initially manual. What was the other driver of adoption?

Engineering

Engineering Bytes Cloud Computing AWS

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Traditional databases excelled at structured data and transactional workloads but struggled with performance at scale as data volumes grew.

Data Lake

Data Lake Cloud Storage Metadata Data Warehouse

Interesting startup idea: benchmarking cloud platform pricing

The Pragmatic Engineer

OCTOBER 17, 2024

The startup was able to start operations thanks to getting access to an EU grant called NGI Search grant. The current database includes 2,000 server types in 130 regions and 340 zones. Results are stored in git and their database, together with benchmarking metadata. Each benchmarking task is evaluated sequentially.

Cloud

Cloud AWS Metadata Cloud Computing

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

Unify transactional and analytical workloads in Snowflake for greater simplicity Many businesses must maintain two separate databases: one to handle transactional workloads and another for analytical workloads. Sensitive data can have enormous value but is oftentimes locked down due to privacy requirements.

Data Architecture

Data Architecture Architecture Data Lake Kafka

Data Integrity for AI: What’s Old is New Again

Precisely

JANUARY 9, 2025

These are all big questions about the accessibility, quality, and governance of data being used by AI solutions today. And then a wide variety of business intelligence (BI) tools popped up to provide last mile visibility with much easier end user access to insights housed in these DWs and data marts.

Data Integration

Data Integration Hadoop Data Warehouse Data Lake

dbt multi-project collaboration

Christophe Blefari

OCTOBER 19, 2023

Private means the model is accessible only within the same group—a model can be only in one group. Once you have dbt build the core project a manifest.json will be generated and tables will be created in the database. Or even more, versioning models. Or even more, versioning models. It can be private, protected or public.

Project

Project Finance SQL Government

Airflow XCOM: The Ultimate Guide

Marc Lamberti

SEPTEMBER 22, 2023

One solution could be to store the accuracies in a database and fetch them back in the task choosing_model with an SQL request. Keep in mind that Airflow stores XComs in the database. To access XComs, go to the user interface, then Admin and XComs. First thing first, xcom_push is accessible only from a task instance object.

MySQL

MySQL Data Pipeline Database Python

A Dive into the Basics of Big Data Storage with HDFS

Analytics Vidhya

FEBRUARY 6, 2023

Introduction HDFS (Hadoop Distributed File System) is not a traditional database but a distributed file system designed to store and process big data. It provides high-throughput access to data and is optimized for […] The post A Dive into the Basics of Big Data Storage with HDFS appeared first on Analytics Vidhya.

Data Storage

Data Storage Big Data Hadoop Datasets

Snowflake’s Fully Managed Service: Beyond Serverless

Snowflake

FEBRUARY 13, 2025

Furthermore, most vendors require valuable time and resources for cluster spin-up and spin-down, disruptive upgrades, code refactoring or even migrations to new editions to access features such as serverless capabilities and performance improvements.

Management

Management Government Cloud Unstructured Data

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. With Materialize, you can!

Kafka

Kafka Data Lake High Quality Data SQL

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Snowflake

NOVEMBER 26, 2024

Before it migrated to Snowflake in 2022, WHOOP was using a catalog of tools — Amazon Redshift for SQL queries and BI tooling, Dremio for a data lake, PostgreSQL databases and others — that had ultimately become expensive to manage and difficult to maintain, let alone scale. million in cost savings annually.

Data Warehouse

Data Warehouse Cloud PostgreSQL Hadoop

Change Data Capture at Pinterest

Pinterest Engineering

NOVEMBER 18, 2024

Change Data Capture (CDC) is a crucial technology that enables organizations to efficiently track and capture changes in their databases. In this blog post, we’ll explore what CDC is, why it’s important, and our journey of implementing Generic CDC solutions for all online databases at Pinterest. What is Change Data Capture?

Kafka

Kafka MySQL Database Software Engineer

Snowflake Startup Spotlight: Innova-Q

Snowflake

APRIL 7, 2025

With advanced encryption, strict access controls and strong data governance, Snowflake helps us ensure the confidentiality and protection of our clients information. We chose Snowflake for its robust, scalable and secure data infrastructure, perfectly suited for handling complex regulatory and quality data efficiently.

Food

Food Data Transparency Software Engineer Software Engineering

How Meta discovers data flows via lineage at scale

Engineering at Meta

JANUARY 22, 2025

Data lineage refers to the process of tracing the journey of data as it moves through various systems, illustrating how data transitions from one data asset, such as a database table (the source asset), to another (the sink asset). In this blog, we will delve into an early stage in PAI implementation: data lineage.

Data Warehouse

Data Warehouse SQL Programming Language Data

Inside Agoda’s Private Cloud - Exclusive

The Pragmatic Engineer

JUNE 13, 2023

For transactional databases, it’s mostly the Microsoft SQL Server, but also other databases like PostgreSQL, ScyllaDB and Couchbase. queries per second as total load, spread across its managed database-as-a-service (DBAAS.) It uses Spark for the data platform. At peak load, Agoda sees around 7.5M

Cloud

Cloud Database Utilities BI

Weekend maintenance kicks an Italian bank offline for days

The Pragmatic Engineer

APRIL 11, 2024

” It’s hard to get too much out of these vague reports, but here’s my attempt at decrypting what might have happened: An Oracle version was updated, and/or database schema changes were made. The changes messed up all major databases in some unexpected way. Sella needs Oracle’s help to figure things out.

Banking

Banking Utilities Database Engineering

Accelerate AI Development with Snowflake

Snowflake

NOVEMBER 11, 2024

Optimize performance and cost with a broader range of model options Cortex AI provides easy access to industry-leading models via LLM functions or REST APIs, enabling you to focus on driving generative AI innovations. We offer a broad selection of models in various sizes, context window lengths and language supports.

Unstructured Data

Unstructured Data SQL AWS Healthcare

An IBM Z Data Integration Success Story

Precisely

MARCH 28, 2025

However, they faced a growing challenge: integrating and accessing data across a complex environment. Some departments used IBM Db2, while others relied on VSAM files or IMS databases creating complex data governance processes and costly data pipeline maintenance. The result?

Data Integration

Data Integration Pipeline-centric Database-centric Kafka

Data Classification: A Step-by-Step Guide

Monte Carlo

APRIL 8, 2025

It’s the difference between knowing which documents can be shared in a public Slack channel versus which ones need encrypted storage and limited access. And most importantlywho really needs access to this data? Step 2: Hunt Down the Sensitive Stuff Now its time to play detective in your database. Databases change.

PostgreSQL

PostgreSQL Medical Database Data

Behind the Scenes with Two New Salary Transparency Websites

The Pragmatic Engineer

APRIL 6, 2023

Our hope is that making salary ranges more accessible on Comprehensive.io on the backend, and Postgres for database storage.” We are super excited to start playing vector databases - ones that store and index vector embeddings we get from natural language processing models like OpenAI embeddings.

Software Engineering

Software Engineering Software Engineer Datasets Database

Datadog’s $65M/year customer mystery solved

The Pragmatic Engineer

MAY 11, 2023

A quick summary of these technologies: Prometheus : a time series database. A fast and open-source column-oriented database management system, which is a popular choice for log management. Ukraine is one of the few countries for which we have access to nationwide data, through job site Djinni.

AWS

AWS Software Engineer Software Engineering Google Cloud

Turbocharging Atlas: How we reduced server initialization time to less than 2 minutes

ThoughtSpot

NOVEMBER 5, 2024

In the realm of modern analytics platforms, where rapid and efficient processing of large datasets is essential, swift metadata access and management are critical for optimal system performance. This article highlights the performance optimizations implemented to initialize Atlas, our in-house Graph database, in less than two minutes.

Metadata

Metadata PostgreSQL Java Database

Part 1: A Survey of Analytics Engineering Work at Netflix

Netflix Tech

DECEMBER 17, 2024

Metric definitions are often scattered across various databases, documentation sites, and code repositories, making it difficult for analysts and data scientists to find reliable information quickly. Enter DataJunction (DJ).

Engineering

Engineering Entertainment Amazon Web Services Utilities

Why SQL is THE Language to Learn for Data Science

KDnuggets

OCTOBER 12, 2023

SQL is the essential data science language due to its universal database accessibility, efficient data cleaning capabilities, seamless integration with other languages, and requirement for most data science jobs.

Data Science

Data Science SQL Database Data

Snowflake Startup Challenge 2025: Meet the Top 10

Snowflake

APRIL 9, 2025

It connects structured and unstructured databases across sources and uses a no-code UI or Python for advanced and predictive analytics. Users can work with the data by defining business concepts instead of writing database queries, and data structures can be reoptimized without major infrastructure changes as business needs evolve.

Pharmaceutical

Pharmaceutical Manufacturing Data Ingestion SQL

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Summary Building a data platform that is enjoyable and accessible for all of its end users is a substantial challenge. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. With Materialize, you can!

Architecture

Architecture Data Lake High Quality Data SQL

OpenAI Acquires Rockset

Rockset

JUNE 21, 2024

We developed our search and analytics database, taking full advantage of the cloud, to eliminate the complexity inherent in the data infrastructure needed for these apps. With this acquisition, what we’ve developed over the years will help make AI accessible to all in a safe and beneficial way.

Database

Database Cloud Accessible Accessibility

Paying down tech debt: further learnings

The Pragmatic Engineer

SEPTEMBER 19, 2024

These scripts mixed database access, HTML generation, and logic in unexpected ways. If it isn’t working, then there might be problems delivering electricity. At ISO-NE, the electricity price publishing system was a pile of Bash, Perl, PHP, and C. Sometimes a script would generate another script.

Recruitment

Recruitment Java Coding Project

Handling Network Throttling with AWS EC2 at Pinterest

Pinterest Engineering

APRIL 7, 2025

However, as we were migrating our widecolumn database , we saw significant performance degradation across many clusters, especially for our bulk-updated workloads. For these use cases, typically datasets are generated offline in batch jobs and get bulk uploaded from S3 to the database running on EC2.

AWS

AWS Bytes Database Data Ingestion

Data News — Week 24.09

Christophe Blefari

MARCH 2, 2024

Currently it support around 250 sources, which is a subset of all Airbyte sources (only the ones written in Python) and it seems it does not support connecting to classic databases. ingestr — ingestr is a CLI tool to copy data between any databases with a single command seamlessly. It's built on top of dlt. Written in Go.

Data

Data Python Database BI

Data Engineering Weekly #195

Data Engineering Weekly

OCTOBER 27, 2024

Astasia Myers: The three components of the unstructured data stack LLMs and vector databases significantly improved the ability to process and understand unstructured data. I never thought of PDF as a self-contained document database, but that seems a reality that we can’t deny. What are you waiting for?

Data Engineer

Data Engineer Data Engineering Engineering Unstructured Data

Introducing Compute-Compute Separation for Real-Time Analytics

Rockset

MARCH 1, 2023

Every database built for real-time analytics has a fundamental limitation. When you deconstruct the core database architecture, deep in the heart of it you will find a single component that is performing two distinct competing functions: real-time data ingestion and query serving. So they are not suitable for real-time analytics.

Data Ingestion

Data Ingestion Database Architecture SQL

Accelerate Development and Productivity with DevOps in Snowflake

Snowflake

JUNE 10, 2024

Declaratively manage database objects : Embrace a declarative approach for defining and managing Snowflake objects, using Python or SQL, with Database Change Management. A simple pip install snowflake grants developers access, eliminating the need to juggle between SQL and Python or wrestle with cumbersome syntax.

Python

Python Data Pipeline SQL Database

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

Data Versioning and Time Travel Open Table Formats empower users with time travel capabilities, allowing them to access previous dataset versions. This feature is essential in environments where multiple users or applications access, modify, or analyze the same data simultaneously. Amazon S3, Azure Data Lake, or Google Cloud Storage).

Architecture

Architecture Systems Data Lake Google Cloud

AWS Shared Responsibility Model – Amazon Web Services

Edureka

APRIL 22, 2025

Meanwhile, customers are responsible for protecting resources within the cloud, including operating systems, applications, data, and the configuration of security controls such as Identity and Access Management (IAM) and security groups.

Amazon Web Services

Amazon Web Services AWS Cloud Data Governance

What are Data Access Object and Data Transfer Object in Python?

SQL Injection: The Cyber Attack Hiding in Your Database

Webinars

Trending Sources

Designing A Non-Relational Database Engine

Webinars

Mirroring SQL Server Database to Microsoft Fabric

Monetizing Analytics Features: Why Data Visualizations Will Never Be Enough

Reconciling The Data In Your Databases With Datafold

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

A Beginner’s Guide to Geospatial with DuckDB

The Roots of Today's Modern Backend Engineering Practices

How Apache Iceberg Is Changing the Face of Data Lakes

Interesting startup idea: benchmarking cloud platform pricing

Simplifying Data Architecture and Security to Accelerate Value

Data Integrity for AI: What’s Old is New Again

dbt multi-project collaboration

Airflow XCOM: The Ultimate Guide

A Dive into the Basics of Big Data Storage with HDFS

Snowflake’s Fully Managed Service: Beyond Serverless

Troubleshooting Kafka In Production

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Change Data Capture at Pinterest

Snowflake Startup Spotlight: Innova-Q

How Meta discovers data flows via lineage at scale

Inside Agoda’s Private Cloud - Exclusive

Weekend maintenance kicks an Italian bank offline for days

Accelerate AI Development with Snowflake

An IBM Z Data Integration Success Story

Data Classification: A Step-by-Step Guide

Behind the Scenes with Two New Salary Transparency Websites

Datadog’s $65M/year customer mystery solved

Turbocharging Atlas: How we reduced server initialization time to less than 2 minutes

Part 1: A Survey of Analytics Engineering Work at Netflix

Why SQL is THE Language to Learn for Data Science

Snowflake Startup Challenge 2025: Meet the Top 10

Addressing The Challenges Of Component Integration In Data Platform Architectures

OpenAI Acquires Rockset

Paying down tech debt: further learnings

Top 6 Amazon S3 Interview Questions

Handling Network Throttling with AWS EC2 at Pinterest

Data News — Week 24.09

Data Engineering Weekly #195

Introducing Compute-Compute Separation for Real-Time Analytics

Accelerate Development and Productivity with DevOps in Snowflake

Why Open Table Format Architecture is Essential for Modern Data Systems

AWS Shared Responsibility Model – Amazon Web Services

Stay Connected