Data and Database - Data Engineering Digest

Scaling Beyond Postgres: How to Choose a Real-Time Analytical Database

Simon Späti

MARCH 11, 2025

Many data engineers and analysts start their journey with Postgres. It’s the Swiss Army knife of databases, and for many applications, it’s more than sufficient. But data volumes grow, analytical demands become more complex, and Postgres stops being enough.

Database

Database Data Warehouse Data Engineering Data Engineer

Azure SQL Database: The Future of Cloud Data Management

ProjectPro

JUNE 6, 2025

What makes the Azure SQL database so popular for OLTP applications? What features of Microsoft Azure SQL database give it an edge over its competitors? To get answers to all these questions, read our ultimate guide on Azure SQL Database! Table of Contents What is Azure SQL Database? How To Connect To Azure SQL Database?

Database

Database SQL Cloud Data Management

How To Set Up Your Data Infrastructure In 2025 – Part 1

Seattle Data Guy

APRIL 15, 2025

Planning out your data infrastructure in 2025 can feel wildly different than it did even five years ago. Everyone is talking about AI, chatbots, LLMs, vector databases, and whether your data stack is “AI-ready.” The ecosystem is louder, flashier, and more fragmented.

Data

Data Database IT Big Data

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Data News — Week 25.02

Christophe Blefari

JANUARY 11, 2025

The Data News are here to stay, the format might vary during the year, but here we are for another year. We published videos about the Forward Data Conference, you can watch Hannes, DuckDB co-creator, keynote about Changing Large Tables. HNY 2025 ( credits ) Happy new year ✨ I wish you the best for 2025. Not really digest.

Data

Data Data Warehouse Programming Language Coding

Monetizing Analytics Features: Why Data Visualizations Will Never Be Enough

Think your customers will pay more for data visualizations in your application? Five years ago they may have. But today, dashboards and visualizations have become table stakes. Discover which features will differentiate your application and maximize the ROI of your embedded analytics. Brought to you by Logi Analytics.

Data

Data Engineering Roadmap, Learning Path,& Career Track 2025

ProjectPro

JUNE 6, 2025

Data Engineering is gradually becoming a popular career option for young enthusiasts. That's why we've created a comprehensive data engineering roadmap for 2023 to guide you through the essential skills and tools needed to become a successful data engineer. Let's dive into ProjectPro's Data Engineer Roadmap!

Data Engineering

Data Engineering Data Engineer Engineering Amazon Web Services

A Beginner’s Guide to Graph Databases

ProjectPro

JUNE 6, 2025

Imagine solving a complex puzzle where each piece represents a unique data point, and their connections form a vast network. Traditional databases often need help to capture these intricate relationships, leaving you with a fragmented view of your data. Table of Contents What is a Graph Database? Why Graph Databases?

Database

Database Database-centric Relational Database MongoDB

Vector Technologies for AI: Extending Your Existing Data Stack

Simon Späti

MARCH 28, 2025

The database landscape has reached 394 ranked systems across multiple categoriesrelational, document, key-value, graph, search engine, time series, and the rapidly emerging vector databases. As AI applications multiply quickly, vector technologies have become a frontier that data engineers must explore.

Technology

Technology PostgreSQL MySQL Database

Deliver Bi-Directional Integration for Oracle Autonomous Database and Databricks

databricks

MAY 27, 2025

Until now, sharing data between enterprise systems often meant complex pipelines, duplication, and lock-in. With Oracles support for Delta Sharing, thats no longer the case.

BI

BI Database Systems Data

New Study: 2018 State of Embedded Analytics Report

Why do some embedded analytics projects succeed while others fail? We surveyed 500+ application teams embedding analytics to find out which analytics features actually move the needle. Read the 6th annual State of Embedded Analytics Report to discover new best practices. Brought to you by Logi Analytics.

Project

SQL Cheat Sheet: Your Go-To Guide for Querying Databases

ProjectPro

JUNE 6, 2025

In today's data-driven world, the ability to efficiently manage and manipulate data is a skill that transcends industries. Whether you're a data analyst, a web developer, or a business professional, Structured Query Language, or SQL, is a fundamental tool in your arsenal. This is where our SQLCheat Sheet comes to the rescue.

SQL

SQL Database Data Analysis Data Science

How To Choose Right AWS Databases for Your Needs

ProjectPro

JUNE 6, 2025

Explore the world of data analytics with the top AWS databases! Check out this blog to discover your ideal database and uncover the power of scalable and efficient solutions for all your data analytical requirements. Let’s understand more about AWS Databases in the following section.

AWS

AWS Database Amazon Web Services MySQL

How Meta discovers data flows via lineage at scale

Engineering at Meta

JANUARY 22, 2025

Data lineage is an instrumental part of Metas Privacy Aware Infrastructure (PAI) initiative, a suite of technologies that efficiently protect user privacy. It is a critical and powerful tool for scalable discovery of relevant data and data flows, which supports privacy controls across Metas systems.

Data Warehouse

Data Warehouse SQL Programming Language Data

DynamoDB vs. MongoDB- Battle of The Best NoSQL Databases

ProjectPro

JUNE 6, 2025

With a CAGR of 30%, the NoSQL Database Market is likely to surpass USD 36.50 Businesses worldwide are inclining towards analytical solutions to optimize their decision-making abilities based on data-driven techniques. Two of the most popular NoSQL database services available in the industry are AWS DynamoDB and MongoDB.

NoSQL

NoSQL MongoDB Database Amazon Web Services

Modern Data Architecture for Embedded Analytics

Every data-driven project calls for a review of your data architecture—and that includes embedded analytics. Before you add new dashboards and reports to your application, you need to evaluate your data architecture with analytics in mind. Expert guidelines for a high-performance, analytics-ready modern data architecture.

Data Architecture

FAISS Vector Database: A High-Performance AI Similarity Search

ProjectPro

JUNE 6, 2025

Traditional search methods face computational bottlenecks, especially when dealing with high-dimensional data, leading to slow query times and high resource usage. Want to find similar images in a massive database? This blog explores the FAISS Vector Database, a versatile tool applicable to various applications.

Database

Database Datasets Algorithm Hadoop

How to Use Pinecone Vector Database in your AI Projects?

ProjectPro

JUNE 6, 2025

” This blog will align with that vision by exploring what Pinecone Vector Database is, how to use Pinecone Vector Database, and explore a comprehensive Pinecone Vector Database tutorial with a simple example. Table of Contents What is a Pinecone Vector Database? Pinecone is helpful in this situation.

Database

Database Project Metadata Unstructured Data

Chroma DB - Vector Database to Store Large-Scale Embeddings

ProjectPro

JUNE 6, 2025

Imagine you're a detective trying to identify a suspect from a database of millions of mugshots. Embeddings are numerical representations of data, like images, text, or audio. Each movie in your database has a description or review. Embedding Function: A function that calculates embeddings from raw data.

Database

Database Metadata Medical Recruitment

Integrating DuckDB & Python: An Analytics Guide

KDnuggets

JUNE 10, 2025

By Josep Ferrer , KDnuggets AI Content Specialist on June 10, 2025 in Python Image by Author DuckDB is a fast, in-process analytical database designed for modern data analysis. DuckDB is a free, open-source, in-process OLAP database built for fast, local analytics. Let’s dive in! What Is DuckDB? What Are DuckDB’s Main Features?

Python

Python Data Science SQL Machine Learning

Get Better Network Graphs & Save Analysts Time

Many organizations today are unlocking the power of their data by using graph databases to feed downstream analytics, enahance visualizations, and more. Watch this essential video with Senzing CEO Jeff Jonas on how adding entity resolution to a graph database condenses network graphs to improve analytics and save your analysts time.

Database

Data Integrity for AI: What’s Old is New Again

Precisely

JANUARY 9, 2025

Does the LLM capture all the relevant data and context required for it to deliver useful insights? Not to mention the crazy stories about Gen AI making up answers without the data to back it up!) Are we allowed to use all the data, or are there copyright or privacy concerns? But simply moving the data wasnt enough.

Data Integration

Data Integration Hadoop Data Lake Data Warehouse

Data News — Week 24.11

Christophe Blefari

MARCH 15, 2024

Saying mainly that " Sora is a tool to extend creativity " Last point Mira has been mocked and criticised online because as a CTO she wasn't able to say on which public / licensed data Sora has been trained on. Pandera, a data validation library for dataframes, now supports Polars.

Metadata

Metadata Software Engineer Software Engineering Data Warehouse

Data Pruning MNIST: How I Hit 99% Accuracy Using Half the Data

Towards Data Science

JANUARY 30, 2025

Building more efficient AI TLDR : Data-centric AI can create more efficient and accurate models. I experimented with data pruning on MNIST to classify handwritten digits. What if I told you that using just 50% of your training data could achieve better results than using the fulldataset? Image byauthor.

Database-centric

Database-centric Datasets Data Architecture

Understanding Change Data Capture (CDC) in MySQL and PostgreSQL: BinLog vs. WAL + Logical Decoding

Towards Data Science

JANUARY 7, 2025

How CDC tools use MySQL Binlog and PostgreSQL WAL with logical decoding for real-time data streaming Photo by Matoo.Studio on Unsplash CDC (Change Data Capture) is a term that has been gaining significant attention over the past few years. Log-based CDC : This method utilizes the databases transaction log to capture every change made.

PostgreSQL

PostgreSQL MySQL Bytes Data Lake

Top 10 Data Engineering & AI Trends for 2025

Monte Carlo

NOVEMBER 26, 2024

Here’s where leading futurist and investor Tomasz Tunguz thinks data and AI stands at the end of 2024—plus a few predictions of my own. 2025 data engineering trends incoming. Small data is the future of AI (Tomasz) 7. The lines are blurring for analysts and data engineers (Barr) 8. Table of Contents 1.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Monte Carlo Recognized as the #1 Leader in Data Observability and Data Quality by G2

Monte Carlo

DECEMBER 18, 2024

As we turn the corner into 2025, were excited to announce that for the 7th quarter in a row, Monte Carlo has been named G2s #1 Data Observability Platform, as well as #1 in the Data Quality category. Knowing our products are helping our customers achieve their data goals means everything to us. Image courtesy of G2.

High Quality Data

High Quality Data Database Data Software Engineer

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Traditional databases excelled at structured data and transactional workloads but struggled with performance at scale as data volumes grew.

Data Lake

Data Lake Cloud Storage Metadata Data Warehouse

Stop Overcomplicating Data Quality

Towards Data Science

DECEMBER 10, 2024

Three Zero-Cost Solutions That Take Hours, NotMonths A data quality certified pipeline. Source: unsplash.com In my career, data quality initiatives have usually meant big changes. Whats more, fixing the data quality issues this way often leads to new problems. Create a custom dashboard for your specific data qualityproblem.

PostgreSQL

PostgreSQL Data SQL Python

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Snowflake

NOVEMBER 26, 2024

Many of our customers — from Marriott to AT&T — start their journey with the Snowflake AI Data Cloud by migrating their data warehousing workloads to the platform. Today we’re focusing on customers who migrated from a cloud data warehouse to Snowflake and some of the benefits they saw. million in cost savings annually.

Data Warehouse

Data Warehouse Cloud PostgreSQL Data Lake

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

It’s easy these days for an organization’s data infrastructure to begin looking like a maze, with an accumulation of point solutions here and there. Snowflake is committed to doing just that by continually adding features to help our customers simplify how they architect their data infrastructure. Here’s a closer look.

Data Architecture

Data Architecture Architecture Data Lake Kafka

10 AWS Redshift Project Ideas to Build Data Pipelines

ProjectPro

JUNE 6, 2025

Today, businesses use traditional data warehouses to centralize massive amounts of raw data from business operations. Since data needs to be accessible easily, organizations use Amazon Redshift as it offers seamless integration with business intelligence tools and helps you train and deploy machine learning models using SQL commands.

Data Pipeline

Data Pipeline AWS Project Building

Introducing Agent Bricks: Auto-Optimized Agents Using Your Data

databricks

JUNE 11, 2025

Events Data + AI Summit Data + AI World Tour Data Intelligence Days Event Calendar Blog and Podcasts Databricks Blog Explore news, product announcements, and more Databricks Mosaic Research Blog Discover the latest in our Gen AI research Data Brew Podcast Let’s talk data! REGISTER Ready to get started?

Entertainment

Entertainment Manufacturing Consulting Retail

Build Better Data Pipelines with SQL and Python in Snowflake

Snowflake

JUNE 10, 2025

Data transformations are the engine room of modern data operations — powering innovations in AI, analytics and applications. As the core building blocks of any effective data strategy, these transformations are crucial for constructing robust and scalable data pipelines. This puts data engineers in a critical position.

Data Pipeline

Data Pipeline SQL Python Building

Interesting startup idea: benchmarking cloud platform pricing

The Pragmatic Engineer

OCTOBER 17, 2024

The current database includes 2,000 server types in 130 regions and 340 zones. Storing data: data collected is stored to allow for historical comparisons. Results are stored in git and their database, together with benchmarking metadata. Visualizing the data: the frontend that allows querying of live and historic data.

Cloud

Cloud Metadata AWS Cloud Computing

Unlocking the Power of Geospatial Data for Insights

Snowflake

JANUARY 15, 2025

Over the last three geospatial-centric blog posts, weve covered the basics of what geospatial data is, how it works in the broader world of data and how it specifically works in Snowflake based on our native support for GEOGRAPHY , GEOMETRY and H3. But there is so much more you can do with geospatial data in your Snowflake account!

Transportation

Transportation BI Database-centric Metadata

100 Data Modelling Interview Questions To Prepare For In 2025

ProjectPro

JUNE 6, 2025

Data modeling is a crucial skill for every big data professional, but it can be challenging to master. So, if you are preparing for a data modelling interview, you have landed on the right page. We have compiled the top 50 data modelling interview questions and answers from beginner to advanced levels. billion by 2028.

Data Warehouse

Data Warehouse NoSQL PostgreSQL Relational Database

50+ Azure Data Factory Interview Questions and Answers [2025]

ProjectPro

JUNE 6, 2025

Discover 50+ Azure Data Factory interview questions and answers for all experience levels. A report by ResearchAndMarkets projects the global data integration market size to grow from USD 12.24 A report by ResearchAndMarkets projects the global data integration market size to grow from USD 12.24 billion in 2020 to USD 24.84

Data Lake

Data Lake Metadata SQL Datasets

Top 10 Data & AI Trends for 2025

Towards Data Science

DECEMBER 16, 2024

Agentic AI, small data, and the search for value in the age of the unstructured datastack. Heres where leading futurist and investor Tomasz Tunguz thinks data and AI stands at the end of 2024plus a few predictions of myown. 2025 data engineering trends incoming. Search: tools that leverage a corpus of data to answer questions 3.

Unstructured Data

Unstructured Data Data Food Data Engineering

Data Engineering Weekly #221

Data Engineering Weekly

MAY 25, 2025

Dagster Components is now here Components provides a modular architecture that enables data practitioners to self-serve while maintaining engineering quality. Understanding this fact will help data tools break new ground with the advancement of AI agents. and Lite 2.0) to pinpoint drop-offs and high retention sections.

Data Engineering

Data Engineering Data Engineer Engineering PostgreSQL

Unapologetically Technical Episode 17 – Semih Salihoglu

Jesse Anderson

FEBRUARY 11, 2025

Semih is a researcher and entrepreneur with a background in distributed systems and databases. He then pursued his doctoral studies at Stanford University, delving into the complexities of database systems. Dont forget to subscribe to my YouTube channel to get the latest on Unapologetically Technical!

Computer Science

Computer Science Database Design Software Engineer Software Engineering

7 Best Data Warehousing Tools for Efficient Data Storage Needs

ProjectPro

JUNE 6, 2025

Data is often referred to as the new oil, and just like oil requires refining to become useful fuel, data also needs a similar transformation to unlock its true value. This transformation is where data warehousing tools come into play, acting as the refining process for your data. Why Choose a Data Warehousing Tool?

Data Storage

Data Storage PostgreSQL Data Warehouse AWS

What Is AWS DMS And Why You Shouldn’t Use It As An ELT

Seattle Data Guy

NOVEMBER 8, 2024

Whether it was moving data from a local database instance to S3 or some other data storage layer. As… Read more The post What Is AWS DMS And Why You Shouldn’t Use It As An ELT appeared first on Seattle Data Guy. It was interesting to see AWS DMS used in this manner. But it’s not what DMS was built for.

AWS

AWS IT Data Storage Database

How to Transition from ETL Developer to Data Engineer?

ProjectPro

JUNE 6, 2025

In the thought process of making a career transition from ETL developer to data engineer job roles? Read this blog to know how various data-specific roles, such as data engineer, data scientist, etc., differ from ETL developer and the additional skills you need to transition from ETL developer to data engineer job roles.

Data Engineer

Data Engineer Data Engineering Engineering ETL Tools

15+ Exciting Python Flask Projects for Data Science Enthusiasts

ProjectPro

JUNE 6, 2025

Are you a data science enthusiast looking to enhance your Python Flask skills? Check out these exciting python flask projects that will help you apply your Flask knowledge to solve real-world data science challenges. Here is the list of the best Python Flask projects ideal for data experts. This is where Python Flask comes in.

Data Science

Data Science Python Project Google Cloud

Scaling Beyond Postgres: How to Choose a Real-Time Analytical Database

Azure SQL Database: The Future of Cloud Data Management

Webinars

Trending Sources

How To Set Up Your Data Infrastructure In 2025 – Part 1

Webinars

Data News — Week 25.02

Monetizing Analytics Features: Why Data Visualizations Will Never Be Enough

Data Engineering Roadmap, Learning Path,& Career Track 2025

A Beginner’s Guide to Graph Databases

Vector Technologies for AI: Extending Your Existing Data Stack

Deliver Bi-Directional Integration for Oracle Autonomous Database and Databricks

New Study: 2018 State of Embedded Analytics Report

SQL Cheat Sheet: Your Go-To Guide for Querying Databases

How To Choose Right AWS Databases for Your Needs

How Meta discovers data flows via lineage at scale

DynamoDB vs. MongoDB- Battle of The Best NoSQL Databases

Modern Data Architecture for Embedded Analytics

FAISS Vector Database: A High-Performance AI Similarity Search

How to Use Pinecone Vector Database in your AI Projects?

Chroma DB - Vector Database to Store Large-Scale Embeddings

Integrating DuckDB & Python: An Analytics Guide

Get Better Network Graphs & Save Analysts Time

Data Integrity for AI: What’s Old is New Again

Data News — Week 24.11

Data Pruning MNIST: How I Hit 99% Accuracy Using Half the Data

Understanding Change Data Capture (CDC) in MySQL and PostgreSQL: BinLog vs. WAL + Logical Decoding

Top 10 Data Engineering & AI Trends for 2025

Monte Carlo Recognized as the #1 Leader in Data Observability and Data Quality by G2

How Apache Iceberg Is Changing the Face of Data Lakes

Stop Overcomplicating Data Quality

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Simplifying Data Architecture and Security to Accelerate Value

10 AWS Redshift Project Ideas to Build Data Pipelines

Introducing Agent Bricks: Auto-Optimized Agents Using Your Data

Build Better Data Pipelines with SQL and Python in Snowflake

Interesting startup idea: benchmarking cloud platform pricing

Unlocking the Power of Geospatial Data for Insights

100 Data Modelling Interview Questions To Prepare For In 2025

50+ Azure Data Factory Interview Questions and Answers [2025]

Top 10 Data & AI Trends for 2025

Data Engineering Weekly #221

Unapologetically Technical Episode 17 – Semih Salihoglu

7 Best Data Warehousing Tools for Efficient Data Storage Needs

What Is AWS DMS And Why You Shouldn’t Use It As An ELT

How to Transition from ETL Developer to Data Engineer?

15+ Exciting Python Flask Projects for Data Science Enthusiasts

Stay Connected