Database, Document and Systems - Data Engineering Digest

Surveying The Market Of Database Products

Data Engineering Podcast

OCTOBER 29, 2023

Summary Databases are the core of most applications, whether transactional or analytical. In recent years the selection of database products has exploded, making the critical decision of which engine(s) to use even more difficult. What are the aspects of the database market that keep you interested as a VP of product?

Database

Database BI SQL Machine Learning

Vector Technologies for AI: Extending Your Existing Data Stack

Simon Späti

MARCH 28, 2025

The database landscape has reached 394 ranked systems across multiple categoriesrelational, document, key-value, graph, search engine, time series, and the rapidly emerging vector databases. What fundamental differences exist between AI-focused vector databases and analytical vector engines like DuckDB or DataFusion?

Technology

Technology PostgreSQL MySQL Database

Azure SQL Database: The Future of Cloud Data Management

ProjectPro

JUNE 6, 2025

What makes the Azure SQL database so popular for OLTP applications? What features of Microsoft Azure SQL database give it an edge over its competitors? To get answers to all these questions, read our ultimate guide on Azure SQL Database! Table of Contents What is Azure SQL Database? How To Connect To Azure SQL Database?

Database

Database SQL Cloud Data Management

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

An educational side project

The Pragmatic Engineer

JUNE 1, 2023

Juraj included system monitoring parts which monitor the server’s capacity he runs the app on: The monitoring page on the Rides app And it doesn’t end here. Juraj created a systems design explainer on how he built this project, and the technologies used: The systems design diagram for the Rides application The app uses: Node.js

Education

Education Project PostgreSQL Software Engineer

A Beginner’s Guide to Graph Databases

ProjectPro

JUNE 6, 2025

Traditional databases often need help to capture these intricate relationships, leaving you with a fragmented view of your data. This is where graph databases come in— they’re like having a high-definition map that reveals every connection. Table of Contents What is a Graph Database? Why Graph Databases?

Database

Database Database-centric Relational Database MongoDB

Chroma DB - Vector Database to Store Large-Scale Embeddings

ProjectPro

JUNE 6, 2025

Imagine you're a detective trying to identify a suspect from a database of millions of mugshots. Chroma DB is an open-source vector database designed to store and manage vector embeddings—numerical representations of complex data types like text, images, and audio. Each movie in your database has a description or review.

Database

Database Metadata Medical Recruitment

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Summary Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. Can you describe what constitutes a NoSQL database? document, K/V, graph) change that calculus?

Non-relational Database

Non-relational Database Relational Database Database Designing

FAISS Vector Database: A High-Performance AI Similarity Search

ProjectPro

JUNE 6, 2025

Similarity search plays a crucial role by enabling systems to find items similar to a given query item. Want to find similar images in a massive database? Need to analyze text documents and find ones with similar content? This blog explores the FAISS Vector Database, a versatile tool applicable to various applications.

Database

Database Algorithm Datasets Hadoop

DynamoDB vs. MongoDB- Battle of The Best NoSQL Databases

ProjectPro

JUNE 6, 2025

With a CAGR of 30%, the NoSQL Database Market is likely to surpass USD 36.50 Two of the most popular NoSQL database services available in the industry are AWS DynamoDB and MongoDB. This blog compares these two popular databases- DynamoDB vs. MongoDB- to help you choose the best one for your data engineering projects.

NoSQL

NoSQL MongoDB Database Amazon Web Services

How Does AWS DocumentDB Simplify Database Management?

ProjectPro

JUNE 6, 2025

Ever wished for a database that's as easy to use as your favorite app? Its scalability and performance-oriented database ensures that your applications operate smoothly and efficiently. ” AWS DocumentDB is a fully managed, NoSQL database service provided by Amazon Web Services (AWS).

AWS

AWS Database MongoDB Management

Streamline RAG with New Document Preprocessing Features

Snowflake

OCTOBER 15, 2024

As organizations increasingly seek to enhance decision-making and drive operational efficiencies by making knowledge in documents accessible via conversational applications, a RAG-based application framework has quickly become the most efficient and scalable approach. Until now, document preparation (e.g.

SQL

SQL Electronics Data Preparation Cloud Storage

How to Use Pinecone Vector Database in your AI Projects?

ProjectPro

JUNE 6, 2025

” This blog will align with that vision by exploring what Pinecone Vector Database is, how to use Pinecone Vector Database, and explore a comprehensive Pinecone Vector Database tutorial with a simple example. Table of Contents What is a Pinecone Vector Database? Pinecone is helpful in this situation.

Database

Database Project Metadata Unstructured Data

How To Choose Right AWS Databases for Your Needs

ProjectPro

JUNE 6, 2025

Explore the world of data analytics with the top AWS databases! Check out this blog to discover your ideal database and uncover the power of scalable and efficient solutions for all your data analytical requirements. Let’s understand more about AWS Databases in the following section.

AWS

AWS Database Amazon Web Services MySQL

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

Summary Any software system that survives long enough will require some form of migration or evolution. When that system is responsible for the data layer the process becomes more challenging. As you have gone through successive migration projects, how has that influenced the ways that you think about architecting data systems?

Systems

Systems Data Lake High Quality Data Google Cloud

Azure Cosmos DB: The Future of Database Management

ProjectPro

JUNE 6, 2025

Are you ready to join the database revolution? Data is the new oil" has become the mantra of the digital age, and in this era of rapidly increasing data volumes, the need for robust and scalable database management solutions has never been more critical. With such mind-boggling data growth, traditional databases won't cut it anymore.

Database

Database Management MongoDB NoSQL

Exploring Vector Databases: A Guide to Their Role in AI Tech

ProjectPro

JUNE 6, 2025

It's the magic of vector databases! researchers have developed vector databases that allow users to utilize similarity search through vectors. With this blog, you will discover how these innovative databases can revolutionize storage, retrieval, and analysis, amplifying artificial intelligence (AI) applications' potential.

Database

Database Algorithm Machine Learning Metadata

Scale Unstructured Text Analytics with Batch LLM Inference

Snowflake

MARCH 6, 2025

Unstructured text is everywhere in business: customer reviews, support tickets, call transcripts, documents. Meanwhile, operations teams use entity extraction on documents to automate workflows and enable metadata-driven analytical filtering.

Unstructured Data

Unstructured Data Media Medical Data Workflow

10 MongoDB Mini Projects Ideas for Beginners with Source Code

ProjectPro

JUNE 6, 2025

MongoDB Inc offers an amazing database technology that is utilized mainly for storing data in key-value pairs. Such flexibility offered by MongoDB enables developers to utilize it as a user-friendly file-sharing system if and when they wish to share the stored data. MongoDB offers several advantageous features to store your data.

MongoDB

MongoDB Coding Project NoSQL

Snowflake Unistore: Hybrid Tables Now Generally Available

Snowflake

NOVEMBER 12, 2024

As part of Snowflake Unistore , Hybrid Tables unify both transactional and analytical workloads on a single database to simplify architectures as well as governance and security. Larger capacity limits : 1 TB database sizes are the default, with larger sizes available upon request.

Food

Food Metadata Education Data Architect

Amazon Aurora: The Future of Cloud Database Technology

ProjectPro

JUNE 6, 2025

Say goodbye to database downtime, and hello to Amazon Aurora! A detailed study report by Market Research Future (MRFR) projects that the cloud database market value will likely reach USD 38.6 A detailed study report by Market Research Future (MRFR) projects that the cloud database market value will likely reach USD 38.6

Database

Database Technology Cloud PostgreSQL

How to Build a Knowledge Graph for RAG Applications?

ProjectPro

JUNE 6, 2025

Explore how to implement Graph RAG using Knowledge Graphs and Vector Databases with practical insights, hands-on resources, and advanced techniques for enhanced information retrieval. Knowledge Graph vs Vector Database for RAG How to implement Graph RAG using Knowledge Graphs and Vector Databases?

Building

Building Unstructured Data Database Datasets

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

ProjectPro

JUNE 6, 2025

This serverless data integration service can automatically and quickly discover structured or unstructured enterprise data when stored in data lakes in Amazon S3, data warehouses in Amazon Redshift, and other databases that are a component of the Amazon Relational Database Service. Establish a crawler schedule.

AWS

AWS Scala Metadata Data Lake

Snowflake Startup Challenge 2025: Meet the Top 10

Snowflake

APRIL 9, 2025

Its Snowflake Native App, Digityze AI, is an AI-powered document intelligence platform that transforms unstructured biomanufacturing documentation into structured, actionable data and manages the document lifecycle.

Pharmaceutical

Pharmaceutical Manufacturing Data Ingestion SQL

Enable Image Analysis with Cloudera’s New Accelerator for Machine Learning Projects Based on Anthropic Claude

Cloudera

NOVEMBER 15, 2024

Enterprise organizations collect massive volumes of unstructured data, such as images, handwritten text, documents, and more. Unlike Other OCR systems, which can often miss context or require multiple steps to clean the data, Claude 3 enables customers to perform complex document understanding tasks directly.

Machine Learning

Machine Learning Unstructured Data Project Database

Accelerate AI Development with Snowflake

Snowflake

NOVEMBER 11, 2024

Conversational apps: Creating reliable, engaging responses for user questions is now simpler, opening the door to powerful use cases such as self-service analytics and document search via chatbots. For instance, if your documents are in multiple languages, an LLM with strong multilingual capabilities is key.

Unstructured Data

Unstructured Data SQL AWS Healthcare

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

Unify transactional and analytical workloads in Snowflake for greater simplicity Many businesses must maintain two separate databases: one to handle transactional workloads and another for analytical workloads. This helps you optimize storage while maintaining regulatory compliance in an easy, scalable way.

Data Architecture

Data Architecture Architecture Data Lake Kafka

10 Best CrewAI Projects You Must Build in 2025

ProjectPro

JUNE 6, 2025

The CrewAI project landscape consists of a wide range of applications, from simple task automation to complex decision-making systems. The CrewAI framework offers a unique approach to building agentic AI systems by allowing multiple specialized agents to work together, mimicking human team dynamics.

Project

Project Building Recruitment Media

The Future of Reliable Data + AI—Observing the Data, System, Code, and Model

Monte Carlo

MARCH 28, 2025

Gemini can polish Google documents for research teams. Table of Contents Understanding How Data + AI Can Break Data System Code Model Data + AI observability must cover inputs and outputs it is all or nothing Understanding How Data + AI Can Break Data + AI applications are complex. But code takes on new weight in the data + AI system.

Coding

Coding Systems Data Pipeline ETL Tools

Is Cache Augmented Generation a good alternative to RAG?

ProjectPro

JUNE 6, 2025

However, it comes with drawbacks like retrieval latency, document selection errors, and increased system complexity. CAG addresses RAG’s limitations by removing real-time retrieval, reducing latency, and simplifying system architecture. Table of Contents What is Cache Augmented Generation (CAG)?

Manufacturing

Manufacturing Database Datasets Systems

What is Retrieval Augmented Generation (RAG) Architecture?

ProjectPro

JUNE 6, 2025

A RAG architecture is one that combines the strengths of two powerful tools: information retrieval systems and generative models. In an RAG system architecture, this retrieved information is then passed to the generative model , which uses it to create accurate, context-aware responses.

Architecture

Architecture Data Ingestion Google Cloud AWS

The Future of Data Management Is Agentic AI

Snowflake

APRIL 13, 2025

Agentic AI refers to AI systems that act autonomously on behalf of their users. These systems make decisions, learn from interactions and continuously improve without constant human intervention. Many enterprises face overwhelming data sources, from structured databases to unstructured social media feeds. What is agentic AI?

Data Management

Data Management Management Consulting Unstructured Data

30+ AWS Projects Ideas for Beginners to Practice in 2025

ProjectPro

JUNE 6, 2025

Rapid Document Conversion This project aims to quickly and accurately convert the document to the desired format as selected by the user. Many of the document converters, such as PDF to word converters and others, are available online. You must have experienced the need to convert an HTML page/document into PDF format.

AWS

AWS Project Food Cloud Computing

Handling Network Throttling with AWS EC2 at Pinterest

Pinterest Engineering

APRIL 7, 2025

In recent years, while managing Pinterests EC2 infrastructure, particularly for our essential online storage systems, we identified a significant challenge: the lack of clear insights into EC2s network performance and its direct impact on our applications reliability and performance. 4xl with up to 12.5

AWS

AWS Bytes Data Ingestion Database

Snowflake’s Fully Managed Service: Beyond Serverless

Snowflake

FEBRUARY 13, 2025

When you read the documentation on platform as a service (PaaS) offerings, youll often see references to features that are not supported in certain versions of the service, along with outage windows for planned maintenance none of these are an issue with Snowflake. While this system worked, it came with fairly high cost and overhead.

Management

Management Government Cloud Unstructured Data

How to Build Generative AI Applications?

ProjectPro

JUNE 6, 2025

You will build an intelligent FAQ retrieval system by using SQLite for database storage, FastEmbed for text embeddings, and Groq for generating AI-powered responses.Here is a quick overview of the tutorial: Step 1-6: Data Preparation (FAQs, vector embeddings, and storage). Step 7: Create and store vector embeddings in a database.

Building

Building Banking SQL Deep Learning

What is Retrieval-Augmented Generation (RAG)?

Edureka

JANUARY 21, 2025

An overview on “What is RAG” by edureka Retrieval This is the act of getting data from somewhere outside the computer, usually a database, knowledge base, or document store. In RAG, retrieval is the process of looking for useful data (like text or documents) based on what the user or system asks for or types in.

Healthcare

Healthcare Education Medical Database

20 Best Open Source Big Data Projects to Contribute on GitHub

ProjectPro

JUNE 6, 2025

As per the surveyors, Big data (35 percent), Cloud computing (39 percent), operating systems (33 percent), and the Internet of Things (31 percent) are all expected to be impacted by open source shortly. It allows the creation of tables and databases in runtime, loading data, and running queries without reconfiguring or restarting the server.

Big Data

Big Data Project Metadata Programming Language

Part 1: A Survey of Analytics Engineering Work at Netflix

Netflix Tech

DECEMBER 17, 2024

Analytics Engineers deliver these insights by establishing deep business and product partnerships; translating business challenges into solutions that unblock critical decisions; and designing, building, and maintaining end-to-end analytical systems.

Engineering

Engineering Entertainment Amazon Web Services Utilities

9 Retrieval Augmented Generation Project Ideas for Practice

ProjectPro

JUNE 6, 2025

Discover projects like Customized Question Answering Systems, Contextual Chatbots, and Text Summarization. It's designed to enhance the capabilities of language models by incorporating a retriever module that can access and retrieve relevant information from a large external knowledge source, like a database or a collection of documents.

Project

Project Python Database PostgreSQL

AWS Glue vs. EMR- Which is Right For Your Big Data Project?

ProjectPro

JUNE 6, 2025

As a result, the database can be operated more quickly, improving the entire system's performance. On the other hand, due to serverless infrastructure, AWS Glue does not allow you to store temporary or executable files locally, which subsequently impacts the system's performance. It also comes with extensive documentation.

Big Data

Big Data AWS Amazon Web Services Project

The Ultimate 101 Guide to Apache Airflow DAGS

ProjectPro

JUNE 6, 2025

Managing complex data pipelines can be challenging, requiring coordination between multiple systems and teams. Source: Airflow DAG Documentation What is a DAG file in Airflow? E.g., the Python operator executes Python code, and the Snowflake operator executes a query against the Snowflake database.

Data Pipeline

Data Pipeline PostgreSQL Python Database

The Best Data Dictionary Tools in 2025

Monte Carlo

APRIL 28, 2025

Not every solution out there is built the same, and if youve ever tried to wrangle documentation from scratch, you know how painful a clunky tool can be. This basically means the tool updates itself by pulling in changes to data structures from your systems. Its like a time machine for your documentation. Made a mistake?

Metadata

Metadata Hadoop Data SQL

What Is LangChain and How to Use It

Edureka

FEBRUARY 12, 2025

A lot of people use LangChain to do things like chatbots, answering questions, analyzing documents, and automating logic. Integration with External Data : LangChain lets LLMs talk to APIs, databases, and other data sources. Data Retrieval LangChain facilitates integration with: Vector databases (e.g., Why is LangChain important?

IT

IT Database Google Cloud Coding

100 Data Modelling Interview Questions To Prepare For In 2025

ProjectPro

JUNE 6, 2025

Physical data model- The physical data model includes all necessary tables, columns, relationship constraints, and database attributes for physical database implementation. A physical model's key parameters include database performance, indexing approach, and physical storage. What is the definition of a foreign key constraint?

Data Warehouse

Data Warehouse NoSQL PostgreSQL Relational Database

Surveying The Market Of Database Products

Vector Technologies for AI: Extending Your Existing Data Stack

Webinars

Trending Sources

Azure SQL Database: The Future of Cloud Data Management

Webinars

An educational side project

A Beginner’s Guide to Graph Databases

Chroma DB - Vector Database to Store Large-Scale Embeddings

Designing A Non-Relational Database Engine

FAISS Vector Database: A High-Performance AI Similarity Search

DynamoDB vs. MongoDB- Battle of The Best NoSQL Databases

How Does AWS DocumentDB Simplify Database Management?

Streamline RAG with New Document Preprocessing Features

How to Use Pinecone Vector Database in your AI Projects?

How To Choose Right AWS Databases for Your Needs

Data Migration Strategies For Large Scale Systems

Azure Cosmos DB: The Future of Database Management

Exploring Vector Databases: A Guide to Their Role in AI Tech

Scale Unstructured Text Analytics with Batch LLM Inference

10 MongoDB Mini Projects Ideas for Beginners with Source Code

Snowflake Unistore: Hybrid Tables Now Generally Available

Amazon Aurora: The Future of Cloud Database Technology

How to Build a Knowledge Graph for RAG Applications?

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

Snowflake Startup Challenge 2025: Meet the Top 10

Enable Image Analysis with Cloudera’s New Accelerator for Machine Learning Projects Based on Anthropic Claude

Accelerate AI Development with Snowflake

Simplifying Data Architecture and Security to Accelerate Value

10 Best CrewAI Projects You Must Build in 2025

The Future of Reliable Data + AI—Observing the Data, System, Code, and Model

Is Cache Augmented Generation a good alternative to RAG?

What is Retrieval Augmented Generation (RAG) Architecture?

The Future of Data Management Is Agentic AI

30+ AWS Projects Ideas for Beginners to Practice in 2025

Handling Network Throttling with AWS EC2 at Pinterest

Snowflake’s Fully Managed Service: Beyond Serverless

How to Build Generative AI Applications?

What is Retrieval-Augmented Generation (RAG)?

20 Best Open Source Big Data Projects to Contribute on GitHub

Part 1: A Survey of Analytics Engineering Work at Netflix

9 Retrieval Augmented Generation Project Ideas for Practice

AWS Glue vs. EMR- Which is Right For Your Big Data Project?

The Ultimate 101 Guide to Apache Airflow DAGS

The Best Data Dictionary Tools in 2025

What Is LangChain and How to Use It

100 Data Modelling Interview Questions To Prepare For In 2025

Stay Connected