Database and High Quality Data - Data Engineering Digest

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Summary Databases come in a variety of formats for different use cases. The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication.

Non-relational Database

Non-relational Database Relational Database Database Designing

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data. Data lakes are notoriously complex.

Database

Database Data Lake High Quality Data Data Workflow

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Summary Building a database engine requires a substantial amount of engineering effort and time investment. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database. Data lakes are notoriously complex.

Database

Database Technology Data Lake High Quality Data

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. With Materialize, you can!

Kafka

Kafka Data Lake High Quality Data SQL

Monte Carlo Recognized as the #1 Leader in Data Observability and Data Quality by G2

Monte Carlo

DECEMBER 18, 2024

Weve always focused on delivering exceptional customer success and improving data quality across the entire data stack and its rewarding to know that hard work continues to translate to meaningful outcomes for our customers.

High Quality Data

High Quality Data Database Data Software Engineer

How Meta discovers data flows via lineage at scale

Engineering at Meta

JANUARY 22, 2025

In order to build high-quality data lineage, we developed different techniques to collect data flow signals across different technology stacks: static code analysis for different languages, runtime instrumentation, and input and output data matching, etc.

Data Warehouse

Data Warehouse SQL Programming Language Data

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Starburst : ![Starburst

SQL

SQL Data Lake High Quality Data Data Pipeline

Shining Some Light In The Black Box Of PostgreSQL Performance

Data Engineering Podcast

NOVEMBER 5, 2023

Summary Databases are the core of most applications, but they are often treated as inscrutable black boxes. When an application is slow, there is a good probability that the database needs some attention. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products.

PostgreSQL

PostgreSQL Data Lake High Quality Data SQL

The Challenge of Data Quality and Availability—And Why It’s Holding Back AI and Analytics

Striim

APRIL 18, 2025

Many organizations struggle with: Inconsistent data formats : Different systems store data in varied structures, requiring extensive preprocessing before analysis. Siloed storage : Critical business data is often locked away in disconnected databases, preventing a unified view. Start Your Free Trial | Schedule a Demo

High Quality Data

High Quality Data Business Intelligence Unstructured Data Data Pipeline

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. With Materialize, you can!

Architecture

Architecture Data Lake High Quality Data SQL

Release Management For Data Platform Services And Logic

Data Engineering Podcast

MAY 12, 2024

I listened to the recent episode "Transforming Your Database" and appreciated the valuable advice on how to approach the selection and integration of new databases in applications and the impact on team dynamics. Data lakes are notoriously complex. Data lakes are notoriously complex.

Management

Management Data Lake High Quality Data Machine Learning

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Data Lake

Data Lake High Quality Data BI Data Workflow

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

I listened to the recent episode "Transforming Your Database" and appreciated the valuable advice on how to approach the selection and integration of new databases in applications and the impact on team dynamics. Data lakes are notoriously complex. Data lakes are notoriously complex.

Process

Process Data Lake High Quality Data Machine Learning

Practical First Steps In Data Governance For Long Term Success

Data Engineering Podcast

JUNE 2, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Starburst : ![Starburst

Data Governance

Data Governance Government Data Lake High Quality Data

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

Summary Databases and analytics architectures have gone through several generational shifts. A substantial amount of the data that is being managed in these systems is related to customers and their interactions with an organization.

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Unlocking Your dbt Projects With Practical Advice For Practitioners

Data Engineering Podcast

NOVEMBER 19, 2023

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. With Materialize, you can!

Project

Project Data Lake High Quality Data SQL

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Process

Data Process Process Data Lake High Quality Data

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Data Engineering Podcast

NOVEMBER 12, 2023

If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Data lakes are notoriously complex. You shouldn't have to throw away the database to build with fast-changing data.

Software Engineer

Software Engineer Software Engineering Engineering Data Lake

Building ETL Pipelines With Generative AI

Data Engineering Podcast

OCTOBER 1, 2023

Summary Artificial intelligence applications require substantial high quality data, which is provided through ETL pipelines. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. With Materialize, you can!

Building

Building BI SQL Machine Learning

Designing Data Platforms For Fintech Companies

Data Engineering Podcast

DECEMBER 31, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data.

Designing

Designing Data Lake High Quality Data SQL

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. With Materialize, you can!

Systems

Systems Designing Data Lake SQL

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. SQL Server version upgrade) Section 2: Types of Migrations for Infrastructure Focus Storage migration: Moving data between systems (HDD to SSD, SAN to NAS, etc.) Starburst : ![Starburst

Systems

Systems Data Lake High Quality Data Google Cloud

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Podcast

DECEMBER 17, 2023

Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. With Materialize, you can!

Data Lake

Data Lake High Quality Data SQL Architecture

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Podcast

DECEMBER 10, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products.

Data Lake

Data Lake High Quality Data SQL Architecture

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication.

Project

Project Data Lake High Quality Data Data Workflow

A Complete Guide on How to Build Effective Data Quality Checks

ProjectPro

JUNE 6, 2025

What is Data Quality, and Why is it Important? Data Quality refers to the degree to which data is accurate, reliable, consistent, and relevant for its intended purpose. High-quality data is essential for organizations to derive meaningful insights, make informed decisions, and meet regulatory requirements.

Building

Building High Quality Data Datasets Hadoop

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

The foundational skills are similar between traditional data engineers and AI data engineers are similar, with AI data engineers more heavily focused on machine learning data infrastructure, AI-specific tools, vector databases, and LLM pipelines. Let’s dive into the tools necessary to become an AI data engineer.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

DataKitchen

MARCH 20, 2025

Process-centric data teams focus their energies predominantly on orchestrating and automating workflows. They have demonstrated that robust, well-managed data processing pipelines inevitably yield reliable, high-quality data.

Pipeline-centric

Pipeline-centric Database-centric Process Data

Monte Carlo Announces Support for Kafka and Vector Databases at IMPACT 2023

Monte Carlo

NOVEMBER 8, 2023

Kafka and Vector Database support According to Databricks’ State of Data and AI report , the number of companies using SaaS LLM APIs has grown more than 1300% since November 2022 with a nearly 411% increase in the number of AI models put into production during that same period. Both integrations will be available early 2024.

Kafka

Kafka Database High Quality Data Cloud

Data Ingestion-The Key to a Successful Data Engineering Project

ProjectPro

JUNE 6, 2025

Data engineers are the ones who are responsible for ingesting raw data from multiple sources and processing it to serve clean datasets to Data Scientists and Data Analysts so they can run machine learning models and data analytics, respectively. Destination refers to a landing area where the data is taken to.

Data Ingestion

Data Ingestion Data Engineering Data Engineer Project

5 Data Modeling Projects Ideas For Data Engineers to Practice

ProjectPro

JUNE 6, 2025

It is crucial to have the data in a design that supports the application, which puts it in motion and provides meaningful information while the data is at rest. Data modeling is essential because it enables businesses to visualize these operations and design, build, and deploy high-quality data assets.

Data Engineering

Data Engineering Data Engineer Project Engineering

15 of the Best Data Science Roles to pursue Right Now

ProjectPro

JUNE 6, 2025

TensorFlow) Strong communication and presentation skills Data Scientist Salary According to the Payscale, Data Scientists earn an average of $97,680. Employ automated techniques to extract data from primary and secondary data sources Analyze data and present it in the form of graphs and reports.

Data Science

Data Science Data Mining Data Architect BI

Data Engineer vs. Data Architect-Who Builds the Data Castle?

ProjectPro

JUNE 6, 2025

Whereas data architects focus on data extraction, transformation, and loading data, they consider how it should be structured and arranged. Data engineers and architects can provide high-quality data useful for executive decisions. Data Engineer vs Data Architect - Who Does What?

Data Architect

Data Architect Data Engineering Data Engineer Building

How to Build Generative AI Applications?

ProjectPro

JUNE 6, 2025

Step 1: Collecting and Preparing Data The first step in any AI project, including generative AI , is gathering and preparing high-quality data. The quality of the data significantly impacts the performance of your model and the quality of AI generated content. books, articles) and image datasets (e.g.,

Building

Building Banking SQL Deep Learning

Data Appending vs. Data Enrichment: How to Maximize Data Quality and Insights

Precisely

APRIL 7, 2025

Use case (Retail): As an example, imagine a retail company has a customer database with names and addresses, but many records are missing full address information. The solution: They use a data appending process to match their existing data with a third-party database that contains full street addresses. Plan for it.

Retail

Retail Datasets Data Portfolio

Data Collection And Management To Power Sound Recognition At Audio Analytic

Data Engineering Podcast

JUNE 29, 2020

This was a great conversation about the complexities of working in a niche domain of data analysis and how to build a pipeline of high quality data from collection to analysis. The team at Audio Analytic are working to impart a sense of hearing to our myriad devices with their sound recognition technology.

Data Collection

Data Collection Management High Quality Data Metadata

What Is Data Normalization, and Why Is It Important?

U-Next

FEBRUARY 27, 2023

Data normalization is the process of organizing and transforming data to improve its structural integrity, accuracy, and consistency. Data normalization is also an important part of database design. Data normalization is adopted because it helps to ensure that data will be consistent.

IT

IT Bytes Database Recruitment

Databricks Delta Lake: A Scalable Data Lake Solution

ProjectPro

JUNE 6, 2025

Schema Enforcement and Evolution Delta lake will enforce schema when writing data to the storage. Thus, columns and their data types are maintained, preventing data corruption and achieving data reliability and high-quality data. This data will be available downstream for analytics and reporting.

Data Lake

Data Lake Data Warehouse Metadata Unstructured Data

The Rise of the Data Engineer

Maxime Beauchemin

JANUARY 20, 2017

Data modeling is changing Typical data modeling techniques — like the star schema — which defined our approach to data modeling for the analytics workloads typically associated with data warehouses, are less relevant than they once were. Those systems have been taught to normalize the data for storage on their own.

Data Engineering

Data Engineering Data Engineer Engineering Database-centric

What Is Data Normalization, and Why Is It Important?

U-Next

MARCH 7, 2023

Data normalization is the process of organizing and transforming data to improve its structural integrity, accuracy, and consistency. Data normalization is also an important part of database design. Data normalization is adopted because it helps to ensure that data will be consistent.

IT

IT Bytes Database Recruitment

Data Integrity vs. Data Quality: How Are They Different?

Precisely

JULY 12, 2024

However, simply having high-quality data does not, of itself, ensure that an organization will find it useful. Data observability: P revent business disruption and costly downstream data and analytics issues using intelligent technology that proactively alerts you to data anomalies and outliers.

Data Integration

Data Integration Data Data Governance Datasets

6 Pillars of Data Quality and How to Improve Your Data

Databand.ai

MAY 30, 2023

Data quality refers to the degree of accuracy, consistency, completeness, reliability, and relevance of the data collected, stored, and used within an organization or a specific context. High-quality data is essential for making well-informed decisions, performing accurate analyses, and developing effective strategies.

Data Cleanse

Data Cleanse Data Governance Data Validation High Quality Data

Six Books that Have Shaped My Data Career

Towards Data Science

MARCH 29, 2023

Great reads on modeling, processes, and leadership Photo by Emil Widlund on Unsplash At the very start of my journey in data, I thought I was going to be a data scientist, and my first foray into data was centered on studying statistics and linear algebra, not software engineering or database management.

Data Warehouse

Data Warehouse BI Healthcare Database

Designing A Non-Relational Database Engine

Reconciling The Data In Your Databases With Datafold

Webinars

Trending Sources

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Webinars

Troubleshooting Kafka In Production

Monte Carlo Recognized as the #1 Leader in Data Observability and Data Quality by G2

How Meta discovers data flows via lineage at scale

Tackling Real Time Streaming Data With SQL Using RisingWave

Shining Some Light In The Black Box Of PostgreSQL Performance

The Challenge of Data Quality and Availability—And Why It’s Holding Back AI and Analytics

Addressing The Challenges Of Component Integration In Data Platform Architectures

Release Management For Data Platform Services And Logic

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

X-Ray Vision For Your Flink Stream Processing With Datorios

Practical First Steps In Data Governance For Long Term Success

Modern Customer Data Platform Principles

Unlocking Your dbt Projects With Practical Advice For Practitioners

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Building ETL Pipelines With Generative AI

Designing Data Platforms For Fintech Companies

Designing Data Transfer Systems That Scale

Data Migration Strategies For Large Scale Systems

Adding An Easy Mode For The Modern Data Stack With 5X

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

A Complete Guide on How to Build Effective Data Quality Checks

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

Monte Carlo Announces Support for Kafka and Vector Databases at IMPACT 2023

Data Ingestion-The Key to a Successful Data Engineering Project

5 Data Modeling Projects Ideas For Data Engineers to Practice

15 of the Best Data Science Roles to pursue Right Now

Data Engineer vs. Data Architect-Who Builds the Data Castle?

How to Build Generative AI Applications?

Data Appending vs. Data Enrichment: How to Maximize Data Quality and Insights

Data Collection And Management To Power Sound Recognition At Audio Analytic

What Is Data Normalization, and Why Is It Important?

Databricks Delta Lake: A Scalable Data Lake Solution

The Rise of the Data Engineer

What Is Data Normalization, and Why Is It Important?

Data Integrity vs. Data Quality: How Are They Different?

6 Pillars of Data Quality and How to Improve Your Data

Top 25 DBT Interview Questions and Answers for 2025

Six Books that Have Shaped My Data Career

Stay Connected