High Quality Data and SQL - Data Engineering Digest

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable.

SQL

SQL Data Lake High Quality Data Data Pipeline

A Breakthrough AI-Powered SQL Assistant

Snowflake

APRIL 11, 2024

Data is the lifeblood of modern businesses, but unlocking its true insights often requires complex SQL queries. At Snowflake, we believe in making the power of data accessible to all. That’s why we prioritize simplicity, governance and quality in everything we build – including our AI-powered tools.

SQL

SQL AWS High Quality Data Data Analysis

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Kafka

Kafka Data Lake High Quality Data SQL

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

No Python, No SQL Templates, No YAML: Why Your Open Source Data Quality Tool Should Generate 80% Of Your Data Quality Tests Automatically

DataKitchen

FEBRUARY 17, 2025

No Python, No SQL Templates, No YAML: Why Your Open Source Data Quality Tool Should Generate 80% Of Your Data Quality Tests Automatically As a data engineer, ensuring data quality is both essential and overwhelming.

SQL

SQL Python Government Data Engineer

Low Code And High Quality Data Engineering For The Whole Organization With Prophecy

Data Engineering Podcast

JULY 16, 2021

The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. No more scripts, just SQL.

High Quality Data

High Quality Data Data Engineer Data Engineering Coding

How Meta discovers data flows via lineage at scale

Engineering at Meta

JANUARY 22, 2025

In order to build high-quality data lineage, we developed different techniques to collect data flow signals across different technology stacks: static code analysis for different languages, runtime instrumentation, and input and output data matching, etc. Hack, C++, Python, etc.)

Data Warehouse

Data Warehouse SQL Programming Language Data

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Data Pipeline Government

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Hadoop Data Pipeline

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Architecture

Architecture Data Lake High Quality Data Java

Unlocking Your dbt Projects With Practical Advice For Practitioners

Data Engineering Podcast

NOVEMBER 19, 2023

RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. With Materialize, you can!

Project

Project Data Lake High Quality Data SQL

Shining Some Light In The Black Box Of PostgreSQL Performance

Data Engineering Podcast

NOVEMBER 5, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

PostgreSQL

PostgreSQL Data Lake High Quality Data SQL

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Process

Data Process Process Data Lake High Quality Data

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data BI Data Workflow

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Data Engineering Podcast

NOVEMBER 12, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Software Engineer

Software Engineer Software Engineering Engineering Data Lake

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

JANUARY 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Lake

Data Lake Building High Quality Data AWS

When And How To Conduct An AI Program

Data Engineering Podcast

MARCH 3, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Programming

Programming Data Lake High Quality Data Data Pipeline

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Database

Database Technology Data Lake High Quality Data

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Building

Building Data Lake High Quality Data Machine Learning

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

FEBRUARY 11, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Lake

Data Lake High Quality Data Government Data Pipeline

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Non-relational Database

Non-relational Database Relational Database Database Designing

Building ETL Pipelines With Generative AI

Data Engineering Podcast

OCTOBER 1, 2023

Summary Artificial intelligence applications require substantial high quality data, which is provided through ETL pipelines. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. With Materialize, you can!

Building

Building BI SQL Machine Learning

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Building

Building Data Lake High Quality Data Machine Learning

Designing Data Platforms For Fintech Companies

Data Engineering Podcast

DECEMBER 31, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Designing

Designing Data Lake High Quality Data SQL

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. Data lakes are notoriously complex.

Database

Database Data Lake High Quality Data Data Workflow

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Podcast

DECEMBER 17, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Data Lake

Data Lake High Quality Data SQL Architecture

Databricks Delta Lake: A Scalable Data Lake Solution

ProjectPro

JUNE 6, 2025

Thus, columns and their data types are maintained, preventing data corruption and achieving data reliability and high-quality data. Additionally, it enables for safe modification of schema when enabled explicitly, which supports for dynamic nature of data. How to access Delta lake on Azure Databricks?

Data Lake

Data Lake Data Warehouse Metadata Unstructured Data

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Systems

Systems Designing Data Lake High Quality Data

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Podcast

DECEMBER 10, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products.

Data Lake

Data Lake High Quality Data SQL Architecture

Version Your Data Lakehouse Like Your Software With Nessie

Data Engineering Podcast

MARCH 10, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Architecture Data Pipeline

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Project

Project Data Lake High Quality Data Data Workflow

A Complete Guide on How to Build Effective Data Quality Checks

ProjectPro

JUNE 6, 2025

What is Data Quality, and Why is it Important? Data Quality refers to the degree to which data is accurate, reliable, consistent, and relevant for its intended purpose. High-quality data is essential for organizations to derive meaningful insights, make informed decisions, and meet regulatory requirements.

Building

Building High Quality Data Datasets Hadoop

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. SQL Server version upgrade) Section 2: Types of Migrations for Infrastructure Focus Storage migration: Moving data between systems (HDD to SSD, SAN to NAS, etc.) Starburst : ![Starburst

Systems

Systems Data Lake High Quality Data Google Cloud

15 of the Best Data Science Roles to pursue Right Now

ProjectPro

JUNE 6, 2025

TensorFlow) Strong communication and presentation skills Data Scientist Salary According to the Payscale, Data Scientists earn an average of $97,680. Ability to write, analyze, and debug SQL queries Solid understanding of ETL (Extract, Transfer, Load) tools, NoSQL, Apache Spark System, and relational DBMS.

Data Science

Data Science Data Mining Data Architect BI

6 Tips For Better SQL Query Optimization

Monte Carlo

MARCH 11, 2025

Knowing how to write effective SQL queries is an essential skill for many data-oriented roles. On one end of the spectrum, writing complex SQL queries can feel like a feat even if it might feel like its eating at your soul during the process. Table of Contents What is SQL Query Optimization? SQL Indexing 2.

SQL

SQL Database Database Design Datasets

How to Build Generative AI Applications?

ProjectPro

JUNE 6, 2025

Step 1: Collecting and Preparing Data The first step in any AI project, including generative AI , is gathering and preparing high-quality data. The quality of the data significantly impacts the performance of your model and the quality of AI generated content. books, articles) and image datasets (e.g.,

Building

Building Banking SQL Deep Learning

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

Proficiency in Programming Languages Knowledge of programming languages is a must for AI data engineers and traditional data engineers alike. In addition, AI data engineers should be familiar with programming languages such as Python , Java, Scala, and more for data pipeline, data lineage, and AI model development.

Data Engineer

Data Engineer Data Engineering Engineering Unstructured Data

Top 15 Data Analysis Tools To Become a Data Wizard in 2025

ProjectPro

JUNE 6, 2025

Which data analysis software is suitable for smaller businesses? Do the free tools offer high-quality data analysis? Table of Contents Data Analysis Tools- What are they? Data Analysis Tools- How does Big Data Analytics Benefit Businesses? Google Data Studio 10. Power BI 4. Apache Spark 6.

Data Analysis Tools

Data Analysis Tools Data Analysis BI R (Programming)

Build Your Python Data Processing Your Way And Run It Anywhere With Fugue

Data Engineering Podcast

FEBRUARY 20, 2022

Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. With the Oxylabs scraper APIs you can extract data from even javascript heavy websites. Combined with their residential proxies you can be sure that you’ll have reliable and high quality data whenever you need it.

Python

Python Data Process IT Process

The Rise of the Data Engineer

Maxime Beauchemin

JANUARY 20, 2017

Here are a few examples of services that data engineers and data infrastructure engineer may build and operate. Required Skills SQL mastery: if english is the language of business, SQL is the language of data. SQL/DML/DDL primitives are simple enough that it should hold no secrets to a data engineer.

Data Engineering

Data Engineering Data Engineer Engineering Database-centric

Just Launched: Dremio SQL Query Engine Data Quality Monitoring

Monte Carlo

AUGUST 30, 2024

Ensuring Data Quality In Dremio Dremio and its SQL Query Engine efficiently queries (but doesn’t move) data across a diverse set of sources. This helps keep runtimes and costs low, while giving teams flexibility in how they build and deliver data.

SQL

SQL Engineering Data Lake High Quality Data

Microsoft Fabric vs Tableau 2025: Insights and Comparisons

Edureka

MAY 27, 2025

In the world of data analytics, Microsoft Fabric and Tableau stand out as powerful tools, but they have very different strengths. While Microsoft Fabric offers an all-in-one data platform for enterprises deeply integrated with Azure, Tableau focuses on intuitive, high-quality data visualization for users at all levels.

BI

BI Data Lake Business Intelligence Raw Data

Tackling Real Time Streaming Data With SQL Using RisingWave

A Breakthrough AI-Powered SQL Assistant

Webinars

Trending Sources

Troubleshooting Kafka In Production

Webinars

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

No Python, No SQL Templates, No YAML: Why Your Open Source Data Quality Tool Should Generate 80% Of Your Data Quality Tests Automatically

Low Code And High Quality Data Engineering For The Whole Organization With Prophecy

How Meta discovers data flows via lineage at scale

Making Email Better With AI At Shortwave

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Addressing The Challenges Of Component Integration In Data Platform Architectures

Unlocking Your dbt Projects With Practical Advice For Practitioners

Shining Some Light In The Black Box Of PostgreSQL Performance

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Build A Data Lake For Your Security Logs With Scanner

When And How To Conduct An AI Program

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Build Your Second Brain One Piece At A Time

Data Sharing Across Business And Platform Boundaries

Modern Customer Data Platform Principles

Designing A Non-Relational Database Engine

Building ETL Pipelines With Generative AI

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Designing Data Platforms For Fintech Companies

Reconciling The Data In Your Databases With Datafold

Adding An Easy Mode For The Modern Data Stack With 5X

Databricks Delta Lake: A Scalable Data Lake Solution

Designing Data Transfer Systems That Scale

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Version Your Data Lakehouse Like Your Software With Nessie

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

A Complete Guide on How to Build Effective Data Quality Checks

Data Migration Strategies For Large Scale Systems

15 of the Best Data Science Roles to pursue Right Now

6 Tips For Better SQL Query Optimization

How to Build Generative AI Applications?

Top 25 DBT Interview Questions and Answers for 2025

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Top 15 Data Analysis Tools To Become a Data Wizard in 2025

Build Your Python Data Processing Your Way And Run It Anywhere With Fugue

The Rise of the Data Engineer

Just Launched: Dremio SQL Query Engine Data Quality Monitoring

Microsoft Fabric vs Tableau 2025: Insights and Comparisons

Stay Connected