Data Management, Data Workflow and SQL - Data Engineering Digest

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable.

SQL

SQL Data Lake High Quality Data Data Pipeline

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Kafka

Kafka Data Lake High Quality Data SQL

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Non-relational Database

Non-relational Database Relational Database Database Designing

Webinars

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

MORE WEBINARS

Making The Total Cost Of Ownership For External Data Manageable With Crux

Data Engineering Podcast

JULY 17, 2022

In this episode Crux CTO Mark Etherington discusses the different costs involved in managing external data, how to think about the total return on investment for your data, and how the Crux platform is architected to reduce the toil involved in managing third party data.

Data Management

Data Management Management Metadata MongoDB

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Data Lake

Data Lake High Quality Data BI Data Workflow

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. To start, can you share your definition of what constitutes a "Data Lakehouse"?

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Andrei Tserakhau has dedicated his careeer to this problem, and in this episode he shares the lessons that he has learned and the work he is doing on his most recent data transfer system at DoubleCloud. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles.

Systems

Systems Designing Data Lake SQL

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.

Data Lake

Data Lake High Quality Data Data Pipeline Machine Learning

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication. Your first 30 days are free!

Project

Project Data Lake High Quality Data Data Workflow

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. Data lakes are notoriously complex.

Database

Database Data Lake High Quality Data Data Workflow

Building Linked Data Products With JSON-LD

Data Engineering Podcast

SEPTEMBER 17, 2023

In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. With Materialize, you can!

Building

Building SQL BI Python

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team. Data lakes are notoriously complex. With Materialize, you can!

Architecture

Architecture Data Lake High Quality Data SQL

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

In this episode Pete Hunt, CEO of Dagster labs, outlines these new capabilities, how they reduce the burden on data teams, and the increased collaboration that they enable across teams and business units. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free!

Data Lake

Data Lake High Quality Data Hadoop Data Pipeline

Designing Data Platforms For Fintech Companies

Data Engineering Podcast

DECEMBER 31, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Designing

Designing Data Lake High Quality Data SQL

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. What are the open questions today in technical scalability of data engines? What are the open questions today in technical scalability of data engines?

Data Process

Data Process Process Data Lake High Quality Data

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

JANUARY 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Contact Info LinkedIn Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?

Data Lake

Data Lake Building High Quality Data AWS

Unlocking Your dbt Projects With Practical Advice For Practitioners

Data Engineering Podcast

NOVEMBER 19, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data projects are notoriously complex. With multiple stakeholders to manage across varying backgrounds and toolchains even simple reports can become unwieldy to maintain. Data lakes are notoriously complex.

Project

Project Data Lake High Quality Data SQL

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Podcast

DECEMBER 17, 2023

In this episode founder Tarush Aggarwal explains how the realities of the modern data stack are impacting data teams and the work that they are doing to accelerate time to value. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles.

Data Lake

Data Lake High Quality Data SQL Architecture

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

FEBRUARY 11, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Can you start by giving some context and scope of what we mean by "data sharing" for the purposes of this conversation? Closing Announcements Thank you for listening!

Data Lake

Data Lake High Quality Data Government Data Pipeline

Shining Some Light In The Black Box Of PostgreSQL Performance

Data Engineering Podcast

NOVEMBER 5, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

PostgreSQL

PostgreSQL Data Lake High Quality Data SQL

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Can you describe what the role of the CDP is in the context of a businesses data ecosystem? What do you have planned for the future of ActionIQ?

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Podcast

DECEMBER 10, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products.

Data Lake

Data Lake High Quality Data SQL Architecture

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain. Data lakes are notoriously complex. Data lakes are notoriously complex.

Building

Building Data Lake High Quality Data Machine Learning

When And How To Conduct An AI Program

Data Engineering Podcast

MARCH 3, 2024

Colleen Tartow has worked across all stages of the data lifecycle, and in this episode she shares her hard-earned wisdom about how to conduct an AI program for your organization. Data lakes are notoriously complex. What do you have planned for the future of your work at VAST Data? Your first 30 days are free!

Programming

Programming Data Lake High Quality Data Data Pipeline

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.

Database

Database Technology Data Lake High Quality Data

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!

Building

Building Data Lake High Quality Data Machine Learning

Version Your Data Lakehouse Like Your Software With Nessie

Data Engineering Podcast

MARCH 10, 2024

In this episode Alex Merced explains how the branching and merging functionality in Nessie allows you to use the same versioning semantics for your data lakehouse that you are used to from Git. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free!

Data Lake

Data Lake High Quality Data Architecture Data Pipeline

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Data Engineering Podcast

NOVEMBER 12, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Software Engineer

Software Engineer Software Engineering Engineering Data Lake

Data Engineering Weekly #191

Data Engineering Weekly

SEPTEMBER 29, 2024

[link] ngrok: How we built ngrok's data platform Ngrok shares its experience building a real-time data platform with an event-driven architecture, ensuring scalability and efficient data management. Unix pipes typically represent a physical flow of data between processes. What do you think about Pipe syntax in SQL?

Data Engineer

Data Engineer Data Engineering Engineering SQL

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Can you start by sharing some of your experiences with data migration projects? Can you start by sharing some of your experiences with data migration projects?

Systems

Systems Data Lake High Quality Data Google Cloud

Accelerate Development Of Enterprise Analytics With The Coalesce Visual Workflow Builder

Data Engineering Podcast

APRIL 3, 2022

Summary The flexibility of software oriented data workflows is useful for fulfilling complex requirements, but for simple and repetitious use cases it adds significant complexity. In this episode Satish Jayanthi explains how he is building a framework to allow enterprises to move quickly while maintaining guardrails for data workflows.

Data Warehouse

Data Warehouse Data Workflow Data Architecture SQL

A Reflection On The Data Ecosystem For The Year 2021

Data Engineering Podcast

JANUARY 1, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. No more scripts, just SQL.

Data Warehouse

Data Warehouse SQL Hadoop Data Lake

Unlocking The Value Of Data Across The Organization Through User Friendly Data Tools With Prophecy

Data Engineering Podcast

MAY 22, 2022

With an eye to making data workflows more accessible to everyone in an organization Raj Bains and his team at Prophecy designed a powerful and extensible low-code platform that lets technical and non-technical users scale data flows without forcing everyone into the same layers of abstraction.

Scala

Scala SQL Data Data Engineer

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Data Engineering Podcast

DECEMBER 18, 2022

In this episode Abe Gong brings his experiences with the Great Expectations project and community to discuss the technical and organizational considerations involved in implementing these constraints to your data workflows. Can you describe what your conception of a data contract is?

Metadata

Metadata Business Intelligence Data Lake BI

Data Exploration For Business Users Powered By Analytics Engineering With Lightdash

Data Engineering Podcast

OCTOBER 22, 2021

In this episode Oliver Laslett describes why dashboards aren’t sufficient for business analytics, how Lightdash promotes the work that you are already doing in your data warehouse modeling with dbt, and how they are focusing on bridging the divide between data teams and business teams and the requirements that they have for data workflows.

Engineering

Engineering Business Intelligence BI Data Warehouse

Be Confident In Your Data Integration By Quickly Validating Matching Records With data-

Data Engineering Podcast

JULY 3, 2022

In this episode they explain how the utility is implemented to run quickly and how you can start using it in your own data workflows to ensure that your data warehouse isn’t missing any records from your source systems. Can you describe what the data diff tool is and the story behind it?

Data Integration

Data Integration MongoDB Scala MySQL

Doing DataOps For External Data Sources As A Service at Demyst

Data Engineering Podcast

NOVEMBER 27, 2021

If you are having trouble answering questions for your business with the data that you generate and collect internally, then it is definitely worthwhile to explore the information available from external sources. The data you’re looking for is already in your data warehouse and BI tools. No more scripts, just SQL.

Data Warehouse

Data Warehouse Data Lake BI Business Intelligence

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

Data Engineering Podcast

OCTOBER 15, 2021

They also share their ambitions for the near future of adding data observability and data quality management features. The data you’re looking for is already in your data warehouse and BI tools. No more scripts, just SQL. Interview Introduction How did you get involved in the area of data management?

Metadata

Metadata BI Data Warehouse Government

Data Quality Starts At The Source

Data Engineering Podcast

NOVEMBER 14, 2021

In this episode Michael Harper advocates for proactive data quality and starting with the source, rather than being reactive and having to work backwards from when a problem is found. Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams.

Data Warehouse

Data Warehouse BI Data Workflow Data

Creating Shared Context For Your Data Warehouse With A Controlled Vocabulary

Data Engineering Podcast

JANUARY 1, 2022

In this episode Emily Riederer shares her work to create a controlled vocabulary for managing the semantic elements of the data managed by her team and encoding it in the schema definitions in her data warehouse. star/snowflake schema, data vault, etc.) What do you have planned for the future of dbtplyr?

Data Warehouse

Data Warehouse BI Data Workflow Data Engineer

Azure Data Engineer Job Description [Roles and Responsibilities]

Knowledge Hut

SEPTEMBER 25, 2023

Skill Requirements for Azure Data Engineer Job Description Here are some important skill requirements that you may find in a job description for Azure Data Engineers: 1. Azure Data Engineers work with these and other solutions. Create and keep up with documentation for database schemas, data models, and SQL code.

Data Engineer

Data Engineer Data Engineering Engineering Data Lake

Removing The Barrier To Exploratory Analytics with Activity Schema and Narrator

Data Engineering Podcast

OCTOBER 29, 2021

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.

Data Warehouse

Data Warehouse BI Data Workflow Data Engineer

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Data Engineering Podcast

SEPTEMBER 11, 2022

In this episode he shares his journey from building a consumer product to launching a data pipeline service and how his frustrations as a product owner have informed his work at Hevo Data. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability.

Data Pipeline

Data Pipeline Building MongoDB Scala

Unlocking Effective Data Governance with Unity Catalog – Data Bricks

RandomTrees

SEPTEMBER 17, 2024

In the realm of big data and AI, managing and securing data assets efficiently is crucial. Databricks addresses this challenge with Unity Catalog, a comprehensive governance solution designed to streamline and secure data management across Databricks workspaces. Advantages of the Unity Catalog 1.

Data Governance

Data Governance Government Metadata Machine Learning

Tackling Real Time Streaming Data With SQL Using RisingWave

Troubleshooting Kafka In Production

Webinars

Trending Sources

Designing A Non-Relational Database Engine

Webinars

Making The Total Cost Of Ownership For External Data Manageable With Crux

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Designing Data Transfer Systems That Scale

Making Email Better With AI At Shortwave

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Reconciling The Data In Your Databases With Datafold

Building Linked Data Products With JSON-LD

Addressing The Challenges Of Component Integration In Data Platform Architectures

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Designing Data Platforms For Fintech Companies

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Build A Data Lake For Your Security Logs With Scanner

Unlocking Your dbt Projects With Practical Advice For Practitioners

Adding An Easy Mode For The Modern Data Stack With 5X

Data Sharing Across Business And Platform Boundaries

Shining Some Light In The Black Box Of PostgreSQL Performance

Modern Customer Data Platform Principles

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Build Your Second Brain One Piece At A Time

When And How To Conduct An AI Program

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Version Your Data Lakehouse Like Your Software With Nessie

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Data Engineering Weekly #191

Data Migration Strategies For Large Scale Systems

Accelerate Development Of Enterprise Analytics With The Coalesce Visual Workflow Builder

A Reflection On The Data Ecosystem For The Year 2021

Unlocking The Value Of Data Across The Organization Through User Friendly Data Tools With Prophecy

Making Sense Of The Technical And Organizational Considerations Of Data Contracts

Data Exploration For Business Users Powered By Analytics Engineering With Lightdash

Be Confident In Your Data Integration By Quickly Validating Matching Records With data-

Doing DataOps For External Data Sources As A Service at Demyst

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

Data Quality Starts At The Source

Creating Shared Context For Your Data Warehouse With A Controlled Vocabulary

Azure Data Engineer Job Description [Roles and Responsibilities]

Removing The Barrier To Exploratory Analytics with Activity Schema and Narrator

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Unlocking Effective Data Governance with Unity Catalog – Data Bricks

Stay Connected