Data Governance, Data Management and Data Workflow

Practical First Steps In Data Governance For Long Term Success

Data Engineering Podcast

JUNE 2, 2024

Summary Modern businesses aspire to be data driven, and technologists enjoy working through the challenge of building data systems to support that goal. Data governance is the binding force between these two parts of the organization. At what point does a lack of an explicit governance policy become a liability?

Data Governance

Data Governance Government Data Lake High Quality Data

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

In this episode Dain Sundstrom, CTO of Starburst, explains how the combination of the Trino query engine and the Iceberg table format offer the ease of use and execution speed of data warehouses with the infinite storage and scalability of data lakes. Data lakes are notoriously complex. Your first 30 days are free!

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Kafka

Kafka Data Lake High Quality Data SQL

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Data Engineering Podcast

JUNE 30, 2024

He highlights the role of data teams in modern organizations and how Synq is empowering them to achieve this. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Can you describe what Synq is and the story behind it?

Pipeline-centric

Pipeline-centric Engineering Data Lake High Quality Data

Stitching Together Enterprise Analytics With Microsoft Fabric

Data Engineering Podcast

JUNE 23, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Data lakes in various forms have been gaining significant popularity as a unified interface to an organization's analytics. Closing Announcements Thank you for listening!

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform. Can you describe what role Trino and Iceberg play in Stripe's data architecture?

Data Lake

Data Lake High Quality Data Metadata Machine Learning

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Data Lake

Data Lake High Quality Data BI Data Workflow

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.

Data Lake

Data Lake High Quality Data Machine Learning Data Pipeline

How To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

Deploy DataOps DataOps , or Data Operations, is an approach that applies the principles of DevOps to data management. It aims to streamline and automate data workflows, enhance collaboration and improve the agility of data teams. How effective are your current data workflows?

Data Pipeline

Data Pipeline Metadata Data Workflow Data

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

SQL

SQL Data Lake High Quality Data Machine Learning

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Non-relational Database

Non-relational Database Relational Database Database Designing

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is supported by Code Comments, an original podcast from Red Hat. Data lakes are notoriously complex. My thanks to the team at Code Comments for their support. Closing Announcements Thank you for listening!

Process

Process Data Lake High Quality Data Machine Learning

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

In this episode Pete Hunt, CEO of Dagster labs, outlines these new capabilities, how they reduce the burden on data teams, and the increased collaboration that they enable across teams and business units. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free!

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. What are the open questions today in technical scalability of data engines? What are the open questions today in technical scalability of data engines?

Data Process

Data Process Process Data Lake High Quality Data

Release Management For Data Platform Services And Logic

Data Engineering Podcast

MAY 12, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is supported by Code Comments, an original podcast from Red Hat. Data lakes are notoriously complex. Data lakes are notoriously complex. My thanks to the team at Code Comments for their support.

Management

Management Data Lake High Quality Data Machine Learning

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

JANUARY 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Contact Info LinkedIn Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today?

Data Lake

Data Lake Building High Quality Data AWS

6 Ways To Prepare Your Data Team for 2025

Ascend.io

DECEMBER 4, 2024

Deploy DataOps DataOps , or Data Operations, is an approach that applies the principles of DevOps to data management. It aims to streamline and automate data workflows, enhance collaboration and improve the agility of data teams. How effective are your current data workflows?

Data Pipeline

Data Pipeline Metadata Data Workflow Data

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Summary A significant portion of data workflows involve storing and processing information in database engines. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data. Data lakes are notoriously complex.

Database

Database Data Lake High Quality Data Data Workflow

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

FEBRUARY 11, 2024

In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process. What is the current state of the ecosystem for data sharing protocols/practices/platforms?

Data Lake

Data Lake High Quality Data Government Machine Learning

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern customer data platform (CDP).

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication. Your first 30 days are free!

Project

Project Data Lake High Quality Data Data Workflow

When And How To Conduct An AI Program

Data Engineering Podcast

MARCH 3, 2024

Colleen Tartow has worked across all stages of the data lifecycle, and in this episode she shares her hard-earned wisdom about how to conduct an AI program for your organization. Data lakes are notoriously complex. What do you have planned for the future of your work at VAST Data? Your first 30 days are free!

Programming

Programming Data Lake High Quality Data Machine Learning

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.

Database

Database Technology Data Lake High Quality Data

Designing Data Platforms For Fintech Companies

Data Engineering Podcast

DECEMBER 31, 2023

In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Designing

Designing Data Lake High Quality Data SQL

Unlocking Effective Data Governance with Unity Catalog – Data Bricks

RandomTrees

SEPTEMBER 17, 2024

In the realm of big data and AI, managing and securing data assets efficiently is crucial. Databricks addresses this challenge with Unity Catalog, a comprehensive governance solution designed to streamline and secure data management across Databricks workspaces. Advantages of the Unity Catalog 1.

Data Governance

Data Governance Government Metadata Machine Learning

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team. Data lakes are notoriously complex.

Architecture

Architecture Data Lake High Quality Data SQL

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Andrei Tserakhau has dedicated his careeer to this problem, and in this episode he shares the lessons that he has learned and the work he is doing on his most recent data transfer system at DoubleCloud. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles.

Systems

Systems Designing Data Lake SQL

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain. Data lakes are notoriously complex. Data lakes are notoriously complex.

Building

Building Data Lake High Quality Data Machine Learning

Version Your Data Lakehouse Like Your Software With Nessie

Data Engineering Podcast

MARCH 10, 2024

In this episode Alex Merced explains how the branching and merging functionality in Nessie allows you to use the same versioning semantics for your data lakehouse that you are used to from Git. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free!

Data Lake

Data Lake High Quality Data Architecture Machine Learning

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!

Building

Building Data Lake High Quality Data Machine Learning

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Can you start by sharing some of your experiences with data migration projects? Can you start by sharing some of your experiences with data migration projects?

Systems

Systems Data Lake High Quality Data Google Cloud

Unlocking Your dbt Projects With Practical Advice For Practitioners

Data Engineering Podcast

NOVEMBER 19, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data projects are notoriously complex. With multiple stakeholders to manage across varying backgrounds and toolchains even simple reports can become unwieldy to maintain. Data lakes are notoriously complex.

Project

Project Data Lake SQL High Quality Data

Shining Some Light In The Black Box Of PostgreSQL Performance

Data Engineering Podcast

NOVEMBER 5, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

PostgreSQL

PostgreSQL Data Lake SQL High Quality Data

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Podcast

DECEMBER 17, 2023

In this episode founder Tarush Aggarwal explains how the realities of the modern data stack are impacting data teams and the work that they are doing to accelerate time to value. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles.

Data Lake

Data Lake High Quality Data SQL Architecture

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Podcast

DECEMBER 10, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management You shouldn't have to throw away the database to build with fast-changing data. Data lakes are notoriously complex. That’s three free boards at dataengineeringpodcast.com/miro.

Data Lake

Data Lake SQL High Quality Data Architecture

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Data Engineering Podcast

NOVEMBER 12, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.

Software Engineer

Software Engineer Software Engineering Engineering Data Lake

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

Data Engineering Podcast

OCTOBER 15, 2021

They also share their ambitions for the near future of adding data observability and data quality management features. Interview Introduction How did you get involved in the area of data management? Can you describe what Acryl Data is and the story behind it? How is the governance of DataHub being managed?

Metadata

Metadata BI Data Warehouse Government

The DataOps Vendor Landscape, 2021

DataKitchen

APRIL 13, 2021

We have also included vendors for the specific use cases of ModelOps, MLOps, DataGovOps and DataSecOps which apply DataOps principles to machine learning, AI, data governance, and data security operations. . Piperr.io — Pre-built data pipelines across enterprise stakeholders, from IT to analytics, tech, data science and LoBs.

Consulting

Consulting Machine Learning Data Science Data Pipeline

Better Data Quality Through Observability With Monte Carlo

Data Engineering Podcast

OCTOBER 19, 2020

They also discuss methods for gaining visibility into the flow of data through your infrastructure, how to diagnose and prevent potential problems, and what they are building at Monte Carlo to help you maintain your data’s uptime. If you hand a book to a new data engineer, what wisdom would you add to it?

Machine Learning

Machine Learning Data Engineering Data Engineer Data

Put Your Whole Data Team On The Same Page With Atlan

Data Engineering Podcast

APRIL 5, 2021

She explains how the design of the platform is informed by the needs of managing data projects for large and small teams across her previous roles, how it integrates with your existing systems, and how it can work to bring everyone onto the same page. What portions of the data workflow is Atlan responsible for?

Data Warehouse

Data Warehouse Data Pipeline BI Metadata

DataOps Framework: 4 Key Components and How to Implement Them

Databand.ai

AUGUST 30, 2023

The DataOps framework is a set of practices, processes, and technologies that enables organizations to improve the speed, accuracy, and reliability of their data management and analytics operations. This can be achieved through the use of automated data ingestion, transformation, and analysis tools.

Data Governance

Data Governance Data Pipeline Government Business Analyst

DataOps Architecture: 5 Key Components and How to Get Started

Databand.ai

AUGUST 30, 2023

DataOps is a collaborative approach to data management that combines the agility of DevOps with the power of data analytics. It aims to streamline data ingestion, processing, and analytics by automating and integrating various data workflows.

Architecture

Architecture Data Ingestion Data Governance Data Cleanse

Metadata: What Is It and Why it Matters

Ascend.io

JULY 11, 2024

This is what managing data without metadata feels like. Often described as “data about data,” it is the unsung hero in data management that ensures our vast amounts of information are not only stored but easily discoverable, organized, and actionable. Chaos, right? What is Metadata?

Metadata

Metadata IT Government High Quality Data

DataOps Tools: Key Capabilities & 5 Tools You Must Know About

Databand.ai

AUGUST 30, 2023

DataOps , short for data operations, is an emerging discipline that focuses on improving the collaboration, integration, and automation of data processes across an organization. These tools help organizations implement DataOps practices by providing a unified platform for data teams to collaborate, share, and manage their data assets.

Data Cleanse

Data Cleanse Data Pipeline Data Ingestion Data Validation

Practical First Steps In Data Governance For Long Term Success

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Webinars

Trending Sources

Troubleshooting Kafka In Production

Webinars

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Stitching Together Enterprise Analytics With Microsoft Fabric

Being Data Driven At Stripe With Trino And Iceberg

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Making Email Better With AI At Shortwave

How To Prepare Your Data Team for 2025

Tackling Real Time Streaming Data With SQL Using RisingWave

Designing A Non-Relational Database Engine

X-Ray Vision For Your Flink Stream Processing With Datorios

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Release Management For Data Platform Services And Logic

Build A Data Lake For Your Security Logs With Scanner

6 Ways To Prepare Your Data Team for 2025

Reconciling The Data In Your Databases With Datafold

Data Sharing Across Business And Platform Boundaries

Modern Customer Data Platform Principles

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

When And How To Conduct An AI Program

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Designing Data Platforms For Fintech Companies

Unlocking Effective Data Governance with Unity Catalog – Data Bricks

Addressing The Challenges Of Component Integration In Data Platform Architectures

Designing Data Transfer Systems That Scale

Build Your Second Brain One Piece At A Time

Version Your Data Lakehouse Like Your Software With Nessie

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Migration Strategies For Large Scale Systems

Unlocking Your dbt Projects With Practical Advice For Practitioners

Shining Some Light In The Black Box Of PostgreSQL Performance

Adding An Easy Mode For The Modern Data Stack With 5X

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

The DataOps Vendor Landscape, 2021

Better Data Quality Through Observability With Monte Carlo

Put Your Whole Data Team On The Same Page With Atlan

DataOps Framework: 4 Key Components and How to Implement Them

DataOps Architecture: 5 Key Components and How to Get Started

Metadata: What Is It and Why it Matters

DataOps Tools: Key Capabilities & 5 Tools You Must Know About

Stay Connected