Data Governance, Data Lake and High Quality Data

Practical First Steps In Data Governance For Long Term Success

Data Engineering Podcast

JUNE 2, 2024

Summary Modern businesses aspire to be data driven, and technologists enjoy working through the challenge of building data systems to support that goal. Data governance is the binding force between these two parts of the organization. At what point does a lack of an explicit governance policy become a liability?

Data Governance

Data Governance Government Data Lake High Quality Data

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

JANUARY 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Data Lake

Data Lake Building High Quality Data AWS

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. Visit [dataengineeringpodcast.com/data-council]([link] and use code *depod20* to register today!

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Kafka

Kafka Data Lake High Quality Data SQL

Stitching Together Enterprise Analytics With Microsoft Fabric

Data Engineering Podcast

JUNE 23, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Data lakes in various forms have been gaining significant popularity as a unified interface to an organization's analytics. When is Fabric the wrong choice?

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Machine Learning Data Pipeline

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Data Engineering Podcast

JUNE 30, 2024

He highlights the role of data teams in modern organizations and how Synq is empowering them to achieve this. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Can you describe how Synq is designed/implemented?

Pipeline-centric

Pipeline-centric Engineering Data Lake High Quality Data

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform. Want to see Starburst in action? Want to see Starburst in action?

Data Lake

Data Lake High Quality Data Metadata Machine Learning

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

SQL

SQL Data Lake High Quality Data Machine Learning

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Release Management For Data Platform Services And Logic

Data Engineering Podcast

MAY 12, 2024

Data lakes are notoriously complex. Starburst Logo]([link] This episode is brought to you by Starburst - an end-to-end data lakehouse platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Data lakes are notoriously complex.

Management

Management Data Lake High Quality Data Machine Learning

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

Data lakes are notoriously complex. Starburst Logo]([link] This episode is brought to you by Starburst - an end-to-end data lakehouse platform for data engineers who are battling to build and scale high quality data pipelines on the data lake. Data lakes are notoriously complex.

Process

Process Data Lake High Quality Data Machine Learning

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data BI Data Workflow

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Data Process

Data Process Process Data Lake High Quality Data

When And How To Conduct An AI Program

Data Engineering Podcast

MARCH 3, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Programming

Programming Data Lake High Quality Data Machine Learning

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Database

Database Technology Data Lake High Quality Data

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Building

Building Data Lake High Quality Data Machine Learning

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

FEBRUARY 11, 2024

In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process.

Data Lake

Data Lake High Quality Data Government Machine Learning

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern customer data platform (CDP).

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Building

Building Data Lake High Quality Data Machine Learning

Version Your Data Lakehouse Like Your Software With Nessie

Data Engineering Podcast

MARCH 10, 2024

Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. Data lakes are notoriously complex. What is involved in integrating Nessie into a given data stack?

Data Lake

Data Lake High Quality Data Architecture Machine Learning

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Non-relational Database

Non-relational Database Relational Database Database Designing

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Architecture

Architecture Data Lake High Quality Data SQL

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Database

Database Data Lake High Quality Data Data Workflow

Designing Data Platforms For Fintech Companies

Data Engineering Podcast

DECEMBER 31, 2023

In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Designing

Designing Data Lake High Quality Data SQL

Unlocking Your dbt Projects With Practical Advice For Practitioners

Data Engineering Podcast

NOVEMBER 19, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Project

Project Data Lake SQL High Quality Data

Shining Some Light In The Black Box Of PostgreSQL Performance

Data Engineering Podcast

NOVEMBER 5, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

PostgreSQL

PostgreSQL Data Lake SQL High Quality Data

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Data Engineering Podcast

NOVEMBER 12, 2023

If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Data lakes are notoriously complex. Paola Graziano by The Freak Fandango Orchestra / CC BY-SA 3.0

Software Engineer

Software Engineer Software Engineering Engineering Data Lake

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Project

Project Data Lake High Quality Data Data Workflow

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Powered by Trino, the query engine Apache Iceberg was designed for, Starburst is an open platform with support for all table formats including Apache Iceberg, Hive, and Delta Lake.

Systems

Systems Data Lake High Quality Data Google Cloud

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Podcast

DECEMBER 17, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data SQL Architecture

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Podcast

DECEMBER 10, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake SQL High Quality Data Architecture

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Systems

Systems Designing Data Lake SQL

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

There are dozens of data engineering tools available on the market, so familiarity with a wide variety of these can increase your attractiveness as an AI data engineering candidate. Data Storage Solutions As we all know, data can be stored in a variety of ways.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Gearing Up for Gartner Data & Analytics Summit 2025

Monte Carlo

JANUARY 21, 2025

Key Themes Data-Driven Decision-Making : Learn how to build a data-centric culture that drives better outcomes. Data Governance & Ethics : Understand emerging data regulations and ethical frameworks that shape how organizations collect, store, and use data.

Data Analytics

Data Analytics Pipeline-centric Food Data Lake

Data Engineering Weekly #161

Data Engineering Weekly

MARCH 3, 2024

Here is the agenda, 1) Data Application Lifecycle Management - Harish Kumar( Paypal) Hear from the team in PayPal on how they build the data product lifecycle management (DPLM) systems. 3) DataOPS at AstraZeneca The AstraZeneca team talks about data ops best practices internally established and what worked and what didn’t work!!!

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

Modern Data Management Essentials: Exploring Data Fabric

Precisely

JULY 18, 2024

While data fabric is not a standalone solution, critical capabilities that you can address today to prepare for a data fabric include automated data integration, metadata management, centralized data governance, and self-service access by consumers. Increase metadata maturity.

Data Management

Data Management Management Metadata Database-centric

What is Data Fabric: Architecture, Principles, Advantages, and Ways to Implement

AltexSoft

AUGUST 22, 2022

A data fabric is an architecture design presented as an integration and orchestration layer built on top of multiple disjointed data sources like relational databases , data warehouses , data lakes, data marts , IoT , legacy systems, etc., to provide a unified view of all enterprise data.

Architecture

Architecture Metadata Data Lake Machine Learning

Data Fabric: The Future of Data Architecture

Monte Carlo

FEBRUARY 21, 2023

These include: Seamless integration A data fabric’s ability to integrate seamlessly with various APIs and software-delivery kits (SDKs) truly differentiates this framework from another type of data management: the ubiquitous data lake. That’s a model worth looking at when it comes to data governance,” says Bob.

Data Architecture

Data Architecture Architecture Metadata Unstructured Data

Data Fabric: The Future of Data Architecture

Monte Carlo

FEBRUARY 21, 2023

These include: Seamless integration A data fabric’s ability to integrate seamlessly with various APIs and software-delivery kits (SDKs) truly differentiates this framework from another type of data management: the ubiquitous data lake. That’s a model worth looking at when it comes to data governance,” says Bob.

Data Architecture

Data Architecture Architecture Metadata Unstructured Data

[O’Reilly Book] Chapter 1: Why Data Quality Deserves Attention Now

Monte Carlo

AUGUST 31, 2023

Data pipelines can handle both batch and streaming data, and at a high-level, the methods for measuring data quality for either type of asset are much the same. For instance, in the late 2010s, Uber changed all data analysts’ titles to data scientists after an organizational restructure.

Data Lake

Data Lake Data Pipeline Unstructured Data Data Warehouse

How to become Azure Data Engineer I Edureka

Edureka

FEBRUARY 7, 2023

They should also be proficient in programming languages such as Python , SQL , and Scala , and be familiar with big data technologies such as HDFS , Spark , and Hive. Learn programming languages: Azure Data Engineers should have a strong understanding of programming languages such as Python , SQL , and Scala.

Data Engineering

Data Engineering Data Engineer Engineering Programming Language

Data Observability Tools: Types, Capabilities, and Notable Solutions

Databand.ai

JULY 5, 2023

Data catalog and lineage tools: These tools provide visibility into data lineage by tracking the origin, transformation, and consumption of data across the data pipeline. They help organizations understand the dependencies between data sources, processes, and systems, enabling better data governance and impact analysis.

Data Pipeline

Data Pipeline Data Lake Data Warehouse Datasets

Forge Your Career Path with Best Data Engineering Certifications

ProjectPro

FEBRUARY 21, 2023

GCP Data Engineer Certification The Google Cloud Certified Professional Data Engineer certification is ideal for data professionals whose jobs generally involve data governance, data handling, data processing, and performing a lot of feature engineering on data to prepare it for modeling.

Certification

Certification Data Engineering Data Engineer Engineering

Practical First Steps In Data Governance For Long Term Success

Build A Data Lake For Your Security Logs With Scanner

Webinars

Trending Sources

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Webinars

Troubleshooting Kafka In Production

Stitching Together Enterprise Analytics With Microsoft Fabric

Making Email Better With AI At Shortwave

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Being Data Driven At Stripe With Trino And Iceberg

Tackling Real Time Streaming Data With SQL Using RisingWave

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Release Management For Data Platform Services And Logic

X-Ray Vision For Your Flink Stream Processing With Datorios

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

When And How To Conduct An AI Program

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Build Your Second Brain One Piece At A Time

Data Sharing Across Business And Platform Boundaries

Modern Customer Data Platform Principles

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Version Your Data Lakehouse Like Your Software With Nessie

Designing A Non-Relational Database Engine

Addressing The Challenges Of Component Integration In Data Platform Architectures

Reconciling The Data In Your Databases With Datafold

Designing Data Platforms For Fintech Companies

Unlocking Your dbt Projects With Practical Advice For Practitioners

Shining Some Light In The Black Box Of PostgreSQL Performance

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Migration Strategies For Large Scale Systems

Adding An Easy Mode For The Modern Data Stack With 5X

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Designing Data Transfer Systems That Scale

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Gearing Up for Gartner Data & Analytics Summit 2025

Data Engineering Weekly #161

Modern Data Management Essentials: Exploring Data Fabric

What is Data Fabric: Architecture, Principles, Advantages, and Ways to Implement

Data Fabric: The Future of Data Architecture

Data Fabric: The Future of Data Architecture

[O’Reilly Book] Chapter 1: Why Data Quality Deserves Attention Now

How to become Azure Data Engineer I Edureka

Data Observability Tools: Types, Capabilities, and Notable Solutions

Forge Your Career Path with Best Data Engineering Certifications

Stay Connected