Data Lake, High Quality Data and Machine Learning

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. The Machine Learning Podcast helps you go from idea to production with machine learning.

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

JANUARY 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Data Lake

Data Lake Building High Quality Data AWS

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Kafka

Kafka Data Lake High Quality Data SQL

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Stitching Together Enterprise Analytics With Microsoft Fabric

Data Engineering Podcast

JUNE 23, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Data lakes in various forms have been gaining significant popularity as a unified interface to an organization's analytics. When is Fabric the wrong choice?

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Making Email Better With AI At Shortwave

Data Engineering Podcast

APRIL 21, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Machine Learning Data Pipeline

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform. The Machine Learning Podcast helps you go from idea to production with machine learning.

Data Lake

Data Lake High Quality Data Metadata Machine Learning

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

SQL

SQL Data Lake High Quality Data Machine Learning

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Data Engineering Podcast

JUNE 30, 2024

He highlights the role of data teams in modern organizations and how Synq is empowering them to achieve this. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Don't forget to check out our other shows.

Pipeline-centric

Pipeline-centric Engineering Data Lake High Quality Data

Release Management For Data Platform Services And Logic

Data Engineering Podcast

MAY 12, 2024

Data lakes are notoriously complex. The Machine Learning Podcast helps you go from idea to production with machine learning. If you've learned something or tried out a project from the show then tell us about it! Data lakes are notoriously complex.

Management

Management Data Lake High Quality Data Machine Learning

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

Data lakes are notoriously complex. The Machine Learning Podcast helps you go from idea to production with machine learning. If you've learned something or tried out a project from the show then tell us about it! Data lakes are notoriously complex.

Process

Process Data Lake High Quality Data Machine Learning

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Building

Building Data Lake High Quality Data Machine Learning

Practical First Steps In Data Governance For Long Term Success

Data Engineering Podcast

JUNE 2, 2024

In this episode she shares the practical steps to implementing a data governance practice in your organization, and the pitfalls to avoid. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Data Governance

Data Governance Government Data Lake High Quality Data

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Data Engineering Podcast

APRIL 7, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data BI Data Workflow

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Building

Building Data Lake High Quality Data Machine Learning

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Data Process

Data Process Process Data Lake High Quality Data

When And How To Conduct An AI Program

Data Engineering Podcast

MARCH 3, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Programming

Programming Data Lake High Quality Data Machine Learning

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

FEBRUARY 25, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Database

Database Technology Data Lake High Quality Data

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

FEBRUARY 11, 2024

In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process. The Machine Learning Podcast helps you go from idea to production with machine learning.

Data Lake

Data Lake High Quality Data Government Machine Learning

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern customer data platform (CDP).

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Version Your Data Lakehouse Like Your Software With Nessie

Data Engineering Podcast

MARCH 10, 2024

Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. Data lakes are notoriously complex. What is involved in integrating Nessie into a given data stack?

Data Lake

Data Lake High Quality Data Architecture Machine Learning

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Non-relational Database

Non-relational Database Relational Database Database Designing

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Architecture

Architecture Data Lake High Quality Data SQL

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Data Engineering Podcast

NOVEMBER 12, 2023

If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Data lakes are notoriously complex. Can you describe what Tabnine is and the story behind it?

Software Engineering

Software Engineering Software Engineer Engineering Data Lake

Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

MARCH 17, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Database

Database Data Lake High Quality Data Data Workflow

Unlocking Your dbt Projects With Practical Advice For Practitioners

Data Engineering Podcast

NOVEMBER 19, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Project

Project Data Lake SQL High Quality Data

Shining Some Light In The Black Box Of PostgreSQL Performance

Data Engineering Podcast

NOVEMBER 5, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

PostgreSQL

PostgreSQL Data Lake SQL High Quality Data

Designing Data Platforms For Fintech Companies

Data Engineering Podcast

DECEMBER 31, 2023

In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex.

Designing

Designing Data Lake High Quality Data SQL

Data Migration Strategies For Large Scale Systems

Data Engineering Podcast

MAY 26, 2024

In this episode he shares some of the valuable lessons that he learned about how to make those projects successful. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Don't forget to check out our other shows.

Systems

Systems Data Lake High Quality Data Google Cloud

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.

Project

Project Data Lake High Quality Data Data Workflow

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Podcast

DECEMBER 17, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data SQL Architecture

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Podcast

DECEMBER 10, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake SQL High Quality Data Architecture

Designing Data Transfer Systems That Scale

Data Engineering Podcast

DECEMBER 3, 2023

Learn more about Datafold by visiting dataengineeringpodcast.com/datafold today! Data lakes are notoriously complex. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises.

Systems

Systems Designing Data Lake SQL

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

Key Differences Between AI Data Engineers and Traditional Data Engineers While traditional data engineers and AI data engineers have similar responsibilities, they ultimately differ in where they focus their efforts. Let’s dive into the tools necessary to become an AI data engineer.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Gearing Up for Gartner Data & Analytics Summit 2025

Monte Carlo

JANUARY 21, 2025

Key Themes Data-Driven Decision-Making : Learn how to build a data-centric culture that drives better outcomes. Advanced Analytics & AI : See how the latest machine learning and artificial intelligence solutions are transforming industries. Its a unique blend of business and technical expertise under one roof.

Data Analytics

Data Analytics Pipeline-centric Food Data Lake

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

Cloudera

OCTOBER 7, 2022

Cloudera’s mission, values, and culture have long centered around using open source engines on open data and table formats to enable customers to build flexible and open data lakes. The Open Data Lakehouse . CDP Public Cloud via Cloudera Machine Learning. dbt-impala . dbt-spark-livy. dbt-spark-cde.

Data Warehouse

Data Warehouse Data Lake Government High Quality Data

5 Layers of Data Lakehouse Architecture Explained

Monte Carlo

JANUARY 5, 2024

Data lakehouse architecture combines the benefits of data warehouses and data lakes, bringing together the structure and performance of a data warehouse with the flexibility of a data lake. A visualization of the flow of data in data lakehouse architecture vs. data warehouse and data lake.

Architecture

Architecture Data Lake Metadata Unstructured Data

Data Lakehouse Architecture Explained: 5 Layers

Monte Carlo

JANUARY 5, 2024

Data lakehouse architecture combines the benefits of data warehouses and data lakes, bringing together the structure and performance of a data warehouse with the flexibility of a data lake. A visualization of the flow of data in data lakehouse architecture vs. data warehouse and data lake.

Architecture

Architecture Data Lake Metadata Unstructured Data

What is Data Observability? 5 Key Pillars To Know

Monte Carlo

AUGUST 10, 2023

Data observability tools should use machine learning models to automatically learn your environment and your data. It minimizes false positives by taking into account not just individual metrics, but a holistic view of your data and the potential impact from any particular issue. It is still relevant today.

Data Pipeline

Data Pipeline Software Engineering Software Engineer Machine Learning

Modern Data Management Essentials: Exploring Data Fabric

Precisely

JULY 18, 2024

A data fabric offers several key benefits that transform your data management: Accelerates analytics and decision-making processes by enhancing data accessibility through seamless data integration and retrieval across diverse environments. Increase metadata maturity.

Data Management

Data Management Management Metadata Database-centric

What is Data Fabric: Architecture, Principles, Advantages, and Ways to Implement

AltexSoft

AUGUST 22, 2022

A data fabric is an architecture design presented as an integration and orchestration layer built on top of multiple disjointed data sources like relational databases , data warehouses , data lakes, data marts , IoT , legacy systems, etc., to provide a unified view of all enterprise data.

Architecture

Architecture Metadata Data Lake Machine Learning

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

You can use it for big data analytics and machine learning workloads. Azure Databricks Delta Live Table s: These provide a more straightforward way to build and manage Data Pipelines for the latest, high-quality data in Delta Lake. Azure Blob Storage serves as the data lake to store raw data.

Data Pipeline

Data Pipeline BI Machine Learning Data Preparation

[O’Reilly Book] Chapter 1: Why Data Quality Deserves Attention Now

Monte Carlo

AUGUST 31, 2023

Data pipelines can handle both batch and streaming data, and at a high-level, the methods for measuring data quality for either type of asset are much the same. Data quality can be impacted at any stage of the data pipeline, before ingestion, in production, or even during analysis.

Data Lake

Data Lake Data Pipeline Unstructured Data Data Warehouse

How to become Azure Data Engineer I Edureka

Edureka

FEBRUARY 7, 2023

They should also be proficient in programming languages such as Python , SQL , and Scala , and be familiar with big data technologies such as HDFS , Spark , and Hive. Learn programming languages: Azure Data Engineers should have a strong understanding of programming languages such as Python , SQL , and Scala.

Data Engineering

Data Engineering Data Engineer Engineering Programming Language

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Build A Data Lake For Your Security Logs With Scanner

Webinars

Trending Sources

Troubleshooting Kafka In Production

Webinars

Stitching Together Enterprise Analytics With Microsoft Fabric

Making Email Better With AI At Shortwave

Being Data Driven At Stripe With Trino And Iceberg

Tackling Real Time Streaming Data With SQL Using RisingWave

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Release Management For Data Platform Services And Logic

X-Ray Vision For Your Flink Stream Processing With Datorios

Build Your Second Brain One Piece At A Time

Practical First Steps In Data Governance For Long Term Success

Establish A Single Source Of Truth For Your Data Consumers With A Semantic Layer

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

When And How To Conduct An AI Program

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Sharing Across Business And Platform Boundaries

Modern Customer Data Platform Principles

Version Your Data Lakehouse Like Your Software With Nessie

Designing A Non-Relational Database Engine

Addressing The Challenges Of Component Integration In Data Platform Architectures

Enhancing The Abilities Of Software Engineers With Generative AI At Tabnine

Reconciling The Data In Your Databases With Datafold

Unlocking Your dbt Projects With Practical Advice For Practitioners

Shining Some Light In The Black Box Of PostgreSQL Performance

Designing Data Platforms For Fintech Companies

Data Migration Strategies For Large Scale Systems

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Adding An Easy Mode For The Modern Data Stack With 5X

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Designing Data Transfer Systems That Scale

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Gearing Up for Gartner Data & Analytics Summit 2025

Cloudera’s Open Data Lakehouse Supercharged with dbt Core(tm)

5 Layers of Data Lakehouse Architecture Explained

Data Lakehouse Architecture Explained: 5 Layers

What is Data Observability? 5 Key Pillars To Know

Modern Data Management Essentials: Exploring Data Fabric

What is Data Fabric: Architecture, Principles, Advantages, and Ways to Implement

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

[O’Reilly Book] Chapter 1: Why Data Quality Deserves Attention Now

How to become Azure Data Engineer I Edureka

Stay Connected