Books, Data Management and Technology - Data Engineering Digest

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. Can you describe your experiences with Kafka?

Kafka

Kafka Data Lake High Quality Data SQL

Cloudera wins Risk Markets Technology Award for Data Management Product of the year

Cloudera

FEBRUARY 4, 2021

I am pleased to announce that Cloudera was just named the Risk Data Repository and Data Management Product of the Year in the Risk Markets Technology Awards 2021. . Supporting the industry’s risk data depository and data management needs. End-to-end Data Lifecycle.

Technology

Technology Data Management Management Insurance

21 Best Machine Learning Books for Beginners and Experts

ProjectPro

JUNE 6, 2025

Learning machine learning is easy and quick, and you can learn through machine learning courses, videos, bootcamps, tutorials, and of course, good machine learning books! Each project and book is recommended by ProjectPro’s industry experts, making them the richest sources of practical knowledge in the world of machine learning.

Machine Learning

Machine Learning Deep Learning Algorithm Computer Science

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Writing The Book That Offers A Single Reference For The Fundamentals Of Data Engineering

Data Engineering Podcast

JULY 24, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. What are your goals with this book?

Data Engineer

Data Engineer Data Engineering Lambda Architecture Engineering

Secrets of Spark to Snowflake Migration Success: Customer Stories

Snowflake

NOVEMBER 19, 2024

How will my data stay secure and governed? A critical part of this decision is determining which foundational technology to build infrastructure on. Will it be easy to use for my entire team? What will costs look like? I see these factors as key reasons why organizations of all sizes and industries make the move to Snowflake.

Data Governance

Data Governance Government Healthcare Management

Making The Total Cost Of Ownership For External Data Manageable With Crux

Data Engineering Podcast

JULY 17, 2022

In this episode Crux CTO Mark Etherington discusses the different costs involved in managing external data, how to think about the total return on investment for your data, and how the Crux platform is architected to reduce the toil involved in managing third party data. Tired of deploying bad data?

Data Management

Data Management Management Metadata MongoDB

Data Management Trends From An Investor Perspective

Data Engineering Podcast

JUNE 8, 2020

Summary The landscape of data management and processing is rapidly changing and evolving. This is a useful conversation to gain a macro perspective on where businesses are looking to improve their capabilities to work with data. If you hand a book to a new data engineer, what wisdom would you add to it?

Data Management

Data Management Management Machine Learning Portfolio

Self Service Data Management From Ingest To Insights With Isima

Data Engineering Podcast

NOVEMBER 16, 2020

This was an interesting and contrarian take on the current state of the data management industry and is worth a listen to gain some additional perspective. If you hand a book to a new data engineer, what wisdom would you add to it? What was your motivation for creating a new platform for data applications?

Data Management

Data Management Management BI Business Intelligence

Datapreneurs - How Todays Business Leaders Are Using Data To Define The Future

Data Engineering Podcast

JULY 16, 2023

Bob Muglia has had a front-row seat to many of the major shifts driven by technology over his career. In his recent book "Datapreneurs" he reflects on the people and businesses that he has known and worked with and how they relied on data to deliver valuable services and drive meaningful change.

SQL

SQL Machine Learning Data Engineer Data Engineering

Revisiting The Technical And Social Benefits Of The Data Mesh

Data Engineering Podcast

DECEMBER 26, 2021

In this episode Zhamak re-joins the show to discuss the real world benefits that have been seen, the lessons that she has learned while working with her clients and the community, and her vision for the future of the data mesh. Can you start by giving a brief recap of the principles of the data mesh and the story behind it?

BI

BI Data Warehouse Data Engineer Data Engineering

15 Sample GCP Projects Ideas for Beginners to Practice in 2025

ProjectPro

JUNE 6, 2025

The benefits it offers start from data management and manipulation to machine learning tools on the GCP platform. GCP offers 90 services that span computation, storage, databases, networking, operations, development, data analytics , machine learning , and artificial intelligence , to name a few. Source : 1.bp.blogspot.com

Google Cloud

Google Cloud Project Data Lake Healthcare

Designing A Non-Relational Database Engine

Data Engineering Podcast

APRIL 14, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.

Non-relational Database

Non-relational Database Relational Database Database Designing

Unlocking Your dbt Projects With Practical Advice For Practitioners

Data Engineering Podcast

NOVEMBER 19, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data projects are notoriously complex. With multiple stakeholders to manage across varying backgrounds and toolchains even simple reports can become unwieldy to maintain. What was your path to adoption of dbt?

Project

Project Data Lake High Quality Data SQL

Building A Data Mesh Platform At PayPal

Data Engineering Podcast

FEBRUARY 26, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Are you tired of dealing with the headache that is the 'Modern Data Stack'? It's supposed to make building smarter, faster, and more flexible data infrastructures a breeze. We feel your pain.

Building

Building Metadata Machine Learning Data Integration

How to Become A Data Modeler in 2025?

ProjectPro

JUNE 6, 2025

The data modeler builds, implements, and analyzes data architecture and data modeling solutions using relational, dimensional, and NoSQL databases. Data modelers propose innovative data solutions to help in enterprise data management, business intelligence, machine learning , data science, and other business objectives.

NoSQL

NoSQL ETL Tools Certification SQL

Putting Apache Spark Into Action with Jean Georges Perrin - Episode 60

Data Engineering Podcast

DECEMBER 9, 2018

Jean George Perrin has been so impressed by the versatility of Spark that he is writing a book for data engineers to hit the ground running. He also discusses what you need to know to get it deployed and keep it running in a production environment and how it fits into the overall data ecosystem. Who is the target audience?

Scala

Scala Kafka MySQL Hadoop

Towards a Data Mesh (part 1) : Data Domains and Teams Topologies.

François Nguyen

MARCH 7, 2021

I will try to answer to this question “Could you illustrate your journey towards a Data Mesh ?” ” with 3 articles : this one about Data domains and Team Topologies, a second one devoted to the architecture and the technology and the last one about change management and the needed skills.

Government

Government Data Governance Data Metadata

A Reflection On Learning A Lot More Than 97 Things Every Data Engineer Should Know

Data Engineering Podcast

JANUARY 30, 2022

In addition to that, the host curated the essays contained in the book "97 Things Every Data Engineer Should Know", using the knowledge and context gained from running the show to inform the selection process. Interview Introduction How did you get involved in the area of data management?

Data Engineer

Data Engineer Data Engineering Engineering Data Pipeline

Version Your Data Lakehouse Like Your Software With Nessie

Data Engineering Podcast

MARCH 10, 2024

In this episode Alex Merced explains how the branching and merging functionality in Nessie allows you to use the same versioning semantics for your data lakehouse that you are used to from Git. Can you describe what Nessie is and the story behind it? What are the core problems/complexities that Nessie is designed to solve?

Data Lake

Data Lake High Quality Data Architecture Data Pipeline

Talend ETL Tool - A Comprehensive Guide [2025]

ProjectPro

JUNE 6, 2025

Talend ETL Tool Project Ideas For You Best Books To Learn About Talend ETL Tool Talend ETL Tool Tutorial FAQs on Talend ETL Tool What is Talend ETL? Talend is a leading ETL and big data integration software with an open-source environment for data planning, integration, processing, and cloud storage.

ETL Tools

ETL Tools Big Data Java Metadata

How And Why To Become Data Driven As A Business

Data Engineering Podcast

OCTOBER 13, 2021

In this episode he discusses his experiences and how he approached the work of distilling them for his book "Fail Fast, Learn Faster" This is an entertaining and enlightening exploration of the business side of data with an industry veteran. Can you start by discussing the focus of the book and what motivated you to write it?

Entertainment

Entertainment Big Data Data Lake Business Intelligence

A to Z Guide for Azure Data Fundamentals DP-900 Certification

ProjectPro

JUNE 6, 2025

According to a similar report by Pearson VUE (Value of IT Certification, 2021), 61% of certified tech professionals report getting promoted, 73% report upskilling to keep up with emerging technology, and 76% report higher job satisfaction. Companies that employ Azure prefer qualified Microsoft professionals over non-certified ones.

Certification

Certification Google Cloud Data Lake SQL

Database Refactoring Patterns with Pramod Sadalage - Episode 22

Data Engineering Podcast

MARCH 11, 2018

Practices such as version controlled migration scripts and iterative schema evolution provide the necessary mechanisms to ensure that your data layer is as agile as your application. What was the state of software and database system development at the time and why did you find it necessary to write a book on this subject?

Database

Database MongoDB NoSQL Insurance

Top 8 Data Engineering Books [Beginners to Advanced]

Knowledge Hut

JUNE 30, 2023

Whether you're a beginner looking to dive into the foundations or an experienced practitioner seeking advanced techniques, the right books can be your guiding light. Books on data engineering serve as essential resources to guide you through the vast terrain of data engineering. What is Data Engineering?

Data Engineer

Data Engineer Data Engineering Engineering Data Warehouse

A Practical Introduction To Graph Data Applications

Data Engineering Podcast

AUGUST 3, 2020

Graph data models and the applications built on top of them are perfect for representing relationships and finding emergent structures in your information. This was an informative and enlightening conversation with two experts on graph data applications that will help you start on the right track in your own projects.

NoSQL

NoSQL Relational Database Database Algorithm

Low Friction Data Governance With Immuta

Data Engineering Podcast

DECEMBER 21, 2020

The team at Immuta has built a platform that aims to tackle that problem in a flexible and maintainable fashion so that data teams can easily integrate authorization, data masking, and privacy enhancing technologies into their data infrastructure. What is data governance?

Data Governance

Data Governance Government Data Lake Banking

How to Learn Big Data Step by Step from Scratch in 2025?

ProjectPro

JUNE 6, 2025

Data professionals work in several industry segments, and their contributions apply to all industries. You can work in any sector, including finance, manufacturing, information technology, telecommunications, retail, logistics, and automotive. So now is the right time to choose Big Data as your next career option.

Big Data

Big Data Big Data Skills Scala Hadoop

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Data Engineering Podcast

AUGUST 3, 2021

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management You listen to this show to learn about all of the latest tools, patterns, and practices that power data engineering projects across every domain. What is involved in migrating an existing data lake to use Hudi?

Data Lake

Data Lake Data Warehouse Hadoop Kafka

An Introduction To Data And Analytics Engineering For Non-Programmers

Data Engineering Podcast

JANUARY 15, 2022

In this episode Brian McMillan shares his work on the book "Building Data Products" and how he is working to educate business users and data professionals about the combination of technical, economical, and business considerations that need to be blended for these projects to succeed. Who is your target audience?

Engineering

Engineering Electronics ETL Tools Data Pipeline

Best MLOps Certifications To Boost Your Career In 2025

ProjectPro

JUNE 6, 2025

Overall Career Growth The increasing adoption of ML technologies across industries has generated a high demand for professionals with MLOps expertise. Job listings for data/software engineering roles often prioritize candidates with GCP Professional Machine Learning Engineer certification, indicating its impact on the hiring process.

Certification

Certification Google Cloud AWS Machine Learning

Best Data Science Books for Beginners and Experienced [2024]

Knowledge Hut

DECEMBER 26, 2023

In recent years, with the advent of technology, data has been considered to be a valuable asset in both large-scale and small-scale organizations. Data as a resource requires skilled professionals to be collected, interpreted, and stored safely. Here you will find a consolidated list of the best books to learn data science.

Data Science

Data Science Programming Language Scala R (Programming)

AWS Certified Solutions Architect Associate Books for 2024

Knowledge Hut

JANUARY 2, 2024

All you require, if you fit into any of these descriptions, is the ideal book to know it all. So, without any delay, let us delve into the AWS certified solutions architect professional books for you to refer to and excel in this career field. These books serve as valuable companions in your quest to master the AWS ecosystem: 1.

AWS

AWS Amazon Web Services Certification Architecture

Clean Up Your Data Using Scalable Entity Resolution And Data Mastering With Zingg

Data Engineering Podcast

NOVEMBER 6, 2022

In this episode she shares the story behind the project, the details of how it is implemented, and how you can use it for your own data projects. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Who is the target audience for Zingg?

MongoDB

MongoDB Scala MySQL Data Lake

Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams

Data Engineering Podcast

DECEMBER 28, 2022

In this episode he shares his thoughts on the strategic and tactical elements of moving your work as a data professional from being task-oriented to being product-oriented and the long term improvements in your productivity that it provides. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold.

Data Lake

Data Lake Data Warehouse Data Pipeline MongoDB

Data Science for IOT: Challenges, Technologies, Importance

Knowledge Hut

NOVEMBER 19, 2023

With advances in technology, wearable devices do provide some trace of your health. How do they collect, process, and analyze data for you? This is called the IoT serving pattern for downstream use cases in the book Fundamentals of Data Engineering by Joe Reis and Matt Housley. Data Science for IoT has its own share.

Data Science

Data Science Technology Machine Learning Big Data Skills

Combining The Simplicity Of Spreadsheets With The Power Of Modern Data Infrastructure At Canvas

Data Engineering Podcast

JUNE 19, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.

Metadata

Metadata Unstructured Data MongoDB MySQL

Tame The Entropy In Your Data Stack And Prevent Failures With Sifflet

Data Engineering Podcast

NOVEMBER 20, 2022

In this episode CEO and founder Salma Bakouk shares her views on the causes and impacts of "data entropy" and how you can tame it before it leads to failures. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows.

Data Lake

Data Lake MongoDB Data Ingestion Scala

data.world with Bryon Jacob - Episode 9

Data Engineering Podcast

DECEMBER 2, 2017

Continuous delivery lets you get new features in front of your users as fast as possible without introducing bugs or breaking production and GoCD is the open source platform made by the people at Thoughtworks who wrote the book about it. The platform that you have built provides hosting for a large variety of data sizes and types.

Data Pipeline

Data Pipeline Data Engineer Data Engineering Architecture

Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems

Data Engineering Podcast

DECEMBER 25, 2022

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you're ready to build your next pipeline, or want to test out the projects you hear about on the show, you'll need somewhere to deploy it, so check out our friends at Linode.

Machine Learning

Machine Learning Systems Data Lake Data Warehouse

Connecting To The Next Frontier Of Computing With Quantum Networks

Data Engineering Podcast

APRIL 17, 2022

Summary The next paradigm shift in computing is coming in the form of quantum technologies. In this episode Prineha Narang, co-founder and CTO of Aliro, explains how these systems work, the capabilities that they can offer, and how you can start preparing for a post-quantum future for your data systems. what limitations does it remove?)

Data Warehouse

Data Warehouse SQL Data Engineer Data Engineering

Beginners Guide to Azure Synapse Analytics for Data Engineers

ProjectPro

JUNE 6, 2025

Microsoft's Azure Synapse Analytics (formerly SQL Data Warehouse) is a cloud data warehouse that combines data integration , data exploration, enterprise data warehousing, and big data analytics to offer a unified workspace for creating end-to-end analytics solutions.

Data Engineer

Data Engineer Data Engineering Engineering Data Lake

Analytics Engineering Without The Friction Of Complex Pipeline Development With Optimus and dbt

Data Engineering Podcast

OCTOBER 30, 2022

Summary One of the most impactful technologies for data analytics in recent years has been dbt. It’s hard to have a conversation about data engineering or analysis without mentioning it. Despite its widespread adoption there are still rough edges in its workflow that cause friction for data analysts.

Engineering

Engineering MongoDB Scala MySQL

Bring Geospatial Analytics Across Disparate Datasets Into Your Toolkit With The Unfolded Platform

Data Engineering Podcast

JUNE 26, 2022

In this episode Isaac Brodsky explains how the Unfolded platform is architected, their experience joining the team at Foursquare, and how you can start using it for analyzing your spatial data today. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows.

Datasets

Datasets Unstructured Data Metadata MongoDB

Confluent Schema Registry with Ewen Cheslack-Postava - Episode 10

Data Engineering Podcast

DECEMBER 10, 2017

Continuous delivery lets you get new features in front of your users as fast as possible without introducing bugs or breaking production and GoCD is the open source platform made by the people at Thoughtworks who wrote the book about it. Go to dataengineeringpodcast.com/gocd to download and launch it today.

Kafka

Kafka Data Pipeline Data Engineer Data Engineering

Troubleshooting Kafka In Production

Cloudera wins Risk Markets Technology Award for Data Management Product of the year

Webinars

Trending Sources

21 Best Machine Learning Books for Beginners and Experts

Webinars

Writing The Book That Offers A Single Reference For The Fundamentals Of Data Engineering

Secrets of Spark to Snowflake Migration Success: Customer Stories

Making The Total Cost Of Ownership For External Data Manageable With Crux

Data Management Trends From An Investor Perspective

Self Service Data Management From Ingest To Insights With Isima

Datapreneurs - How Todays Business Leaders Are Using Data To Define The Future

Revisiting The Technical And Social Benefits Of The Data Mesh

15 Sample GCP Projects Ideas for Beginners to Practice in 2025

Designing A Non-Relational Database Engine

Unlocking Your dbt Projects With Practical Advice For Practitioners

Building A Data Mesh Platform At PayPal

How to Become A Data Modeler in 2025?

Putting Apache Spark Into Action with Jean Georges Perrin - Episode 60

Towards a Data Mesh (part 1) : Data Domains and Teams Topologies.

A Reflection On Learning A Lot More Than 97 Things Every Data Engineer Should Know

Version Your Data Lakehouse Like Your Software With Nessie

Talend ETL Tool - A Comprehensive Guide [2025]

How And Why To Become Data Driven As A Business

A to Z Guide for Azure Data Fundamentals DP-900 Certification

Database Refactoring Patterns with Pramod Sadalage - Episode 22

Top 8 Data Engineering Books [Beginners to Advanced]

A Practical Introduction To Graph Data Applications

Low Friction Data Governance With Immuta

How to Learn Big Data Step by Step from Scratch in 2025?

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

An Introduction To Data And Analytics Engineering For Non-Programmers

Best MLOps Certifications To Boost Your Career In 2025

Best Data Science Books for Beginners and Experienced [2024]

AWS Certified Solutions Architect Associate Books for 2024

Clean Up Your Data Using Scalable Entity Resolution And Data Mastering With Zingg

Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams

Data Science for IOT: Challenges, Technologies, Importance

Combining The Simplicity Of Spreadsheets With The Power Of Modern Data Infrastructure At Canvas

Tame The Entropy In Your Data Stack And Prevent Failures With Sifflet

data.world with Bryon Jacob - Episode 9

Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems

Connecting To The Next Frontier Of Computing With Quantum Networks

Beginners Guide to Azure Synapse Analytics for Data Engineers

Analytics Engineering Without The Friction Of Complex Pipeline Development With Optimus and dbt

Bring Geospatial Analytics Across Disparate Datasets Into Your Toolkit With The Unfolded Platform

Confluent Schema Registry with Ewen Cheslack-Postava - Episode 10

Stay Connected