This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles. Can you describe your experiences with Kafka?
I am pleased to announce that Cloudera was just named the Risk Data Repository and DataManagement Product of the Year in the Risk Markets Technology Awards 2021. . Supporting the industry’s risk data depository and datamanagement needs. End-to-end Data Lifecycle.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. What are your goals with this book?
How will my data stay secure and governed? A critical part of this decision is determining which foundational technology to build infrastructure on. Will it be easy to use for my entire team? What will costs look like? I see these factors as key reasons why organizations of all sizes and industries make the move to Snowflake.
In this episode Crux CTO Mark Etherington discusses the different costs involved in managing external data, how to think about the total return on investment for your data, and how the Crux platform is architected to reduce the toil involved in managing third party data. Tired of deploying bad data?
Summary The landscape of datamanagement and processing is rapidly changing and evolving. This is a useful conversation to gain a macro perspective on where businesses are looking to improve their capabilities to work with data. If you hand a book to a new data engineer, what wisdom would you add to it?
Bob Muglia has had a front-row seat to many of the major shifts driven by technology over his career. In his recent book "Datapreneurs" he reflects on the people and businesses that he has known and worked with and how they relied on data to deliver valuable services and drive meaningful change.
This was an interesting and contrarian take on the current state of the datamanagement industry and is worth a listen to gain some additional perspective. If you hand a book to a new data engineer, what wisdom would you add to it? What was your motivation for creating a new platform for data applications?
In this episode Zhamak re-joins the show to discuss the real world benefits that have been seen, the lessons that she has learned while working with her clients and the community, and her vision for the future of the data mesh. Can you start by giving a brief recap of the principles of the data mesh and the story behind it?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementData projects are notoriously complex. With multiple stakeholders to manage across varying backgrounds and toolchains even simple reports can become unwieldy to maintain. What was your path to adoption of dbt?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Are you tired of dealing with the headache that is the 'Modern Data Stack'? It's supposed to make building smarter, faster, and more flexible data infrastructures a breeze. We feel your pain.
Jean George Perrin has been so impressed by the versatility of Spark that he is writing a book for data engineers to hit the ground running. He also discusses what you need to know to get it deployed and keep it running in a production environment and how it fits into the overall data ecosystem. Who is the target audience?
I will try to answer to this question “Could you illustrate your journey towards a Data Mesh ?” ” with 3 articles : this one about Data domains and Team Topologies, a second one devoted to the architecture and the technology and the last one about change management and the needed skills.
In addition to that, the host curated the essays contained in the book "97 Things Every Data Engineer Should Know", using the knowledge and context gained from running the show to inform the selection process. Interview Introduction How did you get involved in the area of datamanagement?
In this episode he discusses his experiences and how he approached the work of distilling them for his book "Fail Fast, Learn Faster" This is an entertaining and enlightening exploration of the business side of data with an industry veteran. Can you start by discussing the focus of the book and what motivated you to write it?
Whether you're a beginner looking to dive into the foundations or an experienced practitioner seeking advanced techniques, the right books can be your guiding light. Books on data engineering serve as essential resources to guide you through the vast terrain of data engineering. What is Data Engineering?
Practices such as version controlled migration scripts and iterative schema evolution provide the necessary mechanisms to ensure that your data layer is as agile as your application. What was the state of software and database system development at the time and why did you find it necessary to write a book on this subject?
In this episode Alex Merced explains how the branching and merging functionality in Nessie allows you to use the same versioning semantics for your data lakehouse that you are used to from Git. Can you describe what Nessie is and the story behind it? What are the core problems/complexities that Nessie is designed to solve?
Graph data models and the applications built on top of them are perfect for representing relationships and finding emergent structures in your information. This was an informative and enlightening conversation with two experts on graph data applications that will help you start on the right track in your own projects.
The team at Immuta has built a platform that aims to tackle that problem in a flexible and maintainable fashion so that data teams can easily integrate authorization, data masking, and privacy enhancing technologies into their data infrastructure. What is data governance?
In this episode Brian McMillan shares his work on the book "Building Data Products" and how he is working to educate business users and data professionals about the combination of technical, economical, and business considerations that need to be blended for these projects to succeed. Who is your target audience?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement You listen to this show to learn about all of the latest tools, patterns, and practices that power data engineering projects across every domain. What is involved in migrating an existing data lake to use Hudi?
In recent years, with the advent of technology, data has been considered to be a valuable asset in both large-scale and small-scale organizations. Data as a resource requires skilled professionals to be collected, interpreted, and stored safely. Here you will find a consolidated list of the best books to learn data science.
All you require, if you fit into any of these descriptions, is the ideal book to know it all. So, without any delay, let us delve into the AWS certified solutions architect professional books for you to refer to and excel in this career field. These books serve as valuable companions in your quest to master the AWS ecosystem: 1.
In this episode she shares the story behind the project, the details of how it is implemented, and how you can use it for your own data projects. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Who is the target audience for Zingg?
In this episode he shares his thoughts on the strategic and tactical elements of moving your work as a data professional from being task-oriented to being product-oriented and the long term improvements in your productivity that it provides. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold.
With advances in technology, wearable devices do provide some trace of your health. How do they collect, process, and analyze data for you? This is called the IoT serving pattern for downstream use cases in the book Fundamentals of Data Engineering by Joe Reis and Matt Housley. Data Science for IoT has its own share.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.
In this episode CEO and founder Salma Bakouk shares her views on the causes and impacts of "data entropy" and how you can tame it before it leads to failures. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows.
Continuous delivery lets you get new features in front of your users as fast as possible without introducing bugs or breaking production and GoCD is the open source platform made by the people at Thoughtworks who wrote the book about it. The platform that you have built provides hosting for a large variety of data sizes and types.
Summary The next paradigm shift in computing is coming in the form of quantum technologies. In this episode Prineha Narang, co-founder and CTO of Aliro, explains how these systems work, the capabilities that they can offer, and how you can start preparing for a post-quantum future for your data systems. what limitations does it remove?)
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you're ready to build your next pipeline, or want to test out the projects you hear about on the show, you'll need somewhere to deploy it, so check out our friends at Linode.
Summary One of the most impactful technologies for data analytics in recent years has been dbt. It’s hard to have a conversation about data engineering or analysis without mentioning it. Despite its widespread adoption there are still rough edges in its workflow that cause friction for data analysts.
In this episode Isaac Brodsky explains how the Unfolded platform is architected, their experience joining the team at Foursquare, and how you can start using it for analyzing your spatial data today. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows.
Continuous delivery lets you get new features in front of your users as fast as possible without introducing bugs or breaking production and GoCD is the open source platform made by the people at Thoughtworks who wrote the book about it. Go to dataengineeringpodcast.com/gocd to download and launch it today.
Summary Machine learning is a class of technologies that promise to revolutionize business. How much of your sales process is spent on educating your clients about what AI or ML are and the benefits that these technologies can provide? What basic technology stack is necessary for putting the first ML models into production?
He recently wrote a book on effective patterns for Pandas code, and in this episode he shares advice on how to write efficient data processing routines that will scale with your data volumes, while being understandable and maintainable. What are the main tasks that you have seen Pandas used for in a data engineering context?
Summary Data warehouse technology has been around for decades and has gone through several generational shifts in that time. The current trends in data warehousing are oriented around cloud native architectures that take advantage of dynamic scaling and the separation of compute and storage. When is Firebolt the wrong choice?
What used to be entirely managed by the database engine is now a composition of multiple systems that need to be properly configured to work in concert. In order to bring the DBA into the new era of datamanagement the team at Upsolver added a SQL interface to their data lake platform. We talked last in November of 2018.
The team at Audio Analytic are working to impart a sense of hearing to our myriad devices with their sound recognition technology. This was a great conversation about the complexities of working in a niche domain of data analysis and how to build a pipeline of high quality data from collection to analysis.
Preamble Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. What are some of the primary ways that Flink is used?
Advanced Data Transformation Techniques For data engineers ready to push the boundaries, advanced data transformation techniques offer the tools to tackle complex data challenges and drive innovation. Book a Demo to learn how Ascend combines all these best practices natively in our Unified Data Engineering Platform.
If you are struggling to deliver value from big data, or just starting down the path of building the organizational capacity to turn raw information into valuable products then this is a conversation that you don’t want to miss. If you hand a book to a new data engineer, what wisdom would you add to it?
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content