This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
Summary Modern businesses aspire to be data driven, and technologists enjoy working through the challenge of building data systems to support that goal. Datagovernance is the binding force between these two parts of the organization. At what point does a lack of an explicit governance policy become a liability?
In this episode Dain Sundstrom, CTO of Starburst, explains how the combination of the Trino query engine and the Iceberg table format offer the ease of use and execution speed of data warehouses with the infinite storage and scalability of data lakes. Data lakes are notoriously complex. Your first 30 days are free!
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.
He highlights the role of data teams in modern organizations and how Synq is empowering them to achieve this. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementData lakes are notoriously complex. Can you describe what Synq is and the story behind it?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementData lakes are notoriously complex. Data lakes in various forms have been gaining significant popularity as a unified interface to an organization's analytics. Closing Announcements Thank you for listening!
In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform. Can you describe what role Trino and Iceberg play in Stripe's data architecture?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your dataworkflow, from migration to dbt deployment.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
Deploy DataOps DataOps , or Data Operations, is an approach that applies the principles of DevOps to datamanagement. It aims to streamline and automate dataworkflows, enhance collaboration and improve the agility of data teams. How effective are your current dataworkflows?
In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementData lakes are notoriously complex.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your dataworkflow, from migration to dbt deployment.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement This episode is supported by Code Comments, an original podcast from Red Hat. Data lakes are notoriously complex. My thanks to the team at Code Comments for their support. Closing Announcements Thank you for listening!
In this episode Pete Hunt, CEO of Dagster labs, outlines these new capabilities, how they reduce the burden on data teams, and the increased collaboration that they enable across teams and business units. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free!
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementData lakes are notoriously complex. What are the open questions today in technical scalability of data engines? What are the open questions today in technical scalability of data engines?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement This episode is supported by Code Comments, an original podcast from Red Hat. Data lakes are notoriously complex. Data lakes are notoriously complex. My thanks to the team at Code Comments for their support.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementData lakes are notoriously complex. Contact Info LinkedIn Parting Question From your perspective, what is the biggest gap in the tooling or technology for datamanagement today?
Deploy DataOps DataOps , or Data Operations, is an approach that applies the principles of DevOps to datamanagement. It aims to streamline and automate dataworkflows, enhance collaboration and improve the agility of data teams. How effective are your current dataworkflows?
Summary A significant portion of dataworkflows involve storing and processing information in database engines. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data. Data lakes are notoriously complex.
In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process. What is the current state of the ecosystem for data sharing protocols/practices/platforms?
In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern customer data platform (CDP).
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementData lakes are notoriously complex. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication. Your first 30 days are free!
Colleen Tartow has worked across all stages of the data lifecycle, and in this episode she shares her hard-earned wisdom about how to conduct an AI program for your organization. Data lakes are notoriously complex. What do you have planned for the future of your work at VAST Data? Your first 30 days are free!
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementData lakes are notoriously complex.
In the realm of big data and AI, managing and securing data assets efficiently is crucial. Databricks addresses this challenge with Unity Catalog, a comprehensive governance solution designed to streamline and secure datamanagement across Databricks workspaces. Advantages of the Unity Catalog 1.
In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team. Data lakes are notoriously complex.
Andrei Tserakhau has dedicated his careeer to this problem, and in this episode he shares the lessons that he has learned and the work he is doing on his most recent data transfer system at DoubleCloud. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles.
In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain. Data lakes are notoriously complex. Data lakes are notoriously complex.
In this episode Alex Merced explains how the branching and merging functionality in Nessie allows you to use the same versioning semantics for your data lakehouse that you are used to from Git. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free!
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Data lakes are notoriously complex. Your first 30 days are free!
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementData lakes are notoriously complex. Can you start by sharing some of your experiences with data migration projects? Can you start by sharing some of your experiences with data migration projects?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementData projects are notoriously complex. With multiple stakeholders to manage across varying backgrounds and toolchains even simple reports can become unwieldy to maintain. Data lakes are notoriously complex.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.
In this episode founder Tarush Aggarwal explains how the realities of the modern data stack are impacting data teams and the work that they are doing to accelerate time to value. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement You shouldn't have to throw away the database to build with fast-changing data. Data lakes are notoriously complex. That’s three free boards at dataengineeringpodcast.com/miro.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.
They also share their ambitions for the near future of adding data observability and data quality management features. Interview Introduction How did you get involved in the area of datamanagement? Can you describe what Acryl Data is and the story behind it? How is the governance of DataHub being managed?
We have also included vendors for the specific use cases of ModelOps, MLOps, DataGovOps and DataSecOps which apply DataOps principles to machine learning, AI, datagovernance, and data security operations. . Piperr.io — Pre-built data pipelines across enterprise stakeholders, from IT to analytics, tech, data science and LoBs.
They also discuss methods for gaining visibility into the flow of data through your infrastructure, how to diagnose and prevent potential problems, and what they are building at Monte Carlo to help you maintain your data’s uptime. If you hand a book to a new data engineer, what wisdom would you add to it?
She explains how the design of the platform is informed by the needs of managingdata projects for large and small teams across her previous roles, how it integrates with your existing systems, and how it can work to bring everyone onto the same page. What portions of the dataworkflow is Atlan responsible for?
The DataOps framework is a set of practices, processes, and technologies that enables organizations to improve the speed, accuracy, and reliability of their datamanagement and analytics operations. This can be achieved through the use of automated data ingestion, transformation, and analysis tools.
DataOps is a collaborative approach to datamanagement that combines the agility of DevOps with the power of data analytics. It aims to streamline data ingestion, processing, and analytics by automating and integrating various dataworkflows.
This is what managingdata without metadata feels like. Often described as “data about data,” it is the unsung hero in datamanagement that ensures our vast amounts of information are not only stored but easily discoverable, organized, and actionable. Chaos, right? What is Metadata?
DataOps , short for data operations, is an emerging discipline that focuses on improving the collaboration, integration, and automation of data processes across an organization. These tools help organizations implement DataOps practices by providing a unified platform for data teams to collaborate, share, and manage their data assets.
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content