This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
The goal of this post is to understand how data integrity best practices have been embraced time and time again, no matter the technology underpinning. In the beginning, there was a data warehouse The data warehouse (DW) was an approach to data architecture and structured datamanagement that really hit its stride in the early 1990s.
Summary Modern businesses aspire to be data driven, and technologists enjoy working through the challenge of building data systems to support that goal. Datagovernance is the binding force between these two parts of the organization. At what point does a lack of an explicit governance policy become a liability?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementDatalakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.
Summary A data lakehouse is intended to combine the benefits of datalakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Datalakes are notoriously complex. To start, can you share your definition of what constitutes a "Data Lakehouse"?
Summary Datagovernance is a term that encompasses a wide range of responsibilities, both technical and process oriented. One of the more complex aspects is that of access control to the data assets that an organization is responsible for managing. What is datagovernance?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.
Summary Datagovernance is a practice that requires a high degree of flexibility and collaboration at the organizational and technical levels. The growing prominence of cloud and hybrid environments in datamanagement adds additional stress to an already complex endeavor. Closing Announcements Thank you for listening!
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementDatalakes are notoriously complex. Datalakes in various forms have been gaining significant popularity as a unified interface to an organization's analytics. When is Fabric the wrong choice?
He highlights the role of data teams in modern organizations and how Synq is empowering them to achieve this. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementDatalakes are notoriously complex. Can you describe what Synq is and the story behind it?
Summary Datalakes are gaining popularity due to their flexibility and reduced cost of storage. Along with the benefits there are some additional complexities to consider, including how to safely integrate new data sources or test out changes to existing pipelines. What do you have planned for the future of the project?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Dagster offers a new approach to building and running data platforms and data pipelines. Datalakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
With Hybrid Tables’ fast, high-concurrency point operations, you can store application and workflow state directly in Snowflake, serve data without reverse ETL and build lightweight transactional apps while maintaining a single governance and security model for both transactional and analytical data — all on one platform.
In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementDatalakes are notoriously complex.
In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform. Can you describe what role Trino and Iceberg play in Stripe's data architecture?
In this episode Pete Hunt, CEO of Dagster labs, outlines these new capabilities, how they reduce the burden on data teams, and the increased collaboration that they enable across teams and business units. Datalakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement This episode is supported by Code Comments, an original podcast from Red Hat. Datalakes are notoriously complex. My thanks to the team at Code Comments for their support. Want to see Starburst in action?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementDatalakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.
In this episode Artyom Keydunov, creator of Cube, discusses the evolution and applications of the semantic layer as a component of your data platform, and how Cube provides speed and cost optimization for your data consumers. Datalakes are notoriously complex. Your first 30 days are free!
In an effort to better understand where datagovernance is heading, we spoke with top executives from IT, healthcare, and finance to hear their thoughts on the biggest trends, key challenges, and what insights they would recommend. Get the Trendbook What is the Impact of DataGovernance on GenAI?
Colleen Tartow has worked across all stages of the data lifecycle, and in this episode she shares her hard-earned wisdom about how to conduct an AI program for your organization. Datalakes are notoriously complex. What do you have planned for the future of your work at VAST Data? Your first 30 days are free!
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Dagster offers a new approach to building and running data platforms and data pipelines. Datalakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Dagster offers a new approach to building and running data platforms and data pipelines. Datalakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement This episode is supported by Code Comments, an original podcast from Red Hat. Datalakes are notoriously complex. Datalakes are notoriously complex. Want to see Starburst in action?
In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process. What is the current state of the ecosystem for data sharing protocols/practices/platforms?
In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern customer data platform (CDP).
Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between datalake and warehouse capabilities is the catalog. Datalakes are notoriously complex. What is involved in integrating Nessie into a given data stack?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement This episode is brought to you by Datafold – a testing automation platform for data engineers that prevents data quality issues from entering every part of your data workflow, from migration to dbt deployment.
In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team. Datalakes are notoriously complex.
In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain. Datalakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementDatalakes are notoriously complex.
In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data. Datalakes are notoriously complex. Can you start by outlining some of the situations where reconciling data between databases is needed?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementData projects are notoriously complex. With multiple stakeholders to manage across varying backgrounds and toolchains even simple reports can become unwieldy to maintain. Datalakes are notoriously complex.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.
In August, we wrote about how in a future where distributed data architectures are inevitable, unifying and managing operational and business metadata is critical to successfully maximizing the value of data, analytics, and AI.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementDatalakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementDatalakes are notoriously complex. Can you start by sharing some of your experiences with data migration projects? Closing Announcements Thank you for listening! Want to see Starburst in action?
In this episode founder Tarush Aggarwal explains how the realities of the modern data stack are impacting data teams and the work that they are doing to accelerate time to value. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement You shouldn't have to throw away the database to build with fast-changing data. Datalakes are notoriously complex. That’s three free boards at dataengineeringpodcast.com/miro.
Andrei Tserakhau has dedicated his careeer to this problem, and in this episode he shares the lessons that he has learned and the work he is doing on his most recent data transfer system at DoubleCloud. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles.
While data warehouses are still in use, they are limited in use-cases as they only support structured data. Datalakes add support for semi-structured and unstructured data, and data lakehouses add further flexibility with better governance in a true hybrid solution built from the ground-up.
Both traditional and AI data engineers should be fluent in SQL for managing structured data, but AI data engineers should be proficient in NoSQL databases as well for unstructured datamanagement. Data Storage Solutions As we all know, data can be stored in a variety of ways.
With the amount of data companies are using growing to unprecedented levels, organizations are grappling with the challenge of efficiently managing and deriving insights from these vast volumes of structured and unstructured data. What is a DataLake? Consistency of data throughout the datalake.
Datalakes are useful, flexible data storage repositories that enable many types of data to be stored in its rawest state. Traditionally, after being stored in a datalake, raw data was then often moved to various destinations like a data warehouse for further processing, analysis, and consumption.
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content