This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or datalake. Support DataEngineering Podcast RudderStack also supports real-time use cases.
Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. Can you describe what RisingWave is and the story behind it? Starburst : ![Starburst
Summary A data lakehouse is intended to combine the benefits of datalakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Datalakes are notoriously complex. Visit: dataengineeringpodcast.com/data-council today. Your first 30 days are free!
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Datalakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.
Summary Datalake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With more real-time requirements and the increasing use of streaming data there has been a struggle to merge fast, incremental updates with large, historical analysis.
In that time there have been a number of generational shifts in how dataengineering is done. Materialize’s PostgreSQL-compatible interface lets users leverage the tools they already use, with unsurpassed simplicity enabled by full ANSI SQL support.
One job that has become increasingly popular across enterprise data teams is the role of the AI dataengineer. Demand for AI dataengineers has grown rapidly in data-driven organizations. But what does an AI dataengineer do? Table of Contents What Does an AI DataEngineer Do?
Summary Datalakes offer a great deal of flexibility and the potential for reduced cost for your analytics, but they also introduce a great deal of complexity. What used to be entirely managed by the database engine is now a composition of multiple systems that need to be properly configured to work in concert.
Learn dataengineering, all the references ( credits ) This is a special edition of the Data News. But right now I'm in holidays finishing a hiking week in Corsica 🥾 So I wrote this special edition about: how to learn dataengineering in 2024. Who are the dataengineers?
Before it migrated to Snowflake in 2022, WHOOP was using a catalog of tools — Amazon Redshift for SQL queries and BI tooling, Dremio for a datalake, PostgreSQL databases and others — that had ultimately become expensive to manage and difficult to maintain, let alone scale.
Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. These patterns include both centralized storage patterns like data warehouse , datalake and data lakehouse , and distributed patterns such as data mesh.
Summary Building and maintaining a datalake is a choose your own adventure of tools, services, and evolving best practices. The flexibility and freedom that datalakes provide allows for generating significant value, but it can also lead to anti-patterns and inconsistent quality in your analytics.
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Datalakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.
However, in the typical enterprise, only a small team has the core skills needed to gain access and create value from streams of data. This dataengineering skillset typically consists of Java or Scala programming skills mated with deep DevOps acumen. SQL as the democratization enabler. A rare breed.
In addition to free assessments and free table conversions, SnowConvert now supports accurate conversion of database views from Teradata, Oracle or SQL Server for free. Sensitive data can have enormous value but is oftentimes locked down due to privacy requirements.
GetInData writes an excellent summary of adding data quality checks in a Flink streaming pipeline. link] Fernando Borretti: Composable SQL One of the biggest challenges in SQL is the unit testing. The author highlights three key challenges in SQL.
Summary A datalake can be a highly valuable resource, as long as it is well built and well managed. In this episode Yoni Iny, CTO of Upsolver, discusses the various components that are necessary for a successful datalake project, how the Upsolver platform is architected, and how modern datalakes can benefit your organization.
Summary The Presto project has become the de facto option for building scalable open source analytics in SQL for the datalake. Can you give an overview of the options that are available for someone wanting to use its SQLengine for querying their data? Hudi, Delta Lake, Iceberg, Nessie, LakeFS, etc.).
Data Access API over DataLake Tables Without the Complexity Build a robust GraphQL API service on top of your S3 datalake files with DuckDB and Go Photo by Joshua Sortino on Unsplash 1. This data might be primarily used for internal reporting, but might also be valuable for other services in our organization.
Summary One of the perennial challenges posed by datalakes is how to keep them up to date as new data is collected. In this episode Ori Rafael shares his experiences from Upsolver and building scalable stream processing for integrating and analyzing data, and what the tradeoffs are when coming from a batch oriented mindset.
In this episode Pete Hunt, CEO of Dagster labs, outlines these new capabilities, how they reduce the burden on data teams, and the increased collaboration that they enable across teams and business units. Datalakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team. Datalakes are notoriously complex. With Materialize, you can!
Summary Maintaining a single source of truth for your data is the biggest challenge in dataengineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. Datalakes are notoriously complex. Your first 30 days are free!
Summary The dbt project has become overwhelmingly popular across analytics and dataengineering teams. Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Data projects are notoriously complex. Datalakes are notoriously complex.
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Datalakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.
Colleen Tartow has worked across all stages of the data lifecycle, and in this episode she shares her hard-earned wisdom about how to conduct an AI program for your organization. Datalakes are notoriously complex. Visit dataengineeringpodcast.com/data-council and use code dataengpod20 to register today!
Learn More → Notion: Building and scaling Notion’s datalake Notion writes about scaling the datalake by bringing critical data ingestion operations in-house. Hudi seems to be a de facto choice for CDC datalake features. Notion migrated the insert heavy workload from Snowflake to Hudi.
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Datalakes are notoriously complex. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data.
In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database. Datalakes are notoriously complex. How have the requirements and applications of NoSQL engines changed since they first became popular ~15 years ago?
In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern customer data platform (CDP).
In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain. Datalakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process. Support DataEngineering Podcast Summary Sharing data is a simple concept, but complicated to implement well.
Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between datalake and warehouse capabilities is the catalog. Datalakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. Datalakes are notoriously complex. Visit dataengineeringpodcast.com/data-council and use code dataengpod20 to register today!
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Datalakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
Summary DataEngineering is still a relatively new field that is going through a continued evolution as new technologies are introduced and new requirements are understood. In this episode Maxime Beauchemin returns to revisit what it means to be a dataengineer and how the role has changed over the past 5 years.
He also explains why he started Decodable to address that limitation and the work that he and his team have done to let dataengineers build streaming pipelines entirely in SQL. Missing data? Start trusting your data with Monte Carlo today! No more scripts, just SQL. Struggling with broken pipelines?
If you need to work with data in your cloud datalake, your on-premise database, or a collection of flat files, then give this episode a listen and then try out Presto today. If you hand a book to a new dataengineer, what wisdom would you add to it?
In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector. Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Datalakes are notoriously complex.
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content