This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
Editor’s Note: Launching Data & Gen-AI courses in 2025 I can’t believe DEW will reach almost its 200th edition soon. What I started as a fun hobby has become one of the top-rated newsletters in the dataengineering industry. The blog narrates a few examples of Pipe Syntax in comparison with the SQL queries.
The Critical Role of AI DataEngineers in a Data-Driven World How does a chatbot seamlessly interpret your questions? The answer lies in unstructured data processing—a field that powers modern artificial intelligence (AI) systems. How does a self-driving car understand a chaotic street scene?
Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. Can you describe what RisingWave is and the story behind it?
Save Your Spot → Editor’s Note: Data Council 2025, Apr 22-24, Oakland, CA Data Council has always been one of my favorite events to connect with and learn from the dataengineering community. Data Council 2025 is set for April 22-24 in Oakland, CA. link] BVP: Roadmap: Data 3.0
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.
Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. Visit: dataengineeringpodcast.com/data-council today. Your first 30 days are free!
In this post, you will gain insight into common business use cases for large-scale text data analytics. Youll also discover why deploying batch LLM pipelines can be challenging and how Snowflake has optimized Snowflake Cortex AI for batch inference via SQL functions. What are common batch LLM inference jobs?
In today’s data-driven world, developer productivity is essential for organizations to build effective and reliable products, accelerate time to value, and fuel ongoing innovation. Dive in to experience how the enhanced Python API streamlines your dataworkflows and unlocks the full potential of Python within Snowflake.
We expect that over the coming years, structured data is going to become heavily integrated into AI workflows and that dbt will play a key role in building and provisioning this data. We are committed to building the data control plane that enables AI to reliably access structured data from across your entire data lineage.
Summary A significant amount of time in dataengineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. With Materialize, you can! Hex brings everything together.
Summary Maintaining a single source of truth for your data is the biggest challenge in dataengineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. Data lakes are notoriously complex. Your first 30 days are free!
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication. Data lakes are notoriously complex.
In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team. Data lakes are notoriously complex. With Materialize, you can! Rudderstack :  process. What is Data Science? What are the roles and responsibilities of a DataEngineer? And many more.
Summary The dbt project has become overwhelmingly popular across analytics and dataengineering teams. Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Data projects are notoriously complex. Data lakes are notoriously complex.
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Data lakes are notoriously complex. What are the open questions today in technical scalability of dataengines? What are the open questions today in technical scalability of dataengines?
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
Colleen Tartow has worked across all stages of the data lifecycle, and in this episode she shares her hard-earned wisdom about how to conduct an AI program for your organization. Data lakes are notoriously complex. Visit dataengineeringpodcast.com/data-council and use code dataengpod20 to register today!
In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern customer data platform (CDP).
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Data lakes are notoriously complex. Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Data lakes are notoriously complex.
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Data lakes are notoriously complex. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication. Your first 30 days are free!
In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process. Support DataEngineering Podcast Summary Sharing data is a simple concept, but complicated to implement well.
You might even think of effective data transformation like a powerful magnet that draws the needle from the stack, leaving the hay behind. In this blog post, we’ll explore fundamental concepts, intermediate strategies, and cutting-edge techniques that are shaping the future of dataengineering.
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
In this episode Andrey Korchack, CTO of fintech startup Monite, discusses the complexities of designing and implementing a data platform in that sector. Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Data lakes are notoriously complex.
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management You shouldn't have to throw away the database to build with fast-changing data. It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products.
In this episode Alex Merced explains how the branching and merging functionality in Nessie allows you to use the same versioning semantics for your data lakehouse that you are used to from Git. Data lakes are notoriously complex. Visit dataengineeringpodcast.com/data-council and use code dataengpod20 to register today!
In this episode founder Tarush Aggarwal explains how the realities of the modern data stack are impacting data teams and the work that they are doing to accelerate time to value. Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Introducing RudderStack Profiles.
link] Google: SQL Has Problems - We Can Fix Them - Pipe Syntax In SQL It was a good weekend read about the proposed pipe syntax in SQL, which is more similar to Unix pipes in terms of its core concept—sequential data flow and transformation. Unix pipes typically represent a physical flow of data between processes.
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management Data lakes are notoriously complex. As someone who listens to the DataEngineering Podcast, you know that the road from tool selection to production readiness is anything but smooth or straight.
Summary The flexibility of software oriented dataworkflows is useful for fulfilling complex requirements, but for simple and repetitious use cases it adds significant complexity. In this episode Satish Jayanthi explains how he is building a framework to allow enterprises to move quickly while maintaining guardrails for dataworkflows.
In this episode Abe Gong brings his experiences with the Great Expectations project and community to discuss the technical and organizational considerations involved in implementing these constraints to your dataworkflows. Missing data? Missing data? Struggling with broken pipelines? Stale dashboards? Stale dashboards?
Interested in becoming a dataengineer? The need for data experts in the U.S. job market is expected to grow by 22% in this decade, and according to LinkedIn’s 2020 report , a dataengineer is listed as the 8th fastest growing job today. But what is dataengineering exactly and what does a dataengineer do?
Announcements Hello and welcome to the DataEngineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. Missing data?
DataEngineering Weekly Is Brought to You by RudderStack RudderStack provides data pipelines that make it easy to collect data from every application, website, and SaaS platform, then activate it in your warehouse and business tools. Pipelines for data in motion can quickly turn into DAG hell.
This demonstrates how in-demand Microsoft Certified DataEngineers are becoming. They are moving their servers and on-premises data to Azure Cloud. What does all of this mean for DataEngineering professionals? Who is an Azure DataEngineer? Azure DataEngineers work with these and other solutions.
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content