5 Free Courses to Master Data Engineering
KDnuggets
NOVEMBER 30, 2023
Data engineers must prepare and manage the infrastructure and tools necessary for the whole data workflow in a data-driven company.
This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
KDnuggets
NOVEMBER 30, 2023
Data engineers must prepare and manage the infrastructure and tools necessary for the whole data workflow in a data-driven company.
Data Engineering Podcast
MAY 18, 2024
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is supported by Code Comments, an original podcast from Red Hat. Data lakes are notoriously complex. My thanks to the team at Code Comments for their support.
Data Engineering Podcast
APRIL 14, 2024
The default association with the term "database" is relational engines, but non-relational engines are also used quite widely. In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database.
Data Engineering Podcast
FEBRUARY 18, 2024
In this episode Dain Sundstrom, CTO of Starburst, explains how the combination of the Trino query engine and the Iceberg table format offer the ease of use and execution speed of data warehouses with the infinite storage and scalability of data lakes. Data lakes are notoriously complex. Your first 30 days are free!
KDnuggets
DECEMBER 6, 2023
This week on KDnuggets: Discover GitHub repositories from machine learning courses, bootcamps, books, tools, interview questions, cheat sheets, MLOps platforms, and more to master ML and secure your dream job • Data engineers must prepare and manage the infrastructure and tools necessary for the whole data workflow in a data-driven company • And much, (..)
Snowflake
APRIL 17, 2024
This traditional SQL-centric approach often challenged data engineers working in a Python environment, requiring context-switching and limiting the full potential of Python’s rich libraries and frameworks. To get started, explore the comprehensive API documentation , which will guide you through every step.
Data Engineering Podcast
JANUARY 30, 2022
Summary Pandas is a powerful tool for cleaning, transforming, manipulating, or enriching data, among many other potential uses. As a result it has become a standard tool for data engineers for a wide range of applications. What are the main tasks that you have seen Pandas used for in a data engineering context?
Data Engineering Podcast
JUNE 30, 2024
Petr shares his journey from being an engineer to founding Synq, emphasizing the importance of treating data systems with the same rigor as engineering systems. He discusses the challenges and solutions in data reliability, including the need for transparency and ownership in data systems.
Data Engineering Podcast
DECEMBER 24, 2023
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.
Data Engineering Weekly
NOVEMBER 24, 2024
Editor’s Note: Launching Data & Gen-AI courses in 2025 I can’t believe DEW will reach almost its 200th edition soon. What I started as a fun hobby has become one of the top-rated newsletters in the data engineering industry. We are planning many exciting product lines to trial and launch in 2025.
Data Engineering Podcast
APRIL 7, 2024
Summary Maintaining a single source of truth for your data is the biggest challenge in data engineering. Different roles and tasks in the business need their own ways to access and analyze the data in the organization. Data lakes are notoriously complex. Your first 30 days are free!
Data Engineering Podcast
NOVEMBER 12, 2023
Generative AI has accelerated the ability of developer tools to provide useful suggestions that speed up the work of engineers. Tabnine is one of the main platforms offering an AI powered assistant for software engineers. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold Data lakes are notoriously complex.
Data Engineering Podcast
APRIL 21, 2024
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
ThoughtSpot
SEPTEMBER 6, 2023
The pathway from ETL to actionable analytics can often feel disconnected and cumbersome, leading to frustration for data teams and long wait times for business users. And even when we manage to streamline the data workflow, those insights aren’t always accessible to users unfamiliar with antiquated business intelligence tools.
Data Engineering Podcast
AUGUST 28, 2022
Summary The dream of every engineer is to automate all of their tasks. For data engineers, this is a monumental undertaking. Orchestration engines are one step in that direction, but they are not a complete solution. The only thing worse than having bad data is not knowing that you have it.
Data Engineering Podcast
JUNE 23, 2024
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. What are the elements of Fabric that were engineered specifically for the service? What are the benefits of embedding Copilot into the data engine?
Data Engineering Podcast
FEBRUARY 4, 2024
RisingWave is a database engine that was created specifically for stream processing, with S3 as the storage layer. In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable.
Data Engineering Weekly
NOVEMBER 3, 2024
The blog emphasizes the importance of starting with a clear client focus to avoid over-engineering and ensure user-centric development. Sampling is an obvious strategy for data size, but the layered approach and dynamic inclusion of dependencies are some key techniques I learned with the case study.
Data Engineering Podcast
MARCH 17, 2024
Summary A significant portion of data workflows involve storing and processing information in database engines. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data. Your first 30 days are free!
Data Engineering Podcast
SEPTEMBER 17, 2023
Summary A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. With Materialize, you can! When is JSON-LD the wrong choice?
Data Engineering Podcast
FEBRUARY 25, 2024
Summary Building a database engine requires a substantial amount of engineering effort and time investment. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process. Data lakes are notoriously complex.
Data Engineering Podcast
MARCH 24, 2024
In this episode Pete Hunt, CEO of Dagster labs, outlines these new capabilities, how they reduce the burden on data teams, and the increased collaboration that they enable across teams and business units. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started. Your first 30 days are free!
Data Engineering Podcast
JUNE 16, 2024
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Powered by Trino, the query engine Apache Iceberg was designed for, Starburst is an open platform with support for all table formats including Apache Iceberg, Hive, and Delta Lake.
Data Engineering Podcast
MARCH 3, 2024
Colleen Tartow has worked across all stages of the data lifecycle, and in this episode she shares her hard-earned wisdom about how to conduct an AI program for your organization. Data lakes are notoriously complex. Visit [dataengineeringpodcast.com/data-council]([link] and use code **dataengpod20** to register today!
Data Engineering Podcast
MAY 12, 2024
Summary Building a data platform is a substrantial engineering endeavor. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is supported by Code Comments, an original podcast from Red Hat. Data lakes are notoriously complex.
Data Engineering Podcast
JUNE 9, 2024
Summary Streaming data processing enables new categories of data products and analytics. Unfortunately, reasoning about stream processing engines is complex and lacks sufficient tooling. Data lakes are notoriously complex. Unfortunately, reasoning about stream processing engines is complex and lacks sufficient tooling.
Data Engineering Podcast
JANUARY 7, 2024
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. What are the open questions today in technical scalability of data engines? What are the open questions today in technical scalability of data engines?
Data Engineering Podcast
MARCH 31, 2024
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Datafold has recently launched data replication testing, providing ongoing validation for source-to-target replication. Your first 30 days are free!
Data Engineering Podcast
JANUARY 21, 2024
In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern customer data platform (CDP).
Data Engineering Podcast
JUNE 2, 2024
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. As someone who listens to the Data Engineering Podcast, you know that the road from tool selection to production readiness is anything but smooth or straight.
Data Engineering Weekly
SEPTEMBER 29, 2024
Airbnb writes about Sandcastle, an Airbnb-internal prototyping platform that enables data scientists, engineers, and product managers to bring data/AI ideas to life. link] Grab: Enabling conversational data discovery with LLMs at Grab.
Data Engineering Podcast
JANUARY 28, 2024
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. SIEM) A query engine is useless without data to analyze. What are the data acquisition paths/sources that you are designed to work with?-
Data Engineering Podcast
APRIL 28, 2024
In this episode he explains the data collection and preparation process, the collection of model types and sizes that work together to power the experience, and how to incorporate it into your workflow to act as a second brain. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
Data Engineering Podcast
FEBRUARY 11, 2024
In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process. Support Data Engineering Podcast Summary Sharing data is a simple concept, but complicated to implement well.
Data Engineering Podcast
MARCH 10, 2024
Summary Data lakehouse architectures are gaining popularity due to the flexibility and cost effectiveness that they offer. The link that bridges the gap between data lake and warehouse capabilities is the catalog. Data lakes are notoriously complex. What is involved in integrating Nessie into a given data stack?
Data Engineering Podcast
MAY 5, 2024
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. Data lakes are notoriously complex. Go to dataengineeringpodcast.com/dagster today to get started.
Data Engineering Podcast
NOVEMBER 26, 2023
In this episode Tobias Macey shares his thoughts on the challenges that he is facing as he prepares to build the next set of architectural layers for his data platform to enable a larger audience to start accessing the data being managed by his team. Data lakes are notoriously complex. With Materialize, you can!
Data Engineering Podcast
DECEMBER 3, 2023
Andrei Tserakhau has dedicated his careeer to this problem, and in this episode he shares the lessons that he has learned and the work he is doing on his most recent data transfer system at DoubleCloud. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles.
Data Engineering Weekly
DECEMBER 25, 2023
Welcome to another insightful edition of Data Engineering Weekly. As we approach the end of 2023, it's an opportune time to reflect on the key trends and developments that have shaped the field of data engineering this year. In conclusion, 2023 has been a year of significant developments and shifts in data engineering.
Data Engineering Podcast
OCTOBER 22, 2021
One of the driving forces for that change has been the rise of analytics engineering powered by dbt. What have been the most challenging engineering problems that you have dealt with? One of the driving forces for that change has been the rise of analytics engineering powered by dbt.
Data Engineering Podcast
NOVEMBER 19, 2023
Summary The dbt project has become overwhelmingly popular across analytics and data engineering teams. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data projects are notoriously complex. Data lakes are notoriously complex.
Data Engineering Podcast
NOVEMBER 5, 2023
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.
Meltano
OCTOBER 5, 2022
Interested in becoming a data engineer? The need for data experts in the U.S. job market is expected to grow by 22% in this decade, and according to LinkedIn’s 2020 report , a data engineer is listed as the 8th fastest growing job today. But what is data engineering exactly and what does a data engineer do?
Data Engineering Weekly
OCTOBER 30, 2022
Data Engineering Weekly Is Brought to You by RudderStack RudderStack provides data pipelines that make it easy to collect data from every application, website, and SaaS platform, then activate it in your warehouse and business tools. The highlights are that 59% of folks think data catalogs are sometimes helpful.
Expert insights. Personalized for you.
We have resent the email to
Are you sure you want to cancel your subscriptions?
Let's personalize your content