This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookie Settings
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Strictly Necessary: Used for the proper function of the website
Performance/Analytics: Used for monitoring website traffic and interactions
Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles. Can you describe your experiences with Kafka?
It addresses many of Kafka's challenges in analytical infrastructure. The combination of Kafka and Flink is not a perfect fit for real-time analytics; the integration of Kafka and Lakehouse is very shallow. How do you compare Fluss with Apache Kafka? Fluss and Kafka differ fundamentally in design principles.
In this episode DeVaris Brown discusses the types of applications that are possible when teams don't have to manage the complex infrastructure necessary to support continuous data flows. Can you describe what Meroxa is and the story behind it? How have the focus and goals of the platform and company evolved over the past 2 years?
The ksqlDB project was created to address this state of affairs by building a unified layer on top of the Kafka ecosystem for stream processing. Developers can work with the SQL constructs that they are familiar with while automatically getting the durability and reliability that Kafka offers. How is ksqlDB architected?
Scaling the volume of events that can be processed in real-time can be challenging, so Paul Brebner from Instaclustr set out to see how far he could push Kafka and Cassandra for this use case. Can you start by describing the problem that you were trying to solve and the requirements that you were aiming for?
Summary The landscape of datamanagement and processing is rapidly changing and evolving. This is a useful conversation to gain a macro perspective on where businesses are looking to improve their capabilities to work with data. If you hand a book to a new data engineer, what wisdom would you add to it?
Ingest data more efficiently and manage costs For datamanaged by Snowflake, we are introducing features that help you access data easily and cost-effectively. This reduces the overall complexity of getting streaming data ready to use: Simply create external access integration with your existing Kafka solution.
IBM and Cloudera’s common goal is to accelerate data-driven decision making for enterprise customers, working on defining and executing the best solution for each customer. You can now elevate your data potential and activate AI’s capabilities through the synergic integration between IBM watsonx and Cloudera.
Understand how BigQuery inserts, deletes and updates — Once again Vu took time to deep dive into BigQuery internal, this time to explain how datamanagement is done. Pandera, a data validation library for dataframes, now supports Polars. Unlocking Kafka's potential: tackling tail latency with eBPF.
In this episode David Yaffe and Johnny Graettinger share the story behind the business and technology and how you can start using it today to build a real-time data lake without all of the headache. Can you describe what Estuary is and the story behind it? Stream processing technologies have been around for around a decade.
In this episode Yingjun Wu explains how it is architected to power analytical workflows on continuous data flows, and the challenges of making it responsive and scalable. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementData lakes are notoriously complex.
The DataFlow platform has established a leading position in the data streaming market by unlocking the combined value and synergies of Apache NiFi, Apache Kafka and Apache Flink. We recently delivered all three of these streaming capabilities as cloud services through Cloudera Data Platform (CDP) Data Hub on AWS and Azure.
link] Meta: How Meta understands data at scale Meta describes its datamanagement practices as adopting a “shift-left” approach, integrating data schematization and annotations early in product development. TIL about the idle stream problem.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team.
Summary One of the critical components for modern data infrastructure is a scalable and reliable messaging system. Publish-subscribe systems have been popular for many years, and recently stream oriented systems such as Kafka have been rising in prominence.
Summary One of the critical components for modern data infrastructure is a scalable and reliable messaging system. Publish-subscribe systems have been popular for many years, and recently stream oriented systems such as Kafka have been rising in prominence.
Summary Kafka has become a de facto standard interface for building decoupled systems and working with streaming data. To make the benefits of the Kafka ecosystem more accessible and reduce the operational burden, Alexander Gallego and his team at Vectorized created the Red Panda engine.
In light of this, we’ll share an emerging machine-to-machine (M2M) architecture pattern in which MQTT, Apache Kafka ® , and Scylla all work together to provide an end-to-end IoT solution. The explosive number of devices generating, tracking and sharing data across a variety of networks is overwhelming to most datamanagement solutions.
When you are processing that data in multiple systems it can be difficult to ensure that they all have an accurate representation of that schema, which is why Confluent has built a schema registry that plugs into Kafka. Conversely, what would be involved in using a storage backend other than Kafka?
Pulsar is a well engineered and robust platform for building the core of any system that relies on durable access to easily scalable streams of data. What is Pulsar’s role in the lifecycle of data and where does it fit in the overall ecosystem of data tools? Why is streaming data such an important capability?
Observability and governance of streaming data requires a different approach than batch oriented workflows, and this episode does an excellent job of outlining the complexities involved and how to address them. If you hand a book to a new data engineer, what wisdom would you add to it? Redis and Pulsar)?
The concept of the data mesh architecture is not entirely new; Its conceptual origins are rooted in the microservices architecture, its design principles (i.e., need to integrate multiple “point solutions” used in a data ecosystem) and organization reasons (e.g., The Value Proposition of CDF in Data Mesh Implementations.
If you have ever struggled with implementing your own change data capture pipeline, or understanding when it would be useful then this episode is for you. You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern datamanagement.
It lets you describe data more complexly and make predictions. AI-powered data engineering solutions make it easier to streamline the datamanagement process, which helps businesses find useful insights with little to no manual work. This will help make better analytics predictions and improve datamanagement.
In order to enable connected manufacturing and emerging IoT use cases, ECC needs a solution that can handle all types of diverse data structures and schemas from the edge, normalize the data, and then share it with any type of data consumer including Big Data applications. . STEP 4: Capture data from Apache Kafka streams.
Druid Data Ingestion Our pipeline for the two methods of ingesting data into Druid—the upper process is for batch ingestion, the lower process is for real-time ingestion. Then, they needed to define an ingestion specification which tells Druid how to process the data being ingested. This was our main form of ingestion.
In this episode ThreatStack’s director of operations, Pete Cheslock, and senior infrastructure security engineer, Patrick Cable, discuss the data infrastructure that supports their platform, how they capture and process the data from client systems, and how that information can be used to keep your systems safe from attackers.
Preamble Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. What is Alooma and what is the origin story? How is the Alooma platform architected?
Preamble Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. What are some of the primary ways that Flink is used?
Preamble Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out Linode.
They also explained how it fits in the broad landscape of data tools, the interesting and challenging aspects of the project, and how to build new extensions. Where does it sit in the broader landscape of data tools? How do you manage versioning and backup of data flows, as well as promoting them between environments?
He also discusses what you need to know to get it deployed and keep it running in a production environment and how it fits into the overall data ecosystem. How does it compare to some of the other streaming frameworks such as Flink, Kafka, or Storm? Can you start by explaining what Spark is? Who uses Spark? Who uses Spark?
Preamble Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out Linode. How do you represent a stream on-disk?
This is a fascinating conversation with a lot of useful lessons for anyone who wants to understand the operational aspects of Pulsar and the benefits that it can provide to data workloads. How do you handle migrating existing projects, particularly if they are using Kafka currently?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagementData lakes are notoriously complex. Can you start by sharing some of your experiences with data migration projects? Can you start by sharing some of your experiences with data migration projects?
Interview Introduction How did you first get involved in the area of datamanagement? Regulatory challenges of processing other people’s data What does your data pipelining architecture look like? What are the most challenging aspects of building a general purpose datamanagement environment?
In this episode Vinoth shares the history of the project, how its architecture allows for building more frequently updated analytical queries, and the work being done to add a more polished experience to the data lake paradigm. Interview Introduction How did you get involved in the area of datamanagement?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.
Learn how Confluent’s cloud-native Apache Kafka engine stood out from other datamanagement systems with its uniquely elastic, reliable, and cost-efficient design
In this episode he describes how Presto is architected, how you can use it for your analytics, and the work that he is doing at Starburst Data. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform.
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.
In this episode Yoni Iny, CTO of Upsolver, discusses the various components that are necessary for a successful data lake project, how the Upsolver platform is architected, and how modern data lakes can benefit your organization. There are a lot of opinions on both sides of the data lake argument.
In this episode founders DeVaris Brown and Ali Hamidi explain how their tenure at Heroku informed their approach to making data integration self service, how the platform is architected, and how they have designed their system to adapt to the continued evolution of the data ecosystem. What do you have planned for the future?
Announcements Hello and welcome to the Data Engineering Podcast, the show about modern datamanagement When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.
We organize all of the trending information in your field so you don't have to. Join 37,000+ users and stay up to date on the latest articles your peers are reading.
You know about us, now we want to get to know you!
Let's personalize your content
Let's get even more personalized
We recognize your account from another site in our network, please click 'Send Email' below to continue with verifying your account and setting a password.
Let's personalize your content