Sat.Feb 29, 2020 - Fri.Mar 06, 2020

article thumbnail

Kafka Connect Elasticsearch Connector in Action

Confluent

The Elasticsearch sink connector helps you integrate Apache Kafka® and Elasticsearch with minimum effort. You can take data you’ve stored in Kafka and stream it into Elasticsearch to then be […].

Kafka 121
article thumbnail

Analyzing GDPR Fines – who are largest violators?

KDnuggets

Fines from the GDPR have been rolling in since its inception in 2018. This article investigates who are the largest penalty recipients by country, the amounts, and private individuals.

IT 114
article thumbnail

Easier Stream Processing On Kafka With ksqlDB

Data Engineering Podcast

Summary Building applications on top of unbounded event streams is a complex endeavor, requiring careful integration of multiple disparate systems that were engineered in isolation. The ksqlDB project was created to address this state of affairs by building a unified layer on top of the Kafka ecosystem for stream processing. Developers can work with the SQL constructs that they are familiar with while automatically getting the durability and reliability that Kafka offers.

Kafka 100
article thumbnail

How Netflix uses Druid for Real-time Insights to Ensure a High-Quality Experience

Netflix Tech

By Ben Sykes Continue reading on Netflix TechBlog ».

Kafka 98
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Mock APIs vs. Real Backends – Getting the Best of Both Worlds

Confluent

When building API-driven web applications, there is one key metric that engineering teams should minimize: the blocked factor. The blocked factor measures how much time developers spend in the following […].

article thumbnail

Top February Stories: The Death of Data Scientists – will AutoML replace them?

KDnuggets

Also: Learning from 3 big Data Science career mistakes; Leaders, Changes, and Trends in Gartner 2020 MQ Data Science and Machine Learning Platforms; Why Did I Reject a Data Scientist Job; Free Mathematics Courses for Data Science & Machine Learning.

More Trending

article thumbnail

Introducing Dispatch

Netflix Tech

By Kevin Glisson, Marc Vilanova, Forest Monsen Netflix is pleased to announce the open-source release of our crisis management orchestration framework: Dispatch! Okay, but what is Dispatch? Put simply, Dispatch is: All of the ad-hoc things you’re doing to manage incidents today, done for you, and a bunch of other things you should’ve been doing, but have not had the time!

article thumbnail

On Spark, Hive, and Small Files: An In-Depth Look at Spark Partitioning Strategies

Airbnb Tech

One of the most common ways to store results from a Spark job is by writing the results to a Hive table stored on HDFS. While in theory, managing the output file count from your jobs should be simple, in reality, it can be one of the more complex parts of your pipeline. Author : Zachary Ennenga Airbnb’s new office building, 650 Townsend Background At Airbnb, our offline data processing ecosystem contains many mission-critical, time-sensitive jobs — it is essential for us to maximize the stabilit

article thumbnail

Best Practices for Analyzing Kafka Event Streams

Rockset

Apache Kafka has seen broad adoption as the streaming platform of choice for building applications that react to streams of data in real time. In many organizations, Kafka is the foundational platform for real-time event analytics, acting as a central location for collecting event data and making it available in real time. While Kafka has become the standard for event streaming, we often need to analyze and build useful applications on Kafka data to unlock the most value from event streams.

Kafka 40
article thumbnail

How to Repurpose Successful Database Techniques inside Teradata Vantage

Teradata

Learn how Teradata's hashing algorithm is used to enhance the performance and ease-of-use of the Advanced SQL Engine.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Open-Sourcing riskquant, a library for quantifying risk

Netflix Tech

Netflix has a program in our Information Security department for quantifying the risk of deliberate (attacker-driven) and accidental… Continue reading on Netflix TechBlog ».