Sat.May 04, 2019 - Fri.May 10, 2019

article thumbnail

Apache Kafka Data Access Semantics: Consumers and Membership

Confluent

Every developer who uses Apache Kafka ® has used a Kafka consumer at least once. Although it is the simplest way to subscribe to and access events from Kafka, behind the scenes, Kafka consumers handle tricky distributed systems challenges like data consistency, failover and load balancing. Luckily, Kafka’s consuming model is quite easy to understand.

Kafka 111
article thumbnail

Using FoundationDB As The Bedrock For Your Distributed Systems

Data Engineering Podcast

Summary The database market continues to expand, offering systems that are suited to virtually every use case. But what happens if you need something customized to your application? FoundationDB is a distributed key-value store that provides the primitives that you need to build a custom database platform. In this episode Ryan Worl explains how it is architected, how to use it for your applications, and provides examples of system design patterns that can be built on top of it.

Systems 100
article thumbnail

8 Places to Visit in Denver While Attending Teradata Universe 2019

Teradata

Heading to Teradata Universe 2019? Camille Schmidt lists the "8 Places to Visit in Denver" while attending the flagship conference.

81
article thumbnail

A 5D model to assess your IoT readiness

Cloudera

The number one challenge that enterprises struggle with their IoT implementation is not being able to measure if they are successful or not with it. Most of the enterprises start an IoT initiative without assessing their potential prior hand to be able to complete it. Even if they complete it, they lack the ability to identify and correlate the success metrics with key business goals.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Journey to Event Driven – Part 4: Four Pillars of Event Streaming Microservices

Confluent

So far in this series, we have recognized that by going back to first principles, we have a new foundation to work with. Event-first thinking enables us to build a new atomic unit: the event. Storing events in a stream and connecting streams via stream processors provide a generic, data-centric, distributed application runtime that you can use to build ETL, event streaming applications, applications for recording metrics and anything else that has a real-time data requirement.

Kafka 94
article thumbnail

Back-Pressure Strategy for a Sharded Akka Cluster

Zalando Engineering

AWS SQS polling from sharded Akka Cluster running on Kubernetes NOTE: This blog post requires the reader to have prior knowledge of AWS SQS , Akka Actors and Akka Cluster Sharding. My last post introduced Akka Cluster Sharding as a Distributed Cache running on Kubernetes. As that Proof-of-concept(PoC) proved promising, we started building a high-throughput and low-latency system based on the gained experiences and learnings.

AWS 52

More Trending

article thumbnail

OCR Algorithm: Improve and Automate Business Processes

InData Labs

Businesses of mid and large scale have massive amounts of printed documents in daily use. Among them are invoices, receipts, corporate documents, reports, media releases. And millions of them can be handwritten, which makes documents understandable for humans but difficult to read for machines. Basic Concept of OCR Optical character recognition (OCR) algorithms allow computers.

article thumbnail

Dawn of Kafka DevOps: Managing Multi-Cluster Kafka Connect and KSQL with Confluent Control Center

Confluent

In anything but the smallest deployment of Apache Kafka ® , there are often going to be multiple clusters of Kafka Connect and KSQL. Kafka Connect is used for building event streaming data pipelines between upstream and downstream systems with Kafka, and KSQL is used for building stream processing applications declared in a SQL-like language. People will have multiple clusters of these for various reasons, including: Resource isolation.

Kafka 89