Sat.May 09, 2020 - Fri.May 15, 2020

article thumbnail

Change Data Capture Using Debezium Kafka and Pg

Start Data Engineering

Change data capture is a software design pattern used to capture changes to data and take corresponding action based on that change. The change to data is usually one of read, update or delete. The corresponding action usually is supposed to occur in another system in response to the change that was made in the source system.

Kafka 246
article thumbnail

Apache Kafka Needs No Keeper: Removing the Apache ZooKeeper Dependency

Confluent

Currently, Apache Kafka® uses Apache ZooKeeper™ to store its metadata. Data such as the location of partitions and the configuration of topics are stored outside of Kafka itself, in a […].

Kafka 145
article thumbnail

Drafting Your Data Pipelines

Team Data Science

With careful consideration and learning about your market, the choices you need to make become narrower and more clear. I can now begin drafting my data ingestion/ streaming pipeline without being overwhelmed. For A Quick Recap You can find the first blog post here, where I learned which tech is most in demand in Toronto: [link] And the second blog post is here where I learn which Toronto industries need data engineers the most: [link] The Pipeline Proposal I'll be creating several pipelines in

article thumbnail

COVID-19: The Perfect Storm

Teradata

The COVID-19 pandemic has brought with it a Perfect Storm of disruption that impacts all of us -- from our health to the economy to the supply chain. Read more.

IT 105
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

StreamNative Brings Streaming Data To The Cloud Native Landscape With Pulsar

Data Engineering Podcast

Summary There have been several generations of platforms for managing streaming data, each with their own strengths and weaknesses, and different areas of focus. Pulsar is one of the recent entrants which has quickly gained adoption and an impressive set of capabilities. In this episode Sijie Guo discusses his motivations for spending so much of his time and energy on contributing to the project and growing the community.

Cloud 100
article thumbnail

Building a Telegram Bot Powered by Apache Kafka and ksqlDB

Confluent

Imagine you’ve got a stream of data; it’s not “big data,” but it’s certainly a lot. Within the data, you’ve got some bits you’re interested in, and of those bits, […].

Kafka 141

More Trending

article thumbnail

How China is Using Advanced Analytics During the COVID-19 Pandemic

Teradata

Learn how advanced analytics are being used in China amidst the COVID-19 pandemic to help combat the spread of coronavirus now and in the future.

59
article thumbnail

Getting Started - Installing Apache Superset

Preset

Setting up Apache Superset for the first time can be difficult. Here's an easy way to accomplish the task!

40
article thumbnail

From Eager to Smarter in Apache Kafka Consumer Rebalances

Confluent

Everyone wants their infrastructure to be highly available, and ksqlDB is no different. But crucial properties like high availability don’t come without a thoughtful, rigorous design. We thought hard about […].

Kafka 111
article thumbnail

How Akka Typed Incentivizes Writing Good Code

Rock the JVM

Explore how Akka Typed integrates good practices directly into the API

Coding 52
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Emulate Your Heroes with Data… and Vantage on AWS

Teradata

Teradata's top-ranked analytic software capabilities that market-leading companies have been using for years is available right now on Amazon Web Services (AWS). Learn more.

AWS 52
article thumbnail

Announcing ksqlDB 0.9.0

Confluent

We’re pleased to announce the release of ksqlDB 0.9.0! This version includes support for multi-join statements, enhanced LIKE expressions, and a host of usability improvements. We’ll go through a few […].

Process 96