Top Data Engineering Digest Kafka Process Content for May, 2020

May, 2020

Change Data Capture Using Debezium Kafka and Pg

Start Data Engineering

MAY 9, 2020

Change data capture is a software design pattern used to capture changes to data and take corresponding action based on that change. The change to data is usually one of read, update or delete. The corresponding action usually is supposed to occur in another system in response to the change that was made in the source system.

Kafka

Kafka Data Designing Systems

Apache Kafka Needs No Keeper: Removing the Apache ZooKeeper Dependency

Confluent

MAY 15, 2020

Currently, Apache Kafka® uses Apache ZooKeeper™ to store its metadata. Data such as the location of partitions and the configuration of topics are stored outside of Kafka itself, in a […].

Kafka

Kafka Metadata IT Project

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Trending Sources

Tips on Data Science Masters in Germany

Team Data Science

MAY 26, 2020

Should you do a masters degree in data science in Germany? Why not, but keep the following in mind! In general, it is very, very practical in Germany because it doesn't cost a lot of money to study. Not like for example in the USA or something like that. So if you are interested in it, you should first think about what the corresponding Master's programme is about.

Data Science

Data Science Computer Science Data Data Engineer

Webinars

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

COVID-19: Risk Analytics for Building an Early Warning System

Teradata

MAY 5, 2020

Advanced analytics & AI techniques can help in curtailing the COVID-19 pandemic. This post describes an analytics prototype to build an early warning system for COVID-19.

Systems

Systems Building

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Speaker: Jason Chester, Director, Product Management

In today’s manufacturing landscape, staying competitive means moving beyond reactive quality checks and toward real-time, data-driven process control. But what does true manufacturing process optimization look like—and why is it more urgent now than ever? Join Jason Chester in this new, thought-provoking session on how modern manufacturers are rethinking quality operations from the ground up.

Manufacturing

Mapping The Customer Journey For B2B Companies At Dreamdata

Data Engineering Podcast

MAY 25, 2020

Summary Gaining a complete view of the customer journey is especially difficult in B2B companies. This is due to the number of different individuals involved and the myriad ways that they interface with the business. Dreamdata integrates data from the multitude of platforms that are used by these organizations so that they can get a comprehensive view of their customer lifecycle.

Machine Learning

Machine Learning Portfolio Deep Learning Data Engineer

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

Netflix Tech

MAY 26, 2020

How Netflix is able to enrich VPC Flow Logs at Hyper Scale to provide Network Insight By Hariharan Ananthakrishnan and Angela Ho The Cloud Network Infrastructure that Netflix utilizes today is a large distributed ecosystem that consists of specialized functional tiers and services such as DirectConnect, VPC Peering, Transit Gateways, NAT Gateways, etc.

Bytes

Bytes AWS Metadata Cloud

Data Engineering Project for Beginners - Batch edition

Start Data Engineering

MAY 23, 2020

Introduction Approach Project overview Engineering Design Airflow Primer: Setup Code and explanation Stage 1. pg -> file -> s3 Stage 2. file -> s3 -> EMR -> s3 Stage 3. movie_review_stage, user_purchase_stage -> Redshift table -> quality Check data Monitoring ETL Design Review Common Scenarios Next Steps Conclusion Introduction Starting out in data engineering can be a little intimidating, especially because data engineering involves a lot of moving parts.

Data Engineer

Data Engineer Data Engineering Project Engineering

More Trending

Data Engineering Project for Beginners - Batch edition

Start Data Engineering

MAY 23, 2020

Data Engineer

Data Engineer Data Engineering Project Engineering

Building a Telegram Bot Powered by Apache Kafka and ksqlDB

Confluent

MAY 12, 2020

Imagine you’ve got a stream of data; it’s not “big data,” but it’s certainly a lot. Within the data, you’ve got some bits you’re interested in, and of those bits, […].

Kafka

Kafka Building Big Data MongoDB

Jupyter Notebooks or Standalone Scripts?

Team Data Science

MAY 25, 2020

Lot's of people like notebooks and so do I. Jupyter Notebooks for instance, are great to quickly explore some data or try something out. If you want to bring code into production however, you should or most likely, have to write standalone scripts. If you want to create something for production and then do it in production, Jupiter notebooks are not ideal.

Coding

Coding Data Engineer Data Engineering Engineering

How to Balance Efficiency and Risk in Your Supply Chain

Teradata

MAY 25, 2020

Supply Chain organizations need visibility now to leverage data for making decisions and taking action, both in times of crisis and in relative stability.

Data

Power Up Your PostgreSQL Analytics With Swarm64

Data Engineering Podcast

MAY 18, 2020

Summary The PostgreSQL database is massively popular due to its flexibility and extensive ecosystem of extensions, but it is still not the first choice for high performance analytics. Swarm64 aims to change that by adding support for advanced hardware capabilities like FPGAs and optimized usage of modern SSDs. In this episode CEO and co-founder Thomas Richter discusses his motivation for creating an extension to optimize Postgres hardware usage, the benefits of running your analytics on the same

PostgreSQL

PostgreSQL Database Data Warehouse Machine Learning

Airflow Best Practices for ETL/ELT Pipelines

Speaker: Kenten Danas, Senior Manager, Developer Relations

ETL and ELT are some of the most common data engineering use cases, but can come with challenges like scaling, connectivity to other systems, and dynamically adapting to changing data sources. Airflow is specifically designed for moving and transforming data in ETL/ELT pipelines, and new features in Airflow 3.0 like assets, backfills, and event-driven scheduling make orchestrating ETL/ELT pipelines easier than ever!

Data Engineering

An Introduction to Monads in Scala

Rock the JVM

MAY 31, 2020

A Scala tutorial on Monads that starts with practical needs and builds up from scratch: derive the monad patterns (laws) with no assumptions

Scala

Scala Building

What Does It Mean for a Column to Be Indexed

Start Data Engineering

MAY 1, 2020

When optimizing queries on a database table, most developers tend to just create an index on the field to be queried.

IT Database

Project Metamorphosis Part 1: Elastic Apache Kafka Clusters in Confluent Cloud

Confluent

MAY 6, 2020

A few weeks ago when we talked about our new fundraising, we also announced we’d be kicking off Project Metamorphosis. What is Project Metamorphosis? Let me try to explain. I […].

Project

Project Kafka Cloud

How to develop Spark applications with Zeppelin notebooks

Team Data Science

MAY 23, 2020

I love working with Zeppelin notebooks. Its so simple and you can just try something out. Especially working with dataframes and SparkSQL is a blast. What is a Zeppelin? A Zeppelin is a tool, a notebook tool, just like Jupiter. You can run it on a server and you can run it on your Hadoop cluster or whatever. And it can run Spark jobs in the background.

Hadoop

Hadoop Data Engineer Data Engineering Coding

Whats New in Apache Airflow 3.0 –– And How Will It Reshape Your Data Workflows?

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data Workflow

Introducing Teradata’s Incoming CEO Steve McMillan

Teradata

MAY 6, 2020

Teradata's Board of Directors has selected the company's next President and Chief Executive Officer: Steve McMillan. Read more from interim President and CEO, Vic Lund.

StreamNative Brings Streaming Data To The Cloud Native Landscape With Pulsar

Data Engineering Podcast

MAY 11, 2020

Summary There have been several generations of platforms for managing streaming data, each with their own strengths and weaknesses, and different areas of focus. Pulsar is one of the recent entrants which has quickly gained adoption and an impressive set of capabilities. In this episode Sijie Guo discusses his motivations for spending so much of his time and energy on contributing to the project and growing the community.

Cloud

Cloud Lambda Architecture Kafka Hadoop

Pull the Data you Actually Want

Grouparoo

MAY 21, 2020

There’s an underlying pattern prevalent today in many digital marketing tools that is causing problems. Wasted time, overpaying, slow velocity, and privacy issues for your customers are some of the results of this pattern. The problem is the over-reliance on Events. Specifically, the problem is that many marketing tools live in a world where they expect to be “pushed” data, when it would be so much better if they were “pulling” data when they needed it.

Data

Data Database Data Warehouse Building

Thank You

Start Data Engineering

MAY 30, 2020

Thank you for contacting us. We will get back to you shortly.

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Learning All About Wi-Fi Data with Apache Kafka and Friends

Confluent

MAY 27, 2020

Recently, I’ve been looking at what’s possible with streams of Wi-Fi packet capture (pcap) data. I was prompted after initially setting up my Raspberry Pi to capture pcap data and […].

Kafka

Kafka Data Aggregated Data Process

Build a Full Big Data Platform Right Away?

Team Data Science

MAY 20, 2020

Should companies go full blowing big data/data science platform right away? In my opinion, you should first look at the different stages you are in. Are you in the Proof-of-Concept phase, where you are just working with offline data, where you are proving your concepts? Or are you in the MVP phase or in the creation of an MVP, where you are bringing in the first users, the first customers?

Big Data

Big Data Building AWS Kafka

How to Operationalize Enterprise Analytics in the Telco Industry

Teradata

MAY 21, 2020

Operationalizing world class analytics into day-to-day processes can help solve some of the greatest challenges in the telecommunications industry. Find out more.

Telecommunication

Telecommunication Process

Enterprise Data Operations And Orchestration At Infoworks

Data Engineering Podcast

MAY 4, 2020

Summary Data management is hard at any scale, but working in the context of an enterprise organization adds even greater complexity. Infoworks is a platform built to provide a unified set of tooling for managing the full lifecycle of data in large businesses. By reducing the barrier to entry with a graphical interface for defining data transformations and analysis, it makes it easier to bring the domain experts into the process.

Hadoop

Hadoop Data Pipeline Big Data Data

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

Cloud

New Course: NumPy for Data Engineers

Dataquest

MAY 21, 2020

Python programming is a critical skill for data engineers. When it comes to working with data, there’s a powerful library that can increase your code’s efficiency dramatically, especially when you’re working with large datasets: NumPy. That’s why we’ve added a NumPy for Data Engineers course to our Data Engineering path !

Data Engineer

Data Engineer Data Engineering Engineering Python

5 Fun Code Expressiveness Tricks in Scala

Rock the JVM

MAY 20, 2020

Scala is an incredibly expressive language: discover hidden features that even experienced developers might miss!

Scala

Scala Coding

Building a Clickstream Dashboard Application with ksqlDB and Elasticsearch

Confluent

MAY 26, 2020

Using a powerful, event-driven application can help you unlock insights contained in the event streams of your business. Before we get into the technology, let’s go over some questions you […].

Building

Building Technology Kafka Process

Job Opportunities For Data Science Proof Of Concepts and MVPs

Team Data Science

MAY 20, 2020

What are the job opportunities in the field of Data Science? Several, of course! Based on the 4 phases of a Data Science project, the possibilities can be worked out well. In this blog post only two of the four phases will be discussed. But now from the beginning. The four phases are: Proof-of-Concept, MVP, Validation and Scaling. The Proof of Concept Phase (PoC) Starting at the PoC phase, you could say: okay, I'm getting a research data scientist here.

Data Science

Data Science Algorithm Data Engineer Data Engineering

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

Business Intelligence

COVID-19: The Perfect Storm

Teradata

MAY 13, 2020

The COVID-19 pandemic has brought with it a Perfect Storm of disruption that impacts all of us -- from our health to the economy to the supply chain. Read more.

Azure Synapse Analytics - Microsoft's Flagship Lakehouse Now in Preview

Advancing Analytics: Data Engineering

MAY 19, 2020

Today’s the day! There’s much buzz & excitement as we FINALLY get to see Azure Synapse Analytics in public preview, ready for us all to get our hands on it. There’s a raft of other announcements that come hand & hand with it too. What’s that? You thought Azure Synapse Analytics was already available? You’ve been using all year and don’t see what the fuss is about??

Data Warehouse

Data Warehouse Data Lake SQL Machine Learning

Getting Started - Installing Additional Drivers

Preset

MAY 17, 2020

Now that you have Apache Superset installed locally, here's how to hook it up to your favorite database.

Database

Database IT

Create Your Own Custom String Interpolator

Rock the JVM

MAY 11, 2020

Discover how to create your own custom string interpolator that feels like a native feature of Scala

Scala

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

May, 2020

Change Data Capture Using Debezium Kafka and Pg

Apache Kafka Needs No Keeper: Removing the Apache ZooKeeper Dependency

Webinars

Trending Sources

Tips on Data Science Masters in Germany

Webinars

COVID-19: Risk Analytics for Building an Early Warning System

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Mapping The Customer Journey For B2B Companies At Dreamdata

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

Data Engineering Project for Beginners - Batch edition

Sign up to get articles personalized to your interests!

More Trending

Data Engineering Project for Beginners - Batch edition

Building a Telegram Bot Powered by Apache Kafka and ksqlDB

Jupyter Notebooks or Standalone Scripts?

How to Balance Efficiency and Risk in Your Supply Chain

Power Up Your PostgreSQL Analytics With Swarm64

Airflow Best Practices for ETL/ELT Pipelines

An Introduction to Monads in Scala

What Does It Mean for a Column to Be Indexed

Project Metamorphosis Part 1: Elastic Apache Kafka Clusters in Confluent Cloud

How to develop Spark applications with Zeppelin notebooks

Whats New in Apache Airflow 3.0 –– And How Will It Reshape Your Data Workflows?

Introducing Teradata’s Incoming CEO Steve McMillan

StreamNative Brings Streaming Data To The Cloud Native Landscape With Pulsar

Pull the Data you Actually Want

Thank You

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Learning All About Wi-Fi Data with Apache Kafka and Friends

Build a Full Big Data Platform Right Away?

How to Operationalize Enterprise Analytics in the Telco Industry

Enterprise Data Operations And Orchestration At Infoworks

Optimizing The Modern Developer Experience with Coder

New Course: NumPy for Data Engineers

5 Fun Code Expressiveness Tricks in Scala

Building a Clickstream Dashboard Application with ksqlDB and Elasticsearch

Job Opportunities For Data Science Proof Of Concepts and MVPs

15 Modern Use Cases for Enterprise Business Intelligence

COVID-19: The Perfect Storm

Azure Synapse Analytics - Microsoft's Flagship Lakehouse Now in Preview

Getting Started - Installing Additional Drivers

Create Your Own Custom String Interpolator

How to Modernize Manufacturing Without Losing Control

Stay Connected