May, 2020

article thumbnail

Change Data Capture Using Debezium Kafka and Pg

Start Data Engineering

Change data capture is a software design pattern used to capture changes to data and take corresponding action based on that change. The change to data is usually one of read, update or delete. The corresponding action usually is supposed to occur in another system in response to the change that was made in the source system.

Kafka 246
article thumbnail

Apache Kafka Needs No Keeper: Removing the Apache ZooKeeper Dependency

Confluent

Currently, Apache Kafka® uses Apache ZooKeeper™ to store its metadata. Data such as the location of partitions and the configuration of topics are stored outside of Kafka itself, in a […].

Kafka 145
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Tips on Data Science Masters in Germany

Team Data Science

Should you do a masters degree in data science in Germany? Why not, but keep the following in mind! In general, it is very, very practical in Germany because it doesn't cost a lot of money to study. Not like for example in the USA or something like that. So if you are interested in it, you should first think about what the corresponding Master's programme is about.

article thumbnail

COVID-19: Risk Analytics for Building an Early Warning System

Teradata

Advanced analytics & AI techniques can help in curtailing the COVID-19 pandemic. This post describes an analytics prototype to build an early warning system for COVID-19.

Systems 118
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Mapping The Customer Journey For B2B Companies At Dreamdata

Data Engineering Podcast

Summary Gaining a complete view of the customer journey is especially difficult in B2B companies. This is due to the number of different individuals involved and the myriad ways that they interface with the business. Dreamdata integrates data from the multitude of platforms that are used by these organizations so that they can get a comprehensive view of their customer lifecycle.

article thumbnail

Hyper Scale VPC Flow Logs enrichment to provide Network Insight

Netflix Tech

How Netflix is able to enrich VPC Flow Logs at Hyper Scale to provide Network Insight By Hariharan Ananthakrishnan and Angela Ho The Cloud Network Infrastructure that Netflix utilizes today is a large distributed ecosystem that consists of specialized functional tiers and services such as DirectConnect, VPC Peering, Transit Gateways, NAT Gateways, etc.

AWS 61

More Trending

article thumbnail

Building a Telegram Bot Powered by Apache Kafka and ksqlDB

Confluent

Imagine you’ve got a stream of data; it’s not “big data,” but it’s certainly a lot. Within the data, you’ve got some bits you’re interested in, and of those bits, […].

Kafka 141
article thumbnail

Jupyter Notebooks or Standalone Scripts?

Team Data Science

Lot's of people like notebooks and so do I. Jupyter Notebooks for instance, are great to quickly explore some data or try something out. If you want to bring code into production however, you should or most likely, have to write standalone scripts. If you want to create something for production and then do it in production, Jupiter notebooks are not ideal.

Coding 130
article thumbnail

How to Balance Efficiency and Risk in Your Supply Chain

Teradata

Supply Chain organizations need visibility now to leverage data for making decisions and taking action, both in times of crisis and in relative stability.

Data 111
article thumbnail

Power Up Your PostgreSQL Analytics With Swarm64

Data Engineering Podcast

Summary The PostgreSQL database is massively popular due to its flexibility and extensive ecosystem of extensions, but it is still not the first choice for high performance analytics. Swarm64 aims to change that by adding support for advanced hardware capabilities like FPGAs and optimized usage of modern SSDs. In this episode CEO and co-founder Thomas Richter discusses his motivation for creating an extension to optimize Postgres hardware usage, the benefits of running your analytics on the same

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

An Introduction to Monads in Scala

Rock the JVM

A Scala tutorial on Monads that starts with practical needs and builds up from scratch: derive the monad patterns (laws) with no assumptions

Scala 52
article thumbnail

What Does It Mean for a Column to Be Indexed

Start Data Engineering

When optimizing queries on a database table, most developers tend to just create an index on the field to be queried.

IT 130
article thumbnail

Project Metamorphosis Part 1: Elastic Apache Kafka Clusters in Confluent Cloud

Confluent

A few weeks ago when we talked about our new fundraising, we also announced we’d be kicking off Project Metamorphosis. What is Project Metamorphosis? Let me try to explain. I […].

Project 124
article thumbnail

How to develop Spark applications with Zeppelin notebooks

Team Data Science

I love working with Zeppelin notebooks. Its so simple and you can just try something out. Especially working with dataframes and SparkSQL is a blast. What is a Zeppelin? A Zeppelin is a tool, a notebook tool, just like Jupiter. You can run it on a server and you can run it on your Hadoop cluster or whatever. And it can run Spark jobs in the background.

Hadoop 130
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Introducing Teradata’s Incoming CEO Steve McMillan

Teradata

Teradata's Board of Directors has selected the company's next President and Chief Executive Officer: Steve McMillan. Read more from interim President and CEO, Vic Lund.

108
108
article thumbnail

StreamNative Brings Streaming Data To The Cloud Native Landscape With Pulsar

Data Engineering Podcast

Summary There have been several generations of platforms for managing streaming data, each with their own strengths and weaknesses, and different areas of focus. Pulsar is one of the recent entrants which has quickly gained adoption and an impressive set of capabilities. In this episode Sijie Guo discusses his motivations for spending so much of his time and energy on contributing to the project and growing the community.

Cloud 100
article thumbnail

Pull the Data you Actually Want

Grouparoo

There’s an underlying pattern prevalent today in many digital marketing tools that is causing problems. Wasted time, overpaying, slow velocity, and privacy issues for your customers are some of the results of this pattern. The problem is the over-reliance on Events. Specifically, the problem is that many marketing tools live in a world where they expect to be “pushed” data, when it would be so much better if they were “pulling” data when they needed it.

Data 52
article thumbnail

Thank You

Start Data Engineering

Thank you for contacting us. We will get back to you shortly.

100
100
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Learning All About Wi-Fi Data with Apache Kafka and Friends

Confluent

Recently, I’ve been looking at what’s possible with streams of Wi-Fi packet capture (pcap) data. I was prompted after initially setting up my Raspberry Pi to capture pcap data and […].

Kafka 117
article thumbnail

Build a Full Big Data Platform Right Away?

Team Data Science

Should companies go full blowing big data/data science platform right away? In my opinion, you should first look at the different stages you are in. Are you in the Proof-of-Concept phase, where you are just working with offline data, where you are proving your concepts? Or are you in the MVP phase or in the creation of an MVP, where you are bringing in the first users, the first customers?

Big Data 130
article thumbnail

How to Operationalize Enterprise Analytics in the Telco Industry

Teradata

Operationalizing world class analytics into day-to-day processes can help solve some of the greatest challenges in the telecommunications industry. Find out more.

article thumbnail

Enterprise Data Operations And Orchestration At Infoworks

Data Engineering Podcast

Summary Data management is hard at any scale, but working in the context of an enterprise organization adds even greater complexity. Infoworks is a platform built to provide a unified set of tooling for managing the full lifecycle of data in large businesses. By reducing the barrier to entry with a graphical interface for defining data transformations and analysis, it makes it easier to bring the domain experts into the process.

Hadoop 100
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

New Course: NumPy for Data Engineers

Dataquest

Python programming is a critical skill for data engineers. When it comes to working with data, there’s a powerful library that can increase your code’s efficiency dramatically, especially when you’re working with large datasets: NumPy. That’s why we’ve added a NumPy for Data Engineers course to our Data Engineering path !

article thumbnail

5 Fun Code Expressiveness Tricks in Scala

Rock the JVM

Scala is an incredibly expressive language: discover hidden features that even experienced developers might miss!

Scala 52
article thumbnail

Building a Clickstream Dashboard Application with ksqlDB and Elasticsearch

Confluent

Using a powerful, event-driven application can help you unlock insights contained in the event streams of your business. Before we get into the technology, let’s go over some questions you […].

Building 114
article thumbnail

Job Opportunities For Data Science Proof Of Concepts and MVPs

Team Data Science

What are the job opportunities in the field of Data Science? Several, of course! Based on the 4 phases of a Data Science project, the possibilities can be worked out well. In this blog post only two of the four phases will be discussed. But now from the beginning. The four phases are: Proof-of-Concept, MVP, Validation and Scaling. The Proof of Concept Phase (PoC) Starting at the PoC phase, you could say: okay, I'm getting a research data scientist here.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

COVID-19: The Perfect Storm

Teradata

The COVID-19 pandemic has brought with it a Perfect Storm of disruption that impacts all of us -- from our health to the economy to the supply chain. Read more.

IT 105
article thumbnail

Azure Synapse Analytics - Microsoft's Flagship Lakehouse Now in Preview

Advancing Analytics: Data Engineering

Today’s the day! There’s much buzz & excitement as we FINALLY get to see Azure Synapse Analytics in public preview, ready for us all to get our hands on it. There’s a raft of other announcements that come hand & hand with it too. What’s that? You thought Azure Synapse Analytics was already available? You’ve been using all year and don’t see what the fuss is about??

article thumbnail

Getting Started - Installing Additional Drivers

Preset

Now that you have Apache Superset installed locally, here's how to hook it up to your favorite database.

article thumbnail

Create Your Own Custom String Interpolator

Rock the JVM

Discover how to create your own custom string interpolator that feels like a native feature of Scala

Scala 52
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!