Top Data Engineering Digest Data Workflow Data Architect Content for Week of Sep 18

Sat.Sep 18, 2021 - Fri.Sep 24, 2021

Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot

Uber Engineering

SEPTEMBER 23, 2021

Uber recently launched a new capability: Ads on UberEats. With this new ability came new challenges that needed to be solved at Uber, such as systems for ad auctions, bidding, attribution, reporting, and more. This article focuses on how we … The post Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot appeared first on Uber Engineering Blog.

Kafka

Kafka Process Systems Engineering

What’s New in Apache Kafka 3.0.0

Confluent

SEPTEMBER 21, 2021

I’m pleased to announce the release of Apache Kafka 3.0 on behalf of the Apache Kafka® community. Apache Kafka 3.0 is a major release in more ways than one. Apache […].

Kafka

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Airflow Trigger Rules: All you need to know!

Marc Lamberti

SEPTEMBER 21, 2021

By default, your tasks get executed once all the parent tasks succeed. this behaviour is what you expect in general. But what if you want something more complex? What if you would like to execute a task as soon as one of its parents succeeds? Or maybe you would like to execute a different set of tasks if a task fails? Or act differently according to if a task succeeds, fails or event gets skipped?

Data Pipeline

Data Pipeline IT Management Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Apache Kafka Deployments and Systems Reliability – Part 1

Cloudera

SEPTEMBER 20, 2021

There are many ways that Apache Kafka has been deployed in the field. In our Kafka Summit 2021 presentation, we took a brief overview of many different configurations that have been observed to date. In this blog series, we will discuss each of these deployments and the deployment choices made along with how they impact reliability. In Part 1, the discussion is related to: Serial and Parallel Systems Reliability as a concept, Kafka Clusters with and without Co-Located Apache Zookeeper, and Kafka

Kafka

Kafka Systems Utilities Bytes

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Massively Parallel Data Processing In Python Without The Effort Using Bodo

Data Engineering Podcast

SEPTEMBER 24, 2021

Summary Python has beome the de facto language for working with data. That has brought with it a number of challenges having to do with the speed and scalability of working with large volumes of information.There have been many projects and strategies for overcoming these challenges, each with their own set of tradeoffs. In this episode Ehsan Totoni explains how he built the Bodo project to bring the speed and processing power of HPC techniques to the Python data ecosystem without requiring any

Data Process

Data Process Python Process Data Lake

Announcing ksqlDB 0.21.0

Confluent

SEPTEMBER 24, 2021

We’re pleased to announce ksqlDB 0.21.0! This release includes a major upgrade to ksqlDB’s foreign-key joins, the new data type BYTES, and a new ARRAY_CONCAT function. All of these features […].

Bytes

Bytes Data Process

Unilever

Teradata

SEPTEMBER 19, 2021

Teradata Vantage on Azure supports 27 business services across supply chain, sales, finance, HR, and more.

Finance

More Trending

Unilever

Teradata

SEPTEMBER 19, 2021

Teradata Vantage on Azure supports 27 business services across supply chain, sales, finance, HR, and more.

Finance

Supercharge your Airflow Pipelines with the Cloudera Provider Package

Cloudera

SEPTEMBER 21, 2021

Many customers looking at modernizing their pipeline orchestration have turned to Apache Airflow, a flexible and scalable workflow manager for data engineers. With 100s of open source operators, Airflow makes it easy to deploy pipelines in the cloud and interact with a multitude of services on premise, in the cloud, and across cloud providers for a true hybrid architecture. .

Python

Python Cloud Accessible Accessibility

An Exploration Of The Data Engineering Requirements For Bioinformatics

Data Engineering Podcast

SEPTEMBER 19, 2021

Summary Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has grown up that pairs statistics and compuational analysis with scientific research, namely bioinformatics. This brings with it a unique set of challenges for data collection, data management, and analytical capabilities.

Data Engineer

Data Engineer Data Engineering Engineering Data Lake

Start DataOps Today with ‘Lean DataOps’

DataKitchen

SEPTEMBER 20, 2021

Data organizations don’t always have the budget or schedule required for DataOps when conceived as a top-to-bottom, enterprise-wide transformational change. An essential part of the DataOps methodology is Agile Development , which breaks development into incremental steps. DataOps can and should be implemented in small steps that complement and build upon existing workflows and data pipelines.

Data Pipeline

Data Pipeline Process Data Cleanse Architecture

Netflix Cloud Packaging in the Terabyte Era

Netflix Tech

SEPTEMBER 24, 2021

By Xiaomei Liu , Rosanna Lee , Cyril Concolato Introduction Behind the scenes of the beloved Netflix streaming service and content, there are many technology innovations in media processing. Packaging has always been an important step in media processing. After content ingestion, inspection and encoding, the packaging step encapsulates encoded video and audio in codec agnostic container formats and provides features such as audio video synchronization, random access and DRM protection.

Cloud

Cloud Bytes Cloud Storage Media

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Telecom Network Analytics: Transformation, Innovation, Automation

Cloudera

SEPTEMBER 24, 2021

One of the most substantial big data workloads over the past fifteen years has been in the domain of telecom network analytics. Where does it stand today? What are its current challenges and opportunities? In a sense, there have been three phases of network analytics: the first was an appliance based monitoring phase; the second was an open-source expansion phase; and the third – that we are in right now – is a hybrid-data-cloud and governance phase.

Data Architect

Data Architect Government NoSQL Big Data

Declarative Machine Learning Without The Operational Overhead Using Continual

Data Engineering Podcast

SEPTEMBER 19, 2021

Summary Building, scaling, and maintaining the operational components of a machine learning workflow are all hard problems. Add the work of creating the model itself, and it’s not surprising that a majority of companies that could greatly benefit from machine learning have yet to either put it into production or see the value. Tristan Zajonc recognized the complexity that acts as a barrier to adoption and created the Continual platform in response.

Machine Learning

Machine Learning Data Warehouse Banking Metadata

Data Warehousing Basiscs

Data Science Blog: Data Engineering

SEPTEMBER 24, 2021

Data Warehousing is applied Big Data Management and a key success factor in almost every company. Without a data warehouse, no company today can control its processes and make the right decisions on a strategic level as there would be a lack of data transparency for all decision makers. Bigger comanies even have multiple data warehouses for different purposes.

Data Warehouse

Data Warehouse Data Lake Data Transparency Database

Datakin is now open to all!

Datakin

SEPTEMBER 24, 2021

Blog Datakin is now open to all! Written by Laurent Paris on Sep 24, 2021 This is it! We’re officially out of beta and excited to announce the general availability of Datakin. Our story began with the creation of Marquez over two years ago. We believed then, and still believe now, that a new approach to data lineage was essential to support today’s pipelines.

PostgreSQL

PostgreSQL Datasets Data Pipeline Data Process

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Speed Up Your Data Flow for Business Results

Cloudera

SEPTEMBER 23, 2021

A slow car has never won a Formula One race. The Olympics doesn’t reward slow times in swimming, track or any other clock-timed sport. Likewise, slow data speeds don’t win over customers or colleagues in the real-time business world. Microsoft’s own research once reported that a person visiting a website on a connected device is likely to wait no more than 10 seconds to see it before moving to a competitor’s site.

Data

Data Cloud Data Collection Designing

6 Automated Data Capture Methods For Business Development

InData Labs

SEPTEMBER 23, 2021

Today, digitization penetrates all spheres of business. 2.5 quintillion bytes of data that people create every day is predominantly unstructured data. Whether it is audio, video or text, big data – if meticulously collected, recognized, and processed – can generate business value through leveraging state-of-the-art technologies. But no matter how intelligent machines may be, they.

Bytes

Bytes Unstructured Data Big Data Data

Data Observability: Five Quick Ways to Improve the Reliability of Your Data

Monte Carlo

SEPTEMBER 23, 2021

If your data breaks, does it make a sound? Odds are, the answer is yes. But will you hear it? Probably not. Nowadays, organizations ingest large amounts of data across increasingly complex ecosystems, and very often their data breaks silently, and as a result data teams are left in the dark – until it’s too late. But, if said data is a report used by your Chief Revenue Officer to determine next quarter’s forecast, chances are this data will make a very, very large sound.

Data

Data BI Metadata Data Pipeline

What is Data Hub: Purpose, Architecture Patterns, and Existing Solutions Overview

AltexSoft

SEPTEMBER 23, 2021

The larger the company, the more data it has to generate actionable insights. Yet, more than often, businesses can’t make use of their most valuable asset — information. Why? Because it is scattered across disparate systems, hardly available for analytical apps. Evidently, common storage solutions fail to provide a unified data view and meet the needs of companies for seamless data flow.

Architecture

Architecture Data Lake Unstructured Data Data Warehouse

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Partnerships that Enrich Solutions: a Spotlight Interview with Dell Enterprise Germany’s General Manager, Benjamin Krebs

Cloudera

SEPTEMBER 22, 2021

During this Partner Perspective interview, Cloudera’s Alvin Heib seizes the opportunity to speak with Benjamin Krebs, General Manager of Technology Enterprise in Germany. The pair discuss Benjamin’s role at Dell, the importance of partnerships in his region, how the pandemic has altered Dell’s working landscape and finally, some predictions Benjamin has on Dell’s future.

Management

Management Banking Finance Government

Event Streaming in Apache Pulsar with Scala

Rock the JVM

SEPTEMBER 22, 2021

Apache Pulsar is a cloud-native, distributed messaging and streaming platform handling hundreds of billions of events daily: discover its strengths and see how to use Scala with the pulsar4s client library to interact with it

Scala

Scala Cloud IT

High Quality, Dynamic Images in Power BI

FreshBI

SEPTEMBER 22, 2021

Dynamic Images in Power BI Power BI has an awesome feature where you can define column category types. This allows you to define all values in a column as image URLs. From there, you can use publicly hosted image URLs to populate that column to dynamically view images in Power BI. An example of what this would look like is one column for fruit names and the other for image URLs, using the fruit name column as a slicer to dynamically pick which fruit is displayed.

BI Consulting Business Intelligence Python

Bob Muglia, former Snowflake CEO, to Speak at IMPACT, the World’s First Data Observability Summit

Monte Carlo

SEPTEMBER 22, 2021

Today, we’re thrilled to announce that Bob Muglia , entrepreneur, Fivetran board member, and former CEO of Snowflake, and DJ Patil, the first U.S. Chief Data Scientist, will speak at IMPACT: The Data Observability Summit. Muglia’s fireside chat with Monte Carlo CEO Barr Moses will cap off the event, and touch on such topics as the rise of data in the cloud, challenges and opportunities in the current tooling landscape, and his vision for the future of data engineering and analytics.

Data Lake

Data Lake Kafka Data Science Cloud

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

AWS Kinesis Firehose and Teradata Vantage

Teradata

SEPTEMBER 22, 2021

Many Teradata customers are interested in integrating Vantage with Amazon AWS First Party Services. This Getting Started Guide will help you to connect Vantage with AWS Kinesis service.

AWS

How We Improved the Concurrency and Scalability of Our Redis Rate Limiting System

Rockset

SEPTEMBER 21, 2021

Background Rate limiting is a technique used to protect services from overload. In addition, it can be used to prevent starvation of a multi-tenant resource by a few very large customers. At Rockset, we primarily use rate limiting to protect our: metadata store from overload caused by too many API requests. log store from filling up due to mismatched input and output rates control plane from too many state transitions.

Systems

Systems Java Coding Metadata

Custom Pattern Matching in Scala

Rock the JVM

SEPTEMBER 20, 2021

Pattern matching is one of Scala's most powerful features: discover how to customize it and create your own patterns in this article

Scala

Scala IT

The Data Janitor Letters - August 2021

Pipeline Data Engineering

SEPTEMBER 20, 2021

Data engineering salon. News and interesting reads about the world of data. From Data Driven to Driving Data — The dysfunctions of Data Engineering MrTrustworthy Many “data driven” initiatives are failing even though they had the best engineers on the task and picked the “best” stack of technologies. What's an OLAP cube? ? Claire Carroll, Analytics Engineer, analyticsengineers.club OLAP cubes were this intimidating concept, and the more they read, the less they understood, but it turns out that

Hadoop

Hadoop Software Engineer Software Engineering AWS

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

Tracing SRE’s journey in Zalando - Part II

Zalando Engineering

SEPTEMBER 20, 2021

Welcome to the second part of our journey establishing SRE in Zalando. You’ll find the first part here. Don’t miss out on the third and final post in one week. 2018 - The Return of SRE In our previous blog post we left it with the plans for Site Reliability Engineering (SRE) in Zalando having to change. So, what were those changes and what were the challenges we faced in this new iteration?

Consulting

Consulting Programming Engineering Project

Flexibility and Resiliency Across the Supply Chain

Teradata

SEPTEMBER 20, 2021

The supply chain is not just the sum of its parts. Each function, organization, decision & action are connected & have an effect on each part of the supply chain. Find out more.

Streaming Events From Salesforce for Lead Enrichment With RudderStack’s Webhook Source

RudderStack

SEPTEMBER 23, 2021

How to use a webhook to stream new ‘lead created’ events from Salesforce through Rudderstack for lead enrichment w/ Clearbit data then back to Salesforce.

Data

What is an A/B Test?

Netflix Tech

SEPTEMBER 22, 2021

Martin Tingley with Wenjing Zheng , Simon Ejdemyr , Stephanie Lane , and Colin McFarland This is the second post in a multi-part series on how Netflix uses A/B tests to inform decisions and continuously innovate on our products. See here for Part 1: Decision Making at Netflix. Subsequent posts will go into more details on the statistics of A/B tests, experimentation across Netflix, how Netflix has invested in infrastructure to support and scale experimentation, and the importance of the culture

Entertainment

Entertainment Media Building IT

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineer

Sat.Sep 18, 2021 - Fri.Sep 24, 2021

Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot

What’s New in Apache Kafka 3.0.0

Webinars

Trending Sources

Airflow Trigger Rules: All you need to know!

Webinars

Apache Kafka Deployments and Systems Reliability – Part 1

A Guide to Debugging Apache Airflow® DAGs

Massively Parallel Data Processing In Python Without The Effort Using Bodo

Announcing ksqlDB 0.21.0

Unilever

Sign up to get articles personalized to your interests!

More Trending

Unilever

Supercharge your Airflow Pipelines with the Cloudera Provider Package

An Exploration Of The Data Engineering Requirements For Bioinformatics

Start DataOps Today with ‘Lean DataOps’

Netflix Cloud Packaging in the Terabyte Era

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Telecom Network Analytics: Transformation, Innovation, Automation

Declarative Machine Learning Without The Operational Overhead Using Continual

Data Warehousing Basiscs

Datakin is now open to all!

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speed Up Your Data Flow for Business Results

6 Automated Data Capture Methods For Business Development

Data Observability: Five Quick Ways to Improve the Reliability of Your Data

What is Data Hub: Purpose, Architecture Patterns, and Existing Solutions Overview

How to Modernize Manufacturing Without Losing Control

Partnerships that Enrich Solutions: a Spotlight Interview with Dell Enterprise Germany’s General Manager, Benjamin Krebs

Event Streaming in Apache Pulsar with Scala

High Quality, Dynamic Images in Power BI

Bob Muglia, former Snowflake CEO, to Speak at IMPACT, the World’s First Data Observability Summit

The Ultimate Guide to Apache Airflow DAGS

AWS Kinesis Firehose and Teradata Vantage

How We Improved the Concurrency and Scalability of Our Redis Rate Limiting System

Custom Pattern Matching in Scala

The Data Janitor Letters - August 2021

Apache Airflow® Best Practices: DAG Writing

Tracing SRE’s journey in Zalando - Part II

Flexibility and Resiliency Across the Supply Chain

Streaming Events From Salesforce for Lead Enrichment With RudderStack’s Webhook Source

What is an A/B Test?

How to Achieve High-Accuracy Results When Using LLMs

Stay Connected