Top Data Engineering Digest Big Data Tools Data Analytics Content for September, 2021

September, 2021

Building Uber’s Fulfillment Platform for Planet-Scale using Google Cloud Spanner

Uber Engineering

SEPTEMBER 29, 2021

Introduction. The Fulfillment Platform is a foundational Uber domain that enables the rapid scaling of new verticals. The platform handles billions of database transactions each day, ranging from user actions (e.g., a driver starting a trip) and system actions … The post Building Uber’s Fulfillment Platform for Planet-Scale using Google Cloud Spanner appeared first on Uber Engineering Blog.

Google Cloud

Google Cloud Cloud Building Database

What’s New in Apache Kafka 3.0.0

Confluent

SEPTEMBER 21, 2021

I’m pleased to announce the release of Apache Kafka 3.0 on behalf of the Apache Kafka® community. Apache Kafka 3.0 is a major release in more ways than one. Apache […].

Kafka

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

dbt(Data Build Tool) Tutorial

Start Data Engineering

SEPTEMBER 29, 2021

1. Introduction 2. Dbt, the T in ELT 3. Project 3.1. Prerequisites 3.2. Configurations and connections 3.2.1. profiles.yml 3.2.2. dbt_project.yml 3.3 Data flow 3.3.1. Source 3.3.2. Snapshots 3.3.3. Staging 3.3.4. Marts 3.3.4.1. Core 3.3.4.2. Marketing 3.4. dbt run 3.5. dbt test 3.6. dbt docs 3.7. Scheduling 4. Conclusion 5. Further reading 6. References 1.

Building

Building Data Project Engineering

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How to Take Notes in 2021?

Simon Späti

SEPTEMBER 28, 2021

Taking notes helps you not to forget things, teaches you to express yourself, brainstorms your thoughts, research a topic, and so many more things. I used to take notes all my life. Maybe it’s because I’m Swiss, they say we are well organised. I used to write in OneNote for 10+ years. I have notebooks for my bachelor studies and every workplace I worked.

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Airflow Trigger Rules: All you need to know!

Marc Lamberti

SEPTEMBER 21, 2021

By default, your tasks get executed once all the parent tasks succeed. this behaviour is what you expect in general. But what if you want something more complex? What if you would like to execute a task as soon as one of its parents succeeds? Or maybe you would like to execute a different set of tasks if a task fails? Or act differently according to if a task succeeds, fails or event gets skipped?

Data Pipeline

Data Pipeline IT Management Data

Apache Kafka Deployments and Systems Reliability – Part 1

Cloudera

SEPTEMBER 20, 2021

There are many ways that Apache Kafka has been deployed in the field. In our Kafka Summit 2021 presentation, we took a brief overview of many different configurations that have been observed to date. In this blog series, we will discuss each of these deployments and the deployment choices made along with how they impact reliability. In Part 1, the discussion is related to: Serial and Parallel Systems Reliability as a concept, Kafka Clusters with and without Co-Located Apache Zookeeper, and Kafka

Kafka

Kafka Systems Utilities Bytes

Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot

Uber Engineering

SEPTEMBER 23, 2021

Uber recently launched a new capability: Ads on UberEats. With this new ability came new challenges that needed to be solved at Uber, such as systems for ad auctions, bidding, attribution, reporting, and more. This article focuses on how we … The post Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot appeared first on Uber Engineering Blog.

Kafka

Kafka Process Systems Engineering

More Trending

Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot

Uber Engineering

SEPTEMBER 23, 2021

Kafka

Kafka Process Systems Engineering

Kafka Summit Americas 2021 Recap

Confluent

SEPTEMBER 16, 2021

The full inventory of three online Kafka Summits in 2021 is now complete. Kafka Summit Americas wrapped just yesterday. Being a part of the event team and the Program Committee, […].

Kafka

Kafka Programming

How to Scale Your Data Pipelines

Start Data Engineering

SEPTEMBER 16, 2021

1. Introduction 2. What is scaling & why do we need it? 3. Types of scaling 4. Choose your scaling strategy 5. Conclusion 6. Further reading 7. References 1. Introduction Choosing tools/frameworks to scale your data pipelines can be confusing. If you have struggled with Data pipelines that randomly crash Finding guides on how to scale your data pipelines from the ground up Then this post is for you.

Data Pipeline

Data Pipeline Data IT

Delivering Your Personal Data Cloud With Prifina

Data Engineering Podcast

SEPTEMBER 29, 2021

Summary The promise of online services is that they will make your life easier in exchange for collecting data about you. The reality is that they use more information than you realize for purposes that are not what you intended. There have been many attempts to harness all of the data that you generate for gaining useful insights about yourself, but they are generally difficult to set up and manage or require software development experience.

Cloud

Cloud Data Lake Business Intelligence Data

Unilever

Teradata

SEPTEMBER 19, 2021

Teradata Vantage on Azure supports 27 business services across supply chain, sales, finance, HR, and more.

Finance

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

When Data Redefines Companies

Cloudera

SEPTEMBER 1, 2021

The more an enterprise wants to know about itself and its business prospects, the more data it needs to collect and analyze. Additionally, the more data it collects and stores, the better its ability to know customers, to find new ones, and to provide more of what they want to buy. Sounds simple, but a surprising majority of U.S. companies (about two-thirds, according to CIO.com ) are only now getting tuned in to become fully functioning data-driven enterprises by starting new initiatives, scali

Hadoop

Hadoop Data Utilities Consulting

Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework

Uber Engineering

SEPTEMBER 2, 2021

Introduction. Uber’s GSS (Global Scaled Solutions) team runs scaled programs for diverse products and businesses, including but not limited to Eats, Rides, and Freight. The team transforms Uber’s ideas into agile, global solutions by designing and implementing scalable solutions. One … The post Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework appeared first on Uber Engineering Blog.

AWS

AWS Programming Engineering Designing

Announcing ksqlDB 0.21.0

Confluent

SEPTEMBER 24, 2021

We’re pleased to announce ksqlDB 0.21.0! This release includes a major upgrade to ksqlDB’s foreign-key joins, the new data type BYTES, and a new ARRAY_CONCAT function. All of these features […].

Bytes

Bytes Data Process

Start DataOps Today with ‘Lean DataOps’

DataKitchen

SEPTEMBER 20, 2021

Data organizations don’t always have the budget or schedule required for DataOps when conceived as a top-to-bottom, enterprise-wide transformational change. An essential part of the DataOps methodology is Agile Development , which breaks development into incremental steps. DataOps can and should be implemented in small steps that complement and build upon existing workflows and data pipelines.

Data Pipeline

Data Pipeline Process Data Cleanse Architecture

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Digging Into Data Reliability Engineering

Data Engineering Podcast

SEPTEMBER 25, 2021

Summary The accuracy and availability of data has become critically important to the day-to-day operation of businesses. Similar to the practice of site reliability engineering as a means of ensuring consistent uptime of web services, there has been a new trend of building data reliability engineering practices in companies that rely heavily on their data.

Engineering

Engineering Metadata Data Data Engineering

Groupon

Teradata

SEPTEMBER 12, 2021

Groupon is modernizing with Vantage on AWS to better match its data & analytics with demands of its global business. The Cloud allows Groupon to better leverage infrastructure dollars, support more technology projects and capture opportunity.

AWS

AWS Cloud Technology Project

Supercharge your Airflow Pipelines with the Cloudera Provider Package

Cloudera

SEPTEMBER 21, 2021

Many customers looking at modernizing their pipeline orchestration have turned to Apache Airflow, a flexible and scalable workflow manager for data engineers. With 100s of open source operators, Airflow makes it easy to deploy pipelines in the cloud and interact with a multitude of services on premise, in the cloud, and across cloud providers for a true hybrid architecture. .

Python

Python Cloud Accessibility Accessible

Jellyfish: Cost-Effective Data Tiering for Uber’s Largest Storage System

Uber Engineering

SEPTEMBER 9, 2021

Problem. Uber deploys a few storage technologies to store business data based on their application model. One such technology is called Schemaless , which enables the modeling of related entries in one single row of multiple columns, as well as … The post Jellyfish: Cost-Effective Data Tiering for Uber’s Largest Storage System appeared first on Uber Engineering Blog.

Systems

Systems Data Technology Engineering

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

How to Securely Connect Confluent Cloud with Services on AWS, Azure, and GCP

Confluent

SEPTEMBER 30, 2021

The rise of fully managed cloud services fundamentally changed the technology landscape and introduced benefits like increased flexibility, accelerated deployment, and reduced downtime. Confluent offers a portfolio of fully managed […].

Cloud

Cloud AWS Portfolio Technology

Practical API Design at Netflix, Part 1: Using Protobuf FieldMask

Netflix Tech

SEPTEMBER 3, 2021

By Alex Borysov , Ricky Gardiner Background At Netflix, we heavily use gRPC for the purpose of backend to backend communication. When we process a request it is often beneficial to know which fields the caller is interested in and which ones they ignore. Some response fields can be expensive to compute, some fields can require remote calls to other services.

Designing

Designing Java Bytes Utilities

Massively Parallel Data Processing In Python Without The Effort Using Bodo

Data Engineering Podcast

SEPTEMBER 24, 2021

Summary Python has beome the de facto language for working with data. That has brought with it a number of challenges having to do with the speed and scalability of working with large volumes of information.There have been many projects and strategies for overcoming these challenges, each with their own set of tradeoffs. In this episode Ehsan Totoni explains how he built the Bodo project to bring the speed and processing power of HPC techniques to the Python data ecosystem without requiring any

Data Process

Data Process Python Process Data Lake

What Should Enterprises Do to Offset Future Technology Disruption?

DataKitchen

SEPTEMBER 14, 2021

The post What Should Enterprises Do to Offset Future Technology Disruption? first appeared on DataKitchen.

Technology

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

Cloud

Telecom Network Analytics: Transformation, Innovation, Automation

Cloudera

SEPTEMBER 24, 2021

One of the most substantial big data workloads over the past fifteen years has been in the domain of telecom network analytics. Where does it stand today? What are its current challenges and opportunities? In a sense, there have been three phases of network analytics: the first was an appliance based monitoring phase; the second was an open-source expansion phase; and the third – that we are in right now – is a hybrid-data-cloud and governance phase.

Data Architect

Data Architect Government NoSQL Big Data

Building a Remote-First Culture

Datakin

SEPTEMBER 14, 2021

Blog Building a Remote-First Culture Written by Amanda Bulger on Sep 15, 2021 This morning I was planning an offsite for our team – our first one since Datakin was founded during the pandemic – and I had a realization: I haven’t met most of these people in person yet! We’ve been working together for months and months, solving interesting problems and planning social events, but we have been restricted to knowing each other through a tiny box on a screen.

Building

Building Recruitment Management IT

Getting Started with GraphQL and Apache Kafka

Confluent

SEPTEMBER 15, 2021

GraphQL and Apache Kafka® are sometimes troubled with misconceptions. One of the reasons for this is that people are often familiar with one but not the other. GraphQL is mostly […].

Kafka

How We Build Micro Frontends With Lattice

Netflix Tech

SEPTEMBER 28, 2021

Written by Michael Possumato , Nick Tomlin , Jordan Andree , Andrew Shim , and Rahul Pilani. As we continue to grow here at Netflix, the needs of Revenue and Growth Engineering are rapidly evolving; and our tools must also evolve just as rapidly. The Revenue and Growth Tools (RGT) team decided to set off on a journey to build tools in an abstract manner to have solutions readily available within our organization.

Building

Building Metadata Designing Coding

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

Business Intelligence

An Exploration Of The Data Engineering Requirements For Bioinformatics

Data Engineering Podcast

SEPTEMBER 19, 2021

Summary Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has grown up that pairs statistics and compuational analysis with scientific research, namely bioinformatics. This brings with it a unique set of challenges for data collection, data management, and analytical capabilities.

Data Engineer

Data Engineer Data Engineering Engineering Data Lake

DataOps is the Factory that Supports Your Data Mesh

DataKitchen

SEPTEMBER 17, 2021

Below is our final post (5 of 5) on combining data mesh with DataOps to foster innovation while addressing the challenges of a data mesh decentralized architecture. We see a DataOps process hub like the DataKitchen Platform playing a central supporting role in successfully implementing a data mesh. DataOps excels at the type of workflow automation that can coordinate interdependent domains, manage order-of-operations issues and handle inter-domain communication.

Architecture

Architecture Data Architecture Government Raw Data

Value Proposition of the Cloudera Operational Database over Legacy Apache HBase Deployments

Cloudera

SEPTEMBER 9, 2021

The CDP Operational Database ( COD ) builds on the foundation of existing operational database capabilities that were available with Apache HBase and/or Apache Phoenix in legacy CDH and HDP deployments. Within the context of a broader data and analytics platform implemented in the Cloudera Data Platform ( CDP ), COD will function as highly scalable relational and non-relational transactional database allowing users to leverage big data in operational applications as well as the backbone of the a

Database

Database AWS Relational Database Cloud

MLOps vs DevOps! Here's How They Fit Together

ProjectPro

SEPTEMBER 30, 2021

The word machine learning is buzzing so loud that almost every IT professional has heard this term by now. With time, machine learning has become more applied, and every industry is leveraging it. Most software applications today have sophisticated machine learning algorithms in action behind the scenes - Welcome to the world of MLOps that makes these ML models successful in production.

Machine Learning

Machine Learning Algorithm Software Engineer Software Engineering

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

September, 2021

Building Uber’s Fulfillment Platform for Planet-Scale using Google Cloud Spanner

What’s New in Apache Kafka 3.0.0

Webinars

Trending Sources

dbt(Data Build Tool) Tutorial

Webinars

How to Take Notes in 2021?

A Guide to Debugging Apache Airflow® DAGs

Airflow Trigger Rules: All you need to know!

Apache Kafka Deployments and Systems Reliability – Part 1

Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot

Sign up to get articles personalized to your interests!

More Trending

Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot

Kafka Summit Americas 2021 Recap

How to Scale Your Data Pipelines

Delivering Your Personal Data Cloud With Prifina

Unilever

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

When Data Redefines Companies

Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework

Announcing ksqlDB 0.21.0

Start DataOps Today with ‘Lean DataOps’

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Digging Into Data Reliability Engineering

Groupon

Supercharge your Airflow Pipelines with the Cloudera Provider Package

Jellyfish: Cost-Effective Data Tiering for Uber’s Largest Storage System

How to Modernize Manufacturing Without Losing Control

How to Securely Connect Confluent Cloud with Services on AWS, Azure, and GCP

Practical API Design at Netflix, Part 1: Using Protobuf FieldMask

Massively Parallel Data Processing In Python Without The Effort Using Bodo

What Should Enterprises Do to Offset Future Technology Disruption?

Optimizing The Modern Developer Experience with Coder

Telecom Network Analytics: Transformation, Innovation, Automation

Building a Remote-First Culture

Getting Started with GraphQL and Apache Kafka

How We Build Micro Frontends With Lattice

15 Modern Use Cases for Enterprise Business Intelligence

An Exploration Of The Data Engineering Requirements For Bioinformatics

DataOps is the Factory that Supports Your Data Mesh

Value Proposition of the Cloudera Operational Database over Legacy Apache HBase Deployments

MLOps vs DevOps! Here's How They Fit Together

The Ultimate Guide to Apache Airflow DAGS

Stay Connected