September, 2021

article thumbnail

What’s New in Apache Kafka 3.0.0

Confluent

I’m pleased to announce the release of Apache Kafka 3.0 on behalf of the Apache Kafka® community. Apache Kafka 3.0 is a major release in more ways than one. Apache […].

Kafka 145
article thumbnail

Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework

Uber Engineering

Introduction. Uber’s GSS (Global Scaled Solutions) team runs scaled programs for diverse products and businesses, including but not limited to Eats, Rides, and Freight. The team transforms Uber’s ideas into agile, global solutions by designing and implementing scalable solutions. One … The post Streaming Real-Time Analytics with Redis, AWS Fargate, and Dash Framework appeared first on Uber Engineering Blog.

AWS 141
article thumbnail

dbt(Data Build Tool) Tutorial

Start Data Engineering

1. Introduction 2. Dbt, the T in ELT 3. Project 3.1. Prerequisites 3.2. Configurations and connections 3.2.1. profiles.yml 3.2.2. dbt_project.yml 3.3 Data flow 3.3.1. Source 3.3.2. Snapshots 3.3.3. Staging 3.3.4. Marts 3.3.4.1. Core 3.3.4.2. Marketing 3.4. dbt run 3.5. dbt test 3.6. dbt docs 3.7. Scheduling 4. Conclusion 5. Further reading 6. References 1.

Building 130
article thumbnail

How to Take Notes in 2021?

Simon Späti

Taking notes helps you not to forget things, teaches you to express yourself, brainstorms your thoughts, research a topic, and so many more things. I used to take notes all my life. Maybe it’s because I’m Swiss, they say we are well organised. I used to write in OneNote for 10+ years. I have notebooks for my bachelor studies and every workplace I worked.

IT 130
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Airflow Trigger Rules: All you need to know!

Marc Lamberti

By default, your tasks get executed once all the parent tasks succeed. this behaviour is what you expect in general. But what if you want something more complex? What if you would like to execute a task as soon as one of its parents succeeds? Or maybe you would like to execute a different set of tasks if a task fails? Or act differently according to if a task succeeds, fails or event gets skipped?

article thumbnail

Living on the Edge: How to Accelerate Your Business with Real-time Analytics

Cloudera

Leveraging the Internet of Things (IoT) allows you to improve processes and take your business in new directions. But it requires you to live on the edge. That’s where you find the ability to empower IoT devices to respond to events in real time by capturing and analyzing the relevant data. Edge computing relies on squeezing the power and functionality of a data center into a micro site as close to data sources as possible to enable real-time tasks.

Medical 125

More Trending

article thumbnail

Building Uber’s Fulfillment Platform for Planet-Scale using Google Cloud Spanner

Uber Engineering

Introduction. The Fulfillment Platform is a foundational Uber domain that enables the rapid scaling of new verticals. The platform handles billions of database transactions each day, ranging from user actions (e.g., a driver starting a trip) and system actions … The post Building Uber’s Fulfillment Platform for Planet-Scale using Google Cloud Spanner appeared first on Uber Engineering Blog.

article thumbnail

How to Scale Your Data Pipelines

Start Data Engineering

1. Introduction 2. What is scaling & why do we need it? 3. Types of scaling 4. Choose your scaling strategy 5. Conclusion 6. Further reading 7. References 1. Introduction Choosing tools/frameworks to scale your data pipelines can be confusing. If you have struggled with Data pipelines that randomly crash Finding guides on how to scale your data pipelines from the ground up Then this post is for you.

article thumbnail

Delivering Your Personal Data Cloud With Prifina

Data Engineering Podcast

Summary The promise of online services is that they will make your life easier in exchange for collecting data about you. The reality is that they use more information than you realize for purposes that are not what you intended. There have been many attempts to harness all of the data that you generate for gaining useful insights about yourself, but they are generally difficult to set up and manage or require software development experience.

Cloud 100
article thumbnail

Unilever

Teradata

Teradata Vantage on Azure supports 27 business services across supply chain, sales, finance, HR, and more.

Finance 98
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Apache Kafka Deployments and Systems Reliability – Part 1

Cloudera

There are many ways that Apache Kafka has been deployed in the field. In our Kafka Summit 2021 presentation, we took a brief overview of many different configurations that have been observed to date. In this blog series, we will discuss each of these deployments and the deployment choices made along with how they impact reliability. In Part 1, the discussion is related to: Serial and Parallel Systems Reliability as a concept, Kafka Clusters with and without Co-Located Apache Zookeeper, and Kafka

Kafka 121
article thumbnail

Announcing ksqlDB 0.21.0

Confluent

We’re pleased to announce ksqlDB 0.21.0! This release includes a major upgrade to ksqlDB’s foreign-key joins, the new data type BYTES, and a new ARRAY_CONCAT function. All of these features […].

Bytes 140
article thumbnail

Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot

Uber Engineering

Uber recently launched a new capability: Ads on UberEats. With this new ability came new challenges that needed to be solved at Uber, such as systems for ad auctions, bidding, attribution, reporting, and more. This article focuses on how we … The post Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot appeared first on Uber Engineering Blog.

Kafka 135
article thumbnail

What is an A/B Test?

Netflix Tech

Martin Tingley with Wenjing Zheng , Simon Ejdemyr , Stephanie Lane , and Colin McFarland This is the second post in a multi-part series on how Netflix uses A/B tests to inform decisions and continuously innovate on our products. See here for Part 1: Decision Making at Netflix. Subsequent posts will go into more details on the statistics of A/B tests, experimentation across Netflix, how Netflix has invested in infrastructure to support and scale experimentation, and the importance of the culture

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Digging Into Data Reliability Engineering

Data Engineering Podcast

Summary The accuracy and availability of data has become critically important to the day-to-day operation of businesses. Similar to the practice of site reliability engineering as a means of ensuring consistent uptime of web services, there has been a new trend of building data reliability engineering practices in companies that rely heavily on their data.

article thumbnail

Groupon

Teradata

Groupon is modernizing with Vantage on AWS to better match its data & analytics with demands of its global business. The Cloud allows Groupon to better leverage infrastructure dollars, support more technology projects and capture opportunity.

AWS 98
article thumbnail

When Data Redefines Companies

Cloudera

The more an enterprise wants to know about itself and its business prospects, the more data it needs to collect and analyze. Additionally, the more data it collects and stores, the better its ability to know customers, to find new ones, and to provide more of what they want to buy. Sounds simple, but a surprising majority of U.S. companies (about two-thirds, according to CIO.com ) are only now getting tuned in to become fully functioning data-driven enterprises by starting new initiatives, scali

Hadoop 106
article thumbnail

How to Securely Connect Confluent Cloud with Services on AWS, Azure, and GCP

Confluent

The rise of fully managed cloud services fundamentally changed the technology landscape and introduced benefits like increased flexibility, accelerated deployment, and reduced downtime. Confluent offers a portfolio of fully managed […].

Cloud 122
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Jellyfish: Cost-Effective Data Tiering for Uber’s Largest Storage System

Uber Engineering

Problem. Uber deploys a few storage technologies to store business data based on their application model. One such technology is called Schemaless , which enables the modeling of related entries in one single row of multiple columns, as well as … The post Jellyfish: Cost-Effective Data Tiering for Uber’s Largest Storage System appeared first on Uber Engineering Blog.

Systems 118
article thumbnail

Start DataOps Today with ‘Lean DataOps’

DataKitchen

Data organizations don’t always have the budget or schedule required for DataOps when conceived as a top-to-bottom, enterprise-wide transformational change. An essential part of the DataOps methodology is Agile Development , which breaks development into incremental steps. DataOps can and should be implemented in small steps that complement and build upon existing workflows and data pipelines.

article thumbnail

Massively Parallel Data Processing In Python Without The Effort Using Bodo

Data Engineering Podcast

Summary Python has beome the de facto language for working with data. That has brought with it a number of challenges having to do with the speed and scalability of working with large volumes of information.There have been many projects and strategies for overcoming these challenges, each with their own set of tradeoffs. In this episode Ehsan Totoni explains how he built the Bodo project to bring the speed and processing power of HPC techniques to the Python data ecosystem without requiring any

article thumbnail

Practical API Design at Netflix, Part 1: Using Protobuf FieldMask

Netflix Tech

By Alex Borysov , Ricky Gardiner Background At Netflix, we heavily use gRPC for the purpose of backend to backend communication. When we process a request it is often beneficial to know which fields the caller is interested in and which ones they ignore. Some response fields can be expensive to compute, some fields can require remote calls to other services.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Supercharge your Airflow Pipelines with the Cloudera Provider Package

Cloudera

Many customers looking at modernizing their pipeline orchestration have turned to Apache Airflow, a flexible and scalable workflow manager for data engineers. With 100s of open source operators, Airflow makes it easy to deploy pipelines in the cloud and interact with a multitude of services on premise, in the cloud, and across cloud providers for a true hybrid architecture. .

Python 105
article thumbnail

Getting Started with GraphQL and Apache Kafka

Confluent

GraphQL and Apache Kafka® are sometimes troubled with misconceptions. One of the reasons for this is that people are often familiar with one but not the other. GraphQL is mostly […].

Kafka 122
article thumbnail

Building a Remote-First Culture

Datakin

Blog Building a Remote-First Culture Written by Amanda Bulger on Sep 15, 2021 This morning I was planning an offsite for our team – our first one since Datakin was founded during the pandemic – and I had a realization: I haven’t met most of these people in person yet! We’ve been working together for months and months, solving interesting problems and planning social events, but we have been restricted to knowing each other through a tiny box on a screen.

article thumbnail

What Should Enterprises Do to Offset Future Technology Disruption?

DataKitchen

The post What Should Enterprises Do to Offset Future Technology Disruption? first appeared on DataKitchen.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

An Exploration Of The Data Engineering Requirements For Bioinformatics

Data Engineering Podcast

Summary Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has grown up that pairs statistics and compuational analysis with scientific research, namely bioinformatics. This brings with it a unique set of challenges for data collection, data management, and analytical capabilities.

article thumbnail

How We Build Micro Frontends With Lattice

Netflix Tech

Written by Michael Possumato , Nick Tomlin , Jordan Andree , Andrew Shim , and Rahul Pilani. As we continue to grow here at Netflix, the needs of Revenue and Growth Engineering are rapidly evolving; and our tools must also evolve just as rapidly. The Revenue and Growth Tools (RGT) team decided to set off on a journey to build tools in an abstract manner to have solutions readily available within our organization.

article thumbnail

Telecom Network Analytics: Transformation, Innovation, Automation

Cloudera

One of the most substantial big data workloads over the past fifteen years has been in the domain of telecom network analytics. Where does it stand today? What are its current challenges and opportunities? In a sense, there have been three phases of network analytics: the first was an appliance based monitoring phase; the second was an open-source expansion phase; and the third – that we are in right now – is a hybrid-data-cloud and governance phase.

article thumbnail

Confluent Unlocks the Full Power of Event Streams with Stream Governance

Confluent

Data governance initiatives aim to manage the availability, integrity, and security of data used across an organization. With the explosion in volume, variety, and velocity of data powering the modern […].

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.