Sat.Sep 18, 2021 - Fri.Sep 24, 2021

article thumbnail

Airflow Trigger Rules: All you need to know!

Marc Lamberti

By default, your tasks get executed once all the parent tasks succeed. this behaviour is what you expect in general. But what if you want something more complex? What if you would like to execute a task as soon as one of its parents succeeds? Or maybe you would like to execute a different set of tasks if a task fails? Or act differently according to if a task succeeds, fails or event gets skipped?

article thumbnail

What’s New in Apache Kafka 3.0.0

Confluent

I’m pleased to announce the release of Apache Kafka 3.0 on behalf of the Apache Kafka® community. Apache Kafka 3.0 is a major release in more ways than one. Apache […].

Kafka 145
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot

Uber Engineering

Uber recently launched a new capability: Ads on UberEats. With this new ability came new challenges that needed to be solved at Uber, such as systems for ad auctions, bidding, attribution, reporting, and more. This article focuses on how we … The post Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot appeared first on Uber Engineering Blog.

Kafka 135
article thumbnail

Massively Parallel Data Processing In Python Without The Effort Using Bodo

Data Engineering Podcast

Summary Python has beome the de facto language for working with data. That has brought with it a number of challenges having to do with the speed and scalability of working with large volumes of information.There have been many projects and strategies for overcoming these challenges, each with their own set of tradeoffs. In this episode Ehsan Totoni explains how he built the Bodo project to bring the speed and processing power of HPC techniques to the Python data ecosystem without requiring any

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Apache Kafka Deployments and Systems Reliability – Part 1

Cloudera

There are many ways that Apache Kafka has been deployed in the field. In our Kafka Summit 2021 presentation, we took a brief overview of many different configurations that have been observed to date. In this blog series, we will discuss each of these deployments and the deployment choices made along with how they impact reliability. In Part 1, the discussion is related to: Serial and Parallel Systems Reliability as a concept, Kafka Clusters with and without Co-Located Apache Zookeeper, and Kafka

Kafka 115
article thumbnail

Announcing ksqlDB 0.21.0

Confluent

We’re pleased to announce ksqlDB 0.21.0! This release includes a major upgrade to ksqlDB’s foreign-key joins, the new data type BYTES, and a new ARRAY_CONCAT function. All of these features […].

Bytes 140

More Trending

article thumbnail

An Exploration Of The Data Engineering Requirements For Bioinformatics

Data Engineering Podcast

Summary Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has grown up that pairs statistics and compuational analysis with scientific research, namely bioinformatics. This brings with it a unique set of challenges for data collection, data management, and analytical capabilities.

article thumbnail

Supercharge your Airflow Pipelines with the Cloudera Provider Package

Cloudera

Many customers looking at modernizing their pipeline orchestration have turned to Apache Airflow, a flexible and scalable workflow manager for data engineers. With 100s of open source operators, Airflow makes it easy to deploy pipelines in the cloud and interact with a multitude of services on premise, in the cloud, and across cloud providers for a true hybrid architecture. .

Python 99
article thumbnail

What is an A/B Test?

Netflix Tech

Martin Tingley with Wenjing Zheng , Simon Ejdemyr , Stephanie Lane , and Colin McFarland This is the second post in a multi-part series on how Netflix uses A/B tests to inform decisions and continuously innovate on our products. See here for Part 1: Decision Making at Netflix. Subsequent posts will go into more details on the statistics of A/B tests, experimentation across Netflix, how Netflix has invested in infrastructure to support and scale experimentation, and the importance of the culture

article thumbnail

Unilever

Teradata

Teradata Vantage on Azure supports 27 business services across supply chain, sales, finance, HR, and more.

Finance 98
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Declarative Machine Learning Without The Operational Overhead Using Continual

Data Engineering Podcast

Summary Building, scaling, and maintaining the operational components of a machine learning workflow are all hard problems. Add the work of creating the model itself, and it’s not surprising that a majority of companies that could greatly benefit from machine learning have yet to either put it into production or see the value. Tristan Zajonc recognized the complexity that acts as a barrier to adoption and created the Continual platform in response.

article thumbnail

Telecom Network Analytics: Transformation, Innovation, Automation

Cloudera

One of the most substantial big data workloads over the past fifteen years has been in the domain of telecom network analytics. Where does it stand today? What are its current challenges and opportunities? In a sense, there have been three phases of network analytics: the first was an appliance based monitoring phase; the second was an open-source expansion phase; and the third – that we are in right now – is a hybrid-data-cloud and governance phase.

article thumbnail

Netflix Cloud Packaging in the Terabyte Era

Netflix Tech

By Xiaomei Liu , Rosanna Lee , Cyril Concolato Introduction Behind the scenes of the beloved Netflix streaming service and content, there are many technology innovations in media processing. Packaging has always been an important step in media processing. After content ingestion, inspection and encoding, the packaging step encapsulates encoded video and audio in codec agnostic container formats and provides features such as audio video synchronization, random access and DRM protection.

Cloud 96
article thumbnail

Data Warehousing Basiscs

Data Science Blog: Data Engineering

Data Warehousing is applied Big Data Management and a key success factor in almost every company. Without a data warehouse, no company today can control its processes and make the right decisions on a strategic level as there would be a lack of data transparency for all decision makers. Bigger comanies even have multiple data warehouses for different purposes.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Datakin is now open to all!

Datakin

Blog Datakin is now open to all! Written by Laurent Paris on Sep 24, 2021 This is it! We’re officially out of beta and excited to announce the general availability of Datakin. Our story began with the creation of Marquez over two years ago. We believed then, and still believe now, that a new approach to data lineage was essential to support today’s pipelines.

article thumbnail

Speed Up Your Data Flow for Business Results

Cloudera

A slow car has never won a Formula One race. The Olympics doesn’t reward slow times in swimming, track or any other clock-timed sport. Likewise, slow data speeds don’t win over customers or colleagues in the real-time business world. Microsoft’s own research once reported that a person visiting a website on a connected device is likely to wait no more than 10 seconds to see it before moving to a competitor’s site.

Data 81
article thumbnail

Event Streaming in Apache Pulsar with Scala

Rock the JVM

Apache Pulsar is a cloud-native, distributed messaging and streaming platform handling hundreds of billions of events daily: discover its strengths and see how to use Scala with the pulsar4s client library to interact with it

Scala 52
article thumbnail

6 Automated Data Capture Methods For Business Development

InData Labs

Today, digitization penetrates all spheres of business. 2.5 quintillion bytes of data that people create every day is predominantly unstructured data. Whether it is audio, video or text, big data – if meticulously collected, recognized, and processed – can generate business value through leveraging state-of-the-art technologies. But no matter how intelligent machines may be, they.

Bytes 52
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Data Observability: Five Quick Ways to Improve the Reliability of Your Data

Monte Carlo

If your data breaks, does it make a sound? Odds are, the answer is yes. But will you hear it? Probably not. Nowadays, organizations ingest large amounts of data across increasingly complex ecosystems, and very often their data breaks silently, and as a result data teams are left in the dark – until it’s too late. But, if said data is a report used by your Chief Revenue Officer to determine next quarter’s forecast, chances are this data will make a very, very large sound.

BI 52
article thumbnail

Partnerships that Enrich Solutions: a Spotlight Interview with Dell Enterprise Germany’s General Manager, Benjamin Krebs

Cloudera

During this Partner Perspective interview, Cloudera’s Alvin Heib seizes the opportunity to speak with Benjamin Krebs, General Manager of Technology Enterprise in Germany. The pair discuss Benjamin’s role at Dell, the importance of partnerships in his region, how the pandemic has altered Dell’s working landscape and finally, some predictions Benjamin has on Dell’s future.

article thumbnail

What is Data Hub: Purpose, Architecture Patterns, and Existing Solutions Overview

AltexSoft

The larger the company, the more data it has to generate actionable insights. Yet, more than often, businesses can’t make use of their most valuable asset — information. Why? Because it is scattered across disparate systems, hardly available for analytical apps. Evidently, common storage solutions fail to provide a unified data view and meet the needs of companies for seamless data flow.

article thumbnail

AWS Kinesis Firehose and Teradata Vantage

Teradata

Many Teradata customers are interested in integrating Vantage with Amazon AWS First Party Services. This Getting Started Guide will help you to connect Vantage with AWS Kinesis service.

AWS 52
article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

Bob Muglia, former Snowflake CEO, to Speak at IMPACT, the World’s First Data Observability Summit

Monte Carlo

Today, we’re thrilled to announce that Bob Muglia , entrepreneur, Fivetran board member, and former CEO of Snowflake, and DJ Patil, the first U.S. Chief Data Scientist, will speak at IMPACT: The Data Observability Summit. Muglia’s fireside chat with Monte Carlo CEO Barr Moses will cap off the event, and touch on such topics as the rise of data in the cloud, challenges and opportunities in the current tooling landscape, and his vision for the future of data engineering and analytics.

article thumbnail

How We Improved the Concurrency and Scalability of Our Redis Rate Limiting System

Rockset

Background Rate limiting is a technique used to protect services from overload. In addition, it can be used to prevent starvation of a multi-tenant resource by a few very large customers. At Rockset, we primarily use rate limiting to protect our: metadata store from overload caused by too many API requests. log store from filling up due to mismatched input and output rates control plane from too many state transitions.

Systems 52
article thumbnail

The Data Janitor Letters - August 2021

Pipeline Data Engineering

Data engineering salon. News and interesting reads about the world of data. From Data Driven to Driving Data — The dysfunctions of Data Engineering MrTrustworthy Many “data driven” initiatives are failing even though they had the best engineers on the task and picked the “best” stack of technologies. What's an OLAP cube? ? Claire Carroll, Analytics Engineer, analyticsengineers.club OLAP cubes were this intimidating concept, and the more they read, the less they understood, but it turns out that

Hadoop 52
article thumbnail

Flexibility and Resiliency Across the Supply Chain

Teradata

The supply chain is not just the sum of its parts. Each function, organization, decision & action are connected & have an effect on each part of the supply chain. Find out more.

IT 52
article thumbnail

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Speaker: Evelyn Chou

Choosing the right business intelligence (BI) platform can feel like navigating a maze of features, promises, and technical jargon. With so many options available, how can you ensure you’re making the right decision for your organization’s unique needs? 🤔 This webinar brings together expert insights to break down the complexities of BI solution vetting.

article thumbnail

Tracing SRE’s journey in Zalando - Part II

Zalando Engineering

Welcome to the second part of our journey establishing SRE in Zalando. You’ll find the first part here. Don’t miss out on the third and final post in one week. 2018 - The Return of SRE In our previous blog post we left it with the plans for Site Reliability Engineering (SRE) in Zalando having to change. So, what were those changes and what were the challenges we faced in this new iteration?

article thumbnail

Custom Pattern Matching in Scala

Rock the JVM

Pattern matching is one of Scala's most powerful features: discover how to customize it and create your own patterns in this article

Scala 52
article thumbnail

Streaming Events From Salesforce for Lead Enrichment With RudderStack’s Webhook Source

RudderStack

How to use a webhook to stream new ‘lead created’ events from Salesforce through Rudderstack for lead enrichment w/ Clearbit data then back to Salesforce.

Data 40
article thumbnail

ACID vs BASE Concepts

Data Science Blog: Data Engineering

Understanding databases for storing, updating and analyzing data requires the understanding of two concepts: ACID and BASE. This is the first article of the article series Data Warehousing Basics. The properties of ACID are being applied for databases in order to fulfill enterprise requirements of reliability and consistency. ACID is an acronym, and stands for: Atomicity – Each transaction is either properly executed completely or does not happen at all.

NoSQL 52
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.