What’s New in Apache Kafka 3.0.0
Confluent
SEPTEMBER 21, 2021
I’m pleased to announce the release of Apache Kafka 3.0 on behalf of the Apache Kafka® community. Apache Kafka 3.0 is a major release in more ways than one. Apache […].
Confluent
SEPTEMBER 21, 2021
I’m pleased to announce the release of Apache Kafka 3.0 on behalf of the Apache Kafka® community. Apache Kafka 3.0 is a major release in more ways than one. Apache […].
Uber Engineering
SEPTEMBER 23, 2021
Uber recently launched a new capability: Ads on UberEats. With this new ability came new challenges that needed to be solved at Uber, such as systems for ad auctions, bidding, attribution, reporting, and more. This article focuses on how we … The post Real-Time Exactly-Once Ad Event Processing with Apache Flink, Kafka, and Pinot appeared first on Uber Engineering Blog.
Marc Lamberti
SEPTEMBER 21, 2021
By default, your tasks get executed once all the parent tasks succeed. this behaviour is what you expect in general. But what if you want something more complex? What if you would like to execute a task as soon as one of its parents succeeds? Or maybe you would like to execute a different set of tasks if a task fails? Or act differently according to if a task succeeds, fails or event gets skipped?
Cloudera
SEPTEMBER 20, 2021
There are many ways that Apache Kafka has been deployed in the field. In our Kafka Summit 2021 presentation, we took a brief overview of many different configurations that have been observed to date. In this blog series, we will discuss each of these deployments and the deployment choices made along with how they impact reliability. In Part 1, the discussion is related to: Serial and Parallel Systems Reliability as a concept, Kafka Clusters with and without Co-Located Apache Zookeeper, and Kafka
Advertisement
Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.
Confluent
SEPTEMBER 24, 2021
We’re pleased to announce ksqlDB 0.21.0! This release includes a major upgrade to ksqlDB’s foreign-key joins, the new data type BYTES, and a new ARRAY_CONCAT function. All of these features […].
Data Engineering Podcast
SEPTEMBER 24, 2021
Summary Python has beome the de facto language for working with data. That has brought with it a number of challenges having to do with the speed and scalability of working with large volumes of information.There have been many projects and strategies for overcoming these challenges, each with their own set of tradeoffs. In this episode Ehsan Totoni explains how he built the Bodo project to bring the speed and processing power of HPC techniques to the Python data ecosystem without requiring any
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
Cloudera
SEPTEMBER 21, 2021
Many customers looking at modernizing their pipeline orchestration have turned to Apache Airflow, a flexible and scalable workflow manager for data engineers. With 100s of open source operators, Airflow makes it easy to deploy pipelines in the cloud and interact with a multitude of services on premise, in the cloud, and across cloud providers for a true hybrid architecture. .
Netflix Tech
SEPTEMBER 22, 2021
Martin Tingley with Wenjing Zheng , Simon Ejdemyr , Stephanie Lane , and Colin McFarland This is the second post in a multi-part series on how Netflix uses A/B tests to inform decisions and continuously innovate on our products. See here for Part 1: Decision Making at Netflix. Subsequent posts will go into more details on the statistics of A/B tests, experimentation across Netflix, how Netflix has invested in infrastructure to support and scale experimentation, and the importance of the culture
Data Engineering Podcast
SEPTEMBER 19, 2021
Summary Biology has been gaining a lot of attention in recent years, even before the pandemic. As an outgrowth of that popularity, a new field has grown up that pairs statistics and compuational analysis with scientific research, namely bioinformatics. This brings with it a unique set of challenges for data collection, data management, and analytical capabilities.
DataKitchen
SEPTEMBER 20, 2021
Data organizations don’t always have the budget or schedule required for DataOps when conceived as a top-to-bottom, enterprise-wide transformational change. An essential part of the DataOps methodology is Agile Development , which breaks development into incremental steps. DataOps can and should be implemented in small steps that complement and build upon existing workflows and data pipelines.
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
Cloudera
SEPTEMBER 24, 2021
One of the most substantial big data workloads over the past fifteen years has been in the domain of telecom network analytics. Where does it stand today? What are its current challenges and opportunities? In a sense, there have been three phases of network analytics: the first was an appliance based monitoring phase; the second was an open-source expansion phase; and the third – that we are in right now – is a hybrid-data-cloud and governance phase.
Netflix Tech
SEPTEMBER 24, 2021
By Xiaomei Liu , Rosanna Lee , Cyril Concolato Introduction Behind the scenes of the beloved Netflix streaming service and content, there are many technology innovations in media processing. Packaging has always been an important step in media processing. After content ingestion, inspection and encoding, the packaging step encapsulates encoded video and audio in codec agnostic container formats and provides features such as audio video synchronization, random access and DRM protection.
Data Engineering Podcast
SEPTEMBER 19, 2021
Summary Building, scaling, and maintaining the operational components of a machine learning workflow are all hard problems. Add the work of creating the model itself, and it’s not surprising that a majority of companies that could greatly benefit from machine learning have yet to either put it into production or see the value. Tristan Zajonc recognized the complexity that acts as a barrier to adoption and created the Continual platform in response.
Data Science Blog: Data Engineering
SEPTEMBER 24, 2021
Data Warehousing is applied Big Data Management and a key success factor in almost every company. Without a data warehouse, no company today can control its processes and make the right decisions on a strategic level as there would be a lack of data transparency for all decision makers. Bigger comanies even have multiple data warehouses for different purposes.
Advertisement
Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.
Cloudera
SEPTEMBER 23, 2021
A slow car has never won a Formula One race. The Olympics doesn’t reward slow times in swimming, track or any other clock-timed sport. Likewise, slow data speeds don’t win over customers or colleagues in the real-time business world. Microsoft’s own research once reported that a person visiting a website on a connected device is likely to wait no more than 10 seconds to see it before moving to a competitor’s site.
Datakin
SEPTEMBER 24, 2021
Blog Datakin is now open to all! Written by Laurent Paris on Sep 24, 2021 This is it! We’re officially out of beta and excited to announce the general availability of Datakin. Our story began with the creation of Marquez over two years ago. We believed then, and still believe now, that a new approach to data lineage was essential to support today’s pipelines.
InData Labs
SEPTEMBER 23, 2021
Today, digitization penetrates all spheres of business. 2.5 quintillion bytes of data that people create every day is predominantly unstructured data. Whether it is audio, video or text, big data – if meticulously collected, recognized, and processed – can generate business value through leveraging state-of-the-art technologies. But no matter how intelligent machines may be, they.
Monte Carlo
SEPTEMBER 23, 2021
If your data breaks, does it make a sound? Odds are, the answer is yes. But will you hear it? Probably not. Nowadays, organizations ingest large amounts of data across increasingly complex ecosystems, and very often their data breaks silently, and as a result data teams are left in the dark – until it’s too late. But, if said data is a report used by your Chief Revenue Officer to determine next quarter’s forecast, chances are this data will make a very, very large sound.
Advertisement
Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?
Cloudera
SEPTEMBER 22, 2021
During this Partner Perspective interview, Cloudera’s Alvin Heib seizes the opportunity to speak with Benjamin Krebs, General Manager of Technology Enterprise in Germany. The pair discuss Benjamin’s role at Dell, the importance of partnerships in his region, how the pandemic has altered Dell’s working landscape and finally, some predictions Benjamin has on Dell’s future.
AltexSoft
SEPTEMBER 23, 2021
The larger the company, the more data it has to generate actionable insights. Yet, more than often, businesses can’t make use of their most valuable asset — information. Why? Because it is scattered across disparate systems, hardly available for analytical apps. Evidently, common storage solutions fail to provide a unified data view and meet the needs of companies for seamless data flow.
Rock the JVM
SEPTEMBER 22, 2021
Apache Pulsar is a cloud-native, distributed messaging and streaming platform handling hundreds of billions of events daily: discover its strengths and see how to use Scala with the pulsar4s client library to interact with it
Monte Carlo
SEPTEMBER 22, 2021
Today, we’re thrilled to announce that Bob Muglia , entrepreneur, Fivetran board member, and former CEO of Snowflake, and DJ Patil, the first U.S. Chief Data Scientist, will speak at IMPACT: The Data Observability Summit. Muglia’s fireside chat with Monte Carlo CEO Barr Moses will cap off the event, and touch on such topics as the rise of data in the cloud, challenges and opportunities in the current tooling landscape, and his vision for the future of data engineering and analytics.
Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali
As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.
Teradata
SEPTEMBER 22, 2021
Many Teradata customers are interested in integrating Vantage with Amazon AWS First Party Services. This Getting Started Guide will help you to connect Vantage with AWS Kinesis service.
Pipeline Data Engineering
SEPTEMBER 20, 2021
Data engineering salon. News and interesting reads about the world of data. From Data Driven to Driving Data — The dysfunctions of Data Engineering MrTrustworthy Many “data driven” initiatives are failing even though they had the best engineers on the task and picked the “best” stack of technologies. What's an OLAP cube? ? Claire Carroll, Analytics Engineer, analyticsengineers.club OLAP cubes were this intimidating concept, and the more they read, the less they understood, but it turns out that
Rock the JVM
SEPTEMBER 20, 2021
Pattern matching is one of Scala's most powerful features: discover how to customize it and create your own patterns in this article
Rockset
SEPTEMBER 21, 2021
Background Rate limiting is a technique used to protect services from overload. In addition, it can be used to prevent starvation of a multi-tenant resource by a few very large customers. At Rockset, we primarily use rate limiting to protect our: metadata store from overload caused by too many API requests. log store from filling up due to mismatched input and output rates control plane from too many state transitions.
Speaker: Nikhil Joshi, Founder & President of Snic Solutions
Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.
Zalando Engineering
SEPTEMBER 20, 2021
Welcome to the second part of our journey establishing SRE in Zalando. You’ll find the first part here. Don’t miss out on the third and final post in one week. 2018 - The Return of SRE In our previous blog post we left it with the plans for Site Reliability Engineering (SRE) in Zalando having to change. So, what were those changes and what were the challenges we faced in this new iteration?
Teradata
SEPTEMBER 20, 2021
The supply chain is not just the sum of its parts. Each function, organization, decision & action are connected & have an effect on each part of the supply chain. Find out more.
RudderStack
SEPTEMBER 23, 2021
How to use a webhook to stream new ‘lead created’ events from Salesforce through Rudderstack for lead enrichment w/ Clearbit data then back to Salesforce.
Data Science Blog: Data Engineering
SEPTEMBER 24, 2021
Understanding databases for storing, updating and analyzing data requires the understanding of two concepts: ACID and BASE. This is the first article of the article series Data Warehousing Basics. The properties of ACID are being applied for databases in order to fulfill enterprise requirements of reliability and consistency. ACID is an acronym, and stands for: Atomicity – Each transaction is either properly executed completely or does not happen at all.
Advertisement
Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.
Let's personalize your content