Sat.Mar 23, 2019 - Fri.Mar 29, 2019

article thumbnail

The Importance of Distributed Tracing for Apache-Kafka-Based Applications

Confluent

Apache-Kafka ® -based applications stand out for their ability to decouple producers and consumers using an event log as an intermediate layer. One result of this is that producers and consumers don’t know about each other, as there is no direct communication between them. This enables choreographed service collaborations, where many components can subscribe to events stored in the event log and react to them asynchronously.

Kafka 111
article thumbnail

Building An Enterprise Data Fabric At CluedIn

Data Engineering Podcast

Summary Data integration is one of the most challenging aspects of any data platform, especially as the variety of data sources and formats grow. Enterprise organizations feel this acutely due to the silos that occur naturally across business units. The CluedIn team experienced this issue first-hand in their previous roles, leading them to build a business aimed at building a managed data fabric for the enterprise.

Building 100
article thumbnail

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Netflix Tech

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and Efficiency By: Di Lin , Girish Lingappa , Jitender Aswani Imagine yourself in the role of a data-inspired decision maker staring at a metric on a dashboard about to make a critical business decision but pausing to ask a question?—?“Can I run a check myself to understand what data is behind this metric?

article thumbnail

Improving the User Experience with Uber’s Customer Obsession Ticket Routing Workflow and Orchestration Engine

Uber Engineering

Every day, Uber users around the world initiate customer support tickets through our Customer Obsession Platform. To ensure a seamless user experience, each of those tickets must be matched with an agent who speaks the user’s language and who … The post Improving the User Experience with Uber’s Customer Obsession Ticket Routing Workflow and Orchestration Engine appeared first on Uber Engineering Blog.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Consuming Messages Out of Apache Kafka in a Browser

Confluent

Imagine a fire hose that spews out trillions of gallons of water every day, and part of your job is to withstand every drop coming out of it. This is what it is like to visualize the message throughput of Apache Kafka ®. At Confluent, we want to help developers understand how to think about event streaming and the opportunities it can create. Educating people on what an event stream looks like is a daunting task.

Kafka 79
article thumbnail

Cloudera Altus Director on AWS Marketplace makes cloud deployment and billing easy

Cloudera

Roughly a quarter of Cloudera’s customers have clusters on public cloud, with a majority of them on AWS. These customers often look for cloud infrastructure best practices guidance as they venture into AWS cloud resources for the first time. Some of the questions asked include: How many AMIs do I need? Should I use EBS or S3 for storage? Many of these questions are answered in the Cloudera on AWS reference architecture guide.

AWS 79

More Trending

article thumbnail

From Schemaless Ingest to Smart Schema: Enabling SQL on Raw Data

Rockset

You have complex, semi-structured data—nested JSON or XML, for instance, containing mixed types, sparse fields, and null values. It's messy, you don't understand how it's structured, and new fields appear every so often. The application you're implementing needs to analyze this data, combining it with other datasets, to return live metrics and recommended actions.

article thumbnail

Consuming Messages Out of Apache Kafka in a Browser

Confluent

Imagine a fire hose that spews out trillions of gallons of water every day, and part of your job is to withstand every drop coming out of it. This is what it is like to visualize the message throughput of Apache Kafka ®. At Confluent, we want to help developers understand how to think about event streaming and the opportunities it can create. Educating people on what an event stream looks like is a daunting task.

Kafka 74
article thumbnail

Managing mortgage risk in an uncertain world

Cloudera

Picture the scene: a hopeful homebuyer sits in the almost deserted lobby of a high street bank, waiting for the appointment she booked with the mortgage consultant a week ago – a week ago! It annoys her that she has had to come to a branch she has not visited for years, all because she could not work out how to apply for a home loan on the bank’s website.

article thumbnail

Why You Get Faster Query Results with Teradata’s Adaptive Optimizer

Teradata

Carrie Ballinger explores the capabilities of Teradata’s Adaptive Optimizer and how it builds better query plans for faster answers to analytic queries.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

A Story of Rust

Zalando Engineering

Introducing Rust in an Enterprise Environment Discovery Sometime in 2013, on the internet I stumbled over a new programming language called Rust. Taking a look at the language, I was impressed by its high level features. At that time I was a backend Scala developer with a.Net background. When examining Rust, I found most of the features I used every day like Pattern Matching, the “New Type Pattern” and a “Scala like” Iterator API.

Scala 40
article thumbnail

Introducing Cloudera Edge Management and Cloudera Flow Management

Cloudera

Cloudera’s vision of delivering Edge to AI solutions using the Enterprise Data Cloud will enable enterprises to transform dramatically. In today’s digitally connected enterprises, data originates from the edge, streams into the data center, lands in an Enterprise Data Cloud for downstream processing including Machine Learning and then serves back to the edge for real-time prediction and action.

article thumbnail

Learning with Limited Labeled Data

Cloudera

This post was originally published on the Cloudera Fast Forward Labs blog. . In recent years, machine learning technologies – especially deep learning – have made breakthroughs which have turned science fiction into reality. Autonomous cars are almost possible, and machines can comprehend language. These technical advances are unprecedented, but they hinge on the availability of vast amounts of data.