March, 2019

article thumbnail

The Importance of Distributed Tracing for Apache-Kafka-Based Applications

Confluent

Apache-Kafka ® -based applications stand out for their ability to decouple producers and consumers using an event log as an intermediate layer. One result of this is that producers and consumers don’t know about each other, as there is no direct communication between them. This enables choreographed service collaborations, where many components can subscribe to events stored in the event log and react to them asynchronously.

Kafka 111
article thumbnail

Using Machine Learning to Ensure the Capacity Safety of Individual Microservices

Uber Engineering

Reliability engineering teams at Uber build the tools, libraries, and infrastructure that enable engineers to operate our thousands of microservices reliably at scale. At its essence, reliability engineering boils down to actively preventing outages that affect the mean time between … The post Using Machine Learning to Ensure the Capacity Safety of Individual Microservices appeared first on Uber Engineering Blog.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Why Analytics Projects Fail And What To Do About It

Data Engineering Podcast

Summary Analytics projects fail all the time, resulting in lost opportunities and wasted resources. There are a number of factors that contribute to that failure and not all of them are under our control. However, many of them are and as data engineers we can help to keep our projects on the path to success. Eugene Khazin is the CEO of PrimeTSR where he is tasked with rescuing floundering analytics efforts and ensuring that they provide value to the business.

Project 100
article thumbnail

Teradata Has Been Named One of the World's Most Ethical Companies 2019

Teradata

Teradata is thrilled to be named one the of the World’s Most Ethical Companies, for the tenth consecutive year.

99
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Netflix Tech

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and Efficiency By: Di Lin , Girish Lingappa , Jitender Aswani Imagine yourself in the role of a data-inspired decision maker staring at a metric on a dashboard about to make a critical business decision but pausing to ask a question?—?“Can I run a check myself to understand what data is behind this metric?

article thumbnail

Cloudera Altus Director on AWS Marketplace makes cloud deployment and billing easy

Cloudera

Roughly a quarter of Cloudera’s customers have clusters on public cloud, with a majority of them on AWS. These customers often look for cloud infrastructure best practices guidance as they venture into AWS cloud resources for the first time. Some of the questions asked include: How many AMIs do I need? Should I use EBS or S3 for storage? Many of these questions are answered in the Cloudera on AWS reference architecture guide.

AWS 77

More Trending

article thumbnail

Open Sourcing Peloton, Uber’s Unified Resource Scheduler

Uber Engineering

First introduced by Uber in November 2018, Peloton , a unified resource scheduler, manages resources across distinct workloads, combining separate compute clusters. Peloton is designed for web-scale companies like Uber with millions of containers and tens of thousands of nodes. … The post Open Sourcing Peloton, Uber’s Unified Resource Scheduler appeared first on Uber Engineering Blog.

article thumbnail

Building An Enterprise Data Fabric At CluedIn

Data Engineering Podcast

Summary Data integration is one of the most challenging aspects of any data platform, especially as the variety of data sources and formats grow. Enterprise organizations feel this acutely due to the silos that occur naturally across business units. The CluedIn team experienced this issue first-hand in their previous roles, leading them to build a business aimed at building a managed data fabric for the enterprise.

Building 100
article thumbnail

What Is Pervasive Data Intelligence?

Teradata

Chris Twogood explains Pervasive Data Intelligence and why it's important to large-scale enterprise businesses.

Data 91
article thumbnail

MezzFS?—?Mounting object storage in Netflix’s media processing platform

Netflix Tech

MezzFS?—?Mounting object storage in Netflix’s media processing platform By Barak Alon (on behalf of Netflix’s Media Cloud Engineering team) MezzFS (short for “Mezzanine File System”) is a tool we’ve developed at Netflix that mounts cloud objects as local files via FUSE. It’s used extensively in our media processing platform, which includes services like Archer and runs features like video encoding and title image generation on tens of thousands of Amazon EC2 instances.

Media 87
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Managing mortgage risk in an uncertain world

Cloudera

Picture the scene: a hopeful homebuyer sits in the almost deserted lobby of a high street bank, waiting for the appointment she booked with the mortgage consultant a week ago – a week ago! It annoys her that she has had to come to a branch she has not visited for years, all because she could not work out how to apply for a home loan on the bank’s website.

article thumbnail

Kafka Streams’ Take on Watermarks and Triggers

Confluent

Back in May 2017, we laid out why we believe that Kafka Streams is better off without a concept of watermarks or triggers , and instead opts for a continuous refinement model. This article explains how we are fundamentally sticking with this model, while also opening the door for use cases that are incompatible with continuous refinement. By continuous refinement , I mean that Kafka Streams emits new results whenever records are updated.

Kafka 105
article thumbnail

Improving the User Experience with Uber’s Customer Obsession Ticket Routing Workflow and Orchestration Engine

Uber Engineering

Every day, Uber users around the world initiate customer support tickets through our Customer Obsession Platform. To ensure a seamless user experience, each of those tickets must be matched with an agent who speaks the user’s language and who … The post Improving the User Experience with Uber’s Customer Obsession Ticket Routing Workflow and Orchestration Engine appeared first on Uber Engineering Blog.

article thumbnail

A DataOps vs DevOps Cookoff In The Data Kitchen

Data Engineering Podcast

Summary Delivering a data analytics project on time and with accurate information is critical to the success of any business. DataOps is a set of practices to increase the probability of success by creating value early and often, and using feedback loops to keep your project on course. In this episode Chris Bergh, head chef of Data Kitchen, explains how DataOps differs from DevOps, how the industry has begun adopting DataOps, and how to adopt an agile approach to building your data platform.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

How to Use Analytics to Avoid Problems, Before They Impact Your Business

Teradata

Rob Armstrong uses the metaphor of turbulence when flying, to explains how businesses can prepare themselves for, and respond to, previously unforeseen challenges.

74
article thumbnail

Spinnaker Sets Sail to the Continuous Delivery Foundation

Netflix Tech

Author: Andy Glover Since releasing Spinnaker to the open source community in 2015 , the platform has flourished with the addition of new cloud providers, triggers, pipeline stages, and much more. Myriad new features, improvements, and innovations have been added by an ever growing, actively engaged community. Each new innovation has been a step towards an even better Continuous Delivery platform that facilitates rapid, reliable, safe delivery of flexible assets to pluggable deployment targets.

article thumbnail

Introducing Cloudera Edge Management and Cloudera Flow Management

Cloudera

Cloudera’s vision of delivering Edge to AI solutions using the Enterprise Data Cloud will enable enterprises to transform dramatically. In today’s digitally connected enterprises, data originates from the edge, streams into the data center, lands in an Enterprise Data Cloud for downstream processing including Machine Learning and then serves back to the edge for real-time prediction and action.

article thumbnail

CloudBank’s Journey from Mainframe to Streaming with Confluent Cloud

Confluent

Cloud is one of the key drivers for innovation. Innovative companies experiment with data to come up with something useful. It usually starts with the opening of a firehose that continuously broadcasts tons of events before they start mining it to create music out of simply noise. Today, companies from all around the world are witnessing an explosion of event generation coming from everywhere, including their own internal systems.

Cloud 88
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Serverless Data Management: A SQL Search and Analytics Engine

Rockset

When we started Rockset, we envisioned building a powerful cloud data management system that was really easy to use. Making the data stack simpler is fundamental to making data usable by developers and data scientists. Simplifying the Data Stack To that end, we incorporated user-friendly features that alleviate the pain we personally experienced as data practitioners.

SQL 52
article thumbnail

Brand Identity Issues: How Does Logo Detection Work for Effective Marketing Campaign?

InData Labs

Social media has evolved into the main method of communicating ideas, sharing experience, brand stories, and building communities. The user engagement with ads on Facebook has tripled in the last 2 years, as Hootsuite reports. So far, more than 60% of users discover brands and goods on Instagram, employ such apps as Like2Buy that allows. Запись Brand Identity Issues: How Does Logo Detection Work for Effective Marketing Campaign?

Media 52
article thumbnail

Adding Cloud to Your Analytic Ecosystem

Teradata

Brian Wood explains what should be considered when adding cloud to your analytic ecosystem.

Cloud 72
article thumbnail

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Netflix Tech

Netflix’s engineering culture is predicated on Freedom & Responsibility, the idea that everyone (and every team) at Netflix is entrusted with a core responsibility and they are free to operate with freedom to satisfy their mission. This freedom allows teams and individuals to move fast to deliver on innovation and feel responsible for quality and robustness of their delivery.

Cloud 74
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Introducing the 2019 Data Heroes – EMEA!

Cloudera

The Data Heroes initiative is one of the ways that we recognize customers who achieve outstanding results with Cloudera technologies. The Data Visionary, Data Scientist, Data Architect, and HCC Community Champion awards are given out to organizations transforming their businesses through Big Data. Data Heroes design modern data architectures that work across hybrid and multi-cloud, and solve complex data management and analytic use cases that span from the Edge to AI. .

article thumbnail

Consuming Messages Out of Apache Kafka in a Browser

Confluent

Imagine a fire hose that spews out trillions of gallons of water every day, and part of your job is to withstand every drop coming out of it. This is what it is like to visualize the message throughput of Apache Kafka ®. At Confluent, we want to help developers understand how to think about event streaming and the opportunities it can create. Educating people on what an event stream looks like is a daunting task.

Kafka 76
article thumbnail

Wake up to Pandora with the Clock app from Google

Pandora Engineering

It’s 7 o’clock. I tug my blanket tightly over my face, hoping to whisk the morning away. The sound of my favorite playlist soothes me to a gentle rise. I smile. The alarm worked. Months of hard work collaborating with Google™, including many days at each others’ offices, came to life in that moment. Just a week earlier, we announced the release of the Pandora integration with the Clock app from Google.

Media 52
article thumbnail

Advancing Analytics at #SQLBits 2019

Advancing Analytics: Data Engineering

Today is my first day back in the office after attending SQLBits in Manchester last week. SQLBits is the UK's largest Microsoft Data Platform conference. What makes this event special is that it is organised for the community, by the community and is not for profit - All the proceeds go in to funding the event and in particular, the awesome Friday night party.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Building a Diverse and Inclusive Teradata

Teradata

Teradata CEO Oliver Ratzesberger celebrates International Women's Day.

article thumbnail

Design Principles for Mathematical Engineering in Experimentation Platform

Netflix Tech

Design Principles for Mathematical Engineering in Experimentation Platform at Netflix Jeffrey Wong, Senior Modeling Architect, Experimentation Platform Colin McFarland, Director, Experimentation Platform At Netflix, we have data scientists coming from many backgrounds such as neuroscience, statistics and biostatistics, economics, and physics; each of these backgrounds has a meaningful contribution to how experiments should be analyzed.

article thumbnail

Learning with Limited Labeled Data

Cloudera

This post was originally published on the Cloudera Fast Forward Labs blog. . In recent years, machine learning technologies – especially deep learning – have made breakthroughs which have turned science fiction into reality. Autonomous cars are almost possible, and machines can comprehend language. These technical advances are unprecedented, but they hinge on the availability of vast amounts of data.

article thumbnail

Consuming Messages Out of Apache Kafka in a Browser

Confluent

Imagine a fire hose that spews out trillions of gallons of water every day, and part of your job is to withstand every drop coming out of it. This is what it is like to visualize the message throughput of Apache Kafka ®. At Confluent, we want to help developers understand how to think about event streaming and the opportunities it can create. Educating people on what an event stream looks like is a daunting task.

Kafka 74
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you