May, 2019

article thumbnail

Data Lineage For Your Pipelines

Data Engineering Podcast

Summary Some problems in data are well defined and benefit from a ready-made set of tools. For everything else, there’s Pachyderm, the platform for data science that is built to scale. In this episode Joe Doliner, CEO and co-founder, explains how Pachyderm started as an attempt to make data provenance easier to track, how the platform is architected and used today, and examples of how the underlying principles manifest in the workflows of data engineers and data scientists as they collabor

article thumbnail

Employing QUIC Protocol to Optimize Uber’s App Performance

Uber Engineering

Uber operates on a global scale across more than 600 cities, with our apps relying entirely on wireless connectivity from over 4,500 mobile carriers. To deliver the real-time performance expected from Uber’s users, our mobile apps require low-latency and highly … The post Employing QUIC Protocol to Optimize Uber’s App Performance appeared first on Uber Engineering Blog.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Apache Kafka Data Access Semantics: Consumers and Membership

Confluent

Every developer who uses Apache Kafka ® has used a Kafka consumer at least once. Although it is the simplest way to subscribe to and access events from Kafka, behind the scenes, Kafka consumers handle tricky distributed systems challenges like data consistency, failover and load balancing. Luckily, Kafka’s consuming model is quite easy to understand.

Kafka 111
article thumbnail

Engineering a Studio Quality Experience With High-Quality Audio at Netflix

Netflix Tech

by Guillaume du Pontavice, Phill Williams and Kylee Peña (on behalf of our Streaming Algorithms, Audio Algorithms, and Creative Technologies teams) Remember the epic opening sequence of Stranger Things 2 ? The thrill of that car chase through Pittsburgh not only introduced a whole new set of mysteries, but it returned us to a beloved and dangerous world alongside Dustin, Lucas, Mike, Will and Eleven.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

What Is the Biggest Challenge Facing CMOs Today? Building, Measuring, and Maintaining Brand Equity.

Teradata

Teradata CMO Martyn Etherington discusses how brands can build, measure, and maintain brand equity. He also explains why customer experience is critical to a brand's success.

article thumbnail

Case Study: FULL Uses Rockset with DynamoDB for Live Dashboard to Manage Remote Workforce

Rockset

Remote work affords organizations access to more talent and offers workers greater flexibility in their lives. With a vision for everyone to be able to work from anywhere, FULL Creative runs a contact center service using fully remote teams, tapping into the growing share of employees working remotely. FULL agents answer calls on behalf of 7,000 clients of all sizes, from plumbers to parking garages to legal and medical professionals.

More Trending

article thumbnail

Back-Pressure Strategy for a Sharded Akka Cluster

Zalando Engineering

AWS SQS polling from sharded Akka Cluster running on Kubernetes NOTE: This blog post requires the reader to have prior knowledge of AWS SQS , Akka Actors and Akka Cluster Sharding. My last post introduced Akka Cluster Sharding as a Distributed Cache running on Kubernetes. As that Proof-of-concept(PoC) proved promising, we started building a high-throughput and low-latency system based on the gained experiences and learnings.

AWS 52
article thumbnail

Schemas, Contracts, and Compatibility

Confluent

When you build microservices architectures, one of the concerns you need to address is that of communication between the microservices. At first, you may think to use REST APIs—most programming languages have frameworks that make it very easy to implement REST APIs, so this is a common first choice. REST APIs define the HTTP methods that are used and the request and response payloads that are expected.

Kafka 110
article thumbnail

Lerner?—?using RL agents for test case scheduling

Netflix Tech

Lerner?—?using RL agents for test case scheduling By: Stanislav Kirdey , Kevin Cureton , Scott Rick , Sankar Ramanathan Introduction Netflix brings delightful customer experiences to homes on a variety of devices that continues to grow each day. The device ecosystem is rich with partners ranging from Silicon-on-Chip (SoC) manufacturers, Original Design Manufacturer (ODM) and Original Equipment Manufacturer (OEM) vendors.

article thumbnail

How Air France-KLM Group Uses Cross-Channel Analytics to Smoothly Connect Over 100M Passengers

Teradata

Using Vantage, Air France-KLM Group performs cross-channel analytics of customer data to provide a seamless experience for their passengers.

Data 96
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Converged Index™: The Secret Sauce Behind Rockset's Fast Queries

Rockset

Adding an index to a database is one of those little joys in life. A query takes 10 seconds, you add a good index, and boom.10 milliseconds! Customers are happy, manager is happy, database is happy (according to its CPU graph at least). However, managing indexes gets old quickly. More indexes means writes are slower. There is always another query creeping up on the latency graph.

article thumbnail

Using FoundationDB As The Bedrock For Your Distributed Systems

Data Engineering Podcast

Summary The database market continues to expand, offering systems that are suited to virtually every use case. But what happens if you need something customized to your application? FoundationDB is a distributed key-value store that provides the primitives that you need to build a custom database platform. In this episode Ryan Worl explains how it is architected, how to use it for your applications, and provides examples of system design patterns that can be built on top of it.

Systems 100
article thumbnail

OCR Algorithm: Improve and Automate Business Processes

InData Labs

Businesses of mid and large scale have massive amounts of printed documents in daily use. Among them are invoices, receipts, corporate documents, reports, media releases. And millions of them can be handwritten, which makes documents understandable for humans but difficult to read for machines. Basic Concept of OCR Optical character recognition (OCR) algorithms allow computers.

article thumbnail

Kafka Summit London 2019 Session Videos

Confluent

Let us cut to the chase: Kafka Summit London session videos are available! If you were there, you know what a great time it was, and you know that you had to make sometimes-agonizing decisions about which sessions to attend and which to miss. Well, now you can make all those tradeoffs right by watching the whole catalog. And if you weren’t there? Well, dig in and start learning!

Kafka 109
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Android Rx onError Guidelines

Netflix Tech

By Ed Ballot “Creating a good API is hard.”?—? anyone who has created an API used by others As with any API, wrapping your data stream in a Rx observable requires consideration for reasonable error handling and intuitive behavior. The following guidelines are intended to help developers create consistent and intuitive API. Since we frequently create Rx Observables in our Android app, we needed a common understanding of when to use onNext() and when to use onError() to make the API more consisten

article thumbnail

5 Myths You Have Been Told About Industrial AI

Teradata

Cheryl Wiebe explains why AI for industrial use cases is a more complicated road than it appears.

IT 103
article thumbnail

Developer Pulse: 5 Things Developers Love

Rockset

We love a good debate. And we love data. So when the existential question of spaces vs. tabs came up in our team, we just had to run a real-time survey and collect thousands of data points around it. While we were at it, we figured it was time to settle the debate around other equally important developer issues like Hint vs. LaCroix, Vim vs. Emacs, and more.

NoSQL 52
article thumbnail

Docker for Data Science: Getting Started & Installing Docker

Advancing Analytics: Data Engineering

In the last Docker for Data Science blog we looked at where Docker came from and why it is important. In this blog we will get Docker installed and configured on either Windows or Mac. Installing Docker. Below are instructions for installing Docker on both Windows and on Mac. <important>Before we begin, there are a few different methods for installing Docker on Windows and Mac.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

A 5D model to assess your IoT readiness

Cloudera

The number one challenge that enterprises struggle with their IoT implementation is not being able to measure if they are successful or not with it. Most of the enterprises start an IoT initiative without assessing their potential prior hand to be able to complete it. Even if they complete it, they lack the ability to identify and correlate the success metrics with key business goals.

article thumbnail

Introducing a Cloud-Native Experience for Apache Kafka in Confluent Cloud

Confluent

In the last year, we’ve experienced enormous growth on Confluent Cloud, our fully managed Apache Kafka ® service. Confluent Cloud now handles several GB/s of traffic—a 200-fold increase in just six months. As Confluent Cloud has grown, we’ve noticed two gaps that very clearly remain to be filled in managed Apache Kafka services. First, all the Kafka services out there still require you to size and provision a cluster, which inevitably leads to a poor developer experience, over-provisioned capaci

Kafka 100
article thumbnail

Making our Android Studio Apps Reactive with UI Components & Redux

Netflix Tech

By Juliano Moraes , David Henry , Corey Grunewald & Jim Isaacs Recently Netflix has started building mobile apps to bring technology and innovation to our Studio Physical Productions , the portion of the business responsible for producing our TV shows and movies. Our very first mobile app is called Prodicle and was built for Android & iOS using the same reactive architecture in both platforms, which allowed us to build 2 apps from scratch in 3 months with 4 software engineers.

article thumbnail

Why is a Real Time Interaction Manager (RTIM) Essential to Providing a Superior Customer Experience?

Teradata

Ritu Jain explains the value of the Teradata Real Time Interaction Manager (RTIM) and why personalized customer experiences are so critical for marketers.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Building a Serverless Analytics App to Capture and Query Clickstream Data

Rockset

The best way to answer questions about user behavior is often to gather data. A common pattern is to track user clicks throughout a product, then perform analytical queries on the resulting data, getting a holistic understanding of user behavior. In my case, I was curious to get a pulse of developer preferences on several divisive questions. So, I built a simple survey and gathered tens of thousands of data points from developers on the Internet.

article thumbnail

Understanding Redis Background Memory Usage

Zalando Engineering

A closer look at how the Linux kernel influences Redis memory management Recently, I was talking to a long-time friend, previous university colleague and former boss, who mentioned the fact that Redis was failing to persist data to disk in low memory conditions. For that reason, he advised to never let a Redis in-memory dataset to be bigger than 50% of the system memory.

article thumbnail

Cloudera Data Science Workbench: where innovation meets security, compliance and scale on the road to industrialized AI

Cloudera

Gartner states that “By 2022, 75% of new end-user solutions leveraging machine learning (ML) and AI techniques will be built with commercial instead of open source platforms” ¹. Spoiler alert: it’s not because data scientists will stop relying on open source for the latest innovation in ML algorithms and development environments. But rather as businesses look to operationalize machine learning capabilities at scale, they’ll turn increasingly to commercial platforms, with connectors to open so

article thumbnail

Scylla and Confluent Integration for IoT Deployments

Confluent

The internet is not just connecting people around the world. Through the Internet of Things (IoT), it is also connecting humans to the machines all around us and directly connecting machines to other machines. In light of this, we’ll share an emerging machine-to-machine (M2M) architecture pattern in which MQTT, Apache Kafka ® , and Scylla all work together to provide an end-to-end IoT solution.

Kafka 98
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Deploying Kafka Streams and KSQL with Gradle – Part 2: Managing KSQL Implementations

Confluent

In part 1 , we discussed an event streaming architecture that we implemented for a customer using Apache Kafka ® , KSQL from Confluent, and Kafka Streams. Now in part 2, we’ll discuss the challenges we faced developing, building, and deploying the KSQL portion of our application and how we used Gradle to address them. In part 3, we’ll explore using Gradle to build and deploy KSQL user-defined functions (UDFs) and Kafka Streams microservices.

Kafka 96
article thumbnail

Spring for Apache Kafka Deep Dive – Part 3: Apache Kafka and Spring Cloud Data Flow

Confluent

Following part 1 and part 2 of the Spring for Apache Kafka Deep Dive blog series, here in part 3 we will discuss another project from the Spring team: Spring Cloud Data Flow , which focuses on enabling developers to easily develop, deploy, and orchestrate event streaming pipelines based on Apache Kafka ®. As a continuation from the previous blog series, this blog post explains how Spring Cloud Data Flow helps you gain developer productivity and manage Apache-Kafka-based event streaming applicati

Kafka 95
article thumbnail

Journey to Event Driven – Part 4: Four Pillars of Event Streaming Microservices

Confluent

So far in this series, we have recognized that by going back to first principles, we have a new foundation to work with. Event-first thinking enables us to build a new atomic unit: the event. Storing events in a stream and connecting streams via stream processors provide a generic, data-centric, distributed application runtime that you can use to build ETL, event streaming applications, applications for recording metrics and anything else that has a real-time data requirement.

Kafka 94
article thumbnail

Deploying Kafka Streams and KSQL with Gradle – Part 1: Overview and Motivation

Confluent

Red Pill Analytics was recently engaged by a Fortune 500 e-commerce and wholesale company that is transforming the way they manage inventory. Traditionally, this company has used only a few massive warehouses and shipped out from these locations to all customers regardless of geographic location or delivery style. But these legacy warehouses were slow to ship from and couldn’t keep up with more modern inventory management strategies, including same-day dropshipping or bringing the product physic

Kafka 92
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.