Top Data Engineering Digest Data Mining Big Data Ecosystem Content for May, 2019

May, 2019

Employing QUIC Protocol to Optimize Uber’s App Performance

Uber Engineering

MAY 14, 2019

Uber operates on a global scale across more than 600 cities, with our apps relying entirely on wireless connectivity from over 4,500 mobile carriers. To deliver the real-time performance expected from Uber’s users, our mobile apps require low-latency and highly … The post Employing QUIC Protocol to Optimize Uber’s App Performance appeared first on Uber Engineering Blog.

Engineering

Engineering Architecture

Schemas, Contracts, and Compatibility

Confluent

MAY 21, 2019

When you build microservices architectures, one of the concerns you need to address is that of communication between the microservices. At first, you may think to use REST APIs—most programming languages have frameworks that make it very easy to implement REST APIs, so this is a common first choice. REST APIs define the HTTP methods that are used and the request and response payloads that are expected.

Kafka

Kafka Insurance Architecture Database

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Engineering a Studio Quality Experience With High-Quality Audio at Netflix

Netflix Tech

MAY 1, 2019

by Guillaume du Pontavice, Phill Williams and Kylee Peña (on behalf of our Streaming Algorithms, Audio Algorithms, and Creative Technologies teams) Remember the epic opening sequence of Stranger Things 2 ? The thrill of that car chase through Pittsburgh not only introduced a whole new set of mysteries, but it returned us to a beloved and dangerous world alongside Dustin, Lucas, Mike, Will and Eleven.

Engineering

Engineering Algorithm Media Entertainment

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

5 Myths You Have Been Told About Industrial AI

Teradata

MAY 14, 2019

Cheryl Wiebe explains why AI for industrial use cases is a more complicated road than it appears.

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Data Lineage For Your Pipelines

Data Engineering Podcast

MAY 26, 2019

Summary Some problems in data are well defined and benefit from a ready-made set of tools. For everything else, there’s Pachyderm, the platform for data science that is built to scale. In this episode Joe Doliner, CEO and co-founder, explains how Pachyderm started as an attempt to make data provenance easier to track, how the platform is architected and used today, and examples of how the underlying principles manifest in the workflows of data engineers and data scientists as they collabor

Data Science

Data Science Data Pipeline Data Kafka

Women in Big Data Panel at DataWorks Summit 2019

Cloudera

MAY 2, 2019

Last month, I moderated The Women in Big Data panel hosted by DataWorks Summit and sponsored by Women in Big Data. This was a well-attended event with five amazing guest speakers – Hilary Mason , Tina Rosario , Violeta Ciurel , Ana Gillan and Devon Edwards Joseph. The theme for the discussion was “Top technology trends women and men business leaders need to be aware of”.

Big Data

Big Data Data Science Healthcare Technology

Case Study: FULL Uses Rockset with DynamoDB for Live Dashboard to Manage Remote Workforce

Rockset

MAY 24, 2019

Remote work affords organizations access to more talent and offers workers greater flexibility in their lives. With a vision for everyone to be able to work from anywhere, FULL Creative runs a contact center service using fully remote teams, tapping into the growing share of employees working remotely. FULL agents answer calls on behalf of 7,000 clients of all sizes, from plumbers to parking garages to legal and medical professionals.

Management

Management Medical SQL Utilities

More Trending

Case Study: FULL Uses Rockset with DynamoDB for Live Dashboard to Manage Remote Workforce

Rockset

MAY 24, 2019

Management

Management Medical SQL Utilities

Kafka Summit London 2019 Session Videos

Confluent

MAY 23, 2019

Let us cut to the chase: Kafka Summit London session videos are available! If you were there, you know what a great time it was, and you know that you had to make sometimes-agonizing decisions about which sessions to attend and which to miss. Well, now you can make all those tradeoffs right by watching the whole catalog. And if you weren’t there? Well, dig in and start learning!

Kafka

Kafka Cloud Database Engineering

Lerner?—?using RL agents for test case scheduling

Netflix Tech

MAY 21, 2019

Lerner?—?using RL agents for test case scheduling By: Stanislav Kirdey , Kevin Cureton , Scott Rick , Sankar Ramanathan Introduction Netflix brings delightful customer experiences to homes on a variety of devices that continues to grow each day. The device ecosystem is rich with partners ranging from Silicon-on-Chip (SoC) manufacturers, Original Design Manufacturer (ODM) and Original Equipment Manufacturer (OEM) vendors.

Amazon Web Services

Amazon Web Services Manufacturing Python AWS

3 Easy Ways to Turn Data into Actionable Answers

Teradata

MAY 30, 2019

Rob Armstrong explains three critical ways to get better answers from your data.

Data

Build Your Data Analytics Like An Engineer With DBT

Data Engineering Podcast

MAY 19, 2019

Summary In recent years the traditional approach to building data warehouses has shifted from transforming records before loading, to transforming them afterwards. As a result, the tooling for those transformations needs to be reimagined. The data build tool (dbt) is designed to bring battle tested engineering practices to your analytics pipelines. By providing an opinionated set of best practices it simplifies collaboration and boosts confidence in your data teams.

Data Analytics

Data Analytics Building Engineering Data Warehouse

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Announcing the 2019 Data Impact Awards

Cloudera

MAY 22, 2019

We are excited to be launching our first awards program together as the “New Cloudera.” Although the program is technically in its seventh year, as the first joint awards program, this year’s Data Impact Awards will span even more use cases, covering even more advances in IoT, data warehouse, machine learning, and more. The program recognizes organizations that are using Cloudera’s platform and services to unlock the power of data, with massive business and social impact.

Pharmaceutical

Pharmaceutical Recruitment Machine Learning Metadata

Converged Index™: The Secret Sauce Behind Rockset's Fast Queries

Rockset

MAY 23, 2019

Adding an index to a database is one of those little joys in life. A query takes 10 seconds, you add a good index, and boom.10 milliseconds! Customers are happy, manager is happy, database is happy (according to its CPU graph at least). However, managing indexes gets old quickly. More indexes means writes are slower. There is always another query creeping up on the latency graph.

Database

Database Datasets Media Algorithm

The PipelineDB Team Joins Confluent

Confluent

MAY 1, 2019

Some years ago, when I was at LinkedIn, I didn’t really know what Apache Kafka ® would become but had an inkling that the next generation of applications would not be islands disconnected from one another, or lashed together with irregular, point-to-point bindings. When we founded Confluent, we took the radical approach of viewing data—and the infrastructure that supported it—as a series of real-time streaming events rather than something kept in static, sedentary data repositories.

Kafka

Kafka Datasets Database Technology

Android Rx onError Guidelines

Netflix Tech

MAY 1, 2019

By Ed Ballot “Creating a good API is hard.”?—? anyone who has created an API used by others As with any API, wrapping your data stream in a Rx observable requires consideration for reasonable error handling and intuitive behavior. The following guidelines are intended to help developers create consistent and intuitive API. Since we frequently create Rx Observables in our Android app, we needed a common understanding of when to use onNext() and when to use onError() to make the API more consisten

Database

Database Coding Systems IT

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

What Is the Biggest Challenge Facing CMOs Today? Building, Measuring, and Maintaining Brand Equity.

Teradata

MAY 1, 2019

Teradata CMO Martyn Etherington discusses how brands can build, measure, and maintain brand equity. He also explains why customer experience is critical to a brand's success.

Building

Using FoundationDB As The Bedrock For Your Distributed Systems

Data Engineering Podcast

MAY 6, 2019

Summary The database market continues to expand, offering systems that are suited to virtually every use case. But what happens if you need something customized to your application? FoundationDB is a distributed key-value store that provides the primitives that you need to build a custom database platform. In this episode Ryan Worl explains how it is architected, how to use it for your applications, and provides examples of system design patterns that can be built on top of it.

Systems

Systems MongoDB NoSQL Database

A 5D model to assess your IoT readiness

Cloudera

MAY 9, 2019

The number one challenge that enterprises struggle with their IoT implementation is not being able to measure if they are successful or not with it. Most of the enterprises start an IoT initiative without assessing their potential prior hand to be able to complete it. Even if they complete it, they lack the ability to identify and correlate the success metrics with key business goals.

Manufacturing

Manufacturing Data Ingestion Architecture Data Governance

Building a Serverless Analytics App to Capture and Query Clickstream Data

Rockset

MAY 17, 2019

The best way to answer questions about user behavior is often to gather data. A common pattern is to track user clicks throughout a product, then perform analytical queries on the resulting data, getting a holistic understanding of user behavior. In my case, I was curious to get a pulse of developer preferences on several divisive questions. So, I built a simple survey and gathered tens of thousands of data points from developers on the Internet.

Building

Building SQL NoSQL Data

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Introducing a Cloud-Native Experience for Apache Kafka in Confluent Cloud

Confluent

MAY 13, 2019

In the last year, we’ve experienced enormous growth on Confluent Cloud, our fully managed Apache Kafka ® service. Confluent Cloud now handles several GB/s of traffic—a 200-fold increase in just six months. As Confluent Cloud has grown, we’ve noticed two gaps that very clearly remain to be filled in managed Apache Kafka services. First, all the Kafka services out there still require you to size and provision a cluster, which inevitably leads to a poor developer experience, over-provisioned capaci

Kafka

Kafka Cloud Management Building

Making our Android Studio Apps Reactive with UI Components & Redux

Netflix Tech

MAY 30, 2019

By Juliano Moraes , David Henry , Corey Grunewald & Jim Isaacs Recently Netflix has started building mobile apps to bring technology and innovation to our Studio Physical Productions , the portion of the business responsible for producing our TV shows and movies. Our very first mobile app is called Prodicle and was built for Android & iOS using the same reactive architecture in both platforms, which allowed us to build 2 apps from scratch in 3 months with 4 software engineers.

Architecture

Architecture Coding Software Engineer Software Engineering

How Air France-KLM Group Uses Cross-Channel Analytics to Smoothly Connect Over 100M Passengers

Teradata

MAY 23, 2019

Using Vantage, Air France-KLM Group performs cross-channel analytics of customer data to provide a seamless experience for their passengers.

Data

Back-Pressure Strategy for a Sharded Akka Cluster

Zalando Engineering

MAY 8, 2019

AWS SQS polling from sharded Akka Cluster running on Kubernetes NOTE: This blog post requires the reader to have prior knowledge of AWS SQS , Akka Actors and Akka Cluster Sharding. My last post introduced Akka Cluster Sharding as a Distributed Cache running on Kubernetes. As that Proof-of-concept(PoC) proved promising, we started building a high-throughput and low-latency system based on the gained experiences and learnings.

AWS

AWS Architecture Systems Process

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

Cloud

OCR Algorithm: Improve and Automate Business Processes

InData Labs

MAY 4, 2019

Businesses of mid and large scale have massive amounts of printed documents in daily use. Among them are invoices, receipts, corporate documents, reports, media releases. And millions of them can be handwritten, which makes documents understandable for humans but difficult to read for machines. Basic Concept of OCR Optical character recognition (OCR) algorithms allow computers.

Algorithm

Algorithm Process Media Technology

Case Study: Decore Uses Rockset for Search & Analytics on DynamoDB

Rockset

MAY 6, 2019

Many early adopters of cryptocurrency were individuals at the forefront of this technology, but enterprises are now increasingly getting more involved. As using cryptocurrency for business transactions becomes more commonplace, Decore aims to make accounting as streamlined as possible for companies accepting and sending crypto. Conceived as a “Quickbooks for crypto,” Decore provides accounting solutions for companies that have adopted crypto.

Banking

Banking MySQL AWS Architecture

Scylla and Confluent Integration for IoT Deployments

Confluent

MAY 22, 2019

The internet is not just connecting people around the world. Through the Internet of Things (IoT), it is also connecting humans to the machines all around us and directly connecting machines to other machines. In light of this, we’ll share an emerging machine-to-machine (M2M) architecture pattern in which MQTT, Apache Kafka ® , and Scylla all work together to provide an end-to-end IoT solution.

Kafka

Kafka Google Cloud NoSQL Entertainment

Docker for Data Science: Getting Started & Installing Docker

Advancing Analytics: Data Engineering

MAY 2, 2019

In the last Docker for Data Science blog we looked at where Docker came from and why it is important. In this blog we will get Docker installed and configured on either Windows or Mac. Installing Docker. Below are instructions for installing Docker on both Windows and on Mac. <important>Before we begin, there are a few different methods for installing Docker on Windows and Mac.

Data Science

Data Science Data Accessible Accessibility

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

Business Intelligence

How to Drive Marketing Personalization in an Increasingly Non-Personal World

Teradata

MAY 28, 2019

Tom Casey discusses marketing personalization and why it's important to the modern customer experience.

Understanding Redis Background Memory Usage

Zalando Engineering

MAY 15, 2019

A closer look at how the Linux kernel influences Redis memory management Recently, I was talking to a long-time friend, previous university colleague and former boss, who mentioned the fact that Redis was failing to persist data to disk in low memory conditions. For that reason, he advised to never let a Redis in-memory dataset to be bigger than 50% of the system memory.

Datasets

Datasets Systems Python Coding

Cloudera Data Science Workbench: where innovation meets security, compliance and scale on the road to industrialized AI

Cloudera

MAY 28, 2019

Gartner states that “By 2022, 75% of new end-user solutions leveraging machine learning (ML) and AI techniques will be built with commercial instead of open source platforms” ¹. Spoiler alert: it’s not because data scientists will stop relying on open source for the latest innovation in ML algorithms and development environments. But rather as businesses look to operationalize machine learning capabilities at scale, they’ll turn increasingly to commercial platforms, with connectors to open so

Data Science

Data Science Transportation Machine Learning Algorithm

Using Tableau for Live Dashboards on Event Data

Rockset

MAY 31, 2019

Live dashboards can help organizations make sense of their event data and understand what's happening in their businesses in real time. Marketing managers constantly want to know how many signups there were in the last hour, day, or week. Product managers are always looking to understand which product features are working well and most heavily utilized.

BI Java Data SQL

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

May, 2019

Employing QUIC Protocol to Optimize Uber’s App Performance

Schemas, Contracts, and Compatibility

Webinars

Trending Sources

Engineering a Studio Quality Experience With High-Quality Audio at Netflix

Webinars

5 Myths You Have Been Told About Industrial AI

A Guide to Debugging Apache Airflow® DAGs

Data Lineage For Your Pipelines

Women in Big Data Panel at DataWorks Summit 2019

Case Study: FULL Uses Rockset with DynamoDB for Live Dashboard to Manage Remote Workforce

Sign up to get articles personalized to your interests!

More Trending

Case Study: FULL Uses Rockset with DynamoDB for Live Dashboard to Manage Remote Workforce

Kafka Summit London 2019 Session Videos

Lerner?—?using RL agents for test case scheduling

3 Easy Ways to Turn Data into Actionable Answers

Build Your Data Analytics Like An Engineer With DBT

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Announcing the 2019 Data Impact Awards

Converged Index™: The Secret Sauce Behind Rockset's Fast Queries

The PipelineDB Team Joins Confluent

Android Rx onError Guidelines

Agent Tooling: Connecting AI to Your Tools, Systems & Data

What Is the Biggest Challenge Facing CMOs Today? Building, Measuring, and Maintaining Brand Equity.

Using FoundationDB As The Bedrock For Your Distributed Systems

A 5D model to assess your IoT readiness

Building a Serverless Analytics App to Capture and Query Clickstream Data

How to Modernize Manufacturing Without Losing Control

Introducing a Cloud-Native Experience for Apache Kafka in Confluent Cloud

Making our Android Studio Apps Reactive with UI Components & Redux

How Air France-KLM Group Uses Cross-Channel Analytics to Smoothly Connect Over 100M Passengers

Back-Pressure Strategy for a Sharded Akka Cluster

Optimizing The Modern Developer Experience with Coder

OCR Algorithm: Improve and Automate Business Processes

Case Study: Decore Uses Rockset for Search & Analytics on DynamoDB

Scylla and Confluent Integration for IoT Deployments

Docker for Data Science: Getting Started & Installing Docker

15 Modern Use Cases for Enterprise Business Intelligence

How to Drive Marketing Personalization in an Increasingly Non-Personal World

Understanding Redis Background Memory Usage

Cloudera Data Science Workbench: where innovation meets security, compliance and scale on the road to industrialized AI

Using Tableau for Live Dashboards on Event Data

The Ultimate Guide to Apache Airflow DAGS

Stay Connected