Top Data Engineering Digest Data Engineer Data Engineering Content for Week of Apr 27

Sat.Apr 27, 2019 - Fri.May 03, 2019

Running Your Database On Kubernetes With KubeDB

Data Engineering Podcast

APRIL 28, 2019

Summary Kubernetes is a driving force in the renaissance around deploying and running applications. However, managing the database layer is still a separate concern. The KubeDB project was created as a way of providing a simple mechanism for running your storage system in the same platform as your application. In this episode Tamal Saha explains how the KubeDB project got started, why you might want to run your database with Kubernetes, and how to get started.

Database

Database PostgreSQL MongoDB MySQL

Python at Netflix

Netflix Tech

APRIL 29, 2019

By Pythonistas at Netflix, coordinated by Amjith Ramanujam and edited by Ellen Livengood As many of us prepare to go to PyCon, we wanted to share a sampling of how Python is used at Netflix. We use Python through the full content lifecycle, from deciding which content to fund all the way to operating the CDN that serves the final video to 148 million members.

Python

Python Amazon Web Services Machine Learning Algorithm

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

MORE WEBINARS

Trending Sources

What Is the Biggest Challenge Facing CMOs Today? Building, Measuring, and Maintaining Brand Equity.

Teradata

MAY 1, 2019

Teradata CMO Martyn Etherington discusses how brands can build, measure, and maintain brand equity. He also explains why customer experience is critical to a brand's success.

Building

Webinars

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

MORE WEBINARS

Optimizing Kafka Streams Applications

Confluent

APRIL 30, 2019

With the release of Apache Kafka ® 2.1.0, Kafka Streams introduced the processor topology optimization framework at the Kafka Streams DSL layer. This framework opens the door for various optimization techniques from the existing data stream management system (DSMS) and data stream processing literature. In what follows, we provide some context around how a processor topology was generated inside Kafka Streams before 2.1, with a focus on stateful operations like aggregations and joins.

Kafka

Kafka Coding Process Bytes

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

Business Intelligence

Docker for Data Science: Getting Started & Installing Docker

Advancing Analytics: Data Engineering

MAY 2, 2019

In the last Docker for Data Science blog we looked at where Docker came from and why it is important. In this blog we will get Docker installed and configured on either Windows or Mac. Installing Docker. Below are instructions for installing Docker on both Windows and on Mac. <important>Before we begin, there are a few different methods for installing Docker on Windows and Mac.

Data Science

Data Science Data Accessible Accessibility

Engineering a Studio Quality Experience With High-Quality Audio at Netflix

Netflix Tech

MAY 1, 2019

by Guillaume du Pontavice, Phill Williams and Kylee Peña (on behalf of our Streaming Algorithms, Audio Algorithms, and Creative Technologies teams) Remember the epic opening sequence of Stranger Things 2 ? The thrill of that car chase through Pittsburgh not only introduced a whole new set of mysteries, but it returned us to a beloved and dangerous world alongside Dustin, Lucas, Mike, Will and Eleven.

Engineering

Engineering Algorithm Media Entertainment

Why is a Real Time Interaction Manager (RTIM) Essential to Providing a Superior Customer Experience?

Teradata

MAY 2, 2019

Ritu Jain explains the value of the Teradata Real Time Interaction Manager (RTIM) and why personalized customer experiences are so critical for marketers.

Management

More Trending

Why is a Real Time Interaction Manager (RTIM) Essential to Providing a Superior Customer Experience?

Teradata

MAY 2, 2019

Ritu Jain explains the value of the Teradata Real Time Interaction Manager (RTIM) and why personalized customer experiences are so critical for marketers.

Management

Dawn of DevOps: Managing Apache Kafka Clusters at Scale with Confluent Control Center

Confluent

MAY 2, 2019

When managing Apache Kafka ® clusters at scale, tasks that are simple on small clusters turn into significant burdens. To be fair, a lot of things turn into significant burdens at scale, and it’s Confluent Control Center’s job to ease as many of them as possible. In Confluent Platform 5.2, Control Center has grown a couple of new features that make large deployments a little more pleasant to manage: It has become much better at managing configuration changes among a large number of brokers, and

Kafka

Kafka Management Food Consulting

Analytics on DynamoDB: Comparing Elasticsearch, Athena and Spark

Rockset

APRIL 29, 2019

In this blog post I compare options for real-time analytics on DynamoDB - Elasticsearch , Athena, and Spark - in terms of ease of setup, maintenance, query capability, latency. There is limited support for SQL analytics with some of these options. I also evaluate which use cases each of them are best suited for. Developers often have a need to serve fast analytical queries over data in Amazon DynamoDB.

NoSQL

NoSQL PostgreSQL AWS SQL

Android Rx onError Guidelines

Netflix Tech

MAY 1, 2019

By Ed Ballot “Creating a good API is hard.”?—? anyone who has created an API used by others As with any API, wrapping your data stream in a Rx observable requires consideration for reasonable error handling and intuitive behavior. The following guidelines are intended to help developers create consistent and intuitive API. Since we frequently create Rx Observables in our Android app, we needed a common understanding of when to use onNext() and when to use onError() to make the API more consisten

Database

Database Coding Systems Building

How to Manage Stakeholder Requests in Big Organizations

Zalando Engineering

MAY 2, 2019

An important factor of success in agile environment is that team works well together. It is also important for a software engineer to be able to focus for longer periods of time with limited interruptions. Many companies have solved the challenge of focus and dedication for the team by having a designated role, such as Scrum Master or Producer, who is responsible for managing stakeholder requests, prioritizing them and communicating to the development team.

Management

Management Software Engineering Software Engineer Machine Learning

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

Data

Dawn of Kafka DevOps: Managing Kafka Clusters at Scale with Confluent Control Center

Confluent

MAY 2, 2019

Kafka

Kafka Management Food Consulting

How We Structure our dbt Projects

dbt Developer Hub

APRIL 30, 2019

As the maintainers of dbt, and analytics consultants, at Fishtown Analytics (now dbt Labs) we build a lot of dbt projects. Over time, we’ve developed internal conventions on how we structure them. This article does not seek to instruct you on how to design a final model for your stakeholders — it won’t cover whether you should denormalize everything into one wide master table , or have many tables that need to be joined together in the BI layer.

Project

Project Database-centric Raw Data Data Warehouse

Secondary Indexes For Analytics On DynamoDB

Rockset

APRIL 29, 2019

In this post I explore how to support analytical queries without encountering prohibitive scan costs, by leveraging secondary indexes in DynamoDB. I also evaluate the pros and cons of this approach in contrast to extracting data to another system like Athena, Spark or Elastic. Rockset recently added support for DynamoDB - which basically means you can run fast SQL on DynamoDB tables without any ETL.

NoSQL

NoSQL SQL AWS Systems

Women in Big Data Panel at DataWorks Summit 2019

Cloudera

MAY 2, 2019

Last month, I moderated The Women in Big Data panel hosted by DataWorks Summit and sponsored by Women in Big Data. This was a well-attended event with five amazing guest speakers – Hilary Mason , Tina Rosario , Violeta Ciurel , Ana Gillan and Devon Edwards Joseph. The theme for the discussion was “Top technology trends women and men business leaders need to be aware of”.

Big Data

Big Data Data Science Healthcare Technology

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

Manufacturing

The PipelineDB Team Joins Confluent

Confluent

MAY 1, 2019

Some years ago, when I was at LinkedIn, I didn’t really know what Apache Kafka ® would become but had an inkling that the next generation of applications would not be islands disconnected from one another, or lashed together with irregular, point-to-point bindings. When we founded Confluent, we took the radical approach of viewing data—and the infrastructure that supported it—as a series of real-time streaming events rather than something kept in static, sedentary data repositories.

Kafka

Kafka Datasets Database Technology

Sat.Apr 27, 2019 - Fri.May 03, 2019

Running Your Database On Kubernetes With KubeDB

Python at Netflix

Webinars

Trending Sources

What Is the Biggest Challenge Facing CMOs Today? Building, Measuring, and Maintaining Brand Equity.

Webinars

Optimizing Kafka Streams Applications

15 Modern Use Cases for Enterprise Business Intelligence

Docker for Data Science: Getting Started & Installing Docker

Engineering a Studio Quality Experience With High-Quality Audio at Netflix

Why is a Real Time Interaction Manager (RTIM) Essential to Providing a Superior Customer Experience?

Sign up to get articles personalized to your interests!

More Trending

Why is a Real Time Interaction Manager (RTIM) Essential to Providing a Superior Customer Experience?

Dawn of DevOps: Managing Apache Kafka Clusters at Scale with Confluent Control Center

Analytics on DynamoDB: Comparing Elasticsearch, Athena and Spark

Android Rx onError Guidelines

How to Manage Stakeholder Requests in Big Organizations

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Dawn of Kafka DevOps: Managing Kafka Clusters at Scale with Confluent Control Center

How We Structure our dbt Projects

Secondary Indexes For Analytics On DynamoDB

Women in Big Data Panel at DataWorks Summit 2019

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

The PipelineDB Team Joins Confluent

Stay Connected