Top Data Engineering Digest Structured Data Unstructured Data Content for Week of Oct 27

Sat.Oct 27, 2018 - Fri.Nov 02, 2018

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Uber Engineering

OCTOBER 30, 2018

Cluster management, a common software infrastructure among technology companies, aggregates compute resources from a collection of physical hosts into a shared resource pool, amplifying compute power and allowing for the flexible use of data center hardware. At Uber, cluster management … The post Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads appeared first on Uber Engineering Blog.

Engineering

Engineering Management Technology Hadoop

Using Notebooks As The Unifying Layer For Data Roles At Netflix with Matthew Seal - Episode 54

Data Engineering Podcast

OCTOBER 28, 2018

Summary Jupyter notebooks have gained popularity among data scientists as an easy way to do exploratory analysis and build interactive reports. However, this can cause difficulties when trying to move the work of the data scientist into a more standard production environment, due to the translation efforts that are necessary. At Netflix they had the crazy idea that perhaps that last step isn’t necessary, and the production workflows can just run the notebooks directly.

Scala

Scala Python Data Engineer Data Engineering

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Netflix MediaDatabase?—?Media Timeline Data Model

Netflix Tech

OCTOBER 31, 2018

Netflix Media Database?—?the Media Timeline Data Model In the previous post in this series, we described some important Netflix business needs as well as traits of the media data system?—?called “ N etflix M edia D ata B ase” (NMDB) that is used to address them. The curious reader might have noticed that a majority of these characteristics relate to properties of the data managed by NMDB.

Media

Media Metadata Data MongoDB

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

#NoEstimates

Zalando Engineering

OCTOBER 31, 2018

Why I advocate a practice of no estimates as a software engineer Before I get to the topic, I would like to clarify one thing: I don’t want to ban estimations generally from software development, as there are good and solid reasons for it. In a nutshell, business needs to be predictable. I want to show a software developer's view on how to reduce or even get rid of endless estimations meetings with doubtful outcomes.

Software Engineer

Software Engineer Software Engineering Coding Building

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Uber Engineering

OCTOBER 30, 2018

Engineering

Engineering Management Technology Hadoop

Doing a 180 on Customer 360 – The Preferred Path to Customer Insights

Cloudera

OCTOBER 30, 2018

451 Research Analyst Sheryl Kingstone, and Cloudera’s Steve Totman recently discussed how a growing number of organizations are replacing legacy Customer 360 systems with Customer Insights Platforms ( watch the replay here ). In this blog post, Sheryl outlines how next-gen CIP applications are delivering a better customer experience, and why businesses are relying on CIPs as their preferred path to customer insights.

Unstructured Data

Unstructured Data Data Lake Algorithm Machine Learning

Why SQL on Raw Data?

Rockset

NOVEMBER 1, 2018

Over a decade after the inception of the Hadoop project, the amount of unstructured data available to modern applications continues to increase. Moreover, despite forecasts to the contrary, SQL remains the lingua franca of data processing; today's NoSQL and Big Data infrastructure platform usage often involves some form of SQL-based querying. This longevity is a testament to the community of analysts and data practitioners who are familiar with SQL as well as the mature ecosystem of tools around

Raw Data

Raw Data SQL Unstructured Data NoSQL

More Trending

Why SQL on Raw Data?

Rockset

NOVEMBER 1, 2018

Raw Data

Raw Data SQL Unstructured Data NoSQL

Dynamic Typing in SQL

Rockset

NOVEMBER 1, 2018

As Peter Bailis put it in his post , querying unstructured data using SQL is a painful process. Moreover, developers frequently prefer dynamic programming languages, so interacting with the strict type system of SQL is a barrier. We at Rockset have built the first schemaless SQL data platform. In this post and a few others that follow, we'd like to introduce you to our approach.

SQL

SQL NoSQL Programming Language Bytes

Cloud Native: What It Means in the Data World

Rockset

OCTOBER 30, 2018

Prior to Rockset, I spent eight years at Facebook building out their big data infrastructure and online data infrastructure. All the software we wrote was deployed in Facebook's private data centers, so it was not till I started building on the public cloud that I fully appreciated its true potential. Facebook may be the very definition of a web-scale company, but getting hardware still required huge lead times and extensive capacity planning.

Cloud

Cloud IT MongoDB Hadoop

Federated Learning: Machine Learning with Privacy on the Edge

Cloudera

OCTOBER 30, 2018

Federated Learning is a technology that allows you to build machine learning systems when your datacenter can’t get direct access to model training data. The data remains in its original location, which helps to ensure privacy and reduces communication costs. Privacy and reduced communication make federated learning a great fit for smartphones and edge hardware, healthcare and other privacy-sensitive use cases, and industrial applications such as predictive maintenance.

Machine Learning

Machine Learning Healthcare Manufacturing Accessible

Sat.Oct 27, 2018 - Fri.Nov 02, 2018

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Using Notebooks As The Unifying Layer For Data Roles At Netflix with Matthew Seal - Episode 54

Webinars

Trending Sources

Netflix MediaDatabase?—?Media Timeline Data Model

Webinars

#NoEstimates

A Guide to Debugging Apache Airflow® DAGs

Peloton: Uber’s Unified Resource Scheduler for Diverse Cluster Workloads

Doing a 180 on Customer 360 – The Preferred Path to Customer Insights

Why SQL on Raw Data?

Sign up to get articles personalized to your interests!

More Trending

Why SQL on Raw Data?

Dynamic Typing in SQL

Cloud Native: What It Means in the Data World

Federated Learning: Machine Learning with Privacy on the Edge

Stay Connected