Sat.Dec 01, 2018 - Fri.Dec 07, 2018

article thumbnail

Apache Zookeeper As A Building Block For Distributed Systems with Patrick Hunt - Episode 59

Data Engineering Podcast

Summary Distributed systems are complex to build and operate, and there are certain primitives that are common to a majority of them. Rather then re-implement the same capabilities every time, many projects build on top of Apache Zookeeper. In this episode Patrick Hunt explains how the Apache Zookeeper project was started, how it functions, and how it is used as a building block for other distributed systems.

Systems 100
article thumbnail

Cache warming: Agility for a stateful service

Netflix Tech

by Deva Jayaraman , Shashi Madappa , Sridhar Enugula , and Ioannis Papapanagiotou EVCache has been a fundamental part of the Netflix platform (we call it Tier-1), holding Petabytes of data. Our caching layer serves multiple use cases from signup, personalization, searching, playback, and more. It is comprised of thousands of nodes in production and hundreds of clusters all of which must routinely scale up due to the increasing growth of our members.

AWS 53
article thumbnail

One Audio Sequencer to Rule Them All

Pandora Engineering

Photo credit: Carol Yepes Last month Pandora announced a public podcast beta in conjunction with the Podcast Genome Project. This rollout introduced many exciting features to our current mobile application offerings, including fully integrated and native podcast support. Ironically, one of the most interesting features and perhaps our biggest engineering win with this iteration is something that’s transparent to our end users: the inclusion of a new audio playback sequencer used exclusively for

Media 52
article thumbnail

Open Source: November Review - Maintainer training, new releases and more

Zalando Engineering

Project Highlights ExternalDNS version 0.5.9 is ready for testing. This project allows you to control DNS records dynamically via Kubernetes resources in a DNS provider-agnostic way. ExternalDNS also successfully made its way to the Kubernetes Incubator. Check out the list of changes in this new release. Zalando-Incubator welcomed two brand new open source projects 1) Darty - a data dependency manager for data science projects.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Announcing my session at #SQLBits - Azure Databricks

Advancing Analytics: Data Engineering

Simon Whiteley and I will be back at #SQLBits 2019 talking about hashtag#DataEngineering and #DataScience in Databricks. We will look at #ApacheSpark #Python #Engineering & #MachineLearning in this full day training day. Register Now Have you looked at Azure DataBricks yet? No! Then you need to. Why you ask, there are many reasons. The number 1, knowing how to use Apache Spark will earn you more money.

article thumbnail

Running SQL on Nested JSON

Rockset

When we surveyed the market, we saw the need for a solution that could perform fast SQL queries on fluid JSON data , including arrays and nested objects: Best architecture to convert JSON to SQL? What are the ways to run SQL on JSON data without predefining schemas? I need database to take JSON and execute SQL. What are my options? The Challenge of SQL on JSON Some form of ETL to transform JSON to tables in SQL databases may be workable for basic JSON data with fixed fields that are known up fro

SQL 40