Sat.Jan 27, 2018 - Fri.Feb 02, 2018

article thumbnail

Dat: Distributed Versioned Data Sharing with Danielle Robinson and Joe Hand - Episode 16

Data Engineering Podcast

Summary Sharing data across multiple computers, particularly when it is large and changing, is a difficult problem to solve. In order to provide a simpler way to distribute and version data sets among collaborators the Dat Project was created. In this episode Danielle Robinson and Joe Hand explain how the project got started, how it functions, and some of the many ways that it can be used.

Data 100
article thumbnail

Recap of Hadoop News for January 2018

ProjectPro

News on Hadoop - Janaury 2018 Apache Hadoop 3.0 goes GA, adds hooks for cloud and GPUs.TechTarget.com, January 3, 2018. The latest update to the 11 year old big data framework Hadoop 3.0 allows cluster pooling on GPU resources , reduces storage requirements, and adds a novel federation scheme that lets YARN resource manager and the job scheduler expand the number of nodes which can run within a Hadoop cluster.

Hadoop 52
article thumbnail

Breaking through the clouds in Asia Pacific

Cloudera

To quote Sam Walton, Walmart’s founder, “There is only one boss. The customer. And he can fire everybody in the company from the chairman on down, simply by spending his money somewhere else”. This very much forms the lens for our focus here at Cloudera Asia Pacific. And it is this unwavering passion and commitment that has driven the team to strive for the very best for our customers and partners, and milestones that we have collectively attained since 2015.

Cloud 40
article thumbnail

Rabbit in the Cloud

Zalando Engineering

How we deployed RabbitMQ on AWS In an effort to move away from our legacy monolithic service, we decided take on the challenge of building a new communication platform based on a micro service architecture, which would be more focused and more easily manageable. The challenge was exciting and big; we had to make crucial decisions early on, decisions that we would have to live with for the foreseeable future.

Cloud 40
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.