March, 2017

article thumbnail

ScyllaDB with Eyal Gutkind - Episode 4

Data Engineering Podcast

Summary If you like the features of Cassandra DB but wish it ran faster with fewer resources then ScyllaDB is the answer you have been looking for. In this episode Eyal Gutkind explains how Scylla was created and how it differentiates itself in the crowded database market. Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch

Database 100
article thumbnail

Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop

Uber Engineering

With the evolution of storage formats like Apache Parquet and Apache ORC and query engines like Presto and Apache Impala , the Hadoop ecosystem has the potential to become a general-purpose, unified serving layer for workloads that can tolerate latencies … The post Hudi: Uber Engineering’s Incremental Processing Framework on Apache Hadoop appeared first on Uber Engineering Blog.

Hadoop 94
article thumbnail

Deep Learning in Production for Predicting Consumer Behavior

Zalando Engineering

At Zalando adtech lab in Hamburg, machine learning drives many of our production systems to build great user experiences. Our most recent product requires precise estimates of future interests of Zalando consumers based on their history of interacting with the fashion platform. For example, we want to predict a consumer's interest in ordering selected fashion articles.

article thumbnail

Recap of Hadoop News for February 2017

ProjectPro

News on Hadoop-February 2017 Big data brings breast cancer research forwards by 'decades'. ScienceDaily.com, February 1, 2017. Researchers analysed data of more than 28000 different genes and millions of images of 300,000 breast cancer cells and found that any cell shape changes caused by physical pressures on the tumours are converted into gene activity.

Hadoop 40
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Defining Data Engineering with Maxime Beauchemin - Episode 3

Data Engineering Podcast

Summary What exactly is data engineering? How has it evolved in recent years and where is it going? How do you get started in the field? In this episode, Maxime Beauchemin joins me to discuss these questions and more. Transcript provided by CastSource Preamble Hello and welcome to the Data Engineering Podcast, the show about modern data infrastructure Go to dataengineeringpodcast.com to subscribe to the show, sign up for the newsletter, read the show notes, and get in touch.

article thumbnail

Linting and ESLint: Write Better Code

Zalando Engineering

Since joining Zalando, I have had the opportunity to dive into some open source projects like ESLint , a pluggable JavaScript linter. Here is my take on what ESLint is, a brief description of linting in general, and why it is so important. What is linting? Generally speaking, linting is a tool for static code analysis and therefore part of white-box testing.

Coding 52

More Trending

article thumbnail

HMM PySpark Implementation: A Zalando Hack Week Project

Zalando Engineering

Every year, Zalando’s Hack Week gives us the opportunity to join together in cross-disciplinary teams to solve a wide variety of problems (you can check this year’s amazing winners here ). The projects come from any point of the organization and we are encouraged to band together with other employees across locations and business units. For our 2016 edition of Hack Week, we implemented a PySpark version of Hidden Markov Model (HMM).

Project 40
article thumbnail

Practical Challenges For RxJava Learners

Zalando Engineering

RxJava is a valuable part of the Java developer toolset and the number one language improvement framework for Android developers. Many of us want to learn it better, read some blogs and sources, but often miss practice to consolidate collected knowledge. See below for how you can challenge yourself with coding tasks and improve your practical RxJava skills.

Java 40
article thumbnail

One-click Deployments for iOS Apps using Xcode 8 and More

Zalando Engineering

The macOS Server 5.2 is a new fruit. It was released (almost) in parallel with Xcode 8 and might come as no surprise that it is the minimum required version by Xcode 8, which also spans new territory. Most importantly, it’s the name change. Say goodbye to OS X Server as now you have macOS Server. But changes go beyond that: While not being mentioned in the changelog, the good old “_xcsbuildd” user is now gone.