July, 2017

article thumbnail

The Road to uChat: Building Uber’s Internal Chat Solution

Uber Engineering

Two years ago, Uber’s previous chat application began showing signs that it would not be able to adapt to our growth. There were app crashes, performance hiccups, and outages that crippled our company’s ability to effectively communicate online. With user … The post The Road to uChat: Building Uber’s Internal Chat Solution appeared first on Uber Engineering Blog.

article thumbnail

Recap of Hadoop News for June 2017

ProjectPro

News on Hadoop - June 2017 Hadoop Servers Expose Over 5 Petabytes of Data. BleepingComputer.com, June 2, 2017. According to John Matherly, the founder of Shodan, a search engine used for discovering IoT devices found that Hadoop installed improperly configured HDFS based servers exposed over 5 PB of information. He found approximately 4487 HDFS servers available without authentication through public IP addresses that in total exposed 5120 TB of data.The expert said that 47820 MongoDB servers exp

Hadoop 52
article thumbnail

The Purpose of JWT: Stateless Authentication

Zalando Engineering

JSON Web Tokens, or just JWTs (pron. [ˈdʒɒts]), are the new fancy kids around the block when it comes to transporting proofs of identity within an untrusted environment like the Web. In this article, I will describe the true purpose of JWTs. I will compare classical, stateful authentication with modern, stateless authentication. And I will explain why it is important to understand the fundamental difference of both approaches.

article thumbnail

Engineering Uber Predictions in Real Time with ELK

Uber Engineering

Uber’s services rely on the accuracy of our event prediction a n d f o r e c a s t i n g t o o l s. From estimating rider demand on a given date to predicting … The post Engineering Uber Predictions in Real Time with ELK appeared first on Uber Engineering Blog.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Closing the Data-Quality Loop

Zalando Engineering

To be able to measure the quality of some of the machine learning models that we have at Zalando, “Golden Standard” corpora are required. However, creating a “Golden Standard” corpus is often laborious, tedious and time-consuming. Thus, a method is needed to produce high quality validation corpora but without the traditional time and cost inefficiencies.

article thumbnail

Complex Event Generation for Business Process Monitoring using Apache Flink

Zalando Engineering

While developing Zalando’s real-time business process monitoring solution, we encountered the need to generate complex events upon the detection of specific patterns of input events. In this blog post we describe the generation of such events using Apache Flink, and share our experiences and lessons learned in the process. You can read more on why we have chosen Apache Flink over other stream processing frameworks here: Apache Showdown: Flink vs.

Process 40