February, 2018

article thumbnail

Honeycomb Data Infrastructure with Sam Stokes - Episode 20

Data Engineering Podcast

Summary One of the sources of data that often gets overlooked is the systems that we use to run our businesses. This data is not used to directly provide value to customers or understand the functioning of the business, but it is still a critical component of a successful system. Sam Stokes is an engineer at Honeycomb where he helps to build a platform that is able to capture all of the events and context that occur in our production environments and use them to answer all of your questions abou

Kafka 100
article thumbnail

Building Reliable Reprocessing and Dead Letter Queues with Apache Kafka

Uber Engineering

In distributed systems, retries are inevitable. From network errors to replication issues and even outages in downstream dependencies, services operating at a massive scale must be prepared to encounter, identify, and handle failure as gracefully as possible. Given the scope … The post Building Reliable Reprocessing and Dead Letter Queues with Apache Kafka appeared first on Uber Engineering Blog.

Kafka 109
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Zalando @ FOSDEM

Zalando Engineering

Why FOSDEM is not your average conference I could get cheeky with semantics and point out that the “M” in FOSDEM stands for “Meeting”. But I’ll play nice and focus instead on the specifics of the event itself. FOSDEM has been running since 2001. In that time, it has grown to become the open source community event for Europe. Over a two-day event, thousands of attendees descend upon the ULB in Brussels to attend what is, in reality, a collection of conferences.

article thumbnail

Concurrency, MySQL and Node.js: A journey of discovery

nodeSWAT

Our story begins like so many others with a code loving protagonist — someone we all can relate to. His days are largely filled with designing code, writing code and reading about code — keeping clients happy while learning and having fun. This has been going on for years now with both MySQL and Node.js among others and as such our protagonist considers himself quite proficient with both those technologies.

MySQL 52
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Recap of Hadoop News for January 2018

ProjectPro

News on Hadoop - Janaury 2018 Apache Hadoop 3.0 goes GA, adds hooks for cloud and GPUs.TechTarget.com, January 3, 2018. The latest update to the 11 year old big data framework Hadoop 3.0 allows cluster pooling on GPU resources , reduces storage requirements, and adds a novel federation scheme that lets YARN resource manager and the job scheduler expand the number of nodes which can run within a Hadoop cluster.

Hadoop 52
article thumbnail

Breaking down data silos: when SAP alone is not enough

Cloudera

Running a large company is impossible without having an ERP system in place, and SAP business software remains at the forefront in this category. But when companies are looking towards new technologies such as data lakes, machine learning or predictive analytics, SAP alone is just not enough. To keep up with tech trends, businesses have to face the challenges of integrating SAP with non-SAP technologies and embark on a crusade against data silos.

More Trending

article thumbnail

Code Migration in Production: Rewriting the Sharding Layer of Uber’s Schemaless Datastore

Uber Engineering

In 2014, Uber Engineering built Schemaless , our fault-tolerant and scalable datastore, to facilitate the rapid growth of our company. For context, we deployed more than 40 Schemaless instances and many thousands of storage nodes in 2016 alone. As our … The post Code Migration in Production: Rewriting the Sharding Layer of Uber’s Schemaless Datastore appeared first on Uber Engineering Blog.

Coding 92
article thumbnail

Data Analysis with Spark

Zalando Engineering

Apache’s lightning fast engine for data analysis and machine learning In recent years, there has been a massive shift in the industry towards data-oriented decision making backed by enormously large data sets. This means that we can serve our customers with more relevant, personalized content. We in the Digital Experience team are tasked with analysing Big Data in order to gather insights and support the product team with the decision making process.

article thumbnail

TimescaleDB: Fast And Scalable Timeseries with Ajay Kulkarni and Mike Freedman - Episode 18

Data Engineering Podcast

Summary As communications between machines become more commonplace the need to store the generated data in a time-oriented manner increases. The market for timeseries data stores has many contenders, but they are not all built to solve the same problems or to scale in the same manner. In this episode the founders of TimescaleDB, Ajay Kulkarni and Mike Freedman, discuss how Timescale was started, the problems that it solves, and how it works under the covers.

article thumbnail

Pulsar: Fast And Scalable Messaging with Rajan Dhabalia and Matteo Merli - Episode 17

Data Engineering Podcast

Summary One of the critical components for modern data infrastructure is a scalable and reliable messaging system. Publish-subscribe systems have been popular for many years, and recently stream oriented systems such as Kafka have been rising in prominence. This week Rajan Dhabalia and Matteo Merli discuss the work they have done on Pulsar, which supports both options, in addition to being globally scalable and fast.

Kafka 100
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Pulsar: Fast And Scalable Messaging with Rajan Dhabalia and Matteo Merli - Episode 17

Data Engineering Podcast

Summary One of the critical components for modern data infrastructure is a scalable and reliable messaging system. Publish-subscribe systems have been popular for many years, and recently stream oriented systems such as Kafka have been rising in prominence. This week Rajan Dhabalia and Matteo Merli discuss the work they have done on Pulsar, which supports both options, in addition to being globally scalable and fast.

Kafka 100
article thumbnail

2017 – Another Award-Winning Year for Cloudera!

Cloudera

In many ways, 2017 was a singular year for Cloudera, not least because we staged a successful IPO and joined the ranks of the world’s fastest-growing, publicly traded companies. We deeply appreciate the vote of confidence and trust our customers have placed in us and are proud of the hard work of our 1,600-plus employees. These are some of the year’s highlights.

article thumbnail

Cloudera on Cloudera: Our Journey to Becoming more Data-driven

Cloudera

I’ve spent the last four years here at Cloudera talking with our customers about how to run their businesses better using their data and Cloudera’s products and services. Now I get to put my money where my mouth is – and turn my focus internally on how we at Cloudera can become more data-driven. We aspire to and are on the journey to be the best-run company on data, and to be our own best reference.

article thumbnail

Cybersecurity on Call: Nation-State Cyber Operations with Patrick Tucker

Cloudera

As cyber attacks continue to increase across the world, it has become more critical for countries to implement cyber operations from a defensive and offensive perspective to protect national secrets and their citizens. An Arizona State University research paper showed just how global this problem is when they discovered that if hackers discussed a zero-day exploit on the dark web in Chinese the likelihood of a hacker exploiting the vulnerability was 9%.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Dave Shuman Talks IoT and Big Data on Federal News Radio

Cloudera

What exactly can we expect for IoT in 2018, and how can you improve your organization with connected devices? That was the question Dave Shuman set out to answer when he sat down last month with John Gilroy at the Federal News Radio headquarters in Washington, D.C. Federal Tech Talk looks at the world of high technology in the federal government and, as its host, John speaks the language of federal CISOs, CIOs, and CTOs.

article thumbnail

Innovation in Digital Experience

Zalando Engineering

Multi-functional teams make for a greater customer journey When I started in Zalando Tech, I hadn’t worked with a product manager before, and I had probably never seen a UX designer, a UI designer, a researcher or a business developer before either. My world was data science, more specifically, personalization and recommender systems. In this isolated bubble, data scientists often thought we could solve all problems without help, but in the last two years, I came to understand why we need to sto

article thumbnail

Cross-Lingual End-to-End Product Search with Deep Learning

Zalando Engineering

How We Built the Next Generation Product Search from Scratch using a Deep Neural Network Product search is one of the key components in an online retail store. A good product search can understand a user’s query in any language, retrieve as many relevant products as possible, and finally present the results as a list in which the preferred products should be at the top, and the less relevant products should be at the bottom.

article thumbnail

Crushing AVRO Small Files with Spark

Zalando Engineering

Solving the many small files problem for AVRO The Fashion Content Platform teams in Zalando Dublin handle large amounts of data on a daily basis. To make sense of it all, we utilise Hadoop (EMR) on AWS. Within this post, we discuss a system where a real-time system feeds the data. Due to the variance in data volumes and the period that these systems write to storage, there can be a large number of small files.

Hadoop 40
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Five Minutes from Machine Learning to RESTful API

Zalando Engineering

The benefits of Connexion: Zalando’s open source API-First framework In this article, I will show how quick and simple it can be to create a RESTful API for a machine learning model using Zalando’s open source Swagger/OpenAPI First framework called Connexion. Official documentation describes Connexion as the following: “Connexion is a framework on top of Flask that automagically handles HTTP requests based on OpenAPI 2.0 Specification (formerly known as Swagger Spec) of your API described in YAM