January, 2018

article thumbnail

Functional Data Engineering — a modern paradigm for batch data processing

Maxime Beauchemin

Batch data processing  — historically known as ETL —  is extremely challenging. It’s time-consuming, brittle, and often unrewarding. Not only that, it’s hard to operate, evolve, and troubleshoot. In this post, we’ll explore how applying the functional programming paradigm to data engineering can bring a lot of clarity to the process. This post distills fragments of wisdom accumulated while working at Yahoo, Facebook, Airbnb and Lyft, with the perspective of well over a decade of data warehousing

article thumbnail

Dat: Distributed Versioned Data Sharing with Danielle Robinson and Joe Hand - Episode 16

Data Engineering Podcast

Summary Sharing data across multiple computers, particularly when it is large and changing, is a difficult problem to solve. In order to provide a simpler way to distribute and version data sets among collaborators the Dat Project was created. In this episode Danielle Robinson and Joe Hand explain how the project got started, how it functions, and some of the many ways that it can be used.

Data 100
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Do These Things if you Want to Succeed as an HR Professional

U-Next

Success in today’s businesses has taken several meanings. Apart from just pay hikes and promotions, success has gotten new dimensions that have been of very recent origins. Today, success has become synonymous with happiness at a workplace, challenging tasks, compensatory rewards, incentives, authoritative job profiles, influential role, and more. The current talent pools in organizations have become wiser and more mature than their previous generation counterparts.

article thumbnail

Postgres Internals: Building a Description Tool

Dataquest

In previous blog posts , we have described the Postgres database and ways to interact with it using Python. Those posts provided the basics, but if you want to work with databases in production systems, then it is necessary to know how to make your queries faster and more efficient. To understand what efficiency means in Postgres, it’s important to learn how Postgres works under the hood.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

The Faces Behind the Fashion-MNIST

Zalando Engineering

We talk to Han and Kashif from Zalando Research Employer Branding Specialist Data Science, Nana Yamazaki catches up with the team using literal fashion icons in Deep Learning. Tell us about Fashion-MNIST. What did you want to accomplish? Fashion-MNIST is a freely available dataset of Zalando articles that most importantly has the same format as the MNIST dataset.

article thumbnail

Recap of Hadoop News for December 2017

ProjectPro

News on Hadoop - December 2017 Apache Impala gets top-level status as open source Hadoop tool.TechTarget.com, December 1, 2017. The massively parallel processing engine born at Cloudera acquired the status of a top-level project within the Apache Foundation. The main objective of Impala is to provide SQL-like interactivity to big data analytics just like other big data tools - Hive, Spark SQL, Drill, HAWQ , Presto and others.

Hadoop 52

More Trending

article thumbnail

CRDTs and Distributed Consensus with Christopher Meiklejohn - Episode 14

Data Engineering Podcast

Summary As we scale our systems to handle larger volumes of data, geographically distributed users, and varied data sources the requirement to distribute the computational resources for managing that information becomes more pronounced. In order to ensure that all of the distributed nodes in our systems agree with each other we need to build mechanisms to properly handle replication of data and conflict resolution.

article thumbnail

Citus Data: Distributed PostGreSQL for Big Data with Ozgun Erdogan and Craig Kerstiens - Episode 13

Data Engineering Podcast

Summary PostGreSQL has become one of the most popular and widely used databases, and for good reason. The level of extensibility that it supports has allowed it to be used in virtually every environment. At Citus Data they have built an extension to support running it in a distributed fashion across large volumes of data with parallelized queries for improved performance.

article thumbnail

The Top 10 Most Popular VISION Blogs of 2017

Cloudera

The New Year is a great time to make resolutions, but it’s also a great time to reflect on the previous year. Before we get too far into 2018, let’s take a look at the ten most popular Cloudera VISION blogs from 2017. Today is an important day in the life of Cloudera. On April 28, 2017, Mike Olson , as one of the founders of Cloudera, writes about the initial public offering, and what the milestone means.

article thumbnail

The three certainties in life: death, taxes and GDPR

Cloudera

As the GDPR clock ticks down to implementation, it is clear that this will not be a non-event like the Millennium Bug – it will happen and there will be dire consequences, potentially company-closures, in the event of non-compliance. The three certainties in life: death, taxes and GDPR. 1999 was a milestone year for the development of technology.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Cybersecurity On Call: Goodbye 2017, Hello 2018! Top Five Tips from 2017

Cloudera

This was an amazing year for our inaugural “Cybersecurity On Call” season. It was truly an honor hosting amazing guests as we explored the world of cybersecurity. From industry thought leaders, to New York Times best sellers, to hackers, I learned a ton about the future of cybersecurity and I hope you did as well. Today’s episode won’t be our usual programming, today is our end of the year special where we will dive into our top five tips from this year’s season.

article thumbnail

Breaking through the clouds in Asia Pacific

Cloudera

To quote Sam Walton, Walmart’s founder, “There is only one boss. The customer. And he can fire everybody in the company from the chairman on down, simply by spending his money somewhere else”. This very much forms the lens for our focus here at Cloudera Asia Pacific. And it is this unwavering passion and commitment that has driven the team to strive for the very best for our customers and partners, and milestones that we have collectively attained since 2015.

Cloud 40
article thumbnail

Six Strategies for Advancing Customer Knowledge: Bringing Data Together

Cloudera

I often meet with our customers to help them understand how to connect modern technology to business success. The ever-present question at these encounters is “Where do I start?” For them, they may understand that they need a data-driven strategy or the culture may aim to take a shift to being guided by data. These are often goals set by the executive team with little guidance on how to execute or implement.

article thumbnail

Staffing your big data team

Cloudera

Building the right team is as important as assembling the right IT infrastructure – and the needs differ just as dramatically. A traditional BI and analytics organization consists of three main groups: Analysts that develop reports often using sample data. The data management team – modelers that take requests, find data, and develop models to answer the questions.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Rabbit in the Cloud

Zalando Engineering

How we deployed RabbitMQ on AWS In an effort to move away from our legacy monolithic service, we decided take on the challenge of building a new communication platform based on a micro service architecture, which would be more focused and more easily manageable. The challenge was exciting and big; we had to make crucial decisions early on, decisions that we would have to live with for the foreseeable future.

Cloud 40
article thumbnail

Building a Better Tech Radar

Zalando Engineering

How Zalando helps its engineering teams navigate the tech landscape Zalando has more than 200 engineering teams, which regularly face tricky technology choices. To help them make good decisions, we created the Zalando Tech Radar as a "navigation" tool. Inspired by ThoughtWorks , it assigns each technology to one of four rings — Adopt, Trial, Assess and Hold — which represents the current consensus within Zalando.

article thumbnail

Simplicity by Distributing Complexity

Zalando Engineering

Building an aggregated view of data in the event-driven microservice architecture In the world of microservices, where a domain model gets decomposed into related, but independently handled entities, we often face the challenge of building an aggregate view of the data that brings together different parts of that model. While this can already be interesting with “traditional” designs, the move to event-driven architectures can magnify these difficulties, especially with simplistic event streams.

Media 40
article thumbnail

Why We Do Scala in Zalando

Zalando Engineering

Leveraging the full power of a functional programming language In Zalando Dublin, you will find that most engineering teams are writing their applications using Scala. We will try to explain why that is the case and the reasons we love Scala. This content is coming both from my own experience and the team I'm working with in building the new Zalando Customer Data Platform.

Scala 40
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Rock Solid Kafka and ZooKeeper Ops on AWS

Zalando Engineering

Reducing ops effort while maintaining Kafka and Zookeeper This post is targeted to those looking for ways to reduce ops effort while maintaining Kafka and Zookeeper deployments on AWS and also improving their availability and stability. In a nutshell, we are going to explain how using Elastic Network Interfaces can improve over a straight out of the box setup.

Kafka 40
article thumbnail

Snorkel: Extracting Value From Dark Data with Alex Ratner - Episode 15

Data Engineering Podcast

Summary The majority of the conversation around machine learning and big data pertains to well-structured and cleaned data sets. Unfortunately, that is just a small percentage of the information that is available, so the rest of the sources of knowledge in a company are housed in so-called “Dark Data” sets. In this episode Alex Ratner explains how the work that he and his fellow researchers are doing on Snorkel can be used to extract value by leveraging labeling functions written by

article thumbnail

Drawn Together

Zalando Engineering

How to talk about design in the agile world How we improved design communication in the Retail Ops Team With an agile and lean approach, most of us here at Zalando changed the way we build digital products. Design processes also evolved,  with  designers usually working alongside cross-functional product teams. But, at first, one thing did not change too much: how we talk about the design.