Sat.Jan 06, 2018 - Fri.Jan 12, 2018

article thumbnail

Functional Data Engineering — a modern paradigm for batch data processing

Maxime Beauchemin

Batch data processing  — historically known as ETL —  is extremely challenging. It’s time-consuming, brittle, and often unrewarding. Not only that, it’s hard to operate, evolve, and troubleshoot. In this post, we’ll explore how applying the functional programming paradigm to data engineering can bring a lot of clarity to the process. This post distills fragments of wisdom accumulated while working at Yahoo, Facebook, Airbnb and Lyft, with the perspective of well over a decade of data warehousing

article thumbnail

Citus Data: Distributed PostGreSQL for Big Data with Ozgun Erdogan and Craig Kerstiens - Episode 13

Data Engineering Podcast

Summary PostGreSQL has become one of the most popular and widely used databases, and for good reason. The level of extensibility that it supports has allowed it to be used in virtually every environment. At Citus Data they have built an extension to support running it in a distributed fashion across large volumes of data with parallelized queries for improved performance.

article thumbnail

Do These Things if you Want to Succeed as an HR Professional

U-Next

Success in today’s businesses has taken several meanings. Apart from just pay hikes and promotions, success has gotten new dimensions that have been of very recent origins. Today, success has become synonymous with happiness at a workplace, challenging tasks, compensatory rewards, incentives, authoritative job profiles, influential role, and more. The current talent pools in organizations have become wiser and more mature than their previous generation counterparts.

article thumbnail

Data Engineering is Critical to Big Data Success

Cloudera

I mentioned in an earlier blog titled, “Staffing your big data team, ” that data engineers are critical to a successful data journey. That said, most companies that are early in their journey lack a dedicated engineering group. And the longer it takes to put a team in place, the likelier it is that your big data project will stall. The data engineering team is responsible for collecting and ingesting batch and stream-oriented data, inventorying the data, working through ingest bottlenecks, and d

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

The Faces Behind the Fashion-MNIST

Zalando Engineering

We talk to Han and Kashif from Zalando Research Employer Branding Specialist Data Science, Nana Yamazaki catches up with the team using literal fashion icons in Deep Learning. Tell us about Fashion-MNIST. What did you want to accomplish? Fashion-MNIST is a freely available dataset of Zalando articles that most importantly has the same format as the MNIST dataset.

article thumbnail

Postgres Internals: Building a Description Tool

Dataquest

In previous blog posts , we have described the Postgres database and ways to interact with it using Python. Those posts provided the basics, but if you want to work with databases in production systems, then it is necessary to know how to make your queries faster and more efficient. To understand what efficiency means in Postgres, it’s important to learn how Postgres works under the hood.

More Trending

article thumbnail

Cybersecurity On Call: Goodbye 2017, Hello 2018! Top Five Tips from 2017

Cloudera

This was an amazing year for our inaugural “Cybersecurity On Call” season. It was truly an honor hosting amazing guests as we explored the world of cybersecurity. From industry thought leaders, to New York Times best sellers, to hackers, I learned a ton about the future of cybersecurity and I hope you did as well. Today’s episode won’t be our usual programming, today is our end of the year special where we will dive into our top five tips from this year’s season.

article thumbnail

Six Strategies for Advancing Customer Knowledge: Bringing Data Together

Cloudera

I often meet with our customers to help them understand how to connect modern technology to business success. The ever-present question at these encounters is “Where do I start?” For them, they may understand that they need a data-driven strategy or the culture may aim to take a shift to being guided by data. These are often goals set by the executive team with little guidance on how to execute or implement.

article thumbnail

Why We Do Scala in Zalando

Zalando Engineering

Leveraging the full power of a functional programming language In Zalando Dublin, you will find that most engineering teams are writing their applications using Scala. We will try to explain why that is the case and the reasons we love Scala. This content is coming both from my own experience and the team I'm working with in building the new Zalando Customer Data Platform.

Scala 40