article thumbnail

Functional Data Engineering — a modern paradigm for batch data processing

Maxime Beauchemin

Batch data processing  — historically known as ETL —  is extremely challenging. In this post, we’ll explore how applying the functional programming paradigm to data engineering can bring a lot of clarity to the process. But how do we model this in a functional data warehouse without mutating data?

article thumbnail

Simplify Data Processing with Pandas Pipeline

KDnuggets

Write a single line of code to clean and process the data for analytics and machine learning tasks.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Last Mile Data Processing with Ray

Pinterest Engineering

Since it takes so long to iterate on workflows, some ML engineers started to perform data processing directly inside training jobs. This is what we commonly refer to as Last Mile Data Processing. Last Mile processing can boost ML engineers’ velocity as they can write code in Python, directly using PyTorch.

article thumbnail

Mastering Batch Data Processing with Versatile Data Kit (VDK)

Towards Data Science

Data Management A tutorial on how to use VDK to perform batch data processing Photo by Mika Baumeister on Unsplash Versatile Data Ki t (VDK) is an open-source data ingestion and processing framework designed to simplify data management complexities.

article thumbnail

Type-safe data processing pipelines

Tweag

Moreover, these steps can be combined in different ways, perhaps omitting some or changing the order of others, producing different data processing pipelines tailored to a particular task at hand. Namely, dependencies are encoded in the types, allowing compile-time checking and serving as the code documentation.

article thumbnail

Is the “AI developer”a threat to jobs – or a marketing stunt?

The Pragmatic Engineer

The company says: “Devin is the new state-of-the-art on the SWE-Bench coding benchmark, has successfully passed practical engineering interviews from leading AI companies, and has even completed real jobs on Upwork. years ago, and it became the leading AI coding assistant almost overnight. It’s more a copilot.

article thumbnail

Massively Parallel Data Processing In Python Without The Effort Using Bodo

Data Engineering Podcast

Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. What are the techniques/technologies that teams might use to optimize or scale out their data processing workflows? Can you describe what Bodo is and the story behind it?