article thumbnail

Functional Data Engineering — a modern paradigm for batch data processing

Maxime Beauchemin

Batch data processing  — historically known as ETL —  is extremely challenging. In this post, we’ll explore how applying the functional programming paradigm to data engineering can bring a lot of clarity to the process. It’s time-consuming, brittle, and often unrewarding. Things have changed quite a bit since then.

article thumbnail

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

The world we live in today presents larger datasets, more complex data, and diverse needs, all of which call for efficient, scalable data systems. Open Table Format (OTF) architecture now provides a solution for efficient data storage, management, and processing while ensuring compatibility across different platforms.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

OLAP vs. OLTP: A Comparative Analysis of Data Processing Systems

KDnuggets

A comprehensive comparison between OLAP and OLTP systems, exploring their features, data models, performance needs, and use cases in data engineering.

Systems 92
article thumbnail

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

Summary Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up. What do you have planned for the future of your academic research?

article thumbnail

Last Mile Data Processing with Ray

Pinterest Engineering

It often requires a long process that touches many languages and frameworks. They have to integrate these jobs with workflow systems, test them at scale, tune them, and release into production. This is not an interactive process, and often bugs are not found until later. However, this approach has its own challenges.

article thumbnail

Type-safe data processing pipelines

Tweag

Moreover, these steps can be combined in different ways, perhaps omitting some or changing the order of others, producing different data processing pipelines tailored to a particular task at hand. The reader is assumed to be somewhat familiar with the DataKinds and TypeFamilies extensions, but we will review some peculiarities.

article thumbnail

Mastering Batch Data Processing with Versatile Data Kit (VDK)

Towards Data Science

Data Management A tutorial on how to use VDK to perform batch data processing Photo by Mika Baumeister on Unsplash Versatile Data Ki t (VDK) is an open-source data ingestion and processing framework designed to simplify data management complexities. The following figure shows a snapshot of VDK UI.