article thumbnail

Functional Data Engineering — a modern paradigm for batch data processing

Maxime Beauchemin

Batch data processing  — historically known as ETL —  is extremely challenging. In this post, we’ll explore how applying the functional programming paradigm to data engineering can bring a lot of clarity to the process. It’s time-consuming, brittle, and often unrewarding.

article thumbnail

Mastering Batch Data Processing with Versatile Data Kit (VDK)

Towards Data Science

Data Management A tutorial on how to use VDK to perform batch data processing Photo by Mika Baumeister on Unsplash Versatile Data Ki t (VDK) is an open-source data ingestion and processing framework designed to simplify data management complexities. The following figure shows a snapshot of VDK UI.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

The Race For Data Quality in a Medallion Architecture

DataKitchen

The Race For Data Quality In A Medallion Architecture The Medallion architecture pattern is gaining traction among data teams. It is a layered approach to managing and transforming data. By systematically moving data through these layers, the Medallion architecture enhances the data structure in a data lakehouse environment.

article thumbnail

Complete Guide to Data Transformation: Basics to Advanced

Ascend.io

What is Data Transformation? Data transformation is the process of converting raw data into a usable format to generate insights. It involves cleaning, normalizing, validating, and enriching data, ensuring that it is consistent and ready for analysis.

article thumbnail

Data logs: The latest evolution in Meta’s access tools

Engineering at Meta

The result of these batch operations in the data warehouse is a set of comma delimited text files containing the unfiltered raw data logs for each user. We do this by passing the raw data through various renderers, discussed in more detail in the next section.

article thumbnail

Why SQL on Raw Data?

Rockset

Over a decade after the inception of the Hadoop project, the amount of unstructured data available to modern applications continues to increase. Moreover, despite forecasts to the contrary, SQL remains the lingua franca of data processing; today's NoSQL and Big Data infrastructure platform usage often involves some form of SQL-based querying.

article thumbnail

What is data processing analyst?

Edureka

Raw data, however, is frequently disorganised, unstructured, and challenging to work with directly. Data processing analysts can be useful in this situation. Let’s take a deep dive into the subject and look at what we’re about to study in this blog: Table of Contents What Is Data Processing Analysis?