article thumbnail

Functional Data Engineering — a modern paradigm for batch data processing

Maxime Beauchemin

Batch data processing  — historically known as ETL —  is extremely challenging. In this post, we’ll explore how applying the functional programming paradigm to data engineering can bring a lot of clarity to the process. It’s time-consuming, brittle, and often unrewarding.

article thumbnail

25 SQL tips to level up your data engineering skills

Start Data Engineering

Introduction Setup SQL tips 1. Handy functions for common data processing scenarios 1.1. STRUCT data types are sorted based on their keys from left to right 1.4. Need to filter on WINDOW function without CTE/Subquery use QUALIFY 1.2. Need the first/last row in a partition, use DISTINCT ON 1.3.

SQL 130
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

Summary Data processing technologies have dramatically improved in their sophistication and raw throughput. Unfortunately, the volumes of data that are being generated continue to double, requiring further advancements in the platform capabilities to keep up.

article thumbnail

Startup Spotlight: How ROE AI Empowers Data Teams

Snowflake

In this edition, we talk to Richard Meng, co-founder and CEO of ROE AI , a startup that empowers data teams to extract insights from unstructured, multimodal data including documents, images and web pages using familiar SQL queries. Whats the coolest thing youre doing with data? What inspires you as a founder?

article thumbnail

Accelerate AI Development with Snowflake

Snowflake

These scalable models can handle millions of records, enabling you to efficiently build high-performing NLP data pipelines. However, scaling LLM data processing to millions of records can pose data transfer and orchestration challenges, easily addressed by the user-friendly SQL functions in Snowflake Cortex.

article thumbnail

Mastering Batch Data Processing with Versatile Data Kit (VDK)

Towards Data Science

Data Management A tutorial on how to use VDK to perform batch data processing Photo by Mika Baumeister on Unsplash Versatile Data Ki t (VDK) is an open-source data ingestion and processing framework designed to simplify data management complexities.

article thumbnail

How Meta discovers data flows via lineage at scale

Engineering at Meta

These stages propagate through various systems including function-based systems that load, process, and propagate data through stacks of function calls in different programming languages (e.g., For simplicity, we will demonstrate these for the web, the data warehouse, and AI, per the diagram below. Hack, C++, Python, etc.)