Designing and Presentation - Data Engineering Digest

What Is PDFMiner And Should You Use It – How To Extract Data From PDFs

Seattle Data Guy

JANUARY 18, 2025

Because they can preserve the visual layout of documents and are compatible with a wide range of devices and operating systems, PDFs are used for everything from business forms and educational material to creative designs.

IT

IT Education Data Designing

Web Design vs Graphic Design: Top Differences & Similarities

Knowledge Hut

JANUARY 11, 2024

In today's digital age, the rise of the internet has given us two powerful mediums for expressing creativity: web design & graphic design. Web design involves creating websites, interfaces, & online applications while graphic design entails creating visual designs for print & digital media.

Designing

Designing Digital Media Media Programming Language

Netflix’s Distributed Counter Abstraction

Netflix Tech

NOVEMBER 12, 2024

By: Rajiv Shringi , Oleksii Tkachuk , Kartik Sathyanarayanan Introduction In our previous blog post, we introduced Netflix’s TimeSeries Abstraction , a distributed service designed to store and query large volumes of temporal event data with low millisecond latencies. Today, we’re excited to present the Distributed Counter Abstraction.

Datasets

Datasets Computer Science Systems Kafka

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

The Roots of Today's Modern Backend Engineering Practices

The Pragmatic Engineer

NOVEMBER 21, 2023

Joshua also writes an excellent Substack newsletter about how to design products which customers love, how to operate live services at scale, grow and optimize your technology orgs, and the history of the tech industry. ARPANET was designed to maintain communications during outages. Subscribe here.

Engineering

Engineering Bytes Cloud Computing AWS

New Study: 2018 State of Embedded Analytics Report

Why do some embedded analytics projects succeed while others fail? We surveyed 500+ application teams embedding analytics to find out which analytics features actually move the needle. Read the 6th annual State of Embedded Analytics Report to discover new best practices. Brought to you by Logi Analytics.

Project

Snowflake Startup Challenge 2025: Meet the Top 10

Snowflake

APRIL 9, 2025

Since snowboards often have a wooden core, and because a snowboard is the traditional trophy for the Snowflake Startup Challenge, were going to go ahead and say that the snowboard trophy qualifies as a present for the fifth anniversary of our Startup Challenge. The judges will deliberate live before naming the 2025 Grand Prize winner.

Pharmaceutical

Pharmaceutical Manufacturing Data Ingestion SQL

Looking back at our Bug Bounty program in 2024

Engineering at Meta

FEBRUARY 13, 2025

We also co-presented talks at EkoParty, DEF CON, Hardwear.io, Pwn2own, and other security research summits. We received more than 100 bug reports and awarded over $320,000 in total.

Programming

Programming Designing Accessibility Accessible

How DoorDash Designed a Successful Write-Heavy Scalable and Reliable Inventory Platform

DoorDash Engineering

FEBRUARY 22, 2023

In simple terms, “inventory” refers to the list of items present in a specific store of a Convenience and Grocery (CnG) merchant. The post How DoorDash Designed a Successful Write-Heavy Scalable and Reliable Inventory Platform appeared first on DoorDash Engineering Blog. Before we dive in, let’s define some important terminology.

Designing

Designing Database Architecture SQL

Title Launch Observability at Netflix Scale

Netflix Tech

JANUARY 6, 2025

This situation presents both challenges and opportunities; while it may be more difficult to make initial progress, there are plenty of easy wins to capitalize on. These elements are critical for a titles eligibility in a row, accurate personalization, and an engaging presentation. artwork, trailers, supplemental messages).

Metadata

Metadata Algorithm Systems Building

The Essential Guide to Building Analytic Applications

Embedding analytics in software presents some unique opportunities—and poses unique challenges—to software teams. What are best practices when designing the UI and UX of embedded dashboards, reports, and analytics? What should software teams know about implementing security that works with the rest of their products?

Analytics Application

The Emerging Role of AI Data Engineers - The New Strategic Role for AI-Driven Success

Data Engineering Weekly

JANUARY 15, 2025

To address these challenges, AI Data Engineers have emerged as key players, designing scalable data workflows that fuel the next generation of AI systems. Complexity and Variability Each type of unstructured data—text, images, videos, or audio—presents unique challenges. Their role is not just important; it is essential.

Data Engineer

Data Engineer Data Engineering Unstructured Data Engineering

Improving Pinterest Search Relevance Using Large Language Models

Pinterest Engineering

APRIL 4, 2025

In this blog, we will go through the technical design and share some offline and online results for our LLM-based search relevance pipeline. Technical Design LLM as Relevance Model Model Architecture We use a cross-encoder language model to predict a Pins relevance to a query, along with Pin text, as shown in Figure 1.

Machine Learning

Machine Learning Metadata Architecture Datasets

Top 10 Data Engineering & AI Trends for 2025

Monte Carlo

NOVEMBER 26, 2024

According to Tomasz, an AI use-case presents the opportunity for cost reduction if one of three criteria are met: 1. Like Google, large models are designed to service a variety of use-cases. While those numbers are certainly impressive by modern standards, they still present some key limitations for the use of AI.

Data Engineer

Data Engineer Data Engineering Engineering Unstructured Data

Data logs: The latest evolution in Meta’s access tools

Engineering at Meta

FEBRUARY 4, 2025

Here we explore initial system designs we considered, an overview of the current architecture, and some important principles Meta takes into account in making data accessible and easy to understand. Obtaining this data from Hive in a format that can be presented to users is not straightforward. What are data logs?

Accessibility

Accessibility Accessible Raw Data Data Warehouse

Exploring the Semantic Layer Through the Lens of MVC

Simon Späti

NOVEMBER 19, 2024

MVC is an interesting concept from the late 70s that separates the View (presentation) from the Controller via the Model. It has been used in designing web applications and is still heavily used, for example, in Ruby on Rails or Laravel, a popular PHP framework.

Designing

Designing IT Data

Modern Data Architecture: Data Mesh and Data Fabric 101

Precisely

OCTOBER 31, 2024

Key Takeaways: Data mesh is a decentralized approach to data management, designed to shift creation and ownership of data products to domain-specific teams. This complexity requires a mature data engineering team to design, implement, and manage it effectively.

Data Architecture

Data Architecture Architecture Metadata Government

Part 1: A Survey of Analytics Engineering Work at Netflix

Netflix Tech

DECEMBER 17, 2024

This article is the first in a multi-part series sharing a breadth of Analytics Engineering work at Netflix, recently presented as part of our annual internal Analytics Engineering conference. It is challenging to scale such bespoke solutions to ever-changing and increasingly complex businessneeds.

Engineering

Engineering Entertainment Amazon Web Services Utilities

From Event-Driven Chaos to a Blazingly Fast Serving API

Zalando Engineering

MARCH 6, 2025

Those without the capacity relied on an existing unified data source, such as our Presentation API, inheriting its version of product data. To serve the presentation view of a Product Offer, a multi-stage event-driven system merged Product, Price, and Stock events into a single structure. This led to competing sources of truth.

Algorithm

Algorithm Architecture Transportation Data Ingestion

SwiftKV Cuts LLM Inference Costs by 75% with Snowflake Cortex AI

Snowflake

JANUARY 16, 2025

Yet, the complexity and computational demands of LLM inference present a challenge. It is designed to facilitate research and prototype new ideas for post-training without getting overwhelmed by complex abstraction layers or generalizations. Inference costs remain prohibitive for many workloads.

Algorithm

Algorithm Data Analysis Building Process

How to get started with dbt

Christophe Blefari

MARCH 1, 2023

This is a methodology that has been designed more than 20 years ago that optimise the storage used. In these articles there are a lot of cool presentations you should watch to understand deeper how dbt works. You can choose the strategy you want depending on your adapter (cf. examples on BigQuery ).

Data Warehouse

Data Warehouse SQL Metadata Raw Data

Part 2: A Survey of Analytics Engineering Work at Netflix

Netflix Tech

JANUARY 2, 2025

This article is the second in a multi-part series sharing a breadth of Analytics Engineering work at Netflix, recently presented as part of our annual internal Analytics Engineering conference. To better guide the design and budgeting of future campaigns, we are developing an Incremental Return on Investment model. Need to catch up?

Engineering

Engineering Entertainment Designing Technology

Interesting startup idea: benchmarking cloud platform pricing

The Pragmatic Engineer

OCTOBER 17, 2024

This grant is designed to “support entrepreneurs, tech-geeks, developers, and socially engaged people, who are capable of challenging the way we search and discover information and resources on the internet” The team is tiny; only three people.

Cloud

Cloud AWS Metadata Cloud Computing

Apache Spark listeners

Waitingforcode

FEBRUARY 3, 2023

Message bus is a common architectural design in the Enterprise Design Patterns. But it's also present at a lower level to enable the event-driven behavior. Apache Spark is not an exception. It uses a publish/subscribe approach in various places.

Architecture

Architecture Designing IT

Watch Meta’s engineers discuss optimizing large-scale networks

Engineering at Meta

JANUARY 27, 2023

The presenters Shrikrishna Khare and Srikrishna Gopu, talk about their experience designing, developing, and operating FBOSS: An in-house software built to manage and support a set of features required for data center switches of a large-scale Internet content provider.

Engineering

Engineering Software Engineering Software Engineer Transportation

How Games Typically Get Built

The Pragmatic Engineer

AUGUST 22, 2023

Games Are Software, But in a Non-Standard Way Video games can often feel far removed from traditional software, given the modes of interaction and style of presentation are often unique to this space. I often explain this working relationship as that artists make it pretty , while designers and programmers make it work.

Software Engineering

Software Engineering Software Engineer Consulting Entertainment

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Want to see Starburst in action?

Data Lake

Data Lake High Quality Data Metadata Machine Learning

From Machine Learning to AI: Simplifying the Path to Enterprise Intelligence

Cloudera

JANUARY 9, 2025

Simplifying to Amplify This renaming is part of a broader effort to simplify how we present our offerings. Best of all, they are all designed to work together seamlessly, providing you with the capabilities for a smooth path from raw data to AI-driven results. Within it, youll find capabilities that clearly map to what they deliver.

Machine Learning

Machine Learning Raw Data Government Algorithm

Speakers for Amsterdam / Netherlands Tech Events

The Pragmatic Engineer

AUGUST 14, 2024

Below are experienced speakers who have presented at conferences and have confirmed they are open to such opportunities (including traveling to locations in the Netherlands.) See his talk at React Live 2023 (a case study of a successful migration) and at JSWORLD 2022 (designing high-performance React applications.) See more past talks.

Software Engineering

Software Engineering Software Engineer Education Architecture

Data Council 2023

Christophe Blefari

MAY 18, 2023

As I often do I've overlooked the 70 presentations and here a medley of what I've liked. The presentation gives another look at the semantic layer. During the demo, Llyod does some data analysis in the browser and it's just mind-blowing 🤯 At the same time someone Google also did a Calcite presentation.

Data

Data BI Consulting Data Science

Establishing a Large Scale Learned Retrieval System at Pinterest

Pinterest Engineering

JANUARY 31, 2025

Fig 1 illustrates a general multi-stage recommendation funnel design in Pinterest. General multi-stage recommendation system design in Pinterest. This section illustrates the current machine learning design of the two-tower machine learning model for learned retrieval at Pinterest.

Systems

Systems Metadata Machine Learning Architecture

Gartner Data & Analytics Summit Takeaway: “Why is nobody listening?”

Precisely

MARCH 18, 2025

There were many Gartner keynotes and analyst-led sessions that had titles like: Scale Data and Analytics on Your AI Journeys” What Everyone in D&A Needs to Know About (Generative) AI: The Foundations AI Governance: Design an Effective AI Governance Operating Model The advice offered during the event was relevant, valuable, and actionable.

Data Analytics

Data Analytics Data Governance Government Consulting

Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents

dbt Developer Hub

APRIL 20, 2025

The dbt MCP server presents our early explorations, but we anticipate that the Community will find many more. Determining the most useful tools for the dbt MCP What are the best and most useful set of tools to enable human in the loop and AI driven LLM access to structured data?

Structured Data

Structured Data SQL BI Project

Speakers for Amsterdam / Netherlands Tech Events

The Pragmatic Engineer

AUGUST 14, 2024

Below are experienced speakers who have presented at conferences and have confirmed they are open to such opportunities (including traveling to locations in the Netherlands.) See his talk at React Live 2023 (a case study of a successful migration) and at JSWORLD 2022 (designing high-performance React applications.) See more past talks.

Software Engineering

Software Engineering Software Engineer Education Architecture

Building ETL Pipeline with Snowpark

Cloudyard

DECEMBER 24, 2024

Read Time: 2 Minute, 11 Second In today’s data-driven world, organizations demand powerful tools to transform, analyze, and present their data seamlessly. GOLDEN Layer : Curated tables (dimensions and facts) designed for analytics and reporting. SILVER Layer : Cleansed and enriched data prepared for analytical processing.

Building

Building Raw Data Scala Business Intelligence

How LinkedIn Is Using Embeddings to Up Its Match Game for Job Seekers

LinkedIn Engineering

OCTOBER 5, 2023

This focus on training pipeline steps which take in heterogeneous inputs with multiple objective functions, and emits co-trained models which are already part of an implicit inference graph centers the application development experience in the place where the AI engineer feels most at home: designing the recommender model itself.

IT

IT Metadata Designing Cloud

Inside Look: Measuring Developer Productivity and Happiness at LinkedIn

LinkedIn Engineering

APRIL 4, 2023

This blog post will provide an overview of how we approached metrics selection and design, system architecture and key product features. Metrics Design and Selection We started with a simple question: What should we measure? To answer that we followed a metrics design framework called “Goals-Signals-Metrics.” (GSM)

MySQL

MySQL Datasets Software Engineer Software Engineering

Beyond Kafka: Conversation with Jark Wu on Fluss - Streaming Storage for Real-Time Analytics

Data Engineering Weekly

FEBRUARY 18, 2025

Fluss is a streaming storage specifically designed for real-time analytics. Fluss and Kafka differ fundamentally in design principles. Kafka is designed for streaming events, but Fluss is designed for streaming analytics. Fluss is designed for real-time analytics. What is Fluss and its use cases?

Kafka

Kafka Lambda Architecture SQL Architecture

Exploring The Nuances Of Building An Intential Data Culture

Data Engineering Podcast

MARCH 5, 2023

In terms of the conference, what are the factors that you consider when deciding how to group the different presentations into tracks or themes? What are the most interesting, unexpected, or challenging lessons that you have learned while working on selecting presentations for this year's event?

Building

Building Database Design Machine Learning Metadata

Working at a Startup vs in Big Tech

The Pragmatic Engineer

SEPTEMBER 28, 2023

I was still recovering from this failure in 2015, when a friend who designed the first version of Uber– Jelle Prins – pinged me, and said Uber was kickstarting an engineering office in Amsterdam. This last one was a great adventure; we went to the US to join the TechStars 2013 accelerator.

Software Engineer

Software Engineer Software Engineering Engineering Building

Data Engineering Weekly #212

Data Engineering Weekly

MARCH 16, 2025

The blog narrates how Apache Arrow offers better data serialization efficiency and avoids design pitfalls from the past. The blog stresses the need for granular, structured feedback, especially from experts, and outlines key considerations for evaluation design. What we learned?

Data Engineer

Data Engineer Data Engineering Engineering Amazon Web Services

Title Launch Observability at Netflix Scale

Netflix Tech

DECEMBER 17, 2024

How can we design systems that recognize these nuances and empower every title to shine and bring joy to ourmembers? They allow us to verify whether titles are presented as intended and investigate any discrepancies. Yet, these pages couldnt be more different. How do we bridge this gap?

Metadata

Metadata Systems Algorithm Data Analysis

Foundation Model for Personalized Recommendation

Netflix Tech

MARCH 28, 2025

These insights have shaped the design of our foundation model, enabling a transition from maintaining numerous small, specialized models to building a scalable, efficient system. Addressing unique challenges, like cold start and presentation bias, the model also acknowledges the distinct differences between language tasks and recommendation.

Metadata

Metadata Bytes Data Mining Entertainment

Defining A Strategy For Your Data Products

Data Engineering Podcast

OCTOBER 22, 2023

With the broader audience comes the need to present data in a more approachable format. With the broader audience comes the need to present data in a more approachable format. Summary The primary application of data has moved beyond analytics. Can you describe what is encompassed by the idea of a data product strategy?

BI

BI SQL Machine Learning Programming Language

Modern Data Engineering: Free Spark to Snowpark Migration Accelerator for Faster, Cheaper Pipelines in Snowflake

Snowflake

JUNE 20, 2024

Designed for processing large data sets, Spark has been a popular solution, yet it is one that can be challenging to manage, especially for users who are new to big data processing or distributed systems. The Snowpark Migration Accelerator builds an internal model representing the functionality present in the codebase.

Data Engineer

Data Engineer Data Engineering Scala Engineering

What Is PDFMiner And Should You Use It – How To Extract Data From PDFs

Web Design vs Graphic Design: Top Differences & Similarities

Webinars

Trending Sources

Netflix’s Distributed Counter Abstraction

Webinars

The Roots of Today's Modern Backend Engineering Practices

New Study: 2018 State of Embedded Analytics Report

Snowflake Startup Challenge 2025: Meet the Top 10

Looking back at our Bug Bounty program in 2024

How DoorDash Designed a Successful Write-Heavy Scalable and Reliable Inventory Platform

Title Launch Observability at Netflix Scale

The Essential Guide to Building Analytic Applications

The Emerging Role of AI Data Engineers - The New Strategic Role for AI-Driven Success

Improving Pinterest Search Relevance Using Large Language Models

Top 10 Data Engineering & AI Trends for 2025

Data logs: The latest evolution in Meta’s access tools

Exploring the Semantic Layer Through the Lens of MVC

Modern Data Architecture: Data Mesh and Data Fabric 101

Part 1: A Survey of Analytics Engineering Work at Netflix

From Event-Driven Chaos to a Blazingly Fast Serving API

SwiftKV Cuts LLM Inference Costs by 75% with Snowflake Cortex AI

How to get started with dbt

Part 2: A Survey of Analytics Engineering Work at Netflix

Interesting startup idea: benchmarking cloud platform pricing

Apache Spark listeners

Watch Meta’s engineers discuss optimizing large-scale networks

How Games Typically Get Built

Being Data Driven At Stripe With Trino And Iceberg

From Machine Learning to AI: Simplifying the Path to Enterprise Intelligence

Speakers for Amsterdam / Netherlands Tech Events

Data Council 2023

Establishing a Large Scale Learned Retrieval System at Pinterest

Gartner Data & Analytics Summit Takeaway: “Why is nobody listening?”

Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents

Speakers for Amsterdam / Netherlands Tech Events

Building ETL Pipeline with Snowpark

How LinkedIn Is Using Embeddings to Up Its Match Game for Job Seekers

Inside Look: Measuring Developer Productivity and Happiness at LinkedIn

Beyond Kafka: Conversation with Jark Wu on Fluss - Streaming Storage for Real-Time Analytics

Exploring The Nuances Of Building An Intential Data Culture

Working at a Startup vs in Big Tech

Data Engineering Weekly #212

Title Launch Observability at Netflix Scale

Foundation Model for Personalized Recommendation

Defining A Strategy For Your Data Products

Modern Data Engineering: Free Spark to Snowpark Migration Accelerator for Faster, Cheaper Pipelines in Snowflake

Stay Connected