Accessibility and Process - Data Engineering Digest

Data logs: The latest evolution in Meta’s access tools

Engineering at Meta

FEBRUARY 4, 2025

Here we explore initial system designs we considered, an overview of the current architecture, and some important principles Meta takes into account in making data accessible and easy to understand. Users have a variety of tools they can use to manage and access their information on Meta platforms. feature on Facebook.

Accessible

Accessible Accessibility Raw Data Data Warehouse

Data Security with Snowflake: Row Access, Masking, and Projection Policies

Cloudyard

NOVEMBER 1, 2024

However, due to compliance regulations, access to these fields needs to be restricted based on the user’s role. Snowflake provides several layers of data security, including Projection Policies , Masking Policies , and Row Access Policies , that work together to restrict access based on roles.

Data Security

Data Security Accessible Accessibility Project

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

DataKitchen

MARCH 20, 2025

Unlocking Data Team Success: Are You Process-Centric or Data-Centric? We’ve identified two distinct types of data teams: process-centric and data-centric. Process-centric data teams focus their energies predominantly on orchestrating and automating workflows. They work in and on these pipelines.

Pipeline-centric

Pipeline-centric Database-centric Process Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

Summary Streaming data processing enables new categories of data products and analytics. Unfortunately, reasoning about stream processing engines is complex and lacks sufficient tooling. Support Data Engineering Podcast Summary Streaming data processing enables new categories of data products and analytics.

Process

Process Data Lake High Quality Data Machine Learning

Monetizing Analytics Features: Why Data Visualizations Will Never Be Enough

Think your customers will pay more for data visualizations in your application? Five years ago they may have. But today, dashboards and visualizations have become table stakes. Discover which features will differentiate your application and maximize the ROI of your embedded analytics. Brought to you by Logi Analytics.

Data

Data and Process Automation Adoption: Challenges, Maturity, and Business Impact

Precisely

MARCH 3, 2025

Data and process automation used to be seen as luxury but those days are gone. Lets explore the top challenges to data and process automation adoption in more detail. Almost half of respondents (47%) reported a medium level of automation adoption, meaning they currently have a mix of automated and manual SAP processes.

Process

Process Government Data Finance

Best Practices for Real-Time Stream Processing

Striim

MARCH 21, 2025

What is Real-Time Stream Processing? To access real-time data, organizations are turning to stream processing. To access real-time data, organizations are turning to stream processing. There are two main data processing paradigms: batch processing and stream processing.

Process

Process Data Warehouse Kafka Data Pipeline

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

Data Engineering Podcast

JANUARY 7, 2024

Summary Data processing technologies have dramatically improved in their sophistication and raw throughput. When performing research and building prototypes of the projects, what is your process for incorporating user experience into the implementation of the product?

Data Process

Data Process Process Data Lake High Quality Data

The Roots of Today's Modern Backend Engineering Practices

The Pragmatic Engineer

NOVEMBER 21, 2023

Avoiding downtime was nerve-wracking, and the notion of a 'rollback' was as much a relief as a technical process. After this zero-byte file was deployed to prod, the Apache web server processes slowly picked up the empty configuration file. Our deployments were initially manual. Apache started to log like a maniac.

Engineering

Engineering Bytes Cloud Computing AWS

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

As an attendee, you will: Discover how construction professionals have deployed digital technologies to manage the risks created by skilled worker shortages, supply chain issues, and other critical challenges 🌐 Gain insight from experts who have successfully created digital workflows and have seen process and business benefits emerge from their (..)

Project

Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents

dbt Developer Hub

APRIL 20, 2025

We are committed to building the data control plane that enables AI to reliably access structured data from across your entire data lineage. We believe it is important for the industry to start coalescing on best practices for safe and trustworthy ways to access your business data via LLM. What is MCP?

Structured Data

Structured Data SQL BI Project

Scalable Model Development and Production in Snowflake ML

Snowflake

MARCH 31, 2025

For image data, running distributed PyTorch on Snowflake ML also with standard settings resulted in over 10x faster processing for a 50,000-image dataset when compared to the same managed Spark solution. Secure access to open source repositories via pip and the ability to bring in any model from hubs such as Hugging Face (see example here ).

Healthcare

Healthcare Medical Government Food

Accelerate AI Development with Snowflake

Snowflake

NOVEMBER 11, 2024

Here’s how Snowflake Cortex AI and Snowflake ML are accelerating the delivery of trusted AI solutions for the most critical generative AI applications: Natural language processing (NLP) for data pipelines: Large language models (LLMs) have a transformative potential, but they often batch inference integration into pipelines, which can be cumbersome.

Unstructured Data

Unstructured Data SQL AWS Healthcare

Gen AI in Action: Customers’ Cortex AI Stories and Outcomes

Snowflake

NOVEMBER 6, 2024

But as technology speeds forward, organizations of all sizes are realizing that generative AI isn’t just aspirational: It’s accessible and applicable now. But getting a handle on all the emails, calls and support tickets had historically been a tedious and largely manual process. Cortex is doing a great job for us.”

Hospitality

Hospitality Medical Government Software Engineering

How to Package and Price Embedded Analytics

Just by embedding analytics, application owners can charge 24% more for their product. How much value could you add? This framework explains how application enhancements can extend your product offerings. Brought to you by Logi Analytics.

Analytics Application

How Meta discovers data flows via lineage at scale

Engineering at Meta

JANUARY 22, 2025

This belief has led us to developing Privacy Aware Infrastructure (PAI) , which offers efficient and reliable first-class privacy constructs embedded in Meta infrastructure to address different privacy requirements, such as purpose limitation , which restricts the purposes for which data can be processed and used. Hack, C++, Python, etc.)

Data Warehouse

Data Warehouse SQL Programming Language Data

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

Cloudera

NOVEMBER 13, 2024

Cloudera, together with Octopai, will make it easier for organizations to better understand, access, and leverage all their data in their entire data estate – including data outside of Cloudera – to power the most robust data, analytics and AI applications.

Metadata

Metadata Management Data Governance Government

They Handle 500B Events Daily. Here’s Their Data Engineering Architecture.

Monte Carlo

NOVEMBER 12, 2024

A data engineering architecture is the structural framework that determines how data flows through an organization – from collection and storage to processing and analysis. And who better to learn from than the tech giants who process more data before breakfast than most companies see in a year?

Architecture

Architecture Data Engineering Data Engineer Engineering

Building cost effective data pipelines with Python & DuckDB

Start Data Engineering

MAY 28, 2024

Use DuckDB to process data, not for multiple users to access data 4.2. Cost calculation: DuckDB + Ephemeral VMs = dirt cheap data processing 4.3. Processing data less than 100GB? Introduction 2. Project demo 3. Building efficient data pipelines with DuckDB 4.1. Use DuckDB 4.4.

Data Pipeline

Data Pipeline Python Building Data

Netflix’s Distributed Counter Abstraction

Netflix Tech

NOVEMBER 12, 2024

However, this category requires near-immediate access to the current count at low latencies, all while keeping infrastructure costs to a minimum. Introducing sufficient jitter to the flush process can further reduce contention. This process can also be used to track the provenance of increments.

Datasets

Datasets Computer Science Systems Kafka

Top Gen AI Use Cases: How to Turn Unstructured Data into Insights

Snowflake

JANUARY 30, 2025

Gen AI makes this all easy and accessible because anyone in an enterprise can simply interact with data by using natural language. What if our app doesnt have access to the right data and generates inaccurate results for stakeholders? Sales teams are usually boxed into dashboards to get insights.

Unstructured Data

Unstructured Data Entertainment Healthcare Telecommunication

Your Enterprise Data Needs an Agent

Snowflake

FEBRUARY 12, 2025

Agents need to access an organization's ever-growing structured and unstructured data to be effective and reliable. As data connections expand, managing access controls and efficiently retrieving accurate informationwhile maintaining strict privacy protocolsbecomes increasingly complex. text, audio) and structured (e.g.,

Unstructured Data

Unstructured Data Government SQL Structured Data

How does ChatGPT work? As explained by the ChatGPT team.

The Pragmatic Engineer

APRIL 21, 2024

Other shipped things include DALL·E 3 (image generation,) GPT-4 (an advanced model,) and the OpenAI API which developers and companies use to integrate AI into their processes. Each word that spits out of ChatGPT is this same process repeated over and over again many times per second.

Engineering

Engineering Software Engineering Software Engineer Programming

Announcing DeepSeek-R1 in private preview on Snowflake Cortex AI

Snowflake

JANUARY 29, 2025

As part of the private preview, we will focus on providing access inline with our product principles of ease, efficiency and trust. To request access during preview please reach out to your sales team. Once the model is generally available, customers can manage access to the model via role-based access control (RBAC).

Unstructured Data

Unstructured Data SQL Python Government

Snowflake’s Fully Managed Service: Beyond Serverless

Snowflake

FEBRUARY 13, 2025

Furthermore, most vendors require valuable time and resources for cluster spin-up and spin-down, disruptive upgrades, code refactoring or even migrations to new editions to access features such as serverless capabilities and performance improvements. As a result, data often went underutilized.

Management

Management Government Cloud Unstructured Data

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Snowflake

NOVEMBER 26, 2024

Processing some 90,000 tables per day, the team oversees the ingestion of more than 100 terabytes of data from upward of 8,500 events daily. With an internal user base of 2,000 — and growing — the company particularly appreciated the seamless data access controls and the ability to securely share data with just a few simple clicks.

Data Warehouse

Data Warehouse Cloud PostgreSQL Hadoop

The Marketing Agency of the Future: Powered by Unified Data, Trust and AI

Snowflake

JANUARY 21, 2025

How can they get access to more transparency into where and why their marketing dollars are being spent (to reduce fraud, saturation and leverage for higher-level internal measurement practices, among other reasons)? Teams will also be able to work more efficiently when they can access all relevant data in one place.

Media

Media Unstructured Data Data Retail

Modern Data Architecture: Data Mesh and Data Fabric 101

Precisely

OCTOBER 31, 2024

Data fabric is a unified approach to data management, creating a consistent way to manage, access, and share data across distributed environments. As data management grows increasingly complex, you need modern solutions that allow you to integrate and access your data seamlessly.

Data Architecture

Data Architecture Architecture Metadata Government

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

Ingest data more efficiently and manage costs For data managed by Snowflake, we are introducing features that help you access data easily and cost-effectively. This reduces the overall complexity of getting streaming data ready to use: Simply create external access integration with your existing Kafka solution.

Data Architecture

Data Architecture Architecture Data Lake Kafka

Part 1: A Survey of Analytics Engineering Work at Netflix

Netflix Tech

DECEMBER 17, 2024

This fragmentation leads to inconsistencies and wastes valuable time as teams end up reinventing metrics or seeking clarification on definitions that should be standardized and readily accessible. Enter DataJunction (DJ). DJ acts as a central store where metric definitions can live and evolve.

Engineering

Engineering Entertainment Amazon Web Services Utilities

Trends and Takeaways from Banking and Payments’ Event of the Year

Snowflake

NOVEMBER 11, 2024

This is not surprising when you consider all the benefits, such as reducing complexity [and] costs and enabling zero-copy data access (ideal for centralizing data governance).

Banking

Banking Finance Retail Food

How Retail and Media Leaders Drive Customer Satisfaction and Profits with Data and AI

Snowflake

MARCH 19, 2025

Attendees will discover how to accelerate their critical business workflows with the right data, technology and ecosystem access. Explore AI and unstructured data processing use cases with proven ROI: This year, retailers and brands will face intense pressure to demonstrate tangible returns on their AI investments.

Retail

Retail Media Entertainment Unstructured Data

Snowflake Startup Spotlight: Innova-Q

Snowflake

APRIL 7, 2025

Our deep industry knowledge and understanding of these gaps gave us the insight to create solutions that simplify and automate compliance processes using AI. With advanced encryption, strict access controls and strong data governance, Snowflake helps us ensure the confidentiality and protection of our clients information.

Food

Food Data Transparency Software Engineering Software Engineer

Interesting startup idea: benchmarking cloud platform pricing

The Pragmatic Engineer

OCTOBER 17, 2024

The startup was able to start operations thanks to getting access to an EU grant called NGI Search grant. Code and raw data repository: Version control: GitHub Heavily using GitHub Actions for things like getting warehouse data from vendor APIs, starting cloud servers, running benchmarks, processing results, and cleaning up after tuns.

Cloud

Cloud AWS Metadata Cloud Computing

A Dive into the Basics of Big Data Storage with HDFS

Analytics Vidhya

FEBRUARY 6, 2023

Introduction HDFS (Hadoop Distributed File System) is not a traditional database but a distributed file system designed to store and process big data. It is a core component of the Apache Hadoop ecosystem and allows for storing and processing large datasets across multiple commodity servers.

Data Storage

Data Storage Big Data Hadoop Datasets

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

Snowflake

DECEMBER 4, 2024

For years, an essential tenet of digital transformation has been to make data accessible, to break down silos so that the enterprise can draw value from all of its data. Overall, data must be easily accessible to AI systems, with clear metadata management and a focus on relevance and timeliness.

Unstructured Data

Unstructured Data Data Lake Deep Learning Structured Data

Airflow XCOM: The Ultimate Guide

Marc Lamberti

SEPTEMBER 22, 2023

To access XComs, go to the user interface, then Admin and XComs. First thing first, xcom_push is accessible only from a task instance object. With the PythonOperator you can access it by passing the parameter ti to the Python callable function. Once we access the task instance object, we can call xcom_push. Yes, there is!

MySQL

MySQL Data Pipeline Database Python

Introducing Accelerator for Machine Learning (ML) Projects: Summarization with Gemini from Vertex AI

Cloudera

DECEMBER 9, 2024

For example, a Cloudera customer saw a large productivity improvement in their contract review process with an application that extracts and displays a short summary of essential clauses for the reviewer. Benchmark tests indicate that Gemini Pro demonstrates superior speed in token processing compared to its competitors like GPT-4.

Machine Learning

Machine Learning Project Banking Accessible

How Meta understands data at scale

Engineering at Meta

APRIL 28, 2025

Specifically, we have adopted a “shift-left” approach, integrating data schematization and annotations early in the product development process. However, conducting these processes outside of developer workflows presented challenges in terms of accuracy and timeliness.

Metadata

Metadata Data Utilities Data Warehouse

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

The Medallion architecture is a design pattern that helps data teams organize data processing and storage into three distinct layers, often called Bronze, Silver, and Gold. The Silver layer aims to create a structured, validated data source that multiple organizations can access. How do you ensure data quality in every layer ?

Architecture

Architecture Raw Data Pipeline-centric Data Ingestion

SwiftKV Cuts LLM Inference Costs by 75% with Snowflake Cortex AI

Snowflake

JANUARY 16, 2025

Customers can access these in Cortex AI via the complete function. This is done by combining parameter preserving model rewiring with lightweight fine-tuning to minimize the likelihood of knowledge being lost in the process. SwiftKV-optimized Llama 3.3 70B and Llama 3.1 405B models, referred to as Snowflake-LLama-3.3-70B

Algorithm

Algorithm Data Analysis Building Process

How To Future-Proof Your Data Pipelines

Ascend.io

NOVEMBER 14, 2024

But when data processes fail to match the increased demand for insights, organizations face bottlenecks and missed opportunities. Set Up Auto-Scaling: Configure auto-scaling for your data processing and storage resources. The ability to harness and analyze data effectively can make or break a company’s competitive edge.

Data Pipeline

Data Pipeline Amazon Web Services Data Integration Data

The Future of Data Management Is Agentic AI

Snowflake

APRIL 13, 2025

Manual processes can be time-consuming and error-prone. Agentic AI automates these processes, helping ensure data integrity and offering real-time insights. Leveraging advanced machine learning and natural language processing, these intelligent agents can efficiently manage and analyze vast data amounts.

Data Management

Data Management Management Consulting Unstructured Data

Data Integration for AI: Top Use Cases and Steps for Success

Precisely

FEBRUARY 20, 2025

It enables faster decision-making, boosts efficiency, and reduces costs by providing self-service access to data for AI models. Define clear goals, assess your data landscape, choose the right tools, ensure data quality and governance, and continuously optimize your integration processes. Thats where data integration comes in.

Data Integration

Data Integration Government Datasets Data Pipeline

Data logs: The latest evolution in Meta’s access tools

Data Security with Snowflake: Row Access, Masking, and Projection Policies

Webinars

Trending Sources

Unlocking Data Team Success: Are You Process-Centric or Data-Centric?

Webinars

X-Ray Vision For Your Flink Stream Processing With Datorios

Monetizing Analytics Features: Why Data Visualizations Will Never Be Enough

Data and Process Automation Adoption: Challenges, Maturity, and Business Impact

Best Practices for Real-Time Stream Processing

Pushing The Limits Of Scalability And User Experience For Data Processing WIth Jignesh Patel

The Roots of Today's Modern Backend Engineering Practices

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents

Scalable Model Development and Production in Snowflake ML

Accelerate AI Development with Snowflake

Gen AI in Action: Customers’ Cortex AI Stories and Outcomes

How to Package and Price Embedded Analytics

How Meta discovers data flows via lineage at scale

Octopai Acquisition Enhances Metadata Management to Trust Data Across Entire Data Estate

They Handle 500B Events Daily. Here’s Their Data Engineering Architecture.

Building cost effective data pipelines with Python & DuckDB

Netflix’s Distributed Counter Abstraction

Top Gen AI Use Cases: How to Turn Unstructured Data into Insights

Your Enterprise Data Needs an Agent

How does ChatGPT work? As explained by the ChatGPT team.

Announcing DeepSeek-R1 in private preview on Snowflake Cortex AI

Snowflake’s Fully Managed Service: Beyond Serverless

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

The Marketing Agency of the Future: Powered by Unified Data, Trust and AI

Modern Data Architecture: Data Mesh and Data Fabric 101

Simplifying Data Architecture and Security to Accelerate Value

Part 1: A Survey of Analytics Engineering Work at Netflix

Trends and Takeaways from Banking and Payments’ Event of the Year

How Retail and Media Leaders Drive Customer Satisfaction and Profits with Data and AI

Snowflake Startup Spotlight: Innova-Q

Interesting startup idea: benchmarking cloud platform pricing

A Dive into the Basics of Big Data Storage with HDFS

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

Airflow XCOM: The Ultimate Guide

Introducing Accelerator for Machine Learning (ML) Projects: Summarization with Gemini from Vertex AI

How Meta understands data at scale

The Race For Data Quality in a Medallion Architecture

SwiftKV Cuts LLM Inference Costs by 75% with Snowflake Cortex AI

How To Future-Proof Your Data Pipelines

The Future of Data Management Is Agentic AI

Data Integration for AI: Top Use Cases and Steps for Success

Stay Connected