Top Data Engineering Digest Data Workflow Data Consolidation Content for Week of Dec 14

Sat.Dec 14, 2024 - Fri.Dec 20, 2024

Top 10 Data & AI Trends for 2025

Towards Data Science

DECEMBER 16, 2024

Agentic AI, small data, and the search for value in the age of the unstructured datastack. Image credit: MonteCarlo According to industry experts, 2024 was destined to be a banner year for generative AI. Operational use cases were rising to the surface, technology was reducing barriers to entry, and general artificial intelligence was obviously right around thecorner.

Unstructured Data

Unstructured Data Data Food Data Engineering

Part 1: A Survey of Analytics Engineering Work at Netflix

Netflix Tech

DECEMBER 17, 2024

This article is the first in a multi-part series sharing a breadth of Analytics Engineering work at Netflix, recently presented as part of our annual internal Analytics Engineering conference. We kick off with a few topics focused on how were empowering Netflix to efficiently produce and effectively deliver high quality, actionable analytic insights across the company.

Engineering

Engineering Entertainment Amazon Web Services Utilities

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Monte Carlo Recognized as the #1 Leader in Data Observability and Data Quality by G2

Monte Carlo

DECEMBER 18, 2024

As we turn the corner into 2025, were excited to announce that for the 7th quarter in a row, Monte Carlo has been named G2s #1 Data Observability Platform, as well as #1 in the Data Quality category. This recognition never gets old because G2 bases their rankings on feedback and insights from real customers who work in these tools every day to add value to their business.

Database

Database High Quality Data Data Software Engineering

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Telco Enterprise Data Platforms: Key Success Factors in Building for an AI Future

Cloudera

DECEMBER 17, 2024

Since 5G networks began rolling out commercially in 2019, telecom carriers have faced a wide range of new challenges: managing high-velocity workloads, reducing infrastructure costs, and adopting AI and automation. Because data management is a key variable for overcoming these challenges, carriers are turning to hybrid cloud solutions, which provide the flexibility and scalability needed to adapt to the evolving landscape 5G enables.

Building

Building Telecommunication Data Architecture Architecture

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Redefining AIOps IT Workflows with Legacy System Visibility

Precisely

DECEMBER 16, 2024

Key Takeaways: Centralized visibility of data is key. Modern IT environments require comprehensive data for successful AIOps, that includes incorporating data from legacy systems like IBM i and IBM Z into ITOps platforms. Predictive of AIOps capabilities will revolutionize IT operations. The shift from reactive to proactive IT operations is driven by AI-powered analysis, automation and insights.

Systems

Systems IT Machine Learning Insurance

Introducing Configurable Metaflow

Netflix Tech

DECEMBER 19, 2024

David J. Berg * , David Casler ^, Romain Cledat * , Qian Huang * , Rui Lin * , Nissan Pow * , Nurcan Sonmez * , Shashank Srikanth * , Chaoying Wang * , Regina Wang * , Darin Yu * *: Model Development Team, Machine Learning Platform ^: Content Demand ModelingTeam A month ago at QConSF, we showcased how Netflix utilizes Metaflow to power a diverse set of ML and AI use cases , managing thousands of unique Metaflow flows.

Machine Learning

Machine Learning Project Data Warehouse Coding

The Developer Experience Upgrade: From Create React App to Vite

Tweag

DECEMBER 18, 2024

We all know how it feels: staring at the terminal while your development server starts up, or watching your CI/CD pipeline crawl through yet another build process. For many React developers using Create React App (CRA), this waiting game has become an unwanted part of the daily routine. While CRA has been the go-to build tool for React applications for years, its aging architecture is increasingly becoming a bottleneck for developer productivity.

Coding

Coding Project Architecture Building

More Trending

The Developer Experience Upgrade: From Create React App to Vite

Tweag

DECEMBER 18, 2024

Coding

Coding Project Architecture Building

Maximizing Fuel Efficiency with Real-Time Data: A New Era in Airline Operations

Striim

DECEMBER 18, 2024

In 2024 , the global airline industry is projected to spend $291 billion on fuel, making it one of the most significant expenses for airlines. Inefficient fuel management not only drives up operational costs but also hampers environmental targets. However, optimizing fuel usage is complex, often hindered by limited real-time monitoring, which can lead to unnecessary waste due to inefficient routes, weather adjustments, excess weight, and outdated practices.

Aggregated Data

Aggregated Data Machine Learning Data Integration Data

The High Price of Poor Address Data: Solutions for Better Business Outcomes

Precisely

DECEMBER 18, 2024

Key Takeaways : Poor address data can lead to missed deliveries, incorrect customer information, and wasted resources negatively impacting overall customer satisfaction, operational efficiency, and profitability. Correcting bad addresses is just the beginning you need to then connect those clean addresses to other valuable data points to unlock real value.

Data Solutions

Data Solutions Retail Datasets Food

Title Launch Observability at Netflix Scale

Netflix Tech

DECEMBER 17, 2024

Part 1: Understanding The Challenges By: VarunKhaitan With special thanks to my stunning colleagues: Mallika Rao , Esmir Mesic , HugoMarques Introduction At Netflix, we manage over a thousand global content launches each month, backed by billions of dollars in annual investment. Ensuring the success and discoverability of each title across our platform is a top priority, as we aim to connect every story with the right audience to delight our members.

Metadata

Metadata Systems Algorithm Data Analysis

Using JSpecify 1.0 to Tame Nulls in Java by Magnus Smith

Scott Logic

DECEMBER 17, 2024

Introduction In the Java ecosystem, dealing with null values has always been a source of confusion and bugs. A null value can represent various states: the absence of a value, an uninitialized object, or even an error. However, there has never been a consistent, standardized approach for annotating and ensuring null-safety at the language level. Nullability annotations like @Nullable and @NonNull are often used, but theyre not part of the core Java language, leading to inconsistencies across lib

Java

Java Coding Project Systems

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

File Archival in Snowflake: Snowpark-Powered Solution

Cloudyard

DECEMBER 18, 2024

Read Time: 2 Minute, 38 Second In data-driven organizations, File Archival in Snowflake: A Snowpark-Powered Solutionhas become a game-changer. Handling feed files in data pipelines is a critical task for many organizations. These files, often stored in stages such as Amazon S3 or Snowflake internal stages, are the backbone of data ingestion workflows.

Retail

Retail Data Ingestion AWS Data Pipeline

How GenAI is Transforming Quality Control and Safety in the F&B Industry.

RandomTrees

DECEMBER 17, 2024

The food and beverage (F&B) sector is constantly under pressure to comply with strict food safety compliance while also ensuring that operations run efficiently. In light of rapid changes in consumer demand, policies, and supply chain management, there is an urgent need to utilize new technologies. Generative AI (GenAI), an area of artificial intelligence, is enhancing the automation of quality control processes, thereby increasing the safety and efficiency of the industry.

Food

Food Manufacturing Machine Learning Algorithm

Queues in Apache Kafka®: Enhancing Message Processing and Scalability

Confluent

DECEMBER 19, 2024

Queue support in Apache Kafka 4.0, enabled by share groups, lets you accommodate traditional queue-type workloads through cooperative consumption.

Kafka

Kafka Process

Secure External Access to Unity Catalog Assets via Open APIs

databricks

DECEMBER 20, 2024

We're excited to announce the Public Preview of credential vending for Unity Catalogs open APIs, allowing external clients to securely access Unity Catalog.

Accessible

Accessible Accessibility

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Designing a Declarative Data Stack: From Theory to Practice

Simon Späti

DECEMBER 20, 2024

What started as a straightforward implementation guide for a declarative data stack quickly evolved into something more fundamental. While attempting to build a system that could define an entire data stack through a single YAML file, I encountered architectural questions that challenged my initial assumptions: Should we generate production-ready code from templates or create a boilerplate repository with best-in-class tools?

Designing

Designing Architecture Data Engineering

How to reference a seed from a different dbt project?

Start Data Engineering

DECEMBER 18, 2024

1. Introduction 2. Ways to reuse seed data across multiple dbt projects 2.1. Code setup 2.1.1. Prerequisites 2.1.2. Setup project environment 2.2. Turn the source repo into a dbt package 2.2.1. Define package version in dbt_project.yml 2.2.2. Store your package for other dbt projects to reference 2.3. Use project dependencies (dbt enterprise only) 2.4.

Project

Project Coding Data

How to Get Addicted to Machine Learning

KDnuggets

DECEMBER 20, 2024

A simple guide for getting hooked to machine learning and building a successful career in the field.

Machine Learning

Machine Learning Building

Introducing Git Support for Queries in Databricks

databricks

DECEMBER 17, 2024

Were excited to announce the Public Preview of Query Git integration as part of the new SQL Editor. Git support for queries.

SQL

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Semantic Layer and AI: The Future of Data Querying with Natural Language

Simon Späti

DECEMBER 19, 2024

Data-driven decision-making is crucial for business success, but organizations face a growing challenge of complexity and data governance. These challenges make it difficult to access data in a unified way. In Part 1 , we explored the semantic layer through the lens of MVC, and in Part 2 , we outlined its benefits. In this final piece of the series, we examine the integration of a semantic layer with artificial intelligence and why it might be the best place to start with GenAI.

Data Governance

Data Governance Government Data Accessibility

Integrating Microservices with Confluent Cloud Using Micronaut® Framework

Confluent

DECEMBER 18, 2024

Real-time data streaming and messaging are essential for building scalable, resilient, event-driven microservices. Explore integrating the Micronaut framework with Confluent Cloud.

Cloud

Cloud Building Data

How to Use Docker for Local Development Environments

KDnuggets

DECEMBER 19, 2024

Using Docker for local development brings stability, flexibility, and ease of management of the environment. No matter what operating system you're using. Learn how to use Docker on Windows, Linux, and macOS to simplify your development setup, from creating your first container to managing complex environments with Docker Compose.

Systems

Systems Management

Benchmarking Domain Intelligence

databricks

DECEMBER 17, 2024

Large language models are improving rapidly; to date, this improvement has largely been measured via academic benchmarks. These benchmarks, such as MMLU and.

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

PostgreSQL Query Optimization: 10 Best Tricks & Techniques (Explained with Code)

Hevo

DECEMBER 20, 2024

PostgreSQL is one of the most popular open-source choices for relational databases. It is loved by engineers for its powerful features, flexibility, efficient data retrieval mechanism, and on top of all its overall performance. However, performance issues can be encountered with the growth in the size of data and complexity of queries.

PostgreSQL

PostgreSQL Coding Relational Database Database

The Power of Predictive Analytics in Healthcare: Using Generative AI and Confluent

Confluent

DECEMBER 20, 2024

Learn how predictive analytics, powered by generative AI and Confluent, transforms healthcare by improving outcomes, reducing costs, and enabling real-time decisions.

Healthcare

An Introduction to Dask: The Python Data Scientist’s Power Tool

KDnuggets

DECEMBER 16, 2024

Ever wondered how to handle large data without slowing down your computer? Lets learn about Dask, a tool that helps you work with large data quickly.

Python

Python Data

Philadelphia Union: Streamlining MLS Roster Planning with GenAI

databricks

DECEMBER 19, 2024

Staying competitive in Major League Soccer (MLS) demands building and maintaining a strong squad through strategic roster planning and smart, effective navigation of.

Building

Building Entertainment Media

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

LLMs vs Advent of Code, AI is winning by Colin Eberhardt

Scott Logic

DECEMBER 14, 2024

Advent of Code (AoC) is an annual, christmas-themed, coding competition that has been running for the past years and is something that I participate in at times. This year, while ~~subjecting myself to~~ learning Rust, I decided to see how OpenAIs latest model faired at the challenge. I quickly knocked together a script, and to my astonishment, found that o1-mini gave correct answers to all but one part of the first six days.

Coding

Coding Datasets Software Engineering Software Engineer

Comprehensive Overview of DataOps Framework

Hevo

DECEMBER 20, 2024

As various industries are heavily relying on data, they face issues like lack of collaboration between their teams, bottlenecks in data pipelines, and slow delivery of insights to make decisions. DataOps is a methodology that is designed to streamline workflows that ensure smooth data integration and quality in the organizations.

Data Pipeline

Data Pipeline Data Integration Designing Data

10 Essential Pandas Commands for Data Preprocessing

KDnuggets

DECEMBER 17, 2024

Check out this beginner's guide to cleaning and preparing data efficiently with Python.

Python

Python Data

Česká spořitelna: How GenAI is Transforming Call Centers in the Financial Services Industry

databricks

DECEMBER 18, 2024

Czech savings bank esk spoitelna , a division of Austrias Erste Group , recently collaborated with AI solution builder DataSentics to explore the.

Banking

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineer

Sat.Dec 14, 2024 - Fri.Dec 20, 2024

Top 10 Data & AI Trends for 2025

Part 1: A Survey of Analytics Engineering Work at Netflix

Webinars

Trending Sources

Monte Carlo Recognized as the #1 Leader in Data Observability and Data Quality by G2

Webinars

Telco Enterprise Data Platforms: Key Success Factors in Building for an AI Future

A Guide to Debugging Apache Airflow® DAGs

Redefining AIOps IT Workflows with Legacy System Visibility

Introducing Configurable Metaflow

The Developer Experience Upgrade: From Create React App to Vite

Sign up to get articles personalized to your interests!

More Trending

The Developer Experience Upgrade: From Create React App to Vite

Maximizing Fuel Efficiency with Real-Time Data: A New Era in Airline Operations

The High Price of Poor Address Data: Solutions for Better Business Outcomes

Title Launch Observability at Netflix Scale

Using JSpecify 1.0 to Tame Nulls in Java by Magnus Smith

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

File Archival in Snowflake: Snowpark-Powered Solution

How GenAI is Transforming Quality Control and Safety in the F&B Industry.

Queues in Apache Kafka®: Enhancing Message Processing and Scalability

Secure External Access to Unity Catalog Assets via Open APIs

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Designing a Declarative Data Stack: From Theory to Practice

How to reference a seed from a different dbt project?

How to Get Addicted to Machine Learning

Introducing Git Support for Queries in Databricks

How to Modernize Manufacturing Without Losing Control

Semantic Layer and AI: The Future of Data Querying with Natural Language

Integrating Microservices with Confluent Cloud Using Micronaut® Framework

How to Use Docker for Local Development Environments

Benchmarking Domain Intelligence

The Ultimate Guide to Apache Airflow DAGS

PostgreSQL Query Optimization: 10 Best Tricks & Techniques (Explained with Code)

The Power of Predictive Analytics in Healthcare: Using Generative AI and Confluent

An Introduction to Dask: The Python Data Scientist’s Power Tool

Philadelphia Union: Streamlining MLS Roster Planning with GenAI

Apache Airflow® Best Practices: DAG Writing

LLMs vs Advent of Code, AI is winning by Colin Eberhardt

Comprehensive Overview of DataOps Framework

10 Essential Pandas Commands for Data Preprocessing

Česká spořitelna: How GenAI is Transforming Call Centers in the Financial Services Industry

How to Achieve High-Accuracy Results When Using LLMs

Stay Connected