Trending Articles

article thumbnail

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Traditional databases excelled at structured data and transactional workloads but struggled with performance at scale as data volumes grew. The data warehouse solved for performance and scale but, much like the databases that preceded it, relied on proprietary formats to build vertically integrated systems.

article thumbnail

Meta Open Source: 2024 by the numbers

Engineering at Meta

Open source has played an essential role in the tech industry and beyond. Whether in the AI/ML, web, or mobile space, our open source community grew and evolved while connecting people worldwide. At Meta Open Source , 2024 was a year of growth and transformation. Our open source initiatives addressed the evolving needs and challenges of developerspowering breakthroughs in AI and enabling the creation of innovative, user-focused applications and experiences.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Best Automation Tools In 2025 for Data Pipelines, Integrations, and More

Seattle Data Guy

Since I started working in tech, one goal that kept coming up was workflow automation. Whether automating a report or setting up retraining pipelines for machine learning models, the idea was always the same: do less manual work and get more consistent results. But automation isnt just for analytics. RevOps teams want to streamline processes… Read more The post Best Automation Tools In 2025 for Data Pipelines, Integrations, and More appeared first on Seattle Data Guy.

article thumbnail

Scalable Model Development and Production in Snowflake ML

Snowflake

Despite the best efforts of many ML teams, most models still never make it to production due to disparate tooling, which often leads to fragmented data and ML pipelines and complex infrastructure management. Snowflake has continuously focused on making it easier and faster for customers to bring advanced models into production. In 2024, we launched over 200 AI features, including a full suite of end-to-end ML features in Snowflake ML , our integrated set of capabilities for machine learning mode

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Improving Pinterest Search Relevance Using Large Language Models

Pinterest Engineering

Han Wang | Machine Learning Engineer II, Relevance & Query Understanding; Mukuntha Narayanan | Machine Learning Engineer II, Relevance & Query Understanding; Onur Gungor | (former) Staff Machine Learning Engineer, Relevance & Query Understanding; Jinfeng Rao | Senior Staff Machine Learning Engineer, Pinner Discovery Figure: Illustration of the search relevance system at Pinterest.

article thumbnail

Announcing Automatic Publishing to Power BI

databricks

Were excited to announce the Public Preview of the Microsoft Power BI task type in Databricks Workflows, available on Azure, AWS, and GCP. With this.

BI 132

More Trending

article thumbnail

Lesser-Known Python Functions That Are Super Useful

KDnuggets

Go beyond the basics by adding these cool and useful Python functions to your programming toolbox.

Python 131
article thumbnail

Solving the weekly menu puzzle pt.2: recommendations at Picnic

Picnic Engineering

A little over a year ago, we shared a blog post about our journey to enhance customers meal planning experience with personalized recipe recommendations. We discussed the challenge of finding culinary inspiration when personal preferences arent fully consideredlike encountering that one veggie youd rather avoid. We explained how a system that learns from your tastes and habits could solve this issue, ultimately making the daily task of choosing meals both effortless and inspiring.

article thumbnail

Data Engineering Weekly #215

Data Engineering Weekly

Introducing Apache Airflow® 3.0 Be among the first to see Airflow 3.0 in action and get your questions answered directly by the Astronomer team. You won't want to miss this live event on April 23rd! Save Your Spot → Thoughtworks: Macro trends in the tech industry That raises an important question: not whether AI becomes foundational infrastructure, but how we prepare for that without getting caught flat-footed.

article thumbnail

InferESG: Finding the Right Architecture for AI-Powered ESG Analysis by David Rees

Scott Logic

During the InferESG project we made a pivotal decision to create an alternative architecture, one that sits parallel to the agentic framework used for the conversational part of the system. This decision came about from discussions with the client, and their needs to analyse and process company sustainability reports, evaluate them and compare them to relevant materiality topics.

article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

article thumbnail

Snowpark Magic: Auto-Create Tables from S3 Folders

Cloudyard

Read Time: 3 Minute, 9 Second Snowpark Magic: Auto-Create Tables from S3 Folders In modern data lakes, its common for departments like Finance, Marketing, Sales, etc., to continuously drop data files into their respective folders within an S3 bucket. These files often arrive in CSV format, and over time, teams request new folders or refresh their data.

Finance 52
article thumbnail

Snowflake Startup Spotlight: Innova-Q

Snowflake

Welcome to Snowflakes Startup Spotlight, where we learn about amazing companies building businesses on Snowflake. This time, were casting the spotlight on Innova-Q , where the founders are stirring things up in the food and beverage industry. With the power of modern generative AI, theyre improving product safety, streamlining operations and simplifying regulatory compliance.

Food 52
article thumbnail

Data Appending vs. Data Enrichment: How to Maximize Data Quality and Insights

Precisely

A former colleague recently asked me to explain my role at Precisely. After my (admittedly lengthy) explanation of what I do as the EVP and GM of our Enrich business, she summarized it in a very succinct, but new way: “Oh, you manage the appending datasets.” That got me thinking. We often use different terms when were talking about the same thing in this case, data appending vs. data enrichment.

Retail 52
article thumbnail

Handling Network Throttling with AWS EC2 at Pinterest

Pinterest Engineering

Jia Zhan, Senior Staff Software Engineer, Pinterest Sachin Holla, Principal Solution Architect, AWS Summary Pinterest is a visual search engine and powers over 550 million monthly active users globally. Pinterests infrastructure runs on AWS and leverages Amazon EC2 instances for its compute fleet. In recent years, while managing Pinterests EC2 infrastructure, particularly for our essential online storage systems, we identified a significant challenge: the lack of clear insights into EC2s network

AWS 40
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

10 GitHub Repositories to Master Cloud Computing

KDnuggets

Learn cloud computing concepts, tools, and best practices through free, community-driven content on GitHub.

article thumbnail

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Making with AI and BI

databricks

At Databricks, we believe the future of business intelligence is powered by AI. Thats why were thrilled to announce the Databricks Smart Business Insights Challenge.

BI 104
article thumbnail

Meta’s Llama 4 Large Language Models now available on Snowflake Cortex AI

Snowflake

At Snowflake, we are committed to providing our customers with industry-leading LLMs. Were pleased to bring Metas latest Llama 4 models to Snowflake Cortex AI! Llama 4 models deliver performant inference so customers can build enterprise-grade generative AI applications and deliver personalized experiences. The Llama 4 Maverick and Llama 4 Scout models can be accessed within the secure Snowflake perimeter on Cortex AI.

article thumbnail

How to create an SCD2 Table using MERGE INTO with Spark & Iceberg

Start Data Engineering

1. Introduction 1.1. Code and setup 2. MERGE INTO is used to UPDATE/DELETE/INSERT rows into a target table based on data in the source table 3. SCD2 table pipeline: INSERT new data, UPDATE existing data, and DELETE stale data 3.1. Source includes 2 versions of upstream customer data: one for insert and the other for update 3.2. Updates to the target table 4.

Coding 100
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

What Is Steganography & How Does It Work?

Edureka

What is Steganography? It is the practice of concealing information within ordinary files or media, making the hidden data undetectable to anyone unaware of its presence. From ancient techniques like invisible ink to modern digital methods that embed messages in images, audio, or network traffic, it has evolved significantly. While it is widely used for data protection and digital watermarking, cybercriminals also exploit it to hide malware and evade detection.

IT 40
article thumbnail

The Essential Guide to Regular Expressions for Data Scientists

KDnuggets

Looking to add regular expressions to your data science toolbox? Learn regex with Python from the ground up with this guide.

article thumbnail

Introducing Meta’s Llama 4 on the Databricks Data Intelligence Platform

databricks

Thousands of enterprises already use Llama models on the Databricks Data Intelligence Platform to power AI applications, agents, and workflows.

Data 99
article thumbnail

Make Map Icons with an Orthographic Projection

ArcGIS

Create custom projections with only two coordinates and then turn them into icons for endless possibilities.

Project 69
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Shifting Left: How Data Contracts Underpin People, Processes, and Technology

Confluent

Application engineers situated at the beginning of a data pipeline ("left side") should apply data contracts and products as rigorously as the data engineers further down the line.

article thumbnail

AI-Driven ABM: Scaling Precision and Impact for B2B Growth

Snowflake

Discover how Snowflake's ABM team achieved a 2.3x lift in meetings booked and a 54% increase in CTR by using Snowflake AI for targeted campaigns and more personalized messaging while optimizing both budget and engagement. Weve seen how Snowflake AI tools are transforming outcomes for our customers. From saving 4,000 hours a year on manual email intake to treating more patients in emergency rooms to saving 75% of costs , AI in Snowflake is making a real impact on businesses around the world.

Banking 68
article thumbnail

Exploring the Role of Smaller LMs in Augmenting RAG Systems

KDnuggets

Let's discover what small language models (SLMs) are, how they can be used in RAG systems and applications, and when to use them over their large language counterparts.

Systems 102
article thumbnail

Announcing the General Availability of Lakeflow Connect

databricks

Were excited to announce the General Availability of Lakeflow Connect for Salesforce and Workday.

article thumbnail

Apache Airflow® 101 Essential Tips for Beginners

Apache Airflow® is the open-source standard to manage workflows as code. It is a versatile tool used in companies across the world from agile startups to tech giants to flagship enterprises across all industries. Due to its widespread adoption, Airflow knowledge is paramount to success in the field of data engineering.

article thumbnail

Mobile GraphQL at Meta in 2025

Engineering at Meta

Mobile GraphQL is a framework used at Meta for fetching data in mobile applications using GraphQL , a strongly-typed, declarative query language. At Meta it handles data fetching for apps like Facebook and Instagram. Sabrina, a software engineer on Metas Mobile GraphQL Platform Team, joins Pascal Hartig on the Meta Tech podcast to discuss the evolution and future of GraphQL.

article thumbnail

The AI Silo Problem: How Data Streaming Can Unify Enterprise AI Agents

Confluent

To solve the AI silo problem, enterprises need a shared communication layer for AI agentsa real-time, event-driven approach that lets agents share intelligence and take coordinated action.

Data 40
article thumbnail

What Does Flossing One Tooth Have to Do with Data Quality?

DataKitchen

Improving data quality can feel overwhelming. There are so many things to fix and so many processes to improve. Where do you even start? Surprisingly, the answer may come from an unusual placeflossing your teeth. Start Small to Build Better Habits In the book Tiny Habits by B.J. Fogg, the author suggests a simple way to build habits. If you want to start flossing your teeth, dont aim for a perfect routine right away.

Data 64
article thumbnail

Creating a Data Science Pipeline for Real-Time Analytics Using Apache Kafka and Spark

KDnuggets

This article explains how to create a system that processes data in real time using Apache Kafka and Spark.

Kafka 96
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.