Trending Articles

article thumbnail

Love and hate - Excel files and data engineers

Waitingforcode

Even though data engineers enjoy discussing table file formats, distributed data processing, or more recently, small data, they still need to deal with legacy systems. By "legacy," I mean not only the code you or your colleagues wrote five years ago but also data formats that have been around for a long time. Despite being challenging for data engineers, these formats remain popular among business users.

article thumbnail

10 GitHub Awesome Lists for Data Science

KDnuggets

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter 10 GitHub Awesome Lists for Data Science Most popular educational resource list on GitHub for Python, R, SQL, analytics, machine learning, datasets, and more.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

How to Combine Streamlit, Pandas, and Plotly for Interactive Data Apps

KDnuggets

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter How to Combine Streamlit, Pandas, and Plotly for Interactive Data Apps With just two Python files and a handful of methods, you can build a complete dashboard that rivals expensive business intelligence tools.

article thumbnail

DuckDB Enters the Lake House Race: My Take on DuckLake

Confessions of a Data Guy

I’ve been thinking about this for a few days now, and I still don’t know whether to cheer or groan. Some moments, I see DuckLake as a smart, much-needed evolution; other times, it feels like just another unnecessary entry in the ever-growing Lake House jungle. Reality, as always, is probably somewhere in between. MotherDuck and […] The post DuckDB Enters the Lake House Race: My Take on DuckLake appeared first on Confessions of a Data Guy.

IT 130
article thumbnail

Whats New in Apache Airflow 3.0 –– And How Will It Reshape Your Data Workflows?

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Meta joins Kotlin Foundation

Engineering at Meta

We are proud to announce that Meta has officially joined the Kotlin Foundation as a gold member, marking a significant milestone in our ongoing commitment to Kotlin and the broader Android development ecosystem. Over the past several years, Meta engineers have been actively migrating our extensive Android codebase —comprising tens of millions of lines—from Java to Kotlin.

Java 145
article thumbnail

Introducing the Databricks AI Governance Framework

databricks

Today, we’re introducing the Databricks AI Governance Framework (DAGF v1.0), a structured and practical approach to governing AI adoption across the enterprise.

More Trending

article thumbnail

Automate Data Quality Reports with n8n: From CSV to Professional Analysis

KDnuggets

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter Automate Data Quality Reports with n8n: From CSV to Professional Analysis Analyze any CSV dataset from a URL and generate professional quality reports with n8n By Vinod Chugani on June 26, 2025 in Data Science Image by Author | ChatGPT The Data Quali

Datasets 110
article thumbnail

Lakebase: Databricks’ Bold Play to Fuse OLTP and the Lakehouse

Confessions of a Data Guy

The future never shows up quietly. Just when you think you’ve tamed the latest “must-have” technology, a fresh acronym crashes the party. I’d barely finished wrapping my head around the Lakehouse paradigm when Databricks rolled out something new at the 2025 Data & AI Summit: Lakebase, a fully managed PostgreSQL engine built directly into the […] The post Lakebase: Databricks’ Bold Play to Fuse OLTP and the Lakehouse appeared first on Confessions of a Data Guy.

article thumbnail

An inside look at Meta’s transition from C to Rust on mobile

Engineering at Meta

Have you ever worked is legacy code? Are you curious what it takes to modernize systems at a massive scale? Pascal Hartig is joined on the latest Meta Tech Podcast by Elaine and Buping, two software engineers working on a bold project to rewrite the decades-old C code in one of Meta’s core messaging libraries in Rust. It’s an ambitious effort that will transform a central messaging library that is shared across Messenger, Facebook, Instagram, and Meta’s AR/VR platforms.

article thumbnail

How Unity Catalog Managed Tables Automate Performance at Scale

databricks

Unity Catalog (UC) managed tables combine strong governance with seamless interoperability across tools.

article thumbnail

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Snowflake Startup Spotlight: Jedify

Snowflake

Welcome to Snowflake’s Startup Spotlight, where we ask startup founders about the problems they’re solving, the apps they’re building and the lessons they’ve learned during their startup journey. In this edition, meet Assaf Henkin, the founder of Jedify , and see how the company is addressing the challenge of growing data complexity by making AI-powered data intelligence accessible and scalable.

article thumbnail

Databricks SQL Scripting: A Familiar Friend or an Old Foe?

Confessions of a Data Guy

I’d be lying if I said a small part of me didn’t groan when I first read about SQL Scripting being released by Databricks. Don’t get me wrong—I don’t fault Databricks for giving users what they want. After all, if you don’t feed the masses, they’ll turn on you. We data engineers are gluttons for […] The post Databricks SQL Scripting: A Familiar Friend or an Old Foe?

SQL 100
article thumbnail

Mapping mangrove dynamics with raster functions in Map Viewer

ArcGIS

ArcGIS Blog Menu Overview Topics Search ArcGIS Blog ArcGIS Blog Imagery & Remote Sensing ArcGIS Online Jun 27, 2025 Mapping mangrove dynamics with raster functions in Map Viewer By Sucheta Bhattacharjee and Ling Tang Mapping mangrove dynamics is critical for understanding the health and resilience of these unique ecosystems. Mangroves provide invaluable ecosystem services such as carbon sequestration, coastal protection from storm surges, and habitat for diverse species.

article thumbnail

AI Security in Action: Applying NVIDIA’s Garak to LLMs on Databricks

databricks

Introduction Large Language Models (LLMs) have swiftly become essential components of modern workflows, automating tasks traditionally performed by humans.

71
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Forward Data Conference + some news

Christophe Blefari

Hey Data News readers. Sorry for being absent for the last 2 months, I was in SF to work on nao because we went through Y Combinator , to be honest it was intense 3 months and an awesome experience. Small head's up, I'm organising the Forward Data Conference (2nd edition) on November 24th in Paris and we are cooking a great program! The call for talk proposal is ending this Sunday (July 6th), so make sure to propose a talk this week if you wanna join this awesome moment!

Data 100
article thumbnail

5 Fun Python Projects for Absolute Beginners

KDnuggets

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter 5 Fun Python Projects for Absolute Beginners Bored of theory? These hands-on Python projects make learning interactive, practical, and actually enjoyable.

Python 113
article thumbnail

End-to-End Data Pipeline on GCP with Airflow: A Social Media Case Study

RandomTrees

Blog Part 2: Orchestrating SQL-based Transformations with Airflow in GCP Introduction In Part 1, we covered how to set up the GCP environment, create datasets, and prepare the schema for our social media project. Now in Part 2, we’ll focus on building an Apache Airflow DAG that automatically reads SQL files from Cloud Storage and executes them in BigQuery.

Media 52
article thumbnail

Robinhood Launches Stock Tokens, Reveals Layer 2 Blockchain, and Expands Crypto Suite in EU and US with Perpetual Futures and Staking

Robinhood

Robinhood Stock Tokens will allow EU customers to get exposure to the US stock market Robinhood will also launch a new Layer 2 blockchain to power the tokenization of Real World Assets Today, at Robinhood Presents: To Catch a Token in Cannes, France, we unveiled a suite of new products that mark a major step forward for crypto. From expanding Robinhood to over 400 million people across 30 EU and EEA countries, to launching stock and ETF tokens, we’re building toward a future where investing is s

Insurance 128
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Data Engineering Weekly #226

Data Engineering Weekly

The Data Platform Fundamentals Guide Learn the fundamental concepts to build a data platform in your organization. - Tips and tricks for data modeling and data ingestion patterns - Explore the benefits of an observation layer across your data pipelines - Learn the key strategies for ensuring data quality for your organization Get the guide Kiran Gopinathan: Programming Language Design in the Era of LLMs - A Return to Mediocrity?

article thumbnail

Want to deliver value? Focus on flow by Nick Hume

Scott Logic

In simple terms, a process that’s becoming more efficient might be defined as one that generates more value without the need for greater effort. However, simplicity is not a defining characteristic of most software development projects, and the more they grow in size and complexity, the more opportunities there are for inefficiencies to creep in. The software development process is relatively easy to conceptualise, and is all too often oversimplified or trivialised, by everyone, from engineering

Project 52
article thumbnail

A Beginner’s Guide to Mastering Gemini + Google Sheets

KDnuggets

In this article, we'll go through the implementation of Gemini with Google Sheets.

98
article thumbnail

End-to-End Data Pipeline on GCP with Airflow: A Social Media Case Study

RandomTrees

Blog Part 1: Social Media Data Pipeline – GCP Setup and Modeling Introduction In this blog series, I will walk you through a real-world case study I personally worked on, where we built an end-to-end social media data pipeline using Google Cloud Platform (GCP) and Apache Airflow. This pipeline helps analyze user engagement, trends, and behavior from a simulated social media platform.

Media 52
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Building a Trusted AI Data Architecture: The Foundation of Scalable Intelligence

Teradata

Skip to main content Support Global Global Deutschland France 日本 대한민국 Why Teradata Product Solutions Resources About us search Try for free Contact us search Check out Teradata AI Factory. Learn more Check out Teradata AI Factory close Home Resources Data architecture Article Building a Trusted AI Data Architecture: The Foundation of Scalable Intelligence Discover how AI data architecture shapes data quality and governance for successful AI initiatives.

article thumbnail

Training 10,000 Anomaly Detection Models on One Billion Records with Explainable Predictions

databricks

The Power of Anomaly Detection Across Industry Anomaly detection is a crucial technique for identifying unusual patterns that could signal potential problems or opportunities.

article thumbnail

Just Launched: Unstructured Data Monitoring

Monte Carlo

Bad data has always eroded stakeholder trust; what’s new today is the type of bad data that’s eroding it. Internal documents, support tickets, product descriptions and images, chat logs… all once siloed and ignored are now fueling the development of AI applications. But as AI adoption accelerates, unstructured data like text and images isn’t just becoming more critical—it’s also becoming more opaque.

article thumbnail

7 Popular LLMs Explained in 7 Minutes

KDnuggets

Blog Top Posts About Topics AI Career Advice Computer Vision Data Engineering Data Science Language Models Machine Learning MLOps NLP Programming Python SQL Datasets Events Resources Cheat Sheets Recommendations Tech Briefs Advertise Join Newsletter 7 Popular LLMs Explained in 7 Minutes Get a quick overview of GPT, BERT, LLaMA, and more! By Kanwal Mehreen , KDnuggets Technical Editor & Content Specialist on June 26, 2025 in Language Models Image by Author | Canva We use large language models in

article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Transform your BIM workflows: Two ways to use Autodesk models in ArcGIS

ArcGIS

ArcGIS Blog Menu Overview Topics Search ArcGIS Blog ArcGIS Blog 3D Visualization & Analytics ArcGIS GeoBIM Jun 26, 2025 Transform your BIM workflows: Two ways to use Autodesk models in ArcGIS By Geoff Cook and Andreas Lippold ArcGIS offers multiple ways to work with building information modeling (BIM) data from Autodesk in geographic information system ( GIS ) workflows.

article thumbnail

DareData Use Case: Beyond Simple Chatbots

DareData

How a Multi-Agent GenAI Architecture can Reshape Customer Support Customer service teams deal with thousands of repetitive and routine questions every day: take a company in the telco sector, it can receive queries from mobile plan details to subscription issues, FAQs or service issues. Or a company from online retail that may receive millions of requests regarding product devolutions.

Retail 52
article thumbnail

Powering Enterprise AI With On-Prem Solutions: Control, Compliance, and Confidence

Teradata

Explore on-prem AI technology, its benefits, and how it's shaping intelligent data solutions across industries.

article thumbnail

From Pawns to Pipelines: Stream Processing Fundamentals Through Chess

Confluent

[Webinar] Master Apache Kafka Fundamentals with Confluent | Register Now Login Contact Us Why Confluent Confluent vs. Apache Kafka® Learn more about how Confluent differs from Apache Kafka For Practitioners Discover the platform that is built and designed for those who build For Executives Unlock the value of data across your business Our Customers Explore testimonials and case studies from Confluents customers Our Partners Find a partner or explore our partner programs Products Data Streaming P

Process 52
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!