Sat.Jul 27, 2024 - Fri.Aug 02, 2024

article thumbnail

My Obsidian Note-Taking Workflow

Simon Späti

A Vim-Inspired Approach to Efficient Note Management with Obsidian and Markdown

article thumbnail

Building Data Science Pipelines Using Pandas

KDnuggets

Learn to build the end-to-end data science pipelines from data ingestion to data visualization using Pandas pipe method.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Announcing General Availability of Lakehouse Federation

databricks

Today, we are excited to announce that Lakehouse Federation in Unity Catalog is now Generally Available (GA) across AWS, Azure, and GCP! Lakehouse.

AWS 139
article thumbnail

Introducing Apache Kafka® 3.8

Confluent

Apache Kafka 3.8 adds 17 new KIPs (13 for Core, 3 for Streams & 1 for Connect). Highlights include 2 new Docker images, the ability to set task assignors, and more!

Kafka 136
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

How To Run A Data Team As A New Head Of Data

Seattle Data Guy

What would you do if you became the head or director of data for a 1,000-person company? Yesterday, you were plugging along as an analyst, and now, suddenly, you have all these new responsibilities. Figuring out where to start is part of the job. You’d probably feel a strong temptation to freak out. Who wouldn’t?… Read more The post How To Run A Data Team As A New Head Of Data appeared first on Seattle Data Guy.

Data 130
article thumbnail

5 Tips for Improving SQL Query Performance

KDnuggets

If you work in data, you’ll write SQL queries all the time. So how do you write efficient SQL queries that are optimized for performance? This tutorial will help you with just that.

SQL 140

More Trending

article thumbnail

Data+AI Summit 2024 - Retrospective - Apache Spark

Waitingforcode

Welcome to the second blog post dedicated to the previous Data+AI Summit. This time I'm going to share with you a summary of Apache Spark talks.

Data 130
article thumbnail

How to make a “peeled edge” area of interest effect in ArcGIS Pro

ArcGIS

Catch eyes and imaginations with this fun technique that draws attention to your area of interest with a bit of style!

127
127
article thumbnail

7 Steps to Master the Art of Data Storytelling

KDnuggets

Follow this 7 step recipe to mastering effective insight and information dissemination through compelling data story crafting.

Data 138
article thumbnail

Ingest data from SQL Server, Salesforce, and Workday with LakeFlow Connect

databricks

We’re excited to announce the Public Preview of LakeFlow Connect for SQL Server, Salesforce, and Workday. These ingestion connectors enable simple and efficient.

SQL 135
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Securely Deploy Custom Apps and Models with Snowpark Container Services, Now Generally Available

Snowflake

Since introducing Snowpark Container Services, we’ve seen overwhelming adoption across industries from customers and partners, including Landing.AI , Relational.AI , H20.AI , SailPoint , AIR MILES , Spark NZ , and Eutelsat OneWeb. These organizations and many more are using Snowpark Container Services capabilities to easily and securely deploy everything from custom front-ends and large-scale ML training and inference to open source and homegrown models, all securely within Snowflake.

article thumbnail

New with Confluent Platform: Enhanced security with OAuth Support, Confluent Platform for Apache Flink® (LA), a new Connector, and More

Confluent

Confluent Platform 7.

121
121
article thumbnail

How to Perform Memory-Efficient Operations on Large Datasets with Pandas

KDnuggets

Let's learn how to perform memory-efficient operations in pandas with large dataset.

Datasets 137
article thumbnail

Lakehouse Monitoring GA: Profiling, Diagnosing, and Enforcing Data Quality with Intelligence

databricks

At Data and AI Summit, we announced the general availability of Databricks Lakehouse Monitoring. Our unified approach to monitoring data and AI.

Data 128
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

ArcGIS Solutions introduces Essential Data Models to Utility Network Foundation solutions

ArcGIS

Essential Data Models in the Utility Network Foundations

Utilities 121
article thumbnail

Snowflake Invests in Contextual AI to Make It Easier for Enterprises to Deploy RAG Applications in the AI Data Cloud

Snowflake

Retrieval Augmented Generation (RAG) allows enterprises to ground responses from Large Language Models in their specific organization’s data. This helps ensure that AI-powered applications provide responses that are not only accurate, relevant, and consistent, but also aligned with business needs. At Snowflake, we make it simple for our customers to implement RAG, while also enabling the strict governance and privacy controls that businesses require.

Cloud 115
article thumbnail

Organize, Search, and Back Up Files with Python’s Pathlib

KDnuggets

This tutorial will teach you how to simplifying your file management tasks, from organization to backup, using Python’s pathlib module.

article thumbnail

OKR-Centric Delivery Models for Engineering-Focused Enterprises

databricks

Introduction An organization adopting new technologies or on a modernization journey typically focuses on upcoming tools, their features and potential performance/cost improvements under.

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Data Engineering Weekly #182

Data Engineering Weekly

Meta: Introducing Llama 3.1: Our most capable models to date Probability one of the hottest announcements this week is Llama 3.1 release - the first-ever open-sourced frontier AI model competitive with leading foundation models across a range of tasks, including GPT-4, GPT-4o, and Claude 3.5 Sonnet. The Llama3 herd of models is an insightful paper that helps one deeply understand the foundational model.

article thumbnail

Beyond Web Mercator: Projected Basemaps Revisited

ArcGIS

More small-scale projected basemaps to add to the set I built in 2023

Project 104
article thumbnail

6 ChatGPT Prompts to Enhance your Productivity at Work

KDnuggets

Unlock your potential with these crafted 6 ChatGPT prompts designed to boost your productivity and streamline your operation workflows.

Designing 134
article thumbnail

Responsible AI with the Databricks Data Intelligence Platform

databricks

The transformative potential of artificial intelligence (AI) is undeniable. From productivity efficiency, to cost savings, and improved decision-making across all industries, AI is.

Data 116
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Daft: Distributed Dataframes with Python.

Confessions of a Data Guy

The post Daft: Distributed Dataframes with Python. appeared first on Confessions of a Data Guy.

Python 100
article thumbnail

Accelerating Academic Medical Research with an AI-Driven Data Strategy

Snowflake

Academic medical centers (AMCs) are a critical keystone of healthcare systems worldwide. They serve as major hubs of medical research, pioneering new treatments that advance and set the standard of care throughout medicine. They also educate and train the next generation of healthcare professionals, ensuring that the medical field continues to advance.

Medical 98
article thumbnail

How to Use MultiIndex for Hierarchical Data Organization in Pandas

KDnuggets

Let's learn how to use multiindex pandas for hierarchical data operations.

Data 119
article thumbnail

Generative AI for Capital Markets

databricks

Financial Valuations & Comparative Analysis Financial institutions specialized in capital markets such as hedge funds, market makers and pension funds have long been.

103
103
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

CI/CD for Data Engineers.

Confessions of a Data Guy

The post CI/CD for Data Engineers. appeared first on Confessions of a Data Guy.

article thumbnail

The 6 Data Quality Dimensions with Examples

Monte Carlo

It’s clear that data quality is becoming more of a focus for more data teams. So why are there still so many questions like these: A quick search on subreddits for data engineers, data analysts, data scientists, and more can yield a plethora of users seeking data quality advice. And while the comment below may seem like the accepted way of doing data quality management… … there’s actually a much better way.

article thumbnail

How to Perform Matrix Operations with NumPy

KDnuggets

Learning how to perform several of the most basic matrix operations with NumPy.

Python 114
article thumbnail

Democratizing Data Sharing: A Platform-Agnostic Approach

databricks

Companies across all industries want to share data with each other to enable collaboration and accelerate innovation. However, these organizations often use different.

Data 94
article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m