Fri.Oct 25, 2024

article thumbnail

10 Essential Python Libraries for Data Science in 2024

KDnuggets

The richness of Python’s ecosystem has one downside: it makes it difficult to decide which libraries are the best for your needs. This article is an attempt to amend this by suggesting ten (and some more, as a bonus) libraries that are an absolute must in data science.

article thumbnail

Tales from the Pipeline: 4 Data Horror Stories To Keep You Up at Night

Monte Carlo

“As he lay awake in his Bay Area apartment, the data leader couldn’t shake the feeling that something wasn’t right. He tried to shut his eyes—to force them closed—but the more the data engineer tried, the more convinced he became. Suddenly, a light appeared from the darkness. It was a Slack from the CEO. She was working late. And the data…it couldn’t be…it looked wrong.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Building Interactive Data Science Applications with Python

KDnuggets

Using Python to build engaging and interactive applications where users can pass in an input, get and feedback and make use of multimedia elements such as images, videos, and audio.

Python 132
article thumbnail

Unlocking FHIR for Data and AI in a Meaningful Way

databricks

Discover how the Databricks and XponentL partnership is allowing customers to unlock their FHIR needs. Learn more about dbignite. Imagine you’re feeling.

Data 96
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Diff Authoring Time: Measuring developer productivity at Meta

Engineering at Meta

At Meta, we’re always looking for ways to enhance the productivity of our engineers and developers. But how exactly do you measure developer productivity? On this episode of the Meta Tech Podcast Pascal Hartig ( @passy ) sits down with Sarita and Moritz , two engineers at Meta who have been working on Diff Authoring Time (DAT) – a method for measuring how long it takes to submit changes to a codebase.

article thumbnail

The Curse of Conway and the Data Space

Towards Data Science

How modern trends can be traced back to Conway’s Law Image by the author. (Generated by Midjourney, touched up with Krita) This article was originally posted on my blog [link]. The article was triggered by and riffs on the “Beware of silo specialisation” section of Bernd Wessely’s post Data Architecture: Lessons Learned. It brings together a few trends I am seeing plus my own opinions after twenty years experience working on both sides of the software / data team divide.

article thumbnail

The Smart Approach to ETL Monitoring

Monte Carlo

We’re the middle children of the data revolution, born into systems promised to be ‘set it and forget it,’ taught to believe that our pipelines would run forever. They won’t. The first rule of data pipelines is: they will break. The second rule of data pipelines is: THEY WILL BREAK. You could spend your nights staring at broken dashboards… or you can put in place an ETL monitoring strategy and avoid those everything-is-broken moments at three in the morning.