Sat.Sep 23, 2023 - Fri.Sep 29, 2023

article thumbnail

Why are Cloud Development Environments Spiking in Popularity, Now?

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover a fresh industry trends: Cloud Developent Environments — which is analysis full subscribers have received 3 weeks ago.

Cloud 316
article thumbnail

5 Free Books to Help You Master Python

KDnuggets

From the basics of Python to clean architecture and more, here are five free books to level up your Python skills.

Python 157
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

DuckDB + Delta Lake (the new lake house?)

Confessions of a Data Guy

I always leave it to my dear readers and followers to give me pokes in the right direction. Nothing like the teaming masses to set you straight. Recently I was working on my Substack Newsletter, on the topic of Polars + Delta Lake, reading remove files from s3 … I left a question open on […] The post DuckDB + Delta Lake (the new lake house?

Data 147
article thumbnail

Arbitrary stateful processing in PySpark with applyInPandasWithState

Waitingforcode

It's always a huge pleasure to see the PySpark API covering more and more Scala API features. Starting from Apache Spark 3.4.0 you can even write arbitrary stateful processing jobs! But since the API is a little bit different than the one available on the Scala side, I wanted to take a deeper look.

Process 147
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Working at a Startup vs in Big Tech

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of four topics in today’s subscriber-only The Pulse issue. To get full newsletters twice a week, subscribe here. Willem Spruijt is a software engineer whom I worked on the same team with at Uber in Amsterdam, building payments systems.

article thumbnail

Top 7 Free Cloud Notebooks for Data Science

KDnuggets

Cloud notebooks are game-changers for data science, providing free access to computing, pre-built environments, collaboration features, and third-party integrations - everything you need to enhance your workflow.

More Trending

article thumbnail

Deploy Private LLMs using Databricks Model Serving

databricks

We are excited to announce public preview of GPU and LLM optimization support for Databricks Model Serving! With this launch, you can deploy.

article thumbnail

Getting started with Airflow in 10 mins

Marc Lamberti

At the end of this introduction to Airflow, you will be all set for getting started with Airflow. You will start with the basics, such as what Airflow is and the essential concepts. Then you will set up and run your local development environment using the Astro CLI to create your first data pipeline. I hope you’re getting excited. Fasten your seatbelt, take a deep breath, and let’s go For a complete hands-on introduction to Apache Airflow, here is a 6-hour course at a discount.

article thumbnail

Introduction to Deep Learning Libraries: PyTorch and Lightning AI

KDnuggets

Simple explanation of PyTorch and Lightning AI.

article thumbnail

Data News — Week 23.38 (late)

Christophe Blefari

Early like my run ( credits ) Hey. This is a super late Data News, I wanted to send it earlier but I was travelling then enjoying time with friends and family. I'm still struggling a bit to write as fast as I would like, but 🤷‍♂️ So, sorry for the late edition and enjoy. Gen AI 🤖 Announcing Microsoft Copilot — Having everything under a common brand is great and Copilot is a great name.

Data 130
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Announcing the Public Preview of Lakeview Dashboards!

databricks

We are excited to announce the public preview of the next generation of Databricks SQL dashboards, dubbed Lakeview dashboards. Available today, this new.

SQL 126
article thumbnail

Airflow TaskGroup: All you need to know!

Marc Lamberti

An Airflow TaskGroup helps make a complex DAG easier to organize and read. Airflow taskgroups are meant to replace SubDAGs, the historical way of grouping your tasks. Indeed, SubDAGs are too complicated only for grouping tasks. They bring a lot of complexity as you must create a DAG in a DAG, import the SubDagOperator (which is a sensor), define the parameters correctly, and so on.

Coding 130
article thumbnail

Deploying Your First Machine Learning Model

KDnuggets

With just 3 simple steps, you can build & deploy a glass classification model faster than you can say.glass classification model!

article thumbnail

Data News — Week 23.38 (late)

Christophe Blefari

Early like my run ( credits ) Hey. This is a super late Data News, I wanted to send it earlier but I was travelling then enjoying time with friends and family. I'm still struggling a bit to write as fast as I would like, but 🤷‍♂️ So, sorry for the late edition and enjoy. Gen AI 🤖 Announcing Microsoft Copilot — Having everything under a common brand is great and Copilot is a great name.

Data 130
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

easyJet bets on Databricks Lakehouse and Generative AI to be an Innovation Leader in Aviation

databricks

This blog is authored by Ben Dias, Director of Data Science and Analytics and Ioannis Mesionis, Lead Data Scientist at easyJet Introduction to.

article thumbnail

Old School: Adapting Esri Basemaps for Printed Products

ArcGIS

Esri basemaps are designed to be used at multiple scales, but a static map needs everything in one view. How doe we get around that?

Designing 123
article thumbnail

Generative Agent Research Papers You Should Read

KDnuggets

Research paper in the exciting field that you don’t want to miss.

146
146
article thumbnail

Lessons from debugging a tricky direct memory leak

Pinterest Engineering

Sanchay Javeria | Software Engineer, Ads Data Infrastructure To support metrics reporting for ads from external advertisers and real-time ad budget calculations at Pinterest, we run streaming pipelines using Apache Flink. These jobs have guaranteed an overall 99th percentile availability to our users; however, every once in a while some tasks get hit with nasty direct out-of-memory (OOM) errors on multiple operators that look something like this: As is the case with most failures in a distribute

Utilities 115
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

How Snowflake Native Apps Deliver Security for App Builders and Consumers

Snowflake

The Snowflake Native App Framework , which leverages Snowflake’s advanced architecture, allows for a new level of security for applications. This security spans not just the application consumer, but also the application providers. Controlling all software and infrastructure in the Snowflake Data Cloud, Snowflake can secure the application code to protect the intellectual property (IP) of builders.

Python 115
article thumbnail

Working with Esri Vector Basemaps in ArcGIS Pro

ArcGIS

Esri Vector Basemaps are available for use in ArcGIS Pro, and that opens up some new possibilities for you.

Designing 117
article thumbnail

A Comparative Overview of the Top 10 Open Source Data Science Tools in 2023

KDnuggets

Are you looking for the open source tools to help you in your data science journey? Look no further. Discover these game-changers that will elevate your data-driven decisions.

article thumbnail

Ballard Power Systems RDU (Remote Diagnostics Unit) Visualization Platform for Interactive At-Scale Industrial IoT Streaming Analytics

databricks

This article represents a collaborative effort between Plotly, Ballard Power Systems, and Databricks. Fleets of buses worldwide run on hydrogen fuel cells made.

Systems 113
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Marketing Success in the Age of AI: Celebrating EMEA’s Modern Marketing Data Stack Pioneers

Snowflake

Data is an invaluable asset in today’s marketing ecosystem. With its unique blend of cultures, economies, and regulatory environments, the EMEA market offers a nuanced picture of how marketers harness data technologies to understand their audiences, calibrate campaigns in real time, and adhere to complex government and industry regulations. In our second annual Snowflake Modern Marketing Data Stack 2023 report , we delve into actual usage and adoption of marketing technologies within the Snowfla

article thumbnail

Training Foundation Improvements for Closeup Recommendation Ranker

Pinterest Engineering

Fan Jiang | Software Engineer, Closeup Candidate Retrieval; Liyao Lu | Software Engineer, Closeup Ranking & Blending; Laksh Bhasin | Software Engineer, Core ML Foundations; Chen Yang | Software Engineer, Core ML Foundations; Shivin Thukral | Software Engineer, Closeup Ranking & Blending; Travis Ebesu | Software Engineer, Closeup Ranking & Blending; Kent Jiang | Software Engineer, Core Serving Infra; Yan Sun | Engineering Manager, Closeup Ranking & Blending; Huizhong Duan | Engine

article thumbnail

The Data Maturity Pyramid: From Reporting to a Proactive Intelligent Data Platform

KDnuggets

This article describes the data maturity pyramid and its various levels, from simple reporting to AI-ready data platforms. It emphasizes the importance of data for business and illustrates how data platforms serve as the driving force behind AI.

Data 145
article thumbnail

Using Images and Metadata for Product Fuzzy Matching with Zingg

databricks

Product matching is an essential function in many retail and consumer goods organizations. Incoming products are compared to items in the existing product.

Metadata 110
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Molex Improves Data Sharing, Visibility, and Performance with the Snowflake Manufacturing Data Cloud

Snowflake

The unprecedented amount of volatility in supply and demand has caused economic uncertainty throughout the world. To navigate today’s challenging economy, manufacturers must digitize their supply chain and manufacturing processes. Digital advancements such as smart manufacturing and automation through AI, machine learning (ML), robotics, and IoT require a connected value chain ecosystem with a secure, scalable, and flexible data platform.

article thumbnail

Fueling Data-Driven Decision-Making with Data Validation and Enrichment Processes

Precisely

77% of data and analytics professionals say data-driven decision-making is the top goal for their data programs. Data-driven decision-making and initiatives are certainly in demand, but their success hinges on … well, the data that supports them. More specifically, the quality and integrity of that data. It seems obvious enough, but checking that your data is up to the task and taking any necessary steps to improve and maintain its quality can be easier said than done.

article thumbnail

Building a Convolutional Neural Network with PyTorch

KDnuggets

This blog post provides a tutorial on constructing a convolutional neural network for image classification in PyTorch, leveraging convolutional and pooling layers for feature extraction as well as fully-connected layers for prediction.

Building 144
article thumbnail

Governing cybersecurity data across multiple clouds and regions using Unity Catalog & Delta Sharing

databricks

According to a 2023 report from Enterprise Search Group, 85% of organizations indicated they deploy applications on two or more IaaS providers, attesting.

article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m