Sat.Aug 10, 2024 - Fri.Aug 16, 2024

article thumbnail

Data Engineering Interview Series #1: Data Structures and Algorithms

Start Data Engineering

1. Introduction 2. Data structures and algorithms to know 2.1. List 2.2. Dictionary 2.3. Queue 2.4. Stack 2.5. Set 2.6. Counter (from collections module) 2.7. Heap 2.8. Graph search 2.8.1 Depth First Search (DFS) 2.8.2. Breadth First Search BFS 2.9. Binary Search 3. Common DSA questions asked during DE interviews 3.1. Intervals 3.

Algorithm 200
article thumbnail

10 Python Libraries Every Data Scientist Should Know

KDnuggets

Want to take the next step in your journey to becoming a data scientist? Check out these Python libraries for data science that you can't do without.

Python 152
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Long Context RAG Performance of LLMs

databricks

Retrieval Augmented Generation (RAG) is the most widely adopted generative AI use case among our customers. RAG enhances the accuracy of LLMs by.

145
145
article thumbnail

Speakers for Amsterdam / Netherlands Tech Events

The Pragmatic Engineer

I (Gergely) sometimes get reachouts to do talks at events in Amsterdam (where I am based,) the Netherlands, or somewhere in Europe. Unfortunately, rarely do talks – I do one conference per year. However, I asked around in the community about tech professionals who do paid talks that software engineers find interesting, engaging, and educational.

article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

A Melange of Maps

ArcGIS

Different thematic map types are better at supporting some questions than others. Here are a range of alternative approaches.

Designing 136
article thumbnail

Beginner’s Guide to Careers in AI and Machine Learning

KDnuggets

The AI and ML complexity results in a growing number and diversity of jobs that require AI & ML expertise. We’ll give you a rundown of these jobs regarding the technical skills they need and the tools they employ.

More Trending

article thumbnail

Speakers for Amsterdam / Netherlands Tech Events

The Pragmatic Engineer

I (Gergely) sometimes get reachouts to do talks at events in Amsterdam (where I am based,) the Netherlands, or somewhere in Europe. Unfortunately, rarely do talks – I do one conference per year. However, I asked around in the community about tech professionals who do paid talks that software engineers find interesting, engaging, and educational.

article thumbnail

Make a vintage basemap in ArcGIS Pro with some Living Atlas shenanigans

ArcGIS

How to combine Living Atlas layers into a plausibly 1890s style ArcGIS Pro basemap. And thoughts on time travel.

132
132
article thumbnail

Top 5 Free Resources for Learning Advanced SQL Techniques

KDnuggets

Today, we’re looking for five quality resources that will teach you advanced SQL and do it for free.

SQL 143
article thumbnail

Announcing the Generative AI World Cup 2024: A Global Hackathon by Databricks

databricks

Welcome to the Generative AI World Cup 2024 , a global hackathon inviting participants to develop innovative Generative AI applications that solve real-world.

131
131
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Robinhood Welcomes Jeff Pinner as Chief Technology Officer

Robinhood

We are thrilled to announce that Jeff Pinner has joined Robinhood as Chief Technology Officer (CTO). Jeff is a deeply respected innovator who will help us accelerate our product roadmap, scale our infrastructure, enhance customer experiences, and drive operational efficiencies. “Robinhood is an engineering company at the forefront of so many pivotal transformations in financial services, including leveraging state of the art AI,” said Jeff Pinner.

article thumbnail

How to add 2D features to a 3D scene

ArcGIS

With the growing popularity of 3D GIS, users are shifting from 2D to 3D. What is the proper method to move pre-existing 2D data onto a 3D scene?

Data 109
article thumbnail

5 Tools for Automating Data Cleaning Processes

KDnuggets

Struggling with time-consuming data cleaning tasks? Discover five tools that can automate and simplify the process.

Process 132
article thumbnail

Databricks SQL Serverless is now available on Google Cloud Platform

databricks

Databricks SQL Serverless is now Generally Available on Google Cloud Platform (GCP)! SQL Serverless is available in 7 GCP regions and 40+ regions across AWS, Azure and GCP.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Unapologetically Technical Episode 13 – Jeff Chou

Jesse Anderson

Unapologetically Technical’s newest episode is now live! In this episode of Unapologetically Technical, I interview Jeff Chou, CEO and co-founder of Sync Computing. Jeff, who holds a PhD from UC Berkeley and a postdoc from MIT, shares his unique journey from academia to startup life, and how his experience with simulations shaped the vision for Sync Computing.

article thumbnail

Calculate the travel time or distance between paired origins and destinations

ArcGIS

Use the ArcGIS Network Analyst route solver and out-of-the-box tools to calculate travel time and distance between origin-destination pairs.

article thumbnail

The Only Prompting Framework for Every Use

KDnuggets

This prompt engineering framework significantly enhances your interactions with AI systems.

article thumbnail

Beyond the Leaderboard: Unpacking Function Calling Evaluation

databricks

1. Introduction The research and engineering community at large have been continuously iterating upon Large Language Models (LLMs) in order to make them.

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

How Meta animates AI-generated images at scale

Engineering at Meta

We launched Meta AI with the goal of giving people new ways to be more productive and unlock their creativity with generative AI (GenAI). But GenAI also comes with challenges of scale. As we deploy new GenAI technologies at Meta, we also focus on delivering these services to people as quickly and efficiently as possible. Meta AI’s animate feature, which lets people generate a short animation of a generated image, carried unique challenges in this regard.

Media 93
article thumbnail

DoorDash Empowers Engineers with Kafka Self-Serve

DoorDash Engineering

DoorDash is supporting an increasingly diverse array of infrastructure use cases as the company matures. To maintain our development velocity and meet growing demands, we are transitioning toward making our stateful storage offerings more self-serve. This journey began with Kafka, one of our most critical and widely used infrastructure components. Kafka is a distributed event streaming platform that DoorDash uses to handle billions of real-time events.

Kafka 82
article thumbnail

10 Python Statistical Functions

KDnuggets

This guide will go over 10 essential statistical functions in Python using commonly-used libraries.

Python 129
article thumbnail

An Introduction to Time Series Forecasting with Generative AI

databricks

An Introduction to Time Series Forecasting with Generative AI Time series forecasting has been a cornerstone of enterprise resource planning for decades. Predictions.

Retail 126
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Current 2024: What’s on Tap in Data Streaming

Confluent

Current 2024 brings 100+ sessions, keynotes, lightning talks, and more from industry leaders. Check out the agenda, highlights, networking events, and more event info.

Data 78
article thumbnail

How to make this thematic map, and stuff

ArcGIS

Some things to consider when making a thematic map.

76
article thumbnail

Tools Every AI Engineer Should Know: A Practical Guide

KDnuggets

Explore essential tools and skills for AI engineers: Python, R, big data frameworks, and cloud services essential for building and optimizing AI systems.

article thumbnail

Building a robust data stewardship tool in life sciences

databricks

This blog was written in collaboration with Gordon Strodel, Director, Data Strategy & Analytics Capability, in addition to Abhinav Batra, Associate Principal, Enterprise.

Building 122
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

We Built an Open-Source Data Quality Testframework for PySpark

Towards Data Science

Measure and report your data quality with ease Continue reading on Towards Data Science »

article thumbnail

Navigating the Future with Cloudera’s Updated Interface

Cloudera

Data practitioners are consistently asked to deliver more with less, and although most executives recognize the value of innovating with data, the reality is that most data teams spend the majority of their time responding to support tickets for data access, performance and troubleshooting, and other mundane activities. At the heart of this backlog of requests is this: data is hard to work with, and it’s made even harder when users need to work to get or find what they need.

article thumbnail

Speeding Up Your Python Code with NumPy

KDnuggets

Why NumPy is significantly faster than standard Python code execution.

Python 127
article thumbnail

Databricks University Alliance Crosses 1,000 University Threshold

databricks

Databricks is thrilled to share that our University Alliance has welcomed its one-thousandth-member school! This milestone is a testament to our mission to.

IT 122
article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m