August, 2024

article thumbnail

Neo4j vs. Amazon Neptune: Graph Databases in Data Engineering

Analytics Vidhya

Introduction Managing complicated, interrelated information is more important than ever in today’s data-driven society. Traditional databases, while still valuable, often falter when it comes to handling highly connected data. Enter the unsung heroes of the data world: graph databases. These powerful tools are designed to manage and query intricate data relationships effortlessly.

Database 212
article thumbnail

Data Engineering Interview Series #1: Data Structures and Algorithms

Start Data Engineering

1. Introduction 2. Data structures and algorithms to know 2.1. List 2.2. Dictionary 2.3. Queue 2.4. Stack 2.5. Set 2.6. Counter (from collections module) 2.7. Heap 2.8. Graph search 2.8.1 Depth First Search (DFS) 2.8.2. Breadth First Search BFS 2.9. Binary Search 3. Common DSA questions asked during DE interviews 3.1. Intervals 3.

Algorithm 200
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Optimizing Your LLM for Performance and Scalability

KDnuggets

Optimize LLM performance and scalability using techniques like prompt engineering, retrieval augmentation, fine-tuning, model pruning, quantization, distillation, load balancing, sharding, and caching.

article thumbnail

Apache Spark’s Most Annoying Use Case

Confessions of a Data Guy

I still remember the good ole days when Apache Spark was fresh and hot, hardly anyone was using it, except a few poor AWS Glue and EMR users … Lord have mercy on their ragged souls. It’s funny how that GOAT of a tool went from being used by a few companies for extremely large […] The post Apache Spark’s Most Annoying Use Case appeared first on Confessions of a Data Guy.

AWS 148
article thumbnail

Apache Airflow® 101 Essential Tips for Beginners

Apache Airflow® is the open-source standard to manage workflows as code. It is a versatile tool used in companies across the world from agile startups to tech giants to flagship enterprises across all industries. Due to its widespread adoption, Airflow knowledge is paramount to success in the field of data engineering.

article thumbnail

Data Teams Survey 2024 Results

Jesse Anderson

In the spring of 2024, I ran a new survey to gather more data for my Data Teams book and update my 2023 and 2020 surveys. In total, we had 81 respondents. This survey was designed to get information about how management uses data teams, the value they’re creating, and how they’re creating it. The survey asked about the best and worst practices that teams are using or experiencing.

More Trending

article thumbnail

Airflow Alternatives for Data Orchestration

Analytics Vidhya

Introduction Apache Airflow is a crucial component in data orchestration and is known for its capability to handle intricate workflows and automate data pipelines. Many organizations have chosen it due to its flexibility and strong scheduling capabilities. Yet, as data requirements change, Airflow’s lack of scalability, real-time processing capabilities, and setup complexity may lead to […] The post Airflow Alternatives for Data Orchestration appeared first on Analytics Vidhya.

article thumbnail

Long Context RAG Performance of LLMs

databricks

Retrieval Augmented Generation (RAG) is the most widely adopted generative AI use case among our customers. RAG enhances the accuracy of LLMs by.

145
145
article thumbnail

10 Python Libraries Every Data Scientist Should Know

KDnuggets

Want to take the next step in your journey to becoming a data scientist? Check out these Python libraries for data science that you can't do without.

Python 152
article thumbnail

Speakers for Amsterdam / Netherlands Tech Events

The Pragmatic Engineer

I (Gergely) sometimes get reachouts to do talks at events in Amsterdam (where I am based,) the Netherlands, or somewhere in Europe. Unfortunately, rarely do talks – I do one conference per year. However, I asked around in the community about tech professionals who do paid talks that software engineers find interesting, engaging, and educational.

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

A RoCE network for distributed AI training at scale

Engineering at Meta

AI networks play an important role in interconnecting tens of thousands of GPUs together, forming the foundational infrastructure for training, enabling large models with hundreds of billions of parameters such as LLAMA 3.1 405B. This week at ACM SIGCOMM 2024 in Sydney, Australia, we are sharing details on the network we have built at Meta over the past few years to support our large-scale distributed AI training workload.

article thumbnail

North Arrow Necessity

ArcGIS

Does your map need a north arrow? It depends.

IT 136
article thumbnail

Data News — Week 24.34

Christophe Blefari

News again. ( credits ) It's been 3 weeks. Summer continues and I hope this new edition finds you well, having had a great vacation and a nice break before getting back to business in September. Content and articles have been a little slow over the last few weeks and that's to be expected, but I feel it gonna get back to business as usual soon.

BI 130
article thumbnail

Announcing General Availability of Lakehouse Federation

databricks

Today, we are excited to announce that Lakehouse Federation in Unity Catalog is now Generally Available (GA) across AWS, Azure, and GCP! Lakehouse.

AWS 136
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Beginner’s Guide to Careers in AI and Machine Learning

KDnuggets

The AI and ML complexity results in a growing number and diversity of jobs that require AI & ML expertise. We’ll give you a rundown of these jobs regarding the technical skills they need and the tools they employ.

article thumbnail

DAIS 2024: Unit tests - configuration and declaration

Waitingforcode

Code organization and assertions flow are both important but even them, they can't guarantee your colleagues' adherence to the unit tests. There are other user-facing attributes to consider as well.

Coding 130
article thumbnail

Speakers for Amsterdam / Netherlands Tech Events

The Pragmatic Engineer

I (Gergely) sometimes get reachouts to do talks at events in Amsterdam (where I am based,) the Netherlands, or somewhere in Europe. Unfortunately, rarely do talks – I do one conference per year. However, I asked around in the community about tech professionals who do paid talks that software engineers find interesting, engaging, and educational.

article thumbnail

A Melange of Maps

ArcGIS

Different thematic map types are better at supporting some questions than others. Here are a range of alternative approaches.

Designing 136
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Essential Skills for Data Engineers in the Age of AI

Seattle Data Guy

If you work in data, then AI is everywhere at this point. But whether AI is hype or reality doesn’t change the fact that data engineers will play a major role in ensuring that the data sets that are utilized for the growing use cases are usable both by machines and humans. Whether that data… Read more The post Essential Skills for Data Engineers in the Age of AI appeared first on Seattle Data Guy.

article thumbnail

Databricks Clean Rooms for privacy-safe collaboration is in Public Preview

databricks

Fueled by the exponential growth in external data and AI for innovation, organizations across all industries are looking for effective ways to collaborate.

Data 135
article thumbnail

10 Free Resources to Learn LLMs

KDnuggets

Learn large language models with these free resources from Deeplearning.AI, Microsoft, AWS, and more.

AWS 145
article thumbnail

DAIS 2024: Orchestrating and scoping assertions in Apache Spark Structured Streaming

Waitingforcode

Testing batch jobs is not the same as testing streaming ones. Although the transformation (the WHAT from the previous article) is similar in both cases, more complete validation tests on the job logic are not. After all, streaming jobs often iteratively build the final outcome while the batch ones generate it in a single pass.

Building 130
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Evaluating Change Data Capture Tools: A Comprehensive Guide

Data Engineering Weekly

TL;DR Aswin and I are thrilled to announce the release of the first version of our comprehensive guide for evaluating Change Data Capture. CDC Evaluation Guide Google Sheet Link: [link] CDC Evaluation Guide Github Link: [link] Change Data Capture (CDC) is a powerful technology in data engineering that allows for continuously capturing changes (inserts, updates, and deletes) made to source systems.

Data Lake 125
article thumbnail

Make a vintage basemap in ArcGIS Pro with some Living Atlas shenanigans

ArcGIS

How to combine Living Atlas layers into a plausibly 1890s style ArcGIS Pro basemap. And thoughts on time travel.

130
130
article thumbnail

How To Run A Data Team As A New Head Of Data

Seattle Data Guy

What would you do if you became the head or director of data for a 1,000-person company? Yesterday, you were plugging along as an analyst, and now, suddenly, you have all these new responsibilities. Figuring out where to start is part of the job. You’d probably feel a strong temptation to freak out. Who wouldn’t?… Read more The post How To Run A Data Team As A New Head Of Data appeared first on Seattle Data Guy.

Data 130
article thumbnail

Announcing Hybrid Search General Availability in Mosaic AI Vector Search

databricks

We're excited to announce the general availability of hybrid search in Mosaic AI Vector Search. Hybrid search is a powerful feature that combines.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

3 Ways of Building Python Projects using GPT-4o

KDnuggets

Learn about essential AI tools that can help you develop Python projects faster and with fewer bugs using natural language.

Python 141
article thumbnail

Data+AI Summit 2024 - Retrospective - Apache Spark

Waitingforcode

Welcome to the second blog post dedicated to the previous Data+AI Summit. This time I'm going to share with you a summary of Apache Spark talks.

Data 130
article thumbnail

How Meta enforces purpose limitation via Privacy Aware Infrastructure at scale

Engineering at Meta

At Meta, we’ve been diligently working to incorporate privacy into different systems of our software stack over the past few years. Today, we’re excited to share some cutting-edge technologies that are part of our Privacy Aware Infrastructure (PAI) initiative. These innovations mark a major milestone in our ongoing commitment to honoring user privacy.