Sat.Aug 03, 2024 - Fri.Aug 09, 2024

article thumbnail

Neo4j vs. Amazon Neptune: Graph Databases in Data Engineering

Analytics Vidhya

Introduction Managing complicated, interrelated information is more important than ever in today’s data-driven society. Traditional databases, while still valuable, often falter when it comes to handling highly connected data. Enter the unsung heroes of the data world: graph databases. These powerful tools are designed to manage and query intricate data relationships effortlessly.

Database 213
article thumbnail

Optimizing Your LLM for Performance and Scalability

KDnuggets

Optimize LLM performance and scalability using techniques like prompt engineering, retrieval augmentation, fine-tuning, model pruning, quantization, distillation, load balancing, sharding, and caching.

article thumbnail

Databricks Clean Rooms for privacy-safe collaboration is in Public Preview

databricks

Fueled by the exponential growth in external data and AI for innovation, organizations across all industries are looking for effective ways to collaborate.

Data 140
article thumbnail

A RoCE network for distributed AI training at scale

Engineering at Meta

AI networks play an important role in interconnecting tens of thousands of GPUs together, forming the foundational infrastructure for training, enabling large models with hundreds of billions of parameters such as LLAMA 3.1 405B. This week at ACM SIGCOMM 2024 in Sydney, Australia, we are sharing details on the network we have built at Meta over the past few years to support our large-scale distributed AI training workload.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Airflow Alternatives for Data Orchestration

Analytics Vidhya

Introduction Apache Airflow is a crucial component in data orchestration and is known for its capability to handle intricate workflows and automate data pipelines. Many organizations have chosen it due to its flexibility and strong scheduling capabilities. Yet, as data requirements change, Airflow’s lack of scalability, real-time processing capabilities, and setup complexity may lead to […] The post Airflow Alternatives for Data Orchestration appeared first on Analytics Vidhya.

article thumbnail

10 GitHub Repositories to Master Statistics

KDnuggets

Learn statistics through interactive books, code examples, cheat sheets, guides, and tools documentation.

Coding 149

More Trending

article thumbnail

Essential Skills for Data Engineers in the Age of AI

Seattle Data Guy

If you work in data, then AI is everywhere at this point. But whether AI is hype or reality doesn’t change the fact that data engineers will play a major role in ensuring that the data sets that are utilized for the growing use cases are usable both by machines and humans. Whether that data… Read more The post Essential Skills for Data Engineers in the Age of AI appeared first on Seattle Data Guy.

article thumbnail

Reimagine Your GIS: From ArcMap to ArcGIS Pro and User Types

ArcGIS

Explore how moving from ArcMap to ArcGIS Pro and user types can make GIS workflows better, improve collaboration, and make big changes within your organization.

130
130
article thumbnail

3 Ways of Building Python Projects using GPT-4o

KDnuggets

Learn about essential AI tools that can help you develop Python projects faster and with fewer bugs using natural language.

Python 149
article thumbnail

Evaluating Change Data Capture Tools: A Comprehensive Guide

Data Engineering Weekly

TL;DR Aswin and I are thrilled to announce the release of the first version of our comprehensive guide for evaluating Change Data Capture. CDC Evaluation Guide Google Sheet Link: [link] CDC Evaluation Guide Github Link: [link] Change Data Capture (CDC) is a powerful technology in data engineering that allows for continuously capturing changes (inserts, updates, and deletes) made to source systems.

Data Lake 126
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Announcing the General Availability of Row and Column Level Security with Databricks Unity Catalog

databricks

Row filters and column masks control data access by filtering rows and masking column values using SQL UDFs in database queries.

SQL 122
article thumbnail

How to Solve Data Engineering Problems

Confessions of a Data Guy

One thing I find myself doing these days (I am unsure how I feel about this), is teaching others to solve problems … Data Engineering problems to be specific. It’s not a hard stretch for most to imagine that what a person does at Senior+ software-type levels is just write good code all day. I […] The post How to Solve Data Engineering Problems appeared first on Confessions of a Data Guy.

article thumbnail

NumPy for Image Processing

KDnuggets

Start your journey into image processing with NumPy by learning how to import libraries, crop images, rotate and flip images, and more.

Process 143
article thumbnail

Snowflake Startup Spotlight: BigGeo Puts Geospatial Intelligence on the Map

Snowflake

Welcome to Snowflake’s Startup Spotlight, where we learn about companies building their businesses on Snowflake. In this edition, we talk to Brent Lane, Co-founder and CEO of BigGeo, about the world of geospatial data and learn how BigGeo is turning 15 years of research into advanced technology that knocks down traditional barriers to using rich, complex location-based data throughout an organization.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Harnessing the Power of Databricks Mosaic AI for Image Generation at Rolls-Royce

databricks

Rolls-Royce has witnessed the transformative power of the Databricks Data Intelligence Platform in various AI projects. One example is a collaboration between Rolls-Royce.

Project 121
article thumbnail

DCPerf: An open source benchmark suite for hyperscale compute applications

Engineering at Meta

We are open-sourcing DCPerf, a collection of benchmarks that represents the diverse categories of workloads that run in data center cloud deployments. We hope that DCperf can be used more broadly by academia, the hardware industry, and internet companies to design and evaluate future products. DCPerf is available now on GitHub. Hyperscale and cloud datacenter deployments constitute the largest market share of server deployments in the world today.

Designing 103
article thumbnail

Time Series Data with NumPy

KDnuggets

Learn how to analyze the time series dataset with the Python package NumPy.

Datasets 142
article thumbnail

#ClouderaLife Employee Spotlight: Stephanie Han

Cloudera

In this Employee Spotlight, we sat down with Stephanie Han to learn about her tenure at Cloudera, her journey from accounting to leading diversity, equality & inclusion (DEI) programs, and her impressive volunteer work. Meet Stephanie Han Stephanie is a Senior Program Manager in the HR team at Cloudera. She’s been with the company since 2019 and plays a key role in a variety of employee-centric initiatives including Cloudera’s employee volunteering program , talent management program, a

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Elevating Data Intelligence: Key Insights from Industry Leaders on Data and AI

databricks

In today's rapidly evolving technological landscape, the intersection of data and artificial intelligence (AI) has become a critical focus for organizations across industries.

Data 95
article thumbnail

Robinhood Reports Second Quarter 2024 Results

Robinhood

Robinhood Markets, Inc. (Nasdaq: HOOD) today reported financial results for the quarter ended June 30, 2024 Read our Q2 2024 earnings press release here. Access more information at investors.robinhood.com. The post Robinhood Reports Second Quarter 2024 Results appeared first on Robinhood Newsroom.

article thumbnail

Tick-Tock: Using Pendulum For Easy Date And Time Management In Python

KDnuggets

Explore Python's Pendulum library for simplified date & time handling and timezone management.

Python 139
article thumbnail

Continued Investments in Price Performance and Faster Top-K Queries

Snowflake

The Snowflake AI Data Cloud is an end-to-end platform that supports all types of data, compute, use cases and personas across an entire organization. By delivering a single, unified platform for all users, it is no surprise that organizations continue to expand their use cases on Snowflake. And therefore, it is extremely important for us to reaffirm our commitment to price-performant queries for our customers on a consistent basis.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Data Engineering Weekly #183

Data Engineering Weekly

Try Fully Managed Apache Airflow for FREE Run Airflow without the hassle and management complexity. Take Astro (the fully managed Airflow solution) for a test drive today and unlock a suite of features designed to simplify, optimize, and scale your data pipelines. For a limited time, new sign-ups will receive a complimentary Airflow Fundamentals Certification exam (normally $150).

article thumbnail

Podcast: DataOps, Observability, and The Cure for Data Team Blues on DataTalks.Club

DataKitchen

Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

Data 76
article thumbnail

7 AI Portfolio Projects to Boost the Resume

KDnuggets

Get noticed by recruiters and hiring managers by creating and documenting the following AI projects.

Portfolio 139
article thumbnail

Agile vs DevOps: What are the Top Differences?

Knowledge Hut

When speaking of software development, Agile and DevOps are two methodologies that are worth mentioning. Both these software development methodologies aids in efficient and quick software development. Although companies are embracing the use of both these methodologies, there is a lot of confusion about which of the two can deliver the best results.

AWS 75
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Introducing the New Confluent Cloud Homepage UI: Enhancing User Experience

Confluent

The new Confluent Cloud Homepage UI adds many features including Clusters/Topics modals, health indicators, favorites, recently visited, recommended actions, & more.

Cloud 72
article thumbnail

PySpark Explained: Delta Tables

Towards Data Science

Learn how to use the building blocks of Delta Lakes.

article thumbnail

5 Python Tips for Data Efficiency and Speed

KDnuggets

Want to write better Python code? Get one step closer with this tutorial on writing maintainable, faster, and memory-efficient Python code.

Python 137
article thumbnail

Beyond the Hype: Is observability just the new name for system monitoring? by Oliver Cronk

Scott Logic

In this episode, I’m joined for a discussion on observability by Scott Rowan, Senior Developer at Scott Logic, and Daniel Gomez Blanco, Principal Engineer at Skyscanner and a member of the Open Telemetry Governance Committee. The conversation explores what observability means in modern distributed software architectures, how it differs from traditional monitoring, and the challenges of implementing observability at scale.

Systems 59
article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.