Sat.Aug 24, 2024 - Fri.Aug 30, 2024

article thumbnail

Apache Spark’s Most Annoying Use Case

Confessions of a Data Guy

I still remember the good ole days when Apache Spark was fresh and hot, hardly anyone was using it, except a few poor AWS Glue and EMR users … Lord have mercy on their ragged souls. It’s funny how that GOAT of a tool went from being used by a few companies for extremely large […] The post Apache Spark’s Most Annoying Use Case appeared first on Confessions of a Data Guy.

AWS 148
article thumbnail

Data Teams Survey 2024 Results

Jesse Anderson

In the spring of 2024, I ran a new survey to gather more data for my Data Teams book and update my 2023 and 2020 surveys. In total, we had 81 respondents. This survey was designed to get information about how management uses data teams, the value they’re creating, and how they’re creating it. The survey asked about the best and worst practices that teams are using or experiencing.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

How to Build and Train a Transformer Model from Scratch with Hugging Face Transformers

KDnuggets

A step-to-step guide to navigate you through training your own transformer-based language model.

Building 135
article thumbnail

Announcing Hybrid Search General Availability in Mosaic AI Vector Search

databricks

We're excited to announce the general availability of hybrid search in Mosaic AI Vector Search. Hybrid search is a powerful feature that combines.

article thumbnail

Apache Airflow® 101 Essential Tips for Beginners

Apache Airflow® is the open-source standard to manage workflows as code. It is a versatile tool used in companies across the world from agile startups to tech giants to flagship enterprises across all industries. Due to its widespread adoption, Airflow knowledge is paramount to success in the field of data engineering.

article thumbnail

Data News — Week 24.34

Christophe Blefari

News again. ( credits ) It's been 3 weeks. Summer continues and I hope this new edition finds you well, having had a great vacation and a nice break before getting back to business in September. Content and articles have been a little slow over the last few weeks and that's to be expected, but I feel it gonna get back to business as usual soon.

BI 130

More Trending

article thumbnail

5 Tips for Using Regular Expressions in Data Cleaning

KDnuggets

Learn how to use regular expressions in Python for data cleaning.

Python 132
article thumbnail

Winning at GenAI: Building the right processes for the data intelligence future

databricks

Learn how companies can create repeatable and scalable workflows that enable users to quickly turn GenAI innovation from experimentation to reality.

Process 119
article thumbnail

Display “Quantity by Category” Symbology in ArcGIS Pro

ArcGIS

You can replicate Quantity by Category symbology in ArcGIS Pro 3.3 by classifying a Size or Color visual variable.

113
113
article thumbnail

Meta is getting ready for post-quantum cryptography

Engineering at Meta

The Quantum Apocalypse is coming. The advent of quantum computers has raised real questions about the future of data privacy over the internet. Someday, advances in quantum computing will make it possible to decrypt sensitive data that was encrypted using today’s complex cryptography systems. In the latest episode of the Meta Tech Podcast you’ll meet Sheran and Rafael, two engineers leading Meta’s post-quantum readiness work.

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Generative AI Specialisation Courses from IBM for Every Profession

KDnuggets

Check out these 5 IBM specialisation courses specific to those who want to learn more about generative AI.

126
126
article thumbnail

Cost-effective, incremental ETL with serverless compute for Delta Live Tables pipelines

databricks

We recently announced the general availability of serverless compute for Notebooks, Workflows, and Delta Live Tables (DLT) pipelines. Today, we'd like to explain.

115
115
article thumbnail

Pinot for Low-Latency Offline Table Analytics

Uber Engineering

Comments

105
105
article thumbnail

Introducing the Rebuild Network Topology Add-In for ArcGIS Pro 2.9 and 3.1

ArcGIS

The Rebuild Network Topology Add-In provides the ability to rebuild the network topology for the current extent of an active map with ArcGIS Pro 2.9 and 3.1.

Utilities 106
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Project Ideas to Master Data Engineering

KDnuggets

Data engineering is best learned by doing projects. But which ones? Here are six projects focusing on different data engineering skills to ensure you have it all covered.

article thumbnail

Stepping into personalized experiences for every customer with the Databricks Data Intelligence Platform

databricks

Skechers has been at the forefront of the e-commerce industry, focusing on hyperpersonalized experiences to meet customer expectations better. Following significant growth during.

Data 108
article thumbnail

Web Developer Roadmap: Front End, Back End, Full Stack

Edureka

A Web Developer Roadmap is just like a book of instructions that tells you what you need to learn to become a web developer. It directs the learner’s attention toward mastering only the relevant stuff at any particular time and avoids unnecessary complications and concentration problems. Think about being at the boundary of unfamiliar woodlands where every path is bound for that famous site for web programming.

MongoDB 97
article thumbnail

Mosaic datasets: More than the sum of its parts

ArcGIS

Mosaic datasets are the backbone of imagery layers, but provide much more to your organization than simply creating imagery layers.

Datasets 104
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

How to Translate Languages with MarianMT and Hugging Face Transformers

KDnuggets

Discover how to translate text quickly and accurately between languages with just a few simple steps using MarianMT.

116
116
article thumbnail

How to perform change data capture (CDC) from full database snapshots using Delta Live Tables

databricks

Learn more about processing snapshots using Delta Live Tables and how you can use the new Apply changes from Snapshshot statement in DLT to build SCD Type 1 or SCD Type 2 target tables delivering incremental data and insights that would typically take months of effort on legacy platforms.

Database 105
article thumbnail

The Future of AI is Real-Time Data

Striim

To the data scientists pushing the boundaries of what’s possible, the AI experts and enthusiasts who see beyond the horizon, and the techies building tomorrow’s solutions today — this manifesto is for you. The key to unlocking AI’s full potential lies in real time data. Traditional methods no longer suffice in a world that demands instant insights and immediate action.

article thumbnail

AI Data Cloud for Energy: Strategies for Oil, Gas & Power

Snowflake

The Energy Sector's transformative shift Energy, the driver of the global economy, is undergoing one of the largest secular shifts of our time, propelled by hundreds of trillions of dollars in global investment in the next 25 years. This shift creates a tremendous opportunity for energy companies. And, at the heart of successfully navigating this change sit data and AI.

Cloud 89
article thumbnail

Apache Airflow® Crash Course: From 0 to Running your Pipeline in the Cloud

With over 30 million monthly downloads, Apache Airflow is the tool of choice for programmatically authoring, scheduling, and monitoring data pipelines. Airflow enables you to define workflows as Python code, allowing for dynamic and scalable pipelines suitable to any use case from ETL/ELT to running ML/AI operations in production. This introductory tutorial provides a crash course for writing and deploying your first Airflow pipeline.

article thumbnail

5 Tips for Optimizing Machine Learning Algorithms

KDnuggets

Embrace these five best-practices boost the effectiveness of your trained machine learning solutions, no matter their complexity

article thumbnail

Streamlining repetitive tasks in Databricks Workflows

databricks

Databricks Workflows now supports single task looping with For Each! Streamline repetitive processes into a single, easy to author, manage, and monitor task.

Process 105
article thumbnail

Add Flexera’s State of the Cloud Report to Your Summer Reading List

Cloudera

It’s nearing the end of the summer in North America, and one report has been a staple on my reading list for more than a decade: the Flexera State of the Cloud Report. The annual survey of hundreds of global IT decision makers assesses cloud strategies, migration trends, and important considerations for companies moving to the cloud or managing cloud environments.

Cloud 83
article thumbnail

Startup Spotlight: Genesis’ Co-Worker Agents Lend AI-Powered Assistance

Snowflake

Welcome to Snowflake’s Startup Spotlight, where we ask startup founders about the problems they’re solving, the apps they’re building and the lessons they’ve learned during their startup journey. In this edition, we’ll learn why the founders of Genesis , Matt Glickman and Justin Langseth, decided to take on the challenge of creating AI-powered assistants to run generative AI workloads in Snowflake, and why “Eliza” and “Stuart” might soon be joining your team meetings.

Cloud 73
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

5 Tips for Getting Started with Language Models

KDnuggets

Break the ice and dispel any fears about this expanding branch of AI with these five pieces of advice that will help you know where to start learning

111
111
article thumbnail

Highlights from the Databricks Community

databricks

Within the Databricks Community, there is a technical blog where community members share best practices, tutorials and insights on data analytics, data engineering.

article thumbnail

Data Engineering Weekly #186

Data Engineering Weekly

Try Fully Managed Apache Airflow for FREE Run Airflow without the hassle and management complexity. Take Astro (the fully managed Airflow solution) for a test drive today and unlock a suite of features designed to simplify, optimize, and scale your data pipelines. For a limited time, new sign-ups will receive a complimentary Airflow Fundamentals Certification exam (normally $150).