Sat.Aug 24, 2024 - Fri.Aug 30, 2024

article thumbnail

Apache Spark’s Most Annoying Use Case

Confessions of a Data Guy

I still remember the good ole days when Apache Spark was fresh and hot, hardly anyone was using it, except a few poor AWS Glue and EMR users … Lord have mercy on their ragged souls. It’s funny how that GOAT of a tool went from being used by a few companies for extremely large […] The post Apache Spark’s Most Annoying Use Case appeared first on Confessions of a Data Guy.

AWS 147
article thumbnail

Data Teams Survey 2024 Results

Jesse Anderson

In the spring of 2024, I ran a new survey to gather more data for my Data Teams book and update my 2023 and 2020 surveys. In total, we had 81 respondents. This survey was designed to get information about how management uses data teams, the value they’re creating, and how they’re creating it. The survey asked about the best and worst practices that teams are using or experiencing.

Data 147
article thumbnail

How to Build and Train a Transformer Model from Scratch with Hugging Face Transformers

KDnuggets

A step-to-step guide to navigate you through training your own transformer-based language model.

Building 144
article thumbnail

Announcing Hybrid Search General Availability in Mosaic AI Vector Search

databricks

We're excited to announce the general availability of hybrid search in Mosaic AI Vector Search. Hybrid search is a powerful feature that combines.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Data News — Week 24.34

Christophe Blefari

News again. ( credits ) It's been 3 weeks. Summer continues and I hope this new edition finds you well, having had a great vacation and a nice break before getting back to business in September. Content and articles have been a little slow over the last few weeks and that's to be expected, but I feel it gonna get back to business as usual soon.

BI 130
article thumbnail

How Meta enforces purpose limitation via Privacy Aware Infrastructure at scale

Engineering at Meta

At Meta, we’ve been diligently working to incorporate privacy into different systems of our software stack over the past few years. Today, we’re excited to share some cutting-edge technologies that are part of our Privacy Aware Infrastructure (PAI) initiative. These innovations mark a major milestone in our ongoing commitment to honoring user privacy.

More Trending

article thumbnail

Winning at GenAI: Building the right processes for the data intelligence future

databricks

Learn how companies can create repeatable and scalable workflows that enable users to quickly turn GenAI innovation from experimentation to reality.

Process 119
article thumbnail

Display “Quantity by Category” Symbology in ArcGIS Pro

ArcGIS

You can replicate Quantity by Category symbology in ArcGIS Pro 3.3 by classifying a Size or Color visual variable.

115
115
article thumbnail

Meta is getting ready for post-quantum cryptography

Engineering at Meta

The Quantum Apocalypse is coming. The advent of quantum computers has raised real questions about the future of data privacy over the internet. Someday, advances in quantum computing will make it possible to decrypt sensitive data that was encrypted using today’s complex cryptography systems. In the latest episode of the Meta Tech Podcast you’ll meet Sheran and Rafael, two engineers leading Meta’s post-quantum readiness work.

article thumbnail

5 Tips for Using Regular Expressions in Data Cleaning

KDnuggets

Learn how to use regular expressions in Python for data cleaning.

Python 143
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Cost-effective, incremental ETL with serverless compute for Delta Live Tables pipelines

databricks

We recently announced the general availability of serverless compute for Notebooks, Workflows, and Delta Live Tables (DLT) pipelines. Today, we'd like to explain.

115
115
article thumbnail

Introducing the Rebuild Network Topology Add-In for ArcGIS Pro 2.9 and 3.1

ArcGIS

The Rebuild Network Topology Add-In provides the ability to rebuild the network topology for the current extent of an active map with ArcGIS Pro 2.9 and 3.1.

Utilities 109
article thumbnail

The Big Data London Guide: 2024 Edition

Monte Carlo

Another Big Data London is right around the corner, and we couldn’t be more excited. Coming in hot on September 18-19, Big Data London is easily the UK’s biggest data event of the year. And with an event as rare and prestigious as Big Data London, it’s normal to want to maximize your time. That’s why we put together our list of the top things to see and do at Big Data London this year—including the data reliability sessions we’re most excited about and the after-parties you don’t want to miss.

article thumbnail

Project Ideas to Master Data Engineering

KDnuggets

Data engineering is best learned by doing projects. But which ones? Here are six projects focusing on different data engineering skills to ensure you have it all covered.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Streamlining repetitive tasks in Databricks Workflows

databricks

Databricks Workflows now supports single task looping with For Each! Streamline repetitive processes into a single, easy to author, manage, and monitor task.

Process 109
article thumbnail

Mosaic datasets: More than the sum of its parts

ArcGIS

Mosaic datasets are the backbone of imagery layers, but provide much more to your organization than simply creating imagery layers.

Datasets 105
article thumbnail

Web Developer Roadmap: Front End, Back End, Full Stack

Edureka

A Web Developer Roadmap is just like a book of instructions that tells you what you need to learn to become a web developer. It directs the learner’s attention toward mastering only the relevant stuff at any particular time and avoids unnecessary complications and concentration problems. Think about being at the boundary of unfamiliar woodlands where every path is bound for that famous site for web programming.

MongoDB 97
article thumbnail

How to Translate Languages with MarianMT and Hugging Face Transformers

KDnuggets

Discover how to translate text quickly and accurately between languages with just a few simple steps using MarianMT.

139
139
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Stepping into personalized experiences for every customer with the Databricks Data Intelligence Platform

databricks

Skechers has been at the forefront of the e-commerce industry, focusing on hyperpersonalized experiences to meet customer expectations better. Following significant growth during.

Data 105
article thumbnail

Add Flexera’s State of the Cloud Report to Your Summer Reading List

Cloudera

It’s nearing the end of the summer in North America, and one report has been a staple on my reading list for more than a decade: the Flexera State of the Cloud Report. The annual survey of hundreds of global IT decision makers assesses cloud strategies, migration trends, and important considerations for companies moving to the cloud or managing cloud environments.

Cloud 88
article thumbnail

Pinot for Low-Latency Offline Table Analytics

Uber Engineering

Comments

85
article thumbnail

5 Tips for Optimizing Machine Learning Algorithms

KDnuggets

Embrace these five best-practices boost the effectiveness of your trained machine learning solutions, no matter their complexity

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

How to perform change data capture (CDC) from full database snapshots using Delta Live Tables

databricks

Learn more about processing snapshots using Delta Live Tables and how you can use the new Apply changes from Snapshshot statement in DLT to build SCD Type 1 or SCD Type 2 target tables delivering incremental data and insights that would typically take months of effort on legacy platforms.

Database 105
article thumbnail

AI Data Cloud for Energy: Strategies for Oil, Gas & Power

Snowflake

The Energy Sector's transformative shift Energy, the driver of the global economy, is undergoing one of the largest secular shifts of our time, propelled by hundreds of trillions of dollars in global investment in the next 25 years. This shift creates a tremendous opportunity for energy companies. And, at the heart of successfully navigating this change sit data and AI.

Cloud 79
article thumbnail

Data Engineering Weekly #186

Data Engineering Weekly

Try Fully Managed Apache Airflow for FREE Run Airflow without the hassle and management complexity. Take Astro (the fully managed Airflow solution) for a test drive today and unlock a suite of features designed to simplify, optimize, and scale your data pipelines. For a limited time, new sign-ups will receive a complimentary Airflow Fundamentals Certification exam (normally $150).

article thumbnail

Digital Transformation Playbook for Modern Businesses

KDnuggets

Check this practical guide sharing insights, challenges, and tactics to be a digital leader with confidence.

134
134
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Highlights from the Databricks Community

databricks

Within the Databricks Community, there is a technical blog where community members share best practices, tutorials and insights on data analytics, data engineering.

article thumbnail

Use Business Analyst’s Target Marketing Wizard to find customers in a new area

ArcGIS

Identify the most promising customers in a new area using the Business Analyst Target Marketing wizard and Esri's Tapestry Segmentation.

article thumbnail

Confluent Champion: The Power of a Learning Culture and Motivated Teams

Confluent

In our latest Confluent Champion post, Janis Hom, staff security GRC program manager, highlights how Confluent fosters a culture that helps her stay motivated.

article thumbnail

How to Use NumPy to Solve Systems of Nonlinear Equations

KDnuggets

In this article, we’ll explore how to leverage NumPy to solve systems of nonlinear equations, turning complex mathematical challenges into manageable tasks.

Systems 134
article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.