Sat.Aug 24, 2024 - Fri.Aug 30, 2024

article thumbnail

Apache Spark’s Most Annoying Use Case

Confessions of a Data Guy

I still remember the good ole days when Apache Spark was fresh and hot, hardly anyone was using it, except a few poor AWS Glue and EMR users … Lord have mercy on their ragged souls. It’s funny how that GOAT of a tool went from being used by a few companies for extremely large […] The post Apache Spark’s Most Annoying Use Case appeared first on Confessions of a Data Guy.

AWS 147
article thumbnail

Data Teams Survey 2024 Results

Jesse Anderson

In the spring of 2024, I ran a new survey to gather more data for my Data Teams book and update my 2023 and 2020 surveys. In total, we had 81 respondents. This survey was designed to get information about how management uses data teams, the value they’re creating, and how they’re creating it. The survey asked about the best and worst practices that teams are using or experiencing.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Data News — Week 24.34

Christophe Blefari

News again. ( credits ) It's been 3 weeks. Summer continues and I hope this new edition finds you well, having had a great vacation and a nice break before getting back to business in September. Content and articles have been a little slow over the last few weeks and that's to be expected, but I feel it gonna get back to business as usual soon.

BI 130
article thumbnail

How Meta enforces purpose limitation via Privacy Aware Infrastructure at scale

Engineering at Meta

At Meta, we’ve been diligently working to incorporate privacy into different systems of our software stack over the past few years. Today, we’re excited to share some cutting-edge technologies that are part of our Privacy Aware Infrastructure (PAI) initiative. These innovations mark a major milestone in our ongoing commitment to honoring user privacy.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Project Ideas to Master Data Engineering

KDnuggets

Data engineering is best learned by doing projects. But which ones? Here are six projects focusing on different data engineering skills to ensure you have it all covered.

article thumbnail

How to perform change data capture (CDC) from full database snapshots using Delta Live Tables

databricks

Learn more about processing snapshots using Delta Live Tables and how you can use the new Apply changes from Snapshshot statement in DLT to build SCD Type 1 or SCD Type 2 target tables delivering incremental data and insights that would typically take months of effort on legacy platforms.

Database 104

More Trending

article thumbnail

Web Developer Roadmap: Front End, Back End, Full Stack

Edureka

A Web Developer Roadmap is just like a book of instructions that tells you what you need to learn to become a web developer. It directs the learner’s attention toward mastering only the relevant stuff at any particular time and avoids unnecessary complications and concentration problems. Think about being at the boundary of unfamiliar woodlands where every path is bound for that famous site for web programming.

MongoDB 97
article thumbnail

Introducing the Rebuild Network Topology Add-In for ArcGIS Pro 2.9 and 3.1

ArcGIS

The Rebuild Network Topology Add-In provides the ability to rebuild the network topology for the current extent of an active map with ArcGIS Pro 2.9 and 3.1.

Utilities 103
article thumbnail

Announcing Hybrid Search General Availability in Mosaic AI Vector Search

databricks

We're excited to announce the general availability of hybrid search in Mosaic AI Vector Search. Hybrid search is a powerful feature that combines.

article thumbnail

Meta is getting ready for post-quantum cryptography

Engineering at Meta

The Quantum Apocalypse is coming. The advent of quantum computers has raised real questions about the future of data privacy over the internet. Someday, advances in quantum computing will make it possible to decrypt sensitive data that was encrypted using today’s complex cryptography systems. In the latest episode of the Meta Tech Podcast you’ll meet Sheran and Rafael, two engineers leading Meta’s post-quantum readiness work.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

How to Build and Train a Transformer Model from Scratch with Hugging Face Transformers

KDnuggets

A step-to-step guide to navigate you through training your own transformer-based language model.

Building 123
article thumbnail

Display “Quantity by Category” Symbology in ArcGIS Pro

ArcGIS

You can replicate Quantity by Category symbology in ArcGIS Pro 3.3 by classifying a Size or Color visual variable.

110
110
article thumbnail

Winning at GenAI: Building the right processes for the data intelligence future

databricks

Learn how companies can create repeatable and scalable workflows that enable users to quickly turn GenAI innovation from experimentation to reality.

Process 101
article thumbnail

Add Flexera’s State of the Cloud Report to Your Summer Reading List

Cloudera

It’s nearing the end of the summer in North America, and one report has been a staple on my reading list for more than a decade: the Flexera State of the Cloud Report. The annual survey of hundreds of global IT decision makers assesses cloud strategies, migration trends, and important considerations for companies moving to the cloud or managing cloud environments.

Cloud 81
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

How to Use NumPy to Solve Systems of Nonlinear Equations

KDnuggets

In this article, we’ll explore how to leverage NumPy to solve systems of nonlinear equations, turning complex mathematical challenges into manageable tasks.

Systems 94
article thumbnail

Mosaic datasets: More than the sum of its parts

ArcGIS

Mosaic datasets are the backbone of imagery layers, but provide much more to your organization than simply creating imagery layers.

Datasets 103
article thumbnail

Cost-effective, incremental ETL with serverless compute for Delta Live Tables pipelines

databricks

We recently announced the general availability of serverless compute for Notebooks, Workflows, and Delta Live Tables (DLT) pipelines. Today, we'd like to explain.

93
article thumbnail

AI Data Cloud for Energy: Strategies for Oil, Gas & Power

Snowflake

The Energy Sector's transformative shift Energy, the driver of the global economy, is undergoing one of the largest secular shifts of our time, propelled by hundreds of trillions of dollars in global investment in the next 25 years. This shift creates a tremendous opportunity for energy companies. And, at the heart of successfully navigating this change sit data and AI.

Cloud 75
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

5 Tips for Getting Started with Language Models

KDnuggets

Break the ice and dispel any fears about this expanding branch of AI with these five pieces of advice that will help you know where to start learning

95
article thumbnail

Data Engineering Weekly #186

Data Engineering Weekly

Try Fully Managed Apache Airflow for FREE Run Airflow without the hassle and management complexity. Take Astro (the fully managed Airflow solution) for a test drive today and unlock a suite of features designed to simplify, optimize, and scale your data pipelines. For a limited time, new sign-ups will receive a complimentary Airflow Fundamentals Certification exam (normally $150).

article thumbnail

Stepping into personalized experiences for every customer with the Databricks Data Intelligence Platform

databricks

Skechers has been at the forefront of the e-commerce industry, focusing on hyperpersonalized experiences to meet customer expectations better. Following significant growth during.

Data 76
article thumbnail

Confluent Champion: The Power of a Learning Culture and Motivated Teams

Confluent

In our latest Confluent Champion post, Janis Hom, staff security GRC program manager, highlights how Confluent fosters a culture that helps her stay motivated.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Generative AI Specialisation Courses from IBM for Every Profession

KDnuggets

Check out these 5 IBM specialisation courses specific to those who want to learn more about generative AI.

110
110
article thumbnail

Mainframe to Cloud Migrations: Expert Insights from AWS, Confluent, and Precisely

Precisely

Key Takeaways: Enhance capabilities through partnerships: AWS, Confluent, and Precisely accelerate mainframe modernization efforts, providing you with essential tools for success. Minimize migration disruptions through phased implementation, starting with low-risk, high-value projects. A strategic and tailored approach to mainframe modernization can enhancing business agility and innovation.

AWS 64
article thumbnail

The GenAI Journey: How Enterprises are Progressing from General-Purpose to Custom LLMs

databricks

Every company's path from foundational to tailored LLMs will be different. Each will require new tooling to help developers deliver the accurate and governed GenAI that leaders are demanding.

article thumbnail

Startup Spotlight: Genesis’ Co-Worker Agents Lend AI-Powered Assistance

Snowflake

Welcome to Snowflake’s Startup Spotlight, where we ask startup founders about the problems they’re solving, the apps they’re building and the lessons they’ve learned during their startup journey. In this edition, we’ll learn why the founders of Genesis , Matt Glickman and Justin Langseth, decided to take on the challenge of creating AI-powered assistants to run generative AI workloads in Snowflake, and why “Eliza” and “Stuart” might soon be joining your team meetings.

Cloud 62
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

5 Tips for Optimizing Machine Learning Algorithms

KDnuggets

Embrace these five best-practices boost the effectiveness of your trained machine learning solutions, no matter their complexity

article thumbnail

Unlock Real-Time Value from DynamoDB Data with Confluent's CDC Source Connector

Confluent

You can simplify the transfer of data from one or more DynamoDB tables to Confluent Cloud with the fully managed, no code, Confluent CDC source connector.

Cloud 64
article thumbnail

Highlights from the Databricks Community

databricks

Within the Databricks Community, there is a technical blog where community members share best practices, tutorials and insights on data analytics, data engineering.

article thumbnail

Comprehensive IBM i Security Requires a Multi-layered Approach

Precisely

Key Takeaways Implement a multi-layered defense to ensure robust protection for your IBM i environment against evolving cybersecurity threats. Address unique IBM i security challenges by recognizing vulnerabilities like integration issues, skilled staff shortages, and unpatched systems. Stay proactive and informed with vulnerability reports that help you understand and mitigate risks, including zero-day vulnerabilities.

article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.