Tools Every Data Scientist Should Know: A Practical Guide
KDnuggets
JULY 12, 2024
Discover the essential tools every data scientist should know to elevate their data science game, from Python and R to SQL and advanced visualization tools.
KDnuggets
JULY 12, 2024
Discover the essential tools every data scientist should know to elevate their data science game, from Python and R to SQL and advanced visualization tools.
Snowflake
JULY 9, 2024
Snowflake is committed to helping customers protect their accounts and data. That’s why we have been working on product capabilities that allow Snowflake admins to make multifactor authentication (MFA) mandatory and monitor compliance with this new policy. As part of that effort, today we’re announcing several key features: 1. A new authentication policy that requires MFA for all users in a Snowflake account 2.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
databricks
JULY 8, 2024
We are proud to announce two new analyst reports recognizing Databricks in the data engineering and data streaming space: IDC MarketScape: Worldwide Analytic.
Waitingforcode
JULY 10, 2024
Welcome to the first Data+AI Summit 2024 retrospective blog post. I'm opening the series with the topic close to my heart at the moment, stream processing!
Speaker: Tamara Fingerlin, Developer Advocate
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
KDnuggets
JULY 9, 2024
This article is for anyone looking to maximize their use of Amazon Web Services (AWS) generative AI (GenAI) services. Here are eight courses that range from beginner to expert level.
ArcGIS
JULY 12, 2024
Creation of a Digital Twin in Seven Days with ArcGIS in Zurich
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
Confessions of a Data Guy
JULY 9, 2024
When I was young and full of myself, writing Perl and PHP, while your ma was still reading you a bedtime story and giving you a stuffy to fall asleep with, I had to program uphill, both ways, in the rain and snow. Not like you milk toast Data Engineers clickty clicking around Databricks and […] The post The Abstractions Are Making You Dumb (rise of the Shallow Expert) appeared first on Confessions of a Data Guy.
KDnuggets
JULY 10, 2024
Let's learn how to efficiently merge large Pandas dataframes.
ArcGIS
JULY 12, 2024
Esri is working with partners (Maxar, TomTom) to enhance our 3D basemaps with high-quality commercial data for elevation and buildings layers.
databricks
JULY 12, 2024
Hallucinations in large language models (LLMs) occur when models produce responses that do not align with factual reality or the provided context. This.
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
Snowflake
JULY 8, 2024
Regulated and sovereign markets across the world have stringent requirements stipulating certain important data be kept within geographical borders or even for certain workloads to have dedicated environments, separate from those of other customers. In these markets, organizations need a secure and well-governed data foundation with effective controls to help comply with regulatory requirements.
KDnuggets
JULY 8, 2024
Text preprocessing is an important step in NLP. Let's learn how to use the Hugging Face Tokenizers Library to preprocess text data.
Engineering at Meta
JULY 10, 2024
Meta’s advertising business leverages large-scale machine learning (ML) recommendation models that power millions of ads recommendations per second across Meta’s family of apps. Maintaining reliability of these ML systems helps ensure the highest level of service and uninterrupted benefit delivery to our users and advertisers. To minimize disruptions and ensure our ML systems are intrinsically resilient, we have built a comprehensive set of prediction robustness solutions that ensure stability w
Precisely
JULY 12, 2024
Data can be your organization’s most valuable asset, but only if it’s data you can trust. When companies work with data that is untrustworthy for any reason, it can result in incorrect insights, skewed analysis, and reckless recommendations to become data integrity vs data quality. Two terms can be used to describe the condition of data: data integrity and data quality.
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
Snowflake
JULY 9, 2024
There is a scene in Mission: Impossible – Rogue Nation where Tom Cruise is hanging onto the outside of a jet as it has taken off. And while, yes, he’s going with it, he’s not really on board or in control. Some data executives feel like that. It’s not enough to establish goals — or, the destination in this metaphor. The data strategy must provide a flight plan for making sure you get there — on time, on budget and, of course, safely on board.
KDnuggets
JULY 11, 2024
Here’s how to ace your data analyst interview and land your first job.
ArcGIS
JULY 12, 2024
You can host scene layers and 3D tiles layers in ArcGIS Online or reference datasets in cloud storage in ArcGIS Enterprise.
Cloudera
JULY 10, 2024
There’s nothing worse than wasting money on unnecessary costs. In on-premises data estates, these costs appear as wasted person-hours waiting for inefficient analytics to complete, or troubleshooting jobs that have failed to execute as expected, or at all. They manifest as idle hardware waiting for urgent workloads to come in, ensuring sufficient spare capacity to run them amidst noisy neighbors and resource-hungry, lower-priority workloads.
Advertisement
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
Snowflake
JULY 11, 2024
Leaders in the advertising, media and entertainment industries know all too well the importance of the media supply chain. It’s the backbone that keeps things running smoothly, including everything from content creation and management to content distribution and analytics. But media supply chains are becoming more complex to manage for several reasons.
KDnuggets
JULY 9, 2024
Learn how to streamline data and model orchestration for Generative AI success. Explore practical use cases and a comprehensive guide in this blog.
Data Engineering Weekly
JULY 7, 2024
Experience Enterprise-Grade Apache Airflow Astro augments Airflow with enterprise-grade features to enhance productivity, meet scalability and availability demands across your data pipelines, and more. Learn More → Notion: Building and scaling Notion’s data lake Notion writes about scaling the data lake by bringing critical data ingestion operations in-house.
Scott Logic
JULY 9, 2024
It seems barely a month goes by without a new supply chain attack making the headlines, and malicious code in dependency packages from package registries such as NPM is a common method. My usual sentiments include “oh another one, what a surprise” , before thoughts eventually turn to - someone really ought to be doing something about this. Fortunately, it turns out that quite a few things are indeed being done - there’s progress, activity, and promising ideas for the future.
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
Ripple Engineering
JULY 9, 2024
Introduction: Embracing the Future with Ripple's Data Platform Migration Welcome to a pivotal moment in Ripple's data journey. As leaders at the intersection of blockchain technology and financial services, we're excited to share a transformative step in our data management evolution. We recently embarked on a significant data platform migration, transitioning from Hadoop to Databricks, a move motivated by our relentless pursuit of excellence and our contributions to the XRP Ledge
KDnuggets
JULY 8, 2024
Learn all about introductory statistics with this collection of tutorials from our sister site Statology.
Engineering at Meta
JULY 10, 2024
Tail utilization is a significant system issue and a major factor in overload-related failures and low compute utilization. The tail utilization optimizations at Meta have had a profound impact on model serving capacity footprint and reliability. Failure rates, which are mostly timeout errors, were reduced by two-thirds; the compute footprint delivered 35% more work for the same amount of resources; and p99 latency was cut in half.
Uber Engineering
JULY 11, 2024
Modernizing the fundamentals of log management at Uber: How we used CLP to build a new logging infra that lets users view and analyze their logs seamlessly, at scale!
Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage
When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m
Confluent
JULY 11, 2024
confluent-kafka-javascript offers two APIs for calling Kafka from JS, one KafkaJS-like & one node-rdkafka-like. Learn it by making an app w/producer, consumer & UI.
KDnuggets
JULY 8, 2024
Docker tags are important for managing and versioning Docker images. This tutorial will teach you how to use Docker tags effectively.
Towards Data Science
JULY 12, 2024
Data as a product is an intriguing concept, but beware of the application trap Continue reading on Towards Data Science »
Uber Engineering
JULY 6, 2024
Kafka Tiered Storage, developed in collaboration with the Apache Kafka community, introduces the separation of storage and processing in brokers, significantly improving the scalability, reliability, and efficiency of Kafka clusters.
Advertisement
Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.
Let's personalize your content