Tue.Aug 06, 2024

article thumbnail

10 GitHub Repositories to Master Statistics

KDnuggets

Learn statistics through interactive books, code examples, cheat sheets, guides, and tools documentation.

Coding 145
article thumbnail

Databricks Clean Rooms for privacy-safe collaboration is in Public Preview

databricks

Fueled by the exponential growth in external data and AI for innovation, organizations across all industries are looking for effective ways to collaborate.

Data 140
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Reimagine Your GIS: From ArcMap to ArcGIS Pro and User Types

ArcGIS

Explore how moving from ArcMap to ArcGIS Pro and user types can make GIS workflows better, improve collaboration, and make big changes within your organization.

128
128
article thumbnail

Evaluating Change Data Capture Tools: A Comprehensive Guide

Data Engineering Weekly

TL;DR Aswin and I are thrilled to announce the release of the first version of our comprehensive guide for evaluating Change Data Capture. CDC Evaluation Guide Google Sheet Link: [link] CDC Evaluation Guide Github Link: [link] Change Data Capture (CDC) is a powerful technology in data engineering that allows for continuously capturing changes (inserts, updates, and deletes) made to source systems.

Data Lake 125
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

How to Use Hugging Face’s Datasets Library for Efficient Data Loading

KDnuggets

Harness the simplicity and effectiveness of Hugging Face's Datasets library to efficiently load datasets, regardless of their source

article thumbnail

The Data Turf Wars are Over, But the Metadata Turf Wars Have Just Begun

Cloudera

Over the past several years, data leaders asked many questions about where they should keep their data and what architecture they should implement to serve an incredible breadth of analytic use cases. Vendors with proprietary formats and query engines made their pitches, and over the years the market listened, and data leaders made their decisions. The most interesting thing about their choices is that, despite the millions of marketing dollars vendors spent trying to convince customers that the

More Trending

article thumbnail

Streaming BigQuery Data Into Confluent in Real Time: A Continuous Query Approach

Confluent

Using SQL-based BigQuery Continuous Queries w/Confluent lets you stream your warehouse data in real-time, sending it downstream for analytics use cases & more.

SQL 69
article thumbnail

Implementing multi-metric scaling: making changes to legacy code safely

Yelp Engineering

We’re excited to announce that multi-metric horizontal autoscaling is available for all services at Yelp. This allows us to scale services using multiple metrics, such as the number of in-flight requests and CPU utilization, rather than relying on a single metric. We expect this to provide us with better resilience and faster recovery during outages.

Coding 52
article thumbnail

The Hidden Threats in Your Data Warehouse Layers (And How to Fix Them)

Monte Carlo

Data warehouses are the centralized repositories that store and manage data from various sources. They are integral to an organization’s data strategy, ensuring data accessibility, accuracy, and utility. However, beneath their surface lies a host of invisible risks embedded within the data warehouse layers. These “hidden threats” can silently undermine your data quality and reliability and often remain undetected until they trigger significant problems such as incorrect busines

article thumbnail

Podcast: Joe Reis Data Engineering

DataKitchen

Chris Bergh joins me to chat about all things DataOps. We also discuss lean, removing waste from data processes and teams, and much more.

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Harnessing GenAI for Critical Manufacturing Innovation

RandomTrees

Manufacturing has always been at the cutting edge of technology since it drives economic growth and societal changes. As a result, in recent times, the development of Generative Artificial Intelligence (GenAI) has opened up new possibilities for innovation in this critical area. GenAI is an artificial intelligence subset dedicated to generating new content and designs.

article thumbnail

Podcast: The Data Engineering Podcast With Tobias Macy

DataKitchen

Summary In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Chris Berg, CEO of DataKitchen, to discuss his ongoing mission to…

article thumbnail

AWS DMS Postgres: Migration Made Easy

Hevo

In today’s dynamic business environment, companies often need to migrate their databases for many different reasons, ranging from scaling their operations to modernizing their technology stack or moving to the cloud to enjoy numerous benefits.

AWS 52
article thumbnail

DataKitchen’s Data Quality TestGen found 18 potential data quality issues in a few minutes (including install time)!

DataKitchen

DataKitchen’s Data Quality TestGen found 18 potential data quality issues in a few minutes (including install time) on data.boston.gov building permit data! Imagine a free tool that you can point at any dataset and find actionable data quality issues immediately! It sure beats having your data consumers tell you about problems they find when you are trying to enjoy your weekend.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Fivetran vs Matillion: Detailed Comparison for 2024

Hevo

Introduction In the data-driven modern world, organizations are quite dependent on ETL tools that help them integrate their data efficiently. These are the tools that base their guarantee of a smooth flow of data from sources to destination for supporting businesses in making decisions.

article thumbnail

AWS DMS CDC SQL Server: Configure, Consider, Limitations, Alternatives

Hevo

We’ve all been there: your business is growing, and your data is expanding across various systems. You’re trying to keep everything in sync, but manual updates and batch processing don’t cut it anymore. You need a reliable way to keep your data up-to-date across all platforms.

AWS 52