Tue.Aug 06, 2024

article thumbnail

10 GitHub Repositories to Master Statistics

KDnuggets

Learn statistics through interactive books, code examples, cheat sheets, guides, and tools documentation.

Coding 149
article thumbnail

Databricks Clean Rooms for privacy-safe collaboration is in Public Preview

databricks

Fueled by the exponential growth in external data and AI for innovation, organizations across all industries are looking for effective ways to collaborate.

Data 140
article thumbnail

Reimagine Your GIS: From ArcMap to ArcGIS Pro and User Types

ArcGIS

Explore how moving from ArcMap to ArcGIS Pro and user types can make GIS workflows better, improve collaboration, and make big changes within your organization.

130
130
article thumbnail

Evaluating Change Data Capture Tools: A Comprehensive Guide

Data Engineering Weekly

TL;DR Aswin and I are thrilled to announce the release of the first version of our comprehensive guide for evaluating Change Data Capture. CDC Evaluation Guide Google Sheet Link: [link] CDC Evaluation Guide Github Link: [link] Change Data Capture (CDC) is a powerful technology in data engineering that allows for continuously capturing changes (inserts, updates, and deletes) made to source systems.

Data Lake 126
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

How to Use Hugging Face’s Datasets Library for Efficient Data Loading

KDnuggets

Harness the simplicity and effectiveness of Hugging Face's Datasets library to efficiently load datasets, regardless of their source

article thumbnail

Streaming BigQuery Data Into Confluent in Real Time: A Continuous Query Approach

Confluent

Using SQL-based BigQuery Continuous Queries w/Confluent lets you stream your warehouse data in real-time, sending it downstream for analytics use cases & more.

SQL 69

More Trending

article thumbnail

DataKitchen’s Data Quality TestGen found 18 potential data quality issues in a few minutes (including install time)!

DataKitchen

DataKitchen’s Data Quality TestGen found 18 potential data quality issues in a few minutes (including install time) on data.boston.gov building permit data! Imagine a free tool that you can point at any dataset and find actionable data quality issues immediately! It sure beats having your data consumers tell you about problems they find when you are trying to enjoy your weekend.

article thumbnail

Implementing multi-metric scaling: making changes to legacy code safely

Yelp Engineering

We’re excited to announce that multi-metric horizontal autoscaling is available for all services at Yelp. This allows us to scale services using multiple metrics, such as the number of in-flight requests and CPU utilization, rather than relying on a single metric. We expect this to provide us with better resilience and faster recovery during outages.

Coding 52
article thumbnail

The Hidden Threats in Your Data Warehouse Layers (And How to Fix Them)

Monte Carlo

Data warehouses are the centralized repositories that store and manage data from various sources. They are integral to an organization’s data strategy, ensuring data accessibility, accuracy, and utility. However, beneath their surface lies a host of invisible risks embedded within the data warehouse layers. These “hidden threats” can silently undermine your data quality and reliability and often remain undetected until they trigger significant problems such as incorrect busines

article thumbnail

Podcast: Joe Reis Data Engineering

DataKitchen

Chris Bergh joins me to chat about all things DataOps. We also discuss lean, removing waste from data processes and teams, and much more.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Harnessing GenAI for Critical Manufacturing Innovation

RandomTrees

Manufacturing has always been at the cutting edge of technology since it drives economic growth and societal changes. As a result, in recent times, the development of Generative Artificial Intelligence (GenAI) has opened up new possibilities for innovation in this critical area. GenAI is an artificial intelligence subset dedicated to generating new content and designs.

article thumbnail

Podcast: The Data Engineering Podcast With Tobias Macy

DataKitchen

Summary In this episode of the Data Engineering Podcast, host Tobias Macey welcomes back Chris Berg, CEO of DataKitchen, to discuss his ongoing mission to…

article thumbnail

AWS DMS Postgres: Migration Made Easy

Hevo

In today’s dynamic business environment, companies often need to migrate their databases for many different reasons, ranging from scaling their operations to modernizing their technology stack or moving to the cloud to enjoy numerous benefits.

AWS 52
article thumbnail

The Data Turf Wars are Over, But the Metadata Turf Wars Have Just Begun

Cloudera

Over the past several years, data leaders asked many questions about where they should keep their data and what architecture they should implement to serve an incredible breadth of analytic use cases. Vendors with proprietary formats and query engines made their pitches, and over the years the market listened, and data leaders made their decisions. The most interesting thing about their choices is that, despite the millions of marketing dollars vendors spent trying to convince customers that the

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Fivetran vs Matillion: Detailed Comparison for 2024

Hevo

Introduction In the data-driven modern world, organizations are quite dependent on ETL tools that help them integrate their data efficiently. These are the tools that base their guarantee of a smooth flow of data from sources to destination for supporting businesses in making decisions.

article thumbnail

AWS DMS CDC SQL Server: Configure, Consider, Limitations, Alternatives

Hevo

We’ve all been there: your business is growing, and your data is expanding across various systems. You’re trying to keep everything in sync, but manual updates and batch processing don’t cut it anymore. You need a reliable way to keep your data up-to-date across all platforms.

AWS 52