Sat.Jul 23, 2022 - Fri.Jul 29, 2022

article thumbnail

Detecting Data Drift for Ensuring Production ML Model Quality Using Eurybia

KDnuggets

This article will focus on a step-by-step data drift study using Eurybia an open-source python library.

Python 156
article thumbnail

4 Must-Have Tests for Your Apache Kafka CI/CD with GitHub Actions

Confluent

Explore GitHub Actions for your Kafka CI/CD pipeline, automate Schema Registry, and transform the development and testing of Kafka client applications.

Kafka 141
article thumbnail

Re-Bundling The Data Stack With Data Orchestration And Software Defined Assets Using Dagster

Data Engineering Podcast

Summary The current stage of evolution in the data management ecosystem has resulted in domain and use case specific orchestration capabilities being incorporated into various tools. This complicates the work involved in making end-to-end workflows visible and integrated. Dagster has invested in bringing insights about external tools’ dependency graphs into one place through its "software defined assets" functionality.

MongoDB 100
article thumbnail

Driving Success With a Modern Data Architecture and a Hybrid Approach in the Financial Services and Telco Industries

Cloudera

Corporations are generating unprecedented volumes of data, especially in industries such as telecom and financial services industries (FSI). Many organizations are hoping to leverage these massive amounts of data by investing heavily in big data solutions – solutions that they hope can meet business goals such as increasing customer satisfaction, uncovering alternative revenue streams, or improving operational efficiency.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Best Practices for Creating Domain-Specific AI Models

KDnuggets

Here are some best practices and techniques for domain-specific model adaptation that worked for us time and again.

155
155
article thumbnail

Being the Best Digital Bank is Not Enough

Teradata

For many, banking is now a digital activity. But the financial services industry still trails many others in leveraging cloud technologies to build deeper, emotional attachments to their customers.

Banking 94

More Trending

article thumbnail

Modern Data Flow: A Better Way of Building Data Pipelines

Confluent

Complete guide to data pipelines, data integration, and modern data flow, the key to next generation, data-driven applications, systems, and organizations.

article thumbnail

The 5 Hardest Things to Do in SQL

KDnuggets

The 5 hardest things Josh Berry, a 15 year analytics professional, experienced while switching from Python to SQL. Offering examples, SQL code, and a resource to customize the SQL to your own project.

SQL 145
article thumbnail

The Good and the Bad of Hadoop Big Data Framework

AltexSoft

What does it take to store all New York Times articles published between 1855 and 1922? Depending on how you measure it, the answer will be 11 million newspaper pages or… just one Hadoop cluster and one tech specialist who can move 4 terabytes of textual data to a new location in 24 hours. The biggest star of the Big Data world, Hadoop was named after a yellow stuffed elephant that belonged to the 2-year son of computer scientist Doug Cutting.

Hadoop 59
article thumbnail

Can BigQuery, Snowflake, and Redshift Handle Real-Time Data Analytics?

Rockset

Enterprise data warehouses (EDWs) became necessary in the 1980s when organizations shifted from using data for operational decisions to using data to fuel critical business decisions. Data warehouses differ from operational databases in that while operational transactional databases collate data for multiple transactional purposes, data warehouses aggregate this transactional data for analytics.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

What is Data Lineage?

Databand.ai

What is Data Lineage? Niv Sluzki 2022-07-28 10:20:02 The term “data lineage” has been thrown around a lot over the last few years. What started as an idea of connecting between datasets quickly became a very confusing term that now gets misused often. It’s time to put order to the chaos and dig deep into what it really is. Because the answer matters quite a lot.

article thumbnail

Practical Deep Learning from fast.ai is Back!

KDnuggets

Looking for a great course to go from machine learning zero to hero quickly? fast.ai has released the latest version of Practical Deep Learning For Coders. And it won't cost you a thing.

article thumbnail

Data Contracts and 4 Other Ways to Overcome Schema Changes

Monte Carlo

There are virtually an unlimited number of ways data can break. It could be a bad JOIN statement, an untriggered Airflow job, or even just someone at a third-party provider who didn’t feel like hitting the send button that day. But perhaps one of the most common reasons for data quality challenges are software feature updates and other changes made upstream by software engineers.

article thumbnail

MongoDB CDC: When to Use Kafka, Debezium, Change Streams and Rockset

Rockset

MongoDB has grown from a basic JSON key-value store to one of the most popular NoSQL database solutions in use today. It is widely supported and provides flexible JSON document storage at scale. It also provides native querying and analytics capabilities. These attributes have caused MongoDB to be widely adopted especially alongside JavaScript web applications.

MongoDB 52
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Here Is The Most Fun Way Of Obtaining The Illustrious IIM Indore Alumni Status: Integrated Program In Business Analytics

U-Next

Every layer of business operations today uses the power of metrics and analytics to enhance their market growth and business success. With the fourth industrial revolution increasing the dependency on emerging technologies like Data Science, Cloud Computing, IoT, Business Analytics, etc., the need to master the nuances of the same is relatively high.

article thumbnail

KDnuggets News, July 27: The AIoT Revolution: How AI and IoT Are Transforming Our World • Introduction to Hill Climbing Algorithm

KDnuggets

Calculus for Data Science • Real-time Translations with AI • Using Numpy's argmax() • Using the apply() Method with Pandas DataFrames • An Introduction to Hill Climbing Algorithm in AI.

Algorithm 141
article thumbnail

Using the Airflow ShortCircuitOperator to Stop Bad Data From Reaching ETL Pipelines 

Monte Carlo

I’m a huge fan of Apache Airflow and how the open source tool enables data engineers to scale data pipelines by more precisely orchestrating workloads. But what happens when Airflow testing doesn’t catch all of your bad data? What if “unknown unknown” data quality issues fall through the cracks and affect your Airflow jobs? One helpful but underutilized solution is to leverage the Airflow ShortCircuitOperator to create data circuit breakers to prevent bad data from flowing across your data

article thumbnail

Understanding the components of the dbt Semantic Layer

dbt Developer Hub

TLDR: The Semantic Layer is made up of a combination of open-source and SaaS offerings and is going to change how your team defines and consumes metrics. At last year's Coalesce, Drew showed us the future 1 - a vision of what metrics in dbt could look like. Since then, we've been getting the infrastructure in place to make that vision a reality. We wanted to share with you where we are today and how it fits into the broader picture of where we're going.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

How to Become a Data Scientist in 2022: The Ultimate Guide

Emeritus

Data science has become an integral part of every company, especially those who understand the value of data and what can be done with that information. The primary role of a data scientist is to extract actionable insights from complex data to inform your business decisions. If you are wondering how to become a data… The post How to Become a Data Scientist in 2022: The Ultimate Guide appeared first on Emeritus Online Courses.

article thumbnail

Using Scikit-learn’s Imputer

KDnuggets

Learn about Scikit-learn’s SimpleImputer, IterativeImputer, KNNImputer, and machine learning pipelines.

article thumbnail

Q&A Picnic Data Engineering Series

Picnic Engineering

The most important thing for a successful analytics strategy. Data Mesh, or Hub-and-Spoke? Is “lakeless” a thing!? … and other reflections on building data governance. Since the publication of the first blog post in this series, we have received numerous questions via social media, direct messages, public posts, and meet-up discussions. It’s been truly amazing to see so much interest and, as promised, we will address the most frequently raised topics in this post.

article thumbnail

Updating our permissioning guidelines: grants as configs in dbt Core v1.2

dbt Developer Hub

If you’ve needed to grant access to a dbt model between 2019 and today, there’s a good chance you’ve come across the "The exact grant statements we use in a dbt project" post on Discourse. It explained options for covering two complementary abilities: querying relations via the "select" privilege using the schema those relations are within via the "usage" privilege The solution then ​ Prior to dbt Core v1.2, we proposed three possible approaches (each coming with caveats and trade-offs ): Using

BI 52
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Difference Between Spring and Spring Boot

U-Next

Introduction . Spring Framework (Spring) is an open-source application framework that provides infrastructure assistance to develop Java applications. Spring is one of the most popular Java Enterprise Edition (Java EE) frameworks, which assists developers in creating high-performance applications using plain old Java objects (POJOs). It is used for developing stand-alone, production-grade applications on the Java Virtual Machine (JVM).

Java 52
article thumbnail

How do I do that in Python?

KDnuggets

This book from Manning is full of techniques and best practices for writing readable and maintainable Python code, with careful cross-referencing that reveals how the same concept can be used in different contexts.

Python 134
article thumbnail

AI in Manufacturing: 5 Successful Use Cases of AI-Based Technologies

AltexSoft

In October 2019, Microsoft reported artificial intelligence helped manufacturing companies outperform rivals stating that manufacturers adopting AI perform 12 percent better than their competitors.Therefore, we are likely to see the outburst of AI-based technologies in manufacturing along with the advent of new highly-paid workplaces in this area. In this article, we’ll highlight 5 use cases of adopting AI-based technologies in manufacturing.

article thumbnail

What Is the Difference Between a Database and a Warehouse in Snowflake? | Propel Data Analytics Blog

Propel Data

Snowflake uses databases for data storage, while a “Snowflake warehouse” is a virtual computing cluster that processes analytical queries.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Working in Cyber Security

U-Next

Is working in cyber security your dream job? If yes, this is the right place for you to learn how to become a cyber security expert and your role in the tech industry. Introduction. Cybersecurity aims at preventing cyber threats and protecting information and information systems. It includes protecting the company’s valuable information, hardware, software, and network.

article thumbnail

How ML Model Explainability Accelerates the AI Adoption Journey for Financial Services

KDnuggets

Explainability and good model governance reduce risk and create the framework for ethical and transparent AI in financial services that eliminates bias.

article thumbnail

Growth Engineering at Zalando

Zalando Engineering

We recently closed out our annual performance review for employees. Naturally, this period is for us to focus on how we are performing, what we aspire to achieve, and how we can progress towards those goals, with the support of our leads. As a leader, I’ve spent a great deal of time working with Software Engineers on their development, and helping them to drive their career progression.

article thumbnail

How to build Snowflake data apps with GraphQL | Propel Data Analytics Blog

Propel Data

Need to build a Snowflake data app? Here's how to create and query a Metric on top of Snowflake data warehouse using Propel’s GraphQL API.

article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.