Detecting Data Drift for Ensuring Production ML Model Quality Using Eurybia
KDnuggets
JULY 26, 2022
This article will focus on a step-by-step data drift study using Eurybia an open-source python library.
KDnuggets
JULY 26, 2022
This article will focus on a step-by-step data drift study using Eurybia an open-source python library.
Confluent
JULY 26, 2022
Explore GitHub Actions for your Kafka CI/CD pipeline, automate Schema Registry, and transform the development and testing of Kafka client applications.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Data Engineering Podcast
JULY 24, 2022
Summary The current stage of evolution in the data management ecosystem has resulted in domain and use case specific orchestration capabilities being incorporated into various tools. This complicates the work involved in making end-to-end workflows visible and integrated. Dagster has invested in bringing insights about external tools’ dependency graphs into one place through its "software defined assets" functionality.
Teradata
JULY 25, 2022
For many, banking is now a digital activity. But the financial services industry still trails many others in leveraging cloud technologies to build deeper, emotional attachments to their customers.
Advertisement
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
KDnuggets
JULY 27, 2022
Here are some best practices and techniques for domain-specific model adaptation that worked for us time and again.
Cloudera
JULY 29, 2022
Corporations are generating unprecedented volumes of data, especially in industries such as telecom and financial services industries (FSI). Many organizations are hoping to leverage these massive amounts of data by investing heavily in big data solutions – solutions that they hope can meet business goals such as increasing customer satisfaction, uncovering alternative revenue streams, or improving operational efficiency.
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
Confluent
JULY 28, 2022
Complete guide to data pipelines, data integration, and modern data flow, the key to next generation, data-driven applications, systems, and organizations.
KDnuggets
JULY 26, 2022
The 5 hardest things Josh Berry, a 15 year analytics professional, experienced while switching from Python to SQL. Offering examples, SQL code, and a resource to customize the SQL to your own project.
Rockset
JULY 29, 2022
Enterprise data warehouses (EDWs) became necessary in the 1980s when organizations shifted from using data for operational decisions to using data to fuel critical business decisions. Data warehouses differ from operational databases in that while operational transactional databases collate data for multiple transactional purposes, data warehouses aggregate this transactional data for analytics.
Databand.ai
JULY 28, 2022
What is Data Lineage? Niv Sluzki 2022-07-28 10:20:02 The term “data lineage” has been thrown around a lot over the last few years. What started as an idea of connecting between datasets quickly became a very confusing term that now gets misused often. It’s time to put order to the chaos and dig deep into what it really is. Because the answer matters quite a lot.
Speaker: Tamara Fingerlin, Developer Advocate
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
Monte Carlo
JULY 28, 2022
There are virtually an unlimited number of ways data can break. It could be a bad JOIN statement, an untriggered Airflow job, or even just someone at a third-party provider who didn’t feel like hitting the send button that day. But perhaps one of the most common reasons for data quality challenges are software feature updates and other changes made upstream by software engineers.
KDnuggets
JULY 25, 2022
Looking for a great course to go from machine learning zero to hero quickly? fast.ai has released the latest version of Practical Deep Learning For Coders. And it won't cost you a thing.
Rockset
JULY 28, 2022
MongoDB has grown from a basic JSON key-value store to one of the most popular NoSQL database solutions in use today. It is widely supported and provides flexible JSON document storage at scale. It also provides native querying and analytics capabilities. These attributes have caused MongoDB to be widely adopted especially alongside JavaScript web applications.
U-Next
JULY 27, 2022
Every layer of business operations today uses the power of metrics and analytics to enhance their market growth and business success. With the fourth industrial revolution increasing the dependency on emerging technologies like Data Science, Cloud Computing, IoT, Business Analytics, etc., the need to master the nuances of the same is relatively high.
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
Emeritus
JULY 26, 2022
Data science has become an integral part of every company, especially those who understand the value of data and what can be done with that information. The primary role of a data scientist is to extract actionable insights from complex data to inform your business decisions. If you are wondering how to become a data… The post How to Become a Data Scientist in 2022: The Ultimate Guide appeared first on Emeritus Online Courses.
KDnuggets
JULY 27, 2022
Calculus for Data Science • Real-time Translations with AI • Using Numpy's argmax() • Using the apply() Method with Pandas DataFrames • An Introduction to Hill Climbing Algorithm in AI.
Monte Carlo
JULY 26, 2022
I’m a huge fan of Apache Airflow and how the open source tool enables data engineers to scale data pipelines by more precisely orchestrating workloads. But what happens when Airflow testing doesn’t catch all of your bad data? What if “unknown unknown” data quality issues fall through the cracks and affect your Airflow jobs? One helpful but underutilized solution is to leverage the Airflow ShortCircuitOperator to create data circuit breakers to prevent bad data from flowing across your data
dbt Developer Hub
JULY 26, 2022
TLDR: The Semantic Layer is made up of a combination of open-source and SaaS offerings and is going to change how your team defines and consumes metrics. At last year's Coalesce, Drew showed us the future 1 - a vision of what metrics in dbt could look like. Since then, we've been getting the infrastructure in place to make that vision a reality. We wanted to share with you where we are today and how it fits into the broader picture of where we're going.
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
Picnic Engineering
JULY 26, 2022
The most important thing for a successful analytics strategy. Data Mesh, or Hub-and-Spoke? Is “lakeless” a thing!? … and other reflections on building data governance. Since the publication of the first blog post in this series, we have received numerous questions via social media, direct messages, public posts, and meet-up discussions. It’s been truly amazing to see so much interest and, as promised, we will address the most frequently raised topics in this post.
KDnuggets
JULY 25, 2022
Learn about Scikit-learn’s SimpleImputer, IterativeImputer, KNNImputer, and machine learning pipelines.
Monte Carlo
JULY 25, 2022
There’s a lot of content out there about why a data mesh is (or isn’t) the best thing since sliced bread. But one thing’s for sure: if you can’t trust the data powering your analytics architecture, it’s hard to justify the investment. Here’s how Snowflake and Monte Carlo are working together to help data teams realize the potential of the data mesh with end-to-end data observability.
dbt Developer Hub
JULY 25, 2022
If you’ve needed to grant access to a dbt model between 2019 and today, there’s a good chance you’ve come across the "The exact grant statements we use in a dbt project" post on Discourse. It explained options for covering two complementary abilities: querying relations via the "select" privilege using the schema those relations are within via the "usage" privilege The solution then Prior to dbt Core v1.2, we proposed three possible approaches (each coming with caveats and trade-offs ): Using
Advertisement
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
U-Next
JULY 25, 2022
Introduction . Spring Framework (Spring) is an open-source application framework that provides infrastructure assistance to develop Java applications. Spring is one of the most popular Java Enterprise Edition (Java EE) frameworks, which assists developers in creating high-performance applications using plain old Java objects (POJOs). It is used for developing stand-alone, production-grade applications on the Java Virtual Machine (JVM).
KDnuggets
JULY 28, 2022
This book from Manning is full of techniques and best practices for writing readable and maintainable Python code, with careful cross-referencing that reveals how the same concept can be used in different contexts.
AltexSoft
JULY 23, 2022
In October 2019, Microsoft reported artificial intelligence helped manufacturing companies outperform rivals stating that manufacturers adopting AI perform 12 percent better than their competitors.Therefore, we are likely to see the outburst of AI-based technologies in manufacturing along with the advent of new highly-paid workplaces in this area. In this article, we’ll highlight 5 use cases of adopting AI-based technologies in manufacturing.
Propel Data
JULY 27, 2022
Snowflake uses databases for data storage, while a “Snowflake warehouse” is a virtual computing cluster that processes analytical queries.
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
U-Next
JULY 25, 2022
Is working in cyber security your dream job? If yes, this is the right place for you to learn how to become a cyber security expert and your role in the tech industry. Introduction. Cybersecurity aims at preventing cyber threats and protecting information and information systems. It includes protecting the company’s valuable information, hardware, software, and network.
KDnuggets
JULY 29, 2022
Explainability and good model governance reduce risk and create the framework for ethical and transparent AI in financial services that eliminates bias.
Zalando Engineering
JULY 25, 2022
We recently closed out our annual performance review for employees. Naturally, this period is for us to focus on how we are performing, what we aspire to achieve, and how we can progress towards those goals, with the support of our leads. As a leader, I’ve spent a great deal of time working with Software Engineers on their development, and helping them to drive their career progression.
Propel Data
JULY 25, 2022
Need to build a Snowflake data app? Here's how to create and query a Metric on top of Snowflake data warehouse using Propel’s GraphQL API.
Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage
When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m
Let's personalize your content