Detecting Data Drift for Ensuring Production ML Model Quality Using Eurybia
KDnuggets
JULY 26, 2022
This article will focus on a step-by-step data drift study using Eurybia an open-source python library.
KDnuggets
JULY 26, 2022
This article will focus on a step-by-step data drift study using Eurybia an open-source python library.
Confluent
JULY 26, 2022
Explore GitHub Actions for your Kafka CI/CD pipeline, automate Schema Registry, and transform the development and testing of Kafka client applications.
Data Engineering Podcast
JULY 24, 2022
Summary The current stage of evolution in the data management ecosystem has resulted in domain and use case specific orchestration capabilities being incorporated into various tools. This complicates the work involved in making end-to-end workflows visible and integrated. Dagster has invested in bringing insights about external tools’ dependency graphs into one place through its "software defined assets" functionality.
Cloudera
JULY 29, 2022
Corporations are generating unprecedented volumes of data, especially in industries such as telecom and financial services industries (FSI). Many organizations are hoping to leverage these massive amounts of data by investing heavily in big data solutions – solutions that they hope can meet business goals such as increasing customer satisfaction, uncovering alternative revenue streams, or improving operational efficiency.
Advertisement
Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.
KDnuggets
JULY 27, 2022
Here are some best practices and techniques for domain-specific model adaptation that worked for us time and again.
Teradata
JULY 25, 2022
For many, banking is now a digital activity. But the financial services industry still trails many others in leveraging cloud technologies to build deeper, emotional attachments to their customers.
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
Confluent
JULY 28, 2022
Complete guide to data pipelines, data integration, and modern data flow, the key to next generation, data-driven applications, systems, and organizations.
KDnuggets
JULY 26, 2022
The 5 hardest things Josh Berry, a 15 year analytics professional, experienced while switching from Python to SQL. Offering examples, SQL code, and a resource to customize the SQL to your own project.
AltexSoft
JULY 29, 2022
What does it take to store all New York Times articles published between 1855 and 1922? Depending on how you measure it, the answer will be 11 million newspaper pages or… just one Hadoop cluster and one tech specialist who can move 4 terabytes of textual data to a new location in 24 hours. The biggest star of the Big Data world, Hadoop was named after a yellow stuffed elephant that belonged to the 2-year son of computer scientist Doug Cutting.
Rockset
JULY 29, 2022
Enterprise data warehouses (EDWs) became necessary in the 1980s when organizations shifted from using data for operational decisions to using data to fuel critical business decisions. Data warehouses differ from operational databases in that while operational transactional databases collate data for multiple transactional purposes, data warehouses aggregate this transactional data for analytics.
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
Databand.ai
JULY 28, 2022
What is Data Lineage? Niv Sluzki 2022-07-28 10:20:02 The term “data lineage” has been thrown around a lot over the last few years. What started as an idea of connecting between datasets quickly became a very confusing term that now gets misused often. It’s time to put order to the chaos and dig deep into what it really is. Because the answer matters quite a lot.
KDnuggets
JULY 25, 2022
Looking for a great course to go from machine learning zero to hero quickly? fast.ai has released the latest version of Practical Deep Learning For Coders. And it won't cost you a thing.
Monte Carlo
JULY 28, 2022
There are virtually an unlimited number of ways data can break. It could be a bad JOIN statement, an untriggered Airflow job, or even just someone at a third-party provider who didn’t feel like hitting the send button that day. But perhaps one of the most common reasons for data quality challenges are software feature updates and other changes made upstream by software engineers.
Rockset
JULY 28, 2022
MongoDB has grown from a basic JSON key-value store to one of the most popular NoSQL database solutions in use today. It is widely supported and provides flexible JSON document storage at scale. It also provides native querying and analytics capabilities. These attributes have caused MongoDB to be widely adopted especially alongside JavaScript web applications.
Advertisement
Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.
U-Next
JULY 27, 2022
Every layer of business operations today uses the power of metrics and analytics to enhance their market growth and business success. With the fourth industrial revolution increasing the dependency on emerging technologies like Data Science, Cloud Computing, IoT, Business Analytics, etc., the need to master the nuances of the same is relatively high.
KDnuggets
JULY 27, 2022
Calculus for Data Science • Real-time Translations with AI • Using Numpy's argmax() • Using the apply() Method with Pandas DataFrames • An Introduction to Hill Climbing Algorithm in AI.
Monte Carlo
JULY 26, 2022
I’m a huge fan of Apache Airflow and how the open source tool enables data engineers to scale data pipelines by more precisely orchestrating workloads. But what happens when Airflow testing doesn’t catch all of your bad data? What if “unknown unknown” data quality issues fall through the cracks and affect your Airflow jobs? One helpful but underutilized solution is to leverage the Airflow ShortCircuitOperator to create data circuit breakers to prevent bad data from flowing across your data
dbt Developer Hub
JULY 26, 2022
TLDR: The Semantic Layer is made up of a combination of open-source and SaaS offerings and is going to change how your team defines and consumes metrics. At last year's Coalesce, Drew showed us the future 1 - a vision of what metrics in dbt could look like. Since then, we've been getting the infrastructure in place to make that vision a reality. We wanted to share with you where we are today and how it fits into the broader picture of where we're going.
Advertisement
Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?
Emeritus
JULY 26, 2022
Data science has become an integral part of every company, especially those who understand the value of data and what can be done with that information. The primary role of a data scientist is to extract actionable insights from complex data to inform your business decisions. If you are wondering how to become a data… The post How to Become a Data Scientist in 2022: The Ultimate Guide appeared first on Emeritus Online Courses.
KDnuggets
JULY 25, 2022
Learn about Scikit-learn’s SimpleImputer, IterativeImputer, KNNImputer, and machine learning pipelines.
Picnic Engineering
JULY 26, 2022
The most important thing for a successful analytics strategy. Data Mesh, or Hub-and-Spoke? Is “lakeless” a thing!? … and other reflections on building data governance. Since the publication of the first blog post in this series, we have received numerous questions via social media, direct messages, public posts, and meet-up discussions. It’s been truly amazing to see so much interest and, as promised, we will address the most frequently raised topics in this post.
dbt Developer Hub
JULY 25, 2022
If you’ve needed to grant access to a dbt model between 2019 and today, there’s a good chance you’ve come across the "The exact grant statements we use in a dbt project" post on Discourse. It explained options for covering two complementary abilities: querying relations via the "select" privilege using the schema those relations are within via the "usage" privilege The solution then Prior to dbt Core v1.2, we proposed three possible approaches (each coming with caveats and trade-offs ): Using
Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali
As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.
U-Next
JULY 25, 2022
Introduction . Spring Framework (Spring) is an open-source application framework that provides infrastructure assistance to develop Java applications. Spring is one of the most popular Java Enterprise Edition (Java EE) frameworks, which assists developers in creating high-performance applications using plain old Java objects (POJOs). It is used for developing stand-alone, production-grade applications on the Java Virtual Machine (JVM).
KDnuggets
JULY 28, 2022
This book from Manning is full of techniques and best practices for writing readable and maintainable Python code, with careful cross-referencing that reveals how the same concept can be used in different contexts.
AltexSoft
JULY 23, 2022
In October 2019, Microsoft reported artificial intelligence helped manufacturing companies outperform rivals stating that manufacturers adopting AI perform 12 percent better than their competitors.Therefore, we are likely to see the outburst of AI-based technologies in manufacturing along with the advent of new highly-paid workplaces in this area. In this article, we’ll highlight 5 use cases of adopting AI-based technologies in manufacturing.
Propel Data
JULY 27, 2022
Snowflake uses databases for data storage, while a “Snowflake warehouse” is a virtual computing cluster that processes analytical queries.
Speaker: Nikhil Joshi, Founder & President of Snic Solutions
Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.
U-Next
JULY 25, 2022
Is working in cyber security your dream job? If yes, this is the right place for you to learn how to become a cyber security expert and your role in the tech industry. Introduction. Cybersecurity aims at preventing cyber threats and protecting information and information systems. It includes protecting the company’s valuable information, hardware, software, and network.
KDnuggets
JULY 29, 2022
Explainability and good model governance reduce risk and create the framework for ethical and transparent AI in financial services that eliminates bias.
Zalando Engineering
JULY 25, 2022
We recently closed out our annual performance review for employees. Naturally, this period is for us to focus on how we are performing, what we aspire to achieve, and how we can progress towards those goals, with the support of our leads. As a leader, I’ve spent a great deal of time working with Software Engineers on their development, and helping them to drive their career progression.
Propel Data
JULY 25, 2022
Need to build a Snowflake data app? Here's how to create and query a Metric on top of Snowflake data warehouse using Propel’s GraphQL API.
Advertisement
Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.
Let's personalize your content