Stop Blaming Humans for Bias in AI
KDnuggets
NOVEMBER 19, 2021
Can artificial intelligence be rid of bias? This is an important question, and it’s equally important that we look in the right place for the answer.
KDnuggets
NOVEMBER 19, 2021
Can artificial intelligence be rid of bias? This is an important question, and it’s equally important that we look in the right place for the answer.
Azure Data Engineering
NOVEMBER 15, 2021
In one of the previous posts, we discussed how we can use Validation activity to design the Pipeline to wait for a scheduled time and retry. There is another way to introduce a delay in the Pipeline. Wait activity can be used to pause the execution of the Pipeline for a fixed amount of time. Sometimes, we come across scenarios where we would like the execution for the Pipeline to be Paused for some time but not cancelled.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Cloudera
NOVEMBER 17, 2021
It’s no secret that Data Scientists have a difficult job. It feels like a lifetime ago that everyone was talking about data science as the sexiest job of the 21st century. Heck, it was so long ago that people were still meeting in person! Today, the sexy is starting to lose its shine. There’s recognition that it’s nearly impossible to find the unicorn data scientist that was the apple of every CEO’s eye in 2012.
Confluent
NOVEMBER 19, 2021
Imagine that you have real-time data about what’s happening in the stock market, and you want to support a large number of customized dashboards displaying the data as it comes […].
Advertisement
In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate
KDnuggets
NOVEMBER 19, 2021
The terms ‘data science’ and ‘machine learning’ are often used interchangeably. But while they are related, there are some glaring differences, so let’s take a look at the differences between the two disciplines, specifically as it relates to programming.
Data Engineering Podcast
NOVEMBER 14, 2021
Summary The most important gauge of success for a data platform is the level of trust in the accuracy of the information that it provides. In order to build and maintain that trust it is necessary to invest in defining, monitoring, and enforcing data quality metrics. In this episode Michael Harper advocates for proactive data quality and starting with the source, rather than being reactive and having to work backwards from when a problem is found.
Data Engineering Digest brings together the best content for data engineering professionals from the widest variety of industry thought leaders.
Confluent
NOVEMBER 16, 2021
Some call it a challenge. Others call it a community. Whatever you call it, 100 Days Of Code is a bunch of fun and a great learning experience that helps […].
KDnuggets
NOVEMBER 19, 2021
Want to know the difference between distributed and federated learning? Read this article to find out.
DataKitchen
NOVEMBER 18, 2021
For several years now, the elephant in the room has been that data and analytics projects are failing. Gartner estimated that 85% of big data projects fail. Data from New Vantage partners showed that the number of data-driven organizations has actually declined to 24% from 37% several years ago and that only 29% of organizations are achieving transformational outcomes from their data. .
Cloudera
NOVEMBER 16, 2021
Introduction. With the general availability of Cloudera DataFlow for the Public Cloud (CDF-PC) , our customers can now self-serve deployments of Apache NiFi data flows on Kubernetes clusters in a cost effective way providing auto scaling, resource isolation and monitoring with KPI-based alerting. You can find more information in this release announcement blog post and in this technical deep dive blog post.
Speaker: Tamara Fingerlin, Developer Advocate
Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.
Netflix Tech
NOVEMBER 15, 2021
Martin Tingley with Wenjing Zheng , Simon Ejdemyr , Stephanie Lane , Michael Lindon , and Colin McFarland This is the fifth post in a multi-part series on how Netflix uses A/B tests to inform decisions and continuously innovate on our products. Need to catch up? Have a look at Part 1 (Decision Making at Netflix), Part 2 (What is an A/B Test?), Part 3 (False positives and statistical significance), and Part 4 (False negatives and power).
KDnuggets
NOVEMBER 18, 2021
Natural language processing research and applications are moving forward rapidly. Several trends have emerged on this progress, and point to a future of more exciting possibilities and interesting opportunities in the field.
Confluent
NOVEMBER 18, 2021
We’re pleased to announce ksqlDB 0.22.0! This release includes source streams and source tables as well as improved pull query (for key-range predicates) and push query performance. All of these […].
Cloudera
NOVEMBER 15, 2021
The word “data” is ubiquitous in narratives of the modern world. And data, the thing itself, is vital to the functioning of that world. This blog discusses quantifications, types, and implications of data. If you’ve ever wondered how much data there is in the world, what types there are and what that means for AI and businesses, then keep reading! Quantifications of data.
Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage
There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.
AltexSoft
NOVEMBER 17, 2021
If you’ve ever been to a bookstore, you probably know the dilemma of the book location. Say you’re looking for “Atlas Shrugged”, and you know it’s a mix of science fiction, mystery, and romance genres. Now, which bookshelf will you go for to find it? Should it be on the science fiction or on the romance shelf? The problem of document classification pertains to the library, information, and computer sciences.
KDnuggets
NOVEMBER 17, 2021
We describe types of recommender systems, more specifically, algorithms and methods for content-based systems, collaborative filtering, and hybrid systems.
DataKitchen
NOVEMBER 19, 2021
Learn how a DataOps Process Hub enables Business Analysts to rapidly answer stakeholders' analytic questions without waiting on the centralized IT Team. The post Solve the Analytics Last-Mile Problem with a DataOps Process Hub first appeared on DataKitchen.
Cloudera
NOVEMBER 18, 2021
With so many impactful and innovative projects being carried out by our customers using the Cloudera platform, selecting the winners of our annual Data Impact Awards (DIA) is never an easy task. Not ones to shy away from a challenge, our expert judges have deliberated and combed through the finalist entries, identifying the customers who are leading industry change and inspiring peers with their data achievements.
Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives
Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri
ProjectPro
NOVEMBER 18, 2021
Data science has been a trending buzzword in recent times. With wide applications in various sectors like healthcare, education, retail, transportation, media, and banking -data science applications are at the core of pretty much every industry out there. The possibilities are endless: analysis of frauds in the finance sector or the personalization of recommendations on eCommerce businesses.
KDnuggets
NOVEMBER 16, 2021
The field of computer vision has seen the development of very powerful applications leveraging machine learning. These projects will introduce you to these techniques and guide you to more advanced practice to gain a deeper appreciation for the sophistication now available.
Teradata
NOVEMBER 17, 2021
Many Teradata customers are interested in integrating Vantage with Microsoft Azure first party services. This guide will help you connect Teradata QueryGrid to Azure HDInsight.
Grouparoo
NOVEMBER 16, 2021
For organizations that manage large volumes of data, leveraging maximum value from the information buried in the data can be a challenge. Breaking silos and collating data into a coherent set of information for processing will yield business benefits. Still, this is only possible once information is in a form enabling the application of analytical techniques.
Advertisement
With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you
ProjectPro
NOVEMBER 17, 2021
Emotions are essential, not only in personal life but in business as well. How your customers and target audience feel about your products or brand provides you with the context necessary to evaluate and improve the product, business, marketing, and communications strategy. Sentiment analysis or opinion mining helps researchers and companies extract insights from user-generated social media and web content.
KDnuggets
NOVEMBER 16, 2021
Learn how to effectively communicate your work.
Preset
NOVEMBER 15, 2021
This post showcases how we at Preset achieve zero downtime web application deployment using Kubernetes (AWS EKS) with zero failed requests.
Grouparoo
NOVEMBER 15, 2021
Application Programming Interfaces or APIs are an integral part of modern software development and enable a wide variety of applications and workflows. Enterprises are becoming increasingly reliant on APIs to effectively connect with partners and customers. APIs come in an array of types and protocols that work great in different scenarios. In this article, we’ll examine the different types of APIs used in software development today.
Speaker: Tamara Fingerlin, Developer Advocate
In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!
ProjectPro
NOVEMBER 16, 2021
Artificial Intelligence is a technique for a machine to imitate human behavior. Today, AI is touted to be instrumental in enabling Industry 4.0 for organizations of all shapes and sizes across all industry verticals. The use of AI applications is continuously expanding, and tech enthusiasts must stay up with this fast-changing sector, especially with open source AI projects, to deploy AI driven projects successfully.
KDnuggets
NOVEMBER 17, 2021
Faker is a Python library that generates fake data to supplement or take the place of real world data. See how it can be used for data science.
Afterpay Tech
NOVEMBER 14, 2021
Photo by Joshua Sortino on Unsplash By: Dorien Koelemeijer Just like the cliché says, security is only as strong as the weakest link. We know that a single Identity and Access Management (IAM) misconfiguration in our AWS environment can lead to compromise of our entire cloud environment. It’s the task of the Security Team at Afterpay to manage this risk, while at the same time making sure that engineers are not slowed down in their everyday tasks.
dbt Developer Hub
NOVEMBER 14, 2021
Hi there, Before I get to the goods, I just wanted to quickly flag that Coalesce is less than 3 weeks away! ? If you had to choose just ONE of the 60+ sessions on tap, consider Tristan's keynote with A16z's Martin Casado. It has two of my favorite elements: 1) Spice ?️ 2) Not-actually-about-us ? Martin and Tristan will discuss something we've all probably considered with the latest wave of innovation (and funding) in our space: Is the modern data stack just another wave in a long string of trend
Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage
When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m
Let's personalize your content