Sat.Dec 14, 2019 - Fri.Dec 20, 2019

article thumbnail

Interpretability part 3: LIME and SHAP

KDnuggets

The third part in a series on leveraging techniques to take a look inside the black box of AI, this guide considers methods that try to explain each prediction instead of establishing a global explanation.

157
157
article thumbnail

Apache Kafka Producer Improvements with the Sticky Partitioner

Confluent

The amount of time it takes for a message to move through a system plays a big role in the performance of distributed systems like Apache Kafka®. In Kafka, the […].

Kafka 26
article thumbnail

Uber Infrastructure in 2019: Improving Reliability, Driving Customer Satisfaction

Uber Engineering

Every day around the world, millions of trips take place across the Uber network, giving users more reliable transportation through ridesharing, bikes, and scooters, drivers and truckers additional opportunities to earn, employees and employers more convenient business travel, and hungry … The post Uber Infrastructure in 2019: Improving Reliability, Driving Customer Satisfaction appeared first on Uber Engineering Blog.

article thumbnail

Solving Data Lineage Tracking And Data Discovery At WeWork

Data Engineering Podcast

Summary Building clean datasets with reliable and reproducible ingestion pipelines is completely useless if it’s not possible to find them and understand their provenance. The solution to discoverability and tracking of data lineage is to incorporate a metadata repository into your data platform. The metadata repository serves as a data catalog and a means of reporting on the health and status of your datasets when it is properly integrated into the rest of your tools.

Metadata 100
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

The 4 fastest ways not to get hired as a data scientist

KDnuggets

Ready to try to get hired as a data scientist for the first time? Avoiding these common mistakes won’t guarantee an offer, but not avoiding them is a sure fire way for your application to be tossed into the trash bin.

Data 156
article thumbnail

The Easiest Way to Install Apache Kafka and Confluent Platform – Using Ansible

Confluent

With Confluent Platform 5.3, we are actively embracing the rising DevOps movement by introducing CP-Ansible, our very own open source Ansible playbooks for deployment of Apache Kafka® and the Confluent […].

Kafka 22

More Trending

article thumbnail

DBLog: A Generic Change-Data-Capture Framework

Netflix Tech

Andreas Andreakis , Ioannis Papapanagiotou Overview Change-Data-Capture (CDC) allows capturing committed changes from a database in real-time and propagating those changes to downstream consumers [1][2]. CDC is becoming increasingly popular for use cases that require keeping multiple heterogeneous datastores in sync (like MySQL and ElasticSearch) and addresses challenges that exist with traditional techniques like dual-writes and distributed transactions [3][4].

MySQL 89
article thumbnail

Automatic Text Summarization in a Nutshell

KDnuggets

Marketing scientist Kevin Gray asks Dr. Anna Farzindar of the University of Southern California about Automatic Text Summarization and the various ways it is used.

IT 152
article thumbnail

What’s New in Apache Kafka 2.4

Confluent

On behalf of the Apache Kafka® community, it is my pleasure to announce the release of Apache Kafka 2.4.0. This release includes a number of key new features and improvements […].

Kafka 21
article thumbnail

How Dataquest Made the Difference for Stacey’s Data Job

Dataquest

Today, Stacey Ustian is a data engineer. But the path that led her here wasn’t always easy, and there were a few bumps and twists along the way. Her journey to data science started in a rather unusual place: the law library. After earning her Master’s degree in Library and Information Science, Stacey had taken a job working in the library of a law firm.

SQL 52
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

DBLog: A Generic Change-Data-Capture Framework

Netflix Tech

Andreas Andreakis , Ioannis Papapanagiotou Overview Change-Data-Capture (CDC) allows capturing committed changes from a database in real-time and propagating those changes to downstream consumers [1][2]. CDC is becoming increasingly popular for use cases that require keeping multiple heterogeneous datastores in sync (like MySQL and ElasticSearch) and addresses challenges that exist with traditional techniques like dual-writes and distributed transactions [3][4].

MySQL 83
article thumbnail

Alternative Cloud Hosted Data Science Environments

KDnuggets

Over the years new alternative providers have risen to provided a solitary data science environment hosted on the cloud for data scientist to analyze, host and share their work.

article thumbnail

Testing Kafka Streams Using TestInputTopic and TestOutputTopic

Confluent

As a test class that allows you to test Kafka Streams logic, TopologyTestDriver is a lot faster than utilizing EmbeddedSingleNodeKafkaCluster and makes it possible to simulate different timing scenarios. Not […].

Kafka 19
article thumbnail

Keeping a Lid on Concurrency within the Vantage Platform

Teradata

Carrie Ballinger discusses the techniques for managing concurrency inside the Advanced SQL Engine and the benefits provided. Read more.

SQL 49
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

DBLog: A Generic Change-Data-Capture Framework

Netflix Tech

Andreas Andreakis, Ioannis Papapanagiotou Continue reading on Netflix TechBlog ».

Data 52
article thumbnail

Industry AI, Analytics, Machine Learning, Data Science Predictions for 2020

KDnuggets

Predictions for 2020 from a dozen innovative companies in AI, Analytics, Machine Learning, Data Science, and Data industry.

article thumbnail

Superset Announces Elasticsearch Support!

Preset

Announcing Elasticsearch in Superset, powered by a new open-source Python library from Preset

Python 40
article thumbnail

6 Practices to Realize a Long-Term Data Vision Through Near-Term Work

Teradata

Enterprises either have no data strategy at all or an over-complicated one that under delivers. Find out how to create an effective data strategy by striking balance.

Data 40
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Google’s New Explainable AI Service

KDnuggets

Google has started offering a new service for “explainable AI” or XAI, as it is fashionably called. Presently offered tools are modest, but the intent is in the right direction.

IT 134
article thumbnail

Let’s Build an Intelligent Chatbot

KDnuggets

Check out this step by step approach to building an intelligent chatbot in Python.

Building 123
article thumbnail

The Most In Demand Tech Skills for Data Scientists

KDnuggets

By the end of this article you’ll know which technologies are becoming more popular with employers and which are becoming less popular.

article thumbnail

The Ultimate Guide to Model Retraining

KDnuggets

Once you have deployed your machine learning model into production, differences in real-world data will result in model drift. So, retraining and redeploying will likely be required. In other words, deployment should be treated as a continuous process. This guide defines model drift and how to identify it, and includes approaches to enable model training.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

How to Convert an RGB Image to Grayscale

KDnuggets

This post is about working with a mixture of color and grayscale images and needing to transform them into a uniform format - all grayscale. We'll be working in Python using the Pillow, Numpy, and Matplotlib packages.

Python 114
article thumbnail

5 Ways to Apply Ethics to AI

KDnuggets

Here are six more lessons based on real life examples that I think we should all remember as people working in machine learning, whether you’re a researcher, engineer, or a decision-maker.

article thumbnail

Pedestrian Detection Using Non Maximum Suppression Algorithm

KDnuggets

Read this overview of a complete pipeline for detecting pedestrians on the road.

article thumbnail

How To “Ultralearn” Data Science: optimization learning, Part 3

KDnuggets

This third part in a series about how to "ultralearn" data science will guide you through how to optimize your learning through five valuable techniques.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Microsoft Introduces Icebreaker to Address the Famous Ice-Start Challenge in Machine Learning

KDnuggets

The new technique allows the deployment of machine learning models that operate with minimum training data.

article thumbnail

Top 2019 Stories: Top 10 Technology Trends of 2019; How to select rows and columns in Pandas

KDnuggets

Also: Your AI skills are worth less than you think; Another 10 Free Must-See Courses for Machine Learning and Data Science.

article thumbnail

How To “Ultralearn” Data Science: removing distractions and finding focus, Part 2

KDnuggets

This second part in a series about how to "ultralearn" data science will guide you through several techniques to remove those distractions -- because your focus needs more focus.

article thumbnail

Ontotext Platform 3.0 for Enterprise Knowledge Graphs Released

KDnuggets

Ontotext Platform 3.0 features significant technology improvements to enable simpler and faster graph navigation, including GraphQL interfaces to make it easier for application developers to access knowledge graphs without tedious development of back-end APIs or complex SPARQL.

article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.