Sat.Nov 02, 2019 - Fri.Nov 08, 2019

article thumbnail

How to Create a Vocabulary for NLP Tasks in Python

KDnuggets

This post will walkthrough a Python implementation of a vocabulary class for storing processed text data and related metadata in a manner useful for subsequently performing NLP tasks.

Python 123
article thumbnail

Automating Your Production Dataflows On Spark

Data Engineering Podcast

Summary As data engineers the health of our pipelines is our highest priority. Unfortunately, there are countless ways that our dataflows can break or degrade that have nothing to do with the business logic or data transformations that we write and maintain. Sean Knapp founded Ascend to address the operational challenges of running a production grade and scalable Spark infrastructure, allowing data engineers to focus on the problems that power their business.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

GraphQL Search Indexing

Netflix Tech

by Artem Shtatnov and Ravi Srinivas Ranganathan Almost a year ago we described our learnings from adopting GraphQL on the Netflix Marketing Tech team. We have a lot more to share since then! There are plenty of existing resources describing how to express a search query in GraphQL and paginate the results. This post looks at the other side of search: how to index data and make it searchable.

Kafka 99
article thumbnail

Introducing Confluent Cloud on Microsoft Azure

Confluent

Today, we are proud to make Confluent Cloud available to companies leveraging the Microsoft Azure ecosystem of services, in addition to the previous rollouts on Google Cloud Platform (GCP) and […].

Cloud 81
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

10 Free Must-read Books on AI

KDnuggets

Artificial Intelligence continues to fill the media headlines while scientists and engineers rapidly expand its capabilities and applications. With such explosive growth in the field, there is a great deal to learn. Dive into these 10 free books that are must-reads to support your AI study and work.

Media 123
article thumbnail

Power to the People: Vantage Analyst in Action

Teradata

The people who drive real business innovation in your org may not all be coders. With Vantage Analyst, they can explore data to uncover insights that may lead to that next big thing.

Data 11

More Trending

article thumbnail

How to Use Single Message Transforms in Kafka Connect

Confluent

Kafka Connect is the part of Apache Kafka® that provides reliable, scalable, distributed streaming integration between Apache Kafka and other systems. Kafka Connect has connectors for many, many systems, and […].

Kafka 74
article thumbnail

Facebook Has Been Quietly Open Sourcing Some Amazing Deep Learning Capabilities for PyTorch

KDnuggets

The new release of PyTorch includes some impressive open source projects for deep learning researchers and developers.

article thumbnail

GraphQL Search Indexing

Netflix Tech

by Artem Shtatnov and Ravi Srinivas Ranganathan Almost a year ago we described our learnings from adopting GraphQL on the Netflix Marketing Tech team. We have a lot more to share since then! There are plenty of existing resources describing how to express a search query in GraphQL and paginate the results. This post looks at the other side of search: how to index data and make it searchable.

Kafka 46
article thumbnail

Analytics on Kafka Event Streams Using Druid, Elasticsearch and Rockset

Rockset

Events are messages that are sent by a system to notify operators or other systems about a change in its domain. With event-driven architectures powered by systems like Apache Kafka becoming more prominent, there are now many applications in the modern software stack that make use of events and messages to operate effectively. In this blog, we will examine the use of three different data backends for event data - Apache Druid , Elasticsearch and Rockset.

Kafka 40
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Three Distinctly Different Customer Experience Strategies

Teradata

Improving the customer experience is the top priority for CMOs. Find out what the top 3 distinct CX strategies are to drive customer loyalty.

40
article thumbnail

Customer Segmentation Using K Means Clustering

KDnuggets

Customer Segmentation can be a powerful means to identify unsatisfied customer needs. This technique can be used by companies to outperform the competition by developing uniquely appealing products and services.

Python 122
article thumbnail

GraphQL Search Indexing

Netflix Tech

by Artem Shtatnov and Ravi Srinivas Ranganathan Almost a year ago we described our learnings from adopting GraphQL on the Netflix Marketing Tech team. We have a lot more to share since then! There are plenty of existing resources describing how to express a search query in GraphQL and paginate the results. This post looks at the other side of search: how to index data and make it searchable.

Kafka 40
article thumbnail

Designing Your Neural Networks

KDnuggets

Check out this step-by-step walk through of some of the more confusing aspects of neural nets to guide you to making smart decisions about your neural network architecture.

Designing 122
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Set Operations Applied to Pandas DataFrames

KDnuggets

In this tutorial, we show how to apply mathematical set operations (union, intersection, and difference) to Pandas DataFrames with the goal of easing the task of comparing the rows of two datasets.

Datasets 118
article thumbnail

Understanding Boxplots

KDnuggets

A boxplot. It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed.

IT 111
article thumbnail

Data Cleaning and Preprocessing for Beginners

KDnuggets

Careful preprocessing of data for your machine learning project is crucial. This overview describes the process of data cleaning and dealing with noise and missing data.

article thumbnail

Orchestrating Dynamic Reports in Python and R with Rmd Files

KDnuggets

Do you want to extract csv files with Python and visualize them in R? How does preparing everything in R and make conclusions with Python sound? Both are possible if you know the right libraries and techniques. Here, we’ll walk through a use-case using both languages in one analysis.

Python 109
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Research Guide: Advanced Loss Functions for Machine Learning Models

KDnuggets

This guide explores research centered on a variety of advanced loss functions for machine learning models.

article thumbnail

Probability Learning: Maximum Likelihood

KDnuggets

The maths behind Bayes will be better understood if we first cover the theory and maths underlying another fundamental method of probabilistic machine learning: Maximum Likelihood. This post will be dedicated to explaining it.

article thumbnail

The Last Defense Against Another AI Winter

KDnuggets

My short answer is this: Yes, another AI Winter will be here if you don’t deploy more ML solutions. You and your Data Science teams are the last line of defense against the AI Winter. You need to solve five key challenges to keep the momentum up.

article thumbnail

3 Reasons to attend Data Natives, 25-26 November, Berlin

KDnuggets

Data Natives is an outstanding conference that lets you meet many talented Data Scientists and Data Professionals. Find your dream company or your dream employee and level up for 2020. Use code DN19_KDNuggets_50 to save.

Data 85
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

How to Become a Successful Healthcare Data Analyst

KDnuggets

Are you interested in starting your career in the data analysis domain? Read this informative blog on how to get your career off the ground.

article thumbnail

Top October Stories: How to Become a (Good) Data Scientist; Everything a Data Scientist Should Know About Data Management; The Last SQL Guide for Data Analysis

KDnuggets

Also A European Approach to Master's Degrees in Data Science; How YouTube is Recommending Your Next Video.

article thumbnail

What is Data Science?

KDnuggets

Data Science is pitched as a modern and exciting job offering high satisfaction. Does its reality really live up to the hype? Here, we show what it's really like to work as a Data Scientist.

article thumbnail

KDnuggets™ News 19:n42, Nov 6: 5 Statistical Traps Data Scientists Should Avoid; 10 Free Must-Read Books on AI

KDnuggets

Learn about statistical fallacies Data Scientists should avoid; New and quite amazing Deep Learning capabilities FB has been quietly open-sourcing; Top Machine Learning tools for Developers; How to build a Neural Network from scratch and more.

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

An Eight-Step Checklist for An Analytics Project

KDnuggets

Follow these eight headings of an audit sheet that business analysts should address before submitting the results of their analytics project. One recommended approach is to rewrite each step as a question, answer it, and then attach it to your project.

Project 59
article thumbnail

Meet Neebo: The Virtual Analytics Hub

KDnuggets

Neebo is a SaaS solution that enables analytics teams to connect to, find, combine and collaborate on trusted data assets in hybrid cloud landscapes, and provides a unified access point where they can more effectively leverage all their analytics assets and knowledge. In this blog, we will highlight some of the features of Neebo and how they can completely transform the way analytics teams operate.

Cloud 57
article thumbnail

Top KDnuggets tweets, Oct 30 – Nov 05: Everything a Data Scientist Should Know About Data Management

KDnuggets

Which Data Science Skills are core and which are hot/emerging ones?; The 4 Quadrants of Data Science Skills and 7 Principles for Creating a Viral DataViz; Microsoft open sources #SandDance, a visual data exploration tool.

article thumbnail

Monitoring Models at Scale

KDnuggets

Catch this Domino webinar on monitoring models at scale, Dec 11 @ 10am PT, covering detecting changes in pattern of real-world data your models are seeing in production, tracking how model accuracy and other quality metrics are changing over time, and getting alerted when health checks fail so that resolution workflows can be triggered.

Data 51
article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m