Sat.Nov 02, 2019 - Fri.Nov 08, 2019

article thumbnail

Automating Your Production Dataflows On Spark

Data Engineering Podcast

Summary As data engineers the health of our pipelines is our highest priority. Unfortunately, there are countless ways that our dataflows can break or degrade that have nothing to do with the business logic or data transformations that we write and maintain. Sean Knapp founded Ascend to address the operational challenges of running a production grade and scalable Spark infrastructure, allowing data engineers to focus on the problems that power their business.

article thumbnail

10 Free Must-read Books on AI

KDnuggets

Artificial Intelligence continues to fill the media headlines while scientists and engineers rapidly expand its capabilities and applications. With such explosive growth in the field, there is a great deal to learn. Dive into these 10 free books that are must-reads to support your AI study and work.

Media 123
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

GraphQL Search Indexing

Netflix Tech

by Artem Shtatnov and Ravi Srinivas Ranganathan Almost a year ago we described our learnings from adopting GraphQL on the Netflix Marketing Tech team. We have a lot more to share since then! There are plenty of existing resources describing how to express a search query in GraphQL and paginate the results. This post looks at the other side of search: how to index data and make it searchable.

Kafka 97
article thumbnail

Introducing Confluent Cloud on Microsoft Azure

Confluent

Today, we are proud to make Confluent Cloud available to companies leveraging the Microsoft Azure ecosystem of services, in addition to the previous rollouts on Google Cloud Platform (GCP) and […].

Cloud 81
article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

Tutorial: Building An Analytics Data Pipeline In Python

Dataquest

If you’ve ever wanted to learn Python online with streaming data, or data that changes quickly, you may be familiar with the concept of a data pipeline. Data pipelines allow you transform data from one representation to another through a series of steps. Data pipelines are a key part of data engineering, which we teach in our new Data Engineer Path.

article thumbnail

How to Create a Vocabulary for NLP Tasks in Python

KDnuggets

This post will walkthrough a Python implementation of a vocabulary class for storing processed text data and related metadata in a manner useful for subsequently performing NLP tasks.

Python 113

More Trending

article thumbnail

How to Use Single Message Transforms in Kafka Connect

Confluent

Kafka Connect is the part of Apache Kafka® that provides reliable, scalable, distributed streaming integration between Apache Kafka and other systems. Kafka Connect has connectors for many, many systems, and […].

Kafka 75
article thumbnail

Analytics on Kafka Event Streams Using Druid, Elasticsearch and Rockset

Rockset

Events are messages that are sent by a system to notify operators or other systems about a change in its domain. With event-driven architectures powered by systems like Apache Kafka becoming more prominent, there are now many applications in the modern software stack that make use of events and messages to operate effectively. In this blog, we will examine the use of three different data backends for event data - Apache Druid , Elasticsearch and Rockset.

Kafka 40
article thumbnail

Customer Segmentation Using K Means Clustering

KDnuggets

Customer Segmentation can be a powerful means to identify unsatisfied customer needs. This technique can be used by companies to outperform the competition by developing uniquely appealing products and services.

Python 104
article thumbnail

GraphQL Search Indexing

Netflix Tech

by Artem Shtatnov and Ravi Srinivas Ranganathan Almost a year ago we described our learnings from adopting GraphQL on the Netflix Marketing Tech team. We have a lot more to share since then! There are plenty of existing resources describing how to express a search query in GraphQL and paginate the results. This post looks at the other side of search: how to index data and make it searchable.

Kafka 40
article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Three Distinctly Different Customer Experience Strategies

Teradata

Improving the customer experience is the top priority for CMOs. Find out what the top 3 distinct CX strategies are to drive customer loyalty.

40
article thumbnail

Designing Your Neural Networks

KDnuggets

Check out this step-by-step walk through of some of the more confusing aspects of neural nets to guide you to making smart decisions about your neural network architecture.

Designing 102
article thumbnail

Set Operations Applied to Pandas DataFrames

KDnuggets

In this tutorial, we show how to apply mathematical set operations (union, intersection, and difference) to Pandas DataFrames with the goal of easing the task of comparing the rows of two datasets.

article thumbnail

Facebook Has Been Quietly Open Sourcing Some Amazing Deep Learning Capabilities for PyTorch

KDnuggets

The new release of PyTorch includes some impressive open source projects for deep learning researchers and developers.

article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Understanding Boxplots

KDnuggets

A boxplot. It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed.

IT 91
article thumbnail

Data Cleaning and Preprocessing for Beginners

KDnuggets

Careful preprocessing of data for your machine learning project is crucial. This overview describes the process of data cleaning and dealing with noise and missing data.

article thumbnail

Orchestrating Dynamic Reports in Python and R with Rmd Files

KDnuggets

Do you want to extract csv files with Python and visualize them in R? How does preparing everything in R and make conclusions with Python sound? Both are possible if you know the right libraries and techniques. Here, we’ll walk through a use-case using both languages in one analysis.

Python 88
article thumbnail

3 Reasons to attend Data Natives, 25-26 November, Berlin

KDnuggets

Data Natives is an outstanding conference that lets you meet many talented Data Scientists and Data Professionals. Find your dream company or your dream employee and level up for 2020. Use code DN19_KDNuggets_50 to save.

Data 79
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

The Last Defense Against Another AI Winter

KDnuggets

My short answer is this: Yes, another AI Winter will be here if you don’t deploy more ML solutions. You and your Data Science teams are the last line of defense against the AI Winter. You need to solve five key challenges to keep the momentum up.

article thumbnail

Probability Learning: Maximum Likelihood

KDnuggets

The maths behind Bayes will be better understood if we first cover the theory and maths underlying another fundamental method of probabilistic machine learning: Maximum Likelihood. This post will be dedicated to explaining it.

article thumbnail

Research Guide: Advanced Loss Functions for Machine Learning Models

KDnuggets

This guide explores research centered on a variety of advanced loss functions for machine learning models.

article thumbnail

What is Data Science?

KDnuggets

Data Science is pitched as a modern and exciting job offering high satisfaction. Does its reality really live up to the hype? Here, we show what it's really like to work as a Data Scientist.

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Meet Neebo: The Virtual Analytics Hub

KDnuggets

Neebo is a SaaS solution that enables analytics teams to connect to, find, combine and collaborate on trusted data assets in hybrid cloud landscapes, and provides a unified access point where they can more effectively leverage all their analytics assets and knowledge. In this blog, we will highlight some of the features of Neebo and how they can completely transform the way analytics teams operate.

Cloud 51
article thumbnail

How to Become a Successful Healthcare Data Analyst

KDnuggets

Are you interested in starting your career in the data analysis domain? Read this informative blog on how to get your career off the ground.

article thumbnail

An Eight-Step Checklist for An Analytics Project

KDnuggets

Follow these eight headings of an audit sheet that business analysts should address before submitting the results of their analytics project. One recommended approach is to rewrite each step as a question, answer it, and then attach it to your project.

Project 49
article thumbnail

KDnuggets™ News 19:n42, Nov 6: 5 Statistical Traps Data Scientists Should Avoid; 10 Free Must-Read Books on AI

KDnuggets

Learn about statistical fallacies Data Scientists should avoid; New and quite amazing Deep Learning capabilities FB has been quietly open-sourcing; Top Machine Learning tools for Developers; How to build a Neural Network from scratch and more.

article thumbnail

Apache Airflow®: The Ultimate Guide to DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Top KDnuggets tweets, Oct 30 – Nov 05: Everything a Data Scientist Should Know About Data Management

KDnuggets

Which Data Science Skills are core and which are hot/emerging ones?; The 4 Quadrants of Data Science Skills and 7 Principles for Creating a Viral DataViz; Microsoft open sources #SandDance, a visual data exploration tool.

article thumbnail

Monitoring Models at Scale

KDnuggets

Catch this Domino webinar on monitoring models at scale, Dec 11 @ 10am PT, covering detecting changes in pattern of real-world data your models are seeing in production, tracking how model accuracy and other quality metrics are changing over time, and getting alerted when health checks fail so that resolution workflows can be triggered.

Data 45
article thumbnail

Practical Computer Vision Course with Real-Life Cases, Nov 18, Washington, DC

KDnuggets

This course, Practical Computer Vision Course with Real-Life Cases, Nov 18 in Washington, DC, will move you on the next step, providing you with practical means of solving business-specific tasks.Reserve your seat now.

article thumbnail

Top Stories, Oct 28 – Nov 3: 5 Statistical Traps Data Scientists Should Avoid; Top Machine Learning Software Tools for Developers

KDnuggets

Also: Why is Machine Learning Deployment Hard?; Data Sources 101; 5 Statistical Traps Data Scientists Should Avoid; Everything a Data Scientist Should Know About Data Management; How to Become a (Good) Data Scientist — Beginner Guide.

article thumbnail

The Cloud Development Environment Adoption Report

Cloud Development Environments (CDEs) are changing how software teams work by moving development to the cloud. Our Cloud Development Environment Adoption Report gathers insights from 223 developers and business leaders, uncovering key trends in CDE adoption. With 66% of large organizations already using CDEs, these platforms are quickly becoming essential to modern development practices.