Sat.Nov 02, 2019 - Fri.Nov 08, 2019

article thumbnail

Automating Your Production Dataflows On Spark

Data Engineering Podcast

Summary As data engineers the health of our pipelines is our highest priority. Unfortunately, there are countless ways that our dataflows can break or degrade that have nothing to do with the business logic or data transformations that we write and maintain. Sean Knapp founded Ascend to address the operational challenges of running a production grade and scalable Spark infrastructure, allowing data engineers to focus on the problems that power their business.

article thumbnail

10 Free Must-read Books on AI

KDnuggets

Artificial Intelligence continues to fill the media headlines while scientists and engineers rapidly expand its capabilities and applications. With such explosive growth in the field, there is a great deal to learn. Dive into these 10 free books that are must-reads to support your AI study and work.

Media 123
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

GraphQL Search Indexing

Netflix Tech

by Artem Shtatnov and Ravi Srinivas Ranganathan Almost a year ago we described our learnings from adopting GraphQL on the Netflix Marketing Tech team. We have a lot more to share since then! There are plenty of existing resources describing how to express a search query in GraphQL and paginate the results. This post looks at the other side of search: how to index data and make it searchable.

Kafka 97
article thumbnail

Introducing Confluent Cloud on Microsoft Azure

Confluent

Today, we are proud to make Confluent Cloud available to companies leveraging the Microsoft Azure ecosystem of services, in addition to the previous rollouts on Google Cloud Platform (GCP) and […].

Cloud 81
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Tutorial: Building An Analytics Data Pipeline In Python

Dataquest

If you’ve ever wanted to learn Python online with streaming data, or data that changes quickly, you may be familiar with the concept of a data pipeline. Data pipelines allow you transform data from one representation to another through a series of steps. Data pipelines are a key part of data engineering, which we teach in our new Data Engineer Path.

article thumbnail

How to Create a Vocabulary for NLP Tasks in Python

KDnuggets

This post will walkthrough a Python implementation of a vocabulary class for storing processed text data and related metadata in a manner useful for subsequently performing NLP tasks.

Python 111

More Trending

article thumbnail

How to Use Single Message Transforms in Kafka Connect

Confluent

Kafka Connect is the part of Apache Kafka® that provides reliable, scalable, distributed streaming integration between Apache Kafka and other systems. Kafka Connect has connectors for many, many systems, and […].

Kafka 75
article thumbnail

Analytics on Kafka Event Streams Using Druid, Elasticsearch and Rockset

Rockset

Events are messages that are sent by a system to notify operators or other systems about a change in its domain. With event-driven architectures powered by systems like Apache Kafka becoming more prominent, there are now many applications in the modern software stack that make use of events and messages to operate effectively. In this blog, we will examine the use of three different data backends for event data - Apache Druid , Elasticsearch and Rockset.

Kafka 40
article thumbnail

Customer Segmentation Using K Means Clustering

KDnuggets

Customer Segmentation can be a powerful means to identify unsatisfied customer needs. This technique can be used by companies to outperform the competition by developing uniquely appealing products and services.

Python 102
article thumbnail

GraphQL Search Indexing

Netflix Tech

by Artem Shtatnov and Ravi Srinivas Ranganathan Almost a year ago we described our learnings from adopting GraphQL on the Netflix Marketing Tech team. We have a lot more to share since then! There are plenty of existing resources describing how to express a search query in GraphQL and paginate the results. This post looks at the other side of search: how to index data and make it searchable.

Kafka 40
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Three Distinctly Different Customer Experience Strategies

Teradata

Improving the customer experience is the top priority for CMOs. Find out what the top 3 distinct CX strategies are to drive customer loyalty.

40
article thumbnail

Designing Your Neural Networks

KDnuggets

Check out this step-by-step walk through of some of the more confusing aspects of neural nets to guide you to making smart decisions about your neural network architecture.

Designing 100
article thumbnail

Set Operations Applied to Pandas DataFrames

KDnuggets

In this tutorial, we show how to apply mathematical set operations (union, intersection, and difference) to Pandas DataFrames with the goal of easing the task of comparing the rows of two datasets.

article thumbnail

Facebook Has Been Quietly Open Sourcing Some Amazing Deep Learning Capabilities for PyTorch

KDnuggets

The new release of PyTorch includes some impressive open source projects for deep learning researchers and developers.

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Understanding Boxplots

KDnuggets

A boxplot. It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed.

IT 90
article thumbnail

Data Cleaning and Preprocessing for Beginners

KDnuggets

Careful preprocessing of data for your machine learning project is crucial. This overview describes the process of data cleaning and dealing with noise and missing data.

article thumbnail

Orchestrating Dynamic Reports in Python and R with Rmd Files

KDnuggets

Do you want to extract csv files with Python and visualize them in R? How does preparing everything in R and make conclusions with Python sound? Both are possible if you know the right libraries and techniques. Here, we’ll walk through a use-case using both languages in one analysis.

Python 87
article thumbnail

3 Reasons to attend Data Natives, 25-26 November, Berlin

KDnuggets

Data Natives is an outstanding conference that lets you meet many talented Data Scientists and Data Professionals. Find your dream company or your dream employee and level up for 2020. Use code DN19_KDNuggets_50 to save.

Data 78
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

The Last Defense Against Another AI Winter

KDnuggets

My short answer is this: Yes, another AI Winter will be here if you don’t deploy more ML solutions. You and your Data Science teams are the last line of defense against the AI Winter. You need to solve five key challenges to keep the momentum up.

article thumbnail

Research Guide: Advanced Loss Functions for Machine Learning Models

KDnuggets

This guide explores research centered on a variety of advanced loss functions for machine learning models.

article thumbnail

Probability Learning: Maximum Likelihood

KDnuggets

The maths behind Bayes will be better understood if we first cover the theory and maths underlying another fundamental method of probabilistic machine learning: Maximum Likelihood. This post will be dedicated to explaining it.

article thumbnail

What is Data Science?

KDnuggets

Data Science is pitched as a modern and exciting job offering high satisfaction. Does its reality really live up to the hype? Here, we show what it's really like to work as a Data Scientist.

article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

Meet Neebo: The Virtual Analytics Hub

KDnuggets

Neebo is a SaaS solution that enables analytics teams to connect to, find, combine and collaborate on trusted data assets in hybrid cloud landscapes, and provides a unified access point where they can more effectively leverage all their analytics assets and knowledge. In this blog, we will highlight some of the features of Neebo and how they can completely transform the way analytics teams operate.

Cloud 51
article thumbnail

An Eight-Step Checklist for An Analytics Project

KDnuggets

Follow these eight headings of an audit sheet that business analysts should address before submitting the results of their analytics project. One recommended approach is to rewrite each step as a question, answer it, and then attach it to your project.

Project 49
article thumbnail

KDnuggets™ News 19:n42, Nov 6: 5 Statistical Traps Data Scientists Should Avoid; 10 Free Must-Read Books on AI

KDnuggets

Learn about statistical fallacies Data Scientists should avoid; New and quite amazing Deep Learning capabilities FB has been quietly open-sourcing; Top Machine Learning tools for Developers; How to build a Neural Network from scratch and more.

article thumbnail

How to Become a Successful Healthcare Data Analyst

KDnuggets

Are you interested in starting your career in the data analysis domain? Read this informative blog on how to get your career off the ground.

article thumbnail

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Speaker: Evelyn Chou

Choosing the right business intelligence (BI) platform can feel like navigating a maze of features, promises, and technical jargon. With so many options available, how can you ensure you’re making the right decision for your organization’s unique needs? 🤔 This webinar brings together expert insights to break down the complexities of BI solution vetting.

article thumbnail

Top KDnuggets tweets, Oct 30 – Nov 05: Everything a Data Scientist Should Know About Data Management

KDnuggets

Which Data Science Skills are core and which are hot/emerging ones?; The 4 Quadrants of Data Science Skills and 7 Principles for Creating a Viral DataViz; Microsoft open sources #SandDance, a visual data exploration tool.

article thumbnail

Monitoring Models at Scale

KDnuggets

Catch this Domino webinar on monitoring models at scale, Dec 11 @ 10am PT, covering detecting changes in pattern of real-world data your models are seeing in production, tracking how model accuracy and other quality metrics are changing over time, and getting alerted when health checks fail so that resolution workflows can be triggered.

Data 45
article thumbnail

Practical Computer Vision Course with Real-Life Cases, Nov 18, Washington, DC

KDnuggets

This course, Practical Computer Vision Course with Real-Life Cases, Nov 18 in Washington, DC, will move you on the next step, providing you with practical means of solving business-specific tasks.Reserve your seat now.

article thumbnail

Top Stories, Oct 28 – Nov 3: 5 Statistical Traps Data Scientists Should Avoid; Top Machine Learning Software Tools for Developers

KDnuggets

Also: Why is Machine Learning Deployment Hard?; Data Sources 101; 5 Statistical Traps Data Scientists Should Avoid; Everything a Data Scientist Should Know About Data Management; How to Become a (Good) Data Scientist — Beginner Guide.

article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.