Sat.Dec 07, 2019 - Fri.Dec 13, 2019

article thumbnail

Productionizing Distributed XGBoost to Train Deep Tree Models with Large Data Sets at Uber

Uber Engineering

Michelangelo , Uber’s machine learning (ML) platform, powers machine learning model training across various use cases at Uber, such as forecasting rider demand , fraud detection , food discovery and recommendation for Uber Eats , and improving the accuracy of … The post Productionizing Distributed XGBoost to Train Deep Tree Models with Large Data Sets at Uber appeared first on Uber Engineering Blog.

Food 123
article thumbnail

Build Pipelines with Pandas Using pdpipe

KDnuggets

We show how to build intuitive and useful pipelines with Pandas DataFrame using a wonderful little library called pdpipe.

Building 123
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

SnowflakeDB: The Data Warehouse Built For The Cloud

Data Engineering Podcast

Summary Data warehouses have gone through many transformations, from standard relational databases on powerful hardware, to column oriented storage engines, to the current generation of cloud-native analytical engines. SnowflakeDB has been leading the charge to take advantage of cloud services that simplify the separation of compute and storage. In this episode Kent Graziano, chief technical evangelist for SnowflakeDB, explains how it is differentiated from other managed platforms and traditiona

article thumbnail

Transferring Avro Schemas Across Schema Registries with Kafka Connect

Confluent

Although starting out with one Confluent Schema Registry deployment per development environment is straightforward, over time, a company may scale and begin migrating data to a cloud environment (such as […].

Kafka 18
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Netflix Hack Day?—?November 2019

Netflix Tech

Netflix Hack Day?—?Fall 2019 By Tom Richards , Carenina Garcia Motion , and Leslie Posada Hack Day at Netflix is an opportunity to build and show off a feature, tool, or quirky app. The goal is simple: experiment with new ideas/technologies, engage with colleagues across different disciplines, and have fun! We know even the silliest idea can spur something more.

article thumbnail

Plotnine: Python Alternative to ggplot2

KDnuggets

Python's plotting libraries such as matplotlib and seaborn does allow the user to create elegant graphics as well, but lack of a standardized syntax for implementing the grammar of graphics compared to the simple, readable and layering approach of ggplot2 in R makes it more difficult to implement in Python.

Python 123

More Trending

article thumbnail

What Data Engineers Think About - Variety, Volume, Velocity and Real-Time Analytics

Rockset

As a data engineer, my time is spent either moving data from one place to another, or preparing it for exposure to either reporting tools or front end users. As data collection and usage have become more sophisticated, the sources of data have become a lot more varied and disparate, volumes have grown and velocity has increased. Variety, Volume and Velocity were popularised as the three Vs of Big Data and in this post I’m going to talk about my considerations for each when selecting technologies

article thumbnail

Netflix Hack Day?—?November 2019

Netflix Tech

Netflix Hack Day?—?Fall 2019 By Tom Richards , Carenina Garcia Motion , and Leslie Posada Hack Day at Netflix is an opportunity to build and show off a feature, tool, or quirky app. The goal is simple: experiment with new ideas/technologies, engage with colleagues across different disciplines, and have fun! We know even the silliest idea can spur something more.

article thumbnail

The 4 Hottest Trends in Data Science for 2020

KDnuggets

The field of Data Science is growing with new capabilities and reach into every industry. With digital transformations occurring in organizations around the world, 2019 included trends of more companies leveraging more data to make better decisions. Check out these next trends in Data Science expected to take off in 2020.

article thumbnail

Data Analytics: How to Know the Right Business Questions to Ask

Teradata

Identifying and focusing on priority analytic use cases within your organization will ensure you are asking the right business questions. Find out more.

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

5 Great New Features in Latest Scikit-learn Release

KDnuggets

From not sweating missing values, to determining feature importance for any estimator, to support for stacking, and a new plotting API, here are 5 new features of the latest release of Scikit-learn which deserve your attention.

article thumbnail

Moving Predictive Maintenance from Theory to Practice

KDnuggets

Here are four common hurdles that need to be overcome before tapping into the benefits of predictive maintenance.

article thumbnail

DeepMind Unveils MuZero, a New Agent that Mastered Chess, Shogi, Atari and Go Without Knowing the Rules

KDnuggets

The new model showed great improvements over the previous AlphaZero agent.

115
115
article thumbnail

AI, Analytics, Machine Learning, Data Science, Deep Learning Technology Main Developments in 2019 and Key Trends for 2020

KDnuggets

We asked leading experts - what are the most important developments of 2019 and 2020 key trends in AI, Analytics, Machine Learning, Data Science, and Deep Learning? This blog focuses mainly on technology and deployment.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Math for Programmers!

KDnuggets

Math for Programmers teaches you the math you need to know for a career in programming, concentrating on what you need to know as a developer.

article thumbnail

Intro to Grafana: Installation, Configuration, and Building the First Dashboard

KDnuggets

One of the biggest highlights of Grafana is the ability to bring several data sources together in one dashboard with adding rows that will host individual panels. Let's look at installing, configuring, and creating our first dashboard using Grafana.

article thumbnail

What just happened in the world of AI?

KDnuggets

The speed at which AI made advancements and news during 2019 makes it imperative now to step back and place these events into order and perspective. It's important to separate the interest that any one advancement initially attracts, from its actual gravity and its consequential influence on the field. This review unfolds the parallel threads of these AI stories over this year and isolates their significance.

IT 97
article thumbnail

Python Dictionary and Dictionary Methods

KDnuggets

Check out this introduction to creating, accessing, and updating dictionaries in Python.

Python 94
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

How To “Ultralearn” Data Science, Part 1

KDnuggets

What is "ultralearning" and how can you follow the strategy to become an expert of data science? Start with this first part in a series that will guide you through this self-motivated methodology to help you efficiently master difficult skills.

article thumbnail

AI, Analytics, Machine Learning, Data Science, Deep Learning Research Main Developments in 2019 and Key Trends for 2020

KDnuggets

As we say goodbye to one year and look forward to another, KDnuggets has once again solicited opinions from numerous research & technology experts as to the most important developments of 2019 and their 2020 key trend predictions.

article thumbnail

Interpretability: Cracking open the black box, Part 2

KDnuggets

The second part in a series on leveraging techniques to take a look inside the black box of AI, this guide considers post-hoc interpretation that is useful when the model is not transparent.

Python 81
article thumbnail

Deploying a pretrained GPT-2 model on AWS

KDnuggets

This post attempts to summarize my recent detour into NLP, describing how I exposed a Huggingface pre-trained Language Model (LM) on an AWS-based web application.

AWS 78
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Deployment of Machine learning models using Flask

KDnuggets

This blog will explain the basics of deploying a machine learning algorithm, focusing on developing a Naïve Bayes model for spam message identification, and using Flask to create an API for that model.

article thumbnail

Top Stories, Dec 2-8: How to Speed up Pandas by 4x with one line of code; 10 Free Top Notch Machine Learning Courses

KDnuggets

Also: Data Science Curriculum Roadmap; Enabling the Deep Learning Revolution; The Essential Toolbox for Data Cleaning; A Non-Technical Reading List for Data Science; The Future of Careers in Data Science & Analysis.

article thumbnail

Scalable graph machine learning: a mountain we can climb?

KDnuggets

Graph machine learning is a developing area of research that brings many complexities. One challenge that both fascinates and infuriates those working with graph algorithms is — scalability. We take a close look at scalability for graph machine learning methods covering what it is, what makes it difficult, and an example of a method that tackles it head-on.

article thumbnail

KDD 2020 Call for Research, Applied Data Science Papers

KDnuggets

ACM SIGKDD Invites Industry and Academic Experts to Submit Advancements in Data Mining, Knowledge Discovery and Machine Learning for 26 th Annual Conference in San Diego.

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

NeurIPS 2019 Outstanding Paper Awards

KDnuggets

NeurIPS 2019 is underway in Vancouver, and the committee has just recently announced this year's Outstanding Paper Awards. Find out what the selections were, along with some additional info on NeurIPS papers, here.

59
article thumbnail

Top November Stories: How to Speed up Pandas by 4x with one line of code

KDnuggets

Also: 10 Free Must-read Books on AI; Data Science for Managers: Programming Languages; The Complete Data Science LinkedIn Profile Guide.

article thumbnail

Top KDnuggets tweets, Dec 04-10: AI, Analytics, Machine Learning, Data Science, Deep Learning Research Main Developments in 2019 and Key Trends for 2020

KDnuggets

AI, Analytics, Machine Learning, Data Science, Deep Learning Research Main Developments and Key Trends; Down with technical debt! Clean #Python for #DataScientists; Calculate Similarity?-?the most relevant Metrics in a Nutshell.

article thumbnail

Dusting Under the Bed: Machine Learners’ Responsibility for the Future of Our Society

KDnuggets

The Machine Learning community must shape the world so that AI is built and implemented with a focus on the entire outcome for our society, and not just optimized for accuracy and/or profit.

article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m