Sat.Jan 11, 2020 - Fri.Jan 17, 2020

article thumbnail

Top 9 Mobile Apps for Learning and Practicing Data Science

KDnuggets

This article will tell you about the top 9 mobile apps that help the user in learning and practicing data science and hence is improving their productivity.

article thumbnail

Engineering SQL Support on Apache Pinot at Uber

Uber Engineering

Uber leverages real-time analytics on aggregate data to improve the user experience across our products, from fighting fraudulent behavior on Uber Eats to forecasting demand on our platform. . As Uber’s operations became more complex and we offered additional features and … The post Engineering SQL Support on Apache Pinot at Uber appeared first on Uber Engineering Blog.

SQL 141
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Streams and Tables in Apache Kafka: Elasticity, Fault Tolerance, and Other Advanced Concepts

Confluent

Now that we’ve learned about the processing layer of Apache Kafka® by looking at streams and tables, as well as the architecture of distributed processing with the Kafka Streams API […].

Kafka 26
article thumbnail

Planet Scale SQL For The New Generation Of Applications With YugabyteDB

Data Engineering Podcast

Summary The modern era of software development is identified by ubiquitous access to elastic infrastructure for computation and easy automation of deployment. This has led to a class of applications that can quickly scale to serve users worldwide. This requires a new class of data storage which can accomodate that demand without having to rearchitect your system at each level of growth.

SQL 100
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Top 10 Technology Trends for 2020

KDnuggets

With integrations of multiple emerging technologies just in the past year, AI development continues at a fast pace. Following the blueprint of science and technology advancements in 2019, we predict 10 trends we expect to see in 2020 and beyond.

article thumbnail

Not Just SQL Anymore! Using R and Python with Vantage

Teradata

Learn about the different ways to use R and Python with Vantage and the pros and cons of each option. Read more from our Teradata expert.

Python 80

More Trending

article thumbnail

Simulating Cohorts

Grouparoo

In the last post , I made a case that the way to make the biggest difference in a metric like retention is to increase how many tests you can run each month. It turns out, going from 1 to 4 tests a month makes a huge difference, especially as those cohorts build on each other over time. To prove this out, I built a spreadsheet. Because I learned even more from creating the spreadsheet itself than writing the blog post, I thought I'd give those learnings some airtime, too.

article thumbnail

The Future of Machine Learning

KDnuggets

This summary overviews the keynote at TensorFlow World by Jeff Dean, Head of AI at Google, that considered the advancements of computer vision and language models and predicted the direction machine learning model building should follow for the future.

article thumbnail

SQL API for Real-Time Kafka Analytics in 3 Steps

Rockset

In this blog we will set up a real-time SQL API on Kafka using AWS Lambda and Rockset. At the time of writing (in early 2020) the San Francisco 49ers are doing remarkably well! To honor their success, we will focus on answering the following question. What are the most popular hashtags in tweets that mentioned the 49ers in the last 20 minutes? Because Twitter moves fast, we will only look at very recent tweets.

Kafka 40
article thumbnail

Streams and Tables in Apache Kafka: Topics, Partitions, and Storage Fundamentals

Confluent

Part 1 of this series discussed the basic elements of an event streaming platform: events, streams, and tables. We also introduced the stream-table duality and learned why it is a […].

Kafka 94
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Handling Trees in Data Science Algorithmic Interview

KDnuggets

This post is about fast-tracking the study and explanation of tree concepts for the data scientists so that you breeze through the next time you get asked these in an interview.

Algorithm 141
article thumbnail

Math for Programmers!

KDnuggets

Math for Programmers teaches you the math you need to know for a career in programming, concentrating on what you need to know as a developer.

article thumbnail

Decision Tree Algorithm, Explained

KDnuggets

All you need to know about decision trees and how to build and optimize decision tree classifier.

Algorithm 123
article thumbnail

Classify A Rare Event Using 5 Machine Learning Algorithms

KDnuggets

Which algorithm works best for unbalanced data? Are there any tradeoffs?

Algorithm 122
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Idiot’s Guide to Precision, Recall, and Confusion Matrix

KDnuggets

Building Machine Learning models is fun, but making sure we build the best ones is what makes a difference. Follow this quick guide to appreciate how to effectively evaluate a classification model, especially for projects where accuracy alone is not enough.

article thumbnail

Geovisualization with Open Data

KDnuggets

In this post I want to show how to use public available (open) data to create geo visualizations in python. Maps are a great way to communicate and compare information when working with geolocation data. There are many frameworks to plot maps, here I focus on matplotlib and geopandas (and give a glimpse of mplleaflet).

Python 109
article thumbnail

Uber Creates Generative Teaching Networks to Better Train Deep Neural Networks

KDnuggets

The new technique can really improve how deep learning models are trained at scale.

article thumbnail

Graph Machine Learning Meets UX: An uncharted love affair

KDnuggets

When machine learning tools are developed by technology first, they risk failing to deliver on what users actually need. It can also be difficult for development teams to establish meaningful direction. This article explores the challenges of designing an interface that enables users to visualise and interact with insights from graph machine learning, and explores the very new, uncharted relationship between machine learning and UX.

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Schema Evolution in Data Lakes

KDnuggets

Whereas a data warehouse will need rigid data modeling and definitions, a data lake can store different types and shapes of data. In a data lake, the schema of the data can be inferred when it’s read, providing the aforementioned flexibility. However, this flexibility is a double-edged sword.

article thumbnail

Streams and Tables in Apache Kafka: Processing Fundamentals with Kafka Streams and ksqlDB

Confluent

Part 2 of this series discussed in detail the storage layer of Apache Kafka: topics, partitions, and brokers, along with storage formats and event partitioning. Now that we have this […].

Kafka 17
article thumbnail

Methods, challenges & applications of Deep Learning | Munich 11-12 May

KDnuggets

Visit Deep Learning World, 11-12 May in Munich, to broaden your knowledge, deepen your understanding and discuss your questions with other Deep Learning experts!

article thumbnail

Top KDnuggets tweets, Jan 08-14: A Beginners Guide to Data Engineering — Part I

KDnuggets

Also: The Book to Start You on Machine Learning - KDnuggets; Top KDnuggets tweets, Jan 1-7: Introduction to #DataVisualization and Storytelling: A Guide For The #DataScientist #eBook; 7 Steps to a Job-winning Data Science Resume - KDnuggets; Tips for open-sourcing research code.

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

7 AI Use Cases Transforming Live Sports Production and Distribution

KDnuggets

Here are 7 powerful AI led use cases both for linear television and for OTT apps that are transforming the live sports production landscape.

article thumbnail

Top Stories, Jan 6-12: Top 5 must-have Data Science skills for 2020; 7 Resources to Becoming a Data Engineer

KDnuggets

Also: The Book to Start You on Machine Learning; An Introductory Guide to NLP for Data Scientists with 7 Common Techniques; A Comprehensive Guide to Natural Language Generation; The Book to Start You on Machine Learning; 10 Python Tips and Tricks You Should Learn Today.

article thumbnail

Survey Segmentation Tutorial

KDnuggets

Learn the basics of verifying segmentation, analyzing the data, and creating segments in this tutorial. When reviewing survey data, you will typically be handed Likert questions (e.g., on a scale of 1 to 5), and by using a few techniques, you can verify the quality of the survey and start grouping respondents into populations.

Data 65
article thumbnail

Statistical Thinking for Industrial Problem Solving: a free online course.

KDnuggets

This online course is available – for free – to anyone interested in building practical skills in using data to solve problems better.

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

KDnuggets™ News 20:n02, Jan 15: Top 5 Must-have Data Science Skills; Learn Machine Learning with THIS Book

KDnuggets

This week: learn the 5 must-have data science skills for the new year; find out which book is THE book to get started learning machine learning; pick up some Python tips and tricks; learn SQL, but learn it the hard way; and find an introductory guide to learning common NLP techniques.

article thumbnail

Disentangling disentanglement: Ideas from NeurIPS 2019

KDnuggets

This year’s NEURIPS-2019 Vancouver conference recently concluded and featured a dozen papers on disentanglement in deep learning. What is this idea and why is it so interesting in machine learning? This summary of these papers will give you initial insight in disentanglement as well as ideas on what you can explore next.