December, 2019

article thumbnail

10 Free Top Notch Machine Learning Courses

KDnuggets

Are you interested in studying machine learning over the holidays? This collection of 10 free top notch courses will allow you to do just that, with something for every approach to improving your machine learning skills.

article thumbnail

Uber Infrastructure in 2019: Improving Reliability, Driving Customer Satisfaction

Uber Engineering

Every day around the world, millions of trips take place across the Uber network, giving users more reliable transportation through ridesharing, bikes, and scooters, drivers and truckers additional opportunities to earn, employees and employers more convenient business travel, and hungry … The post Uber Infrastructure in 2019: Improving Reliability, Driving Customer Satisfaction appeared first on Uber Engineering Blog.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Building The DataDog Platform For Processing Timeseries Data At Massive Scale

Data Engineering Podcast

Summary DataDog is one of the most successful companies in the space of metrics and monitoring for servers and cloud infrastructure. In order to support their customers, they need to capture, process, and analyze massive amounts of timeseries data with a high degree of uptime and reliability. Vadim Semenov works on their data engineering team and joins the podcast in this episode to discuss the challenges that he works through, the systems that DataDog has built to power their business, and how

Process 100
article thumbnail

Open-Sourcing Metaflow, a Human-Centric Framework for Data Science

Netflix Tech

by David Berg , Ravi Kiran Chirravuri , Romain Cledat , Savin Goyal , Ferras Hamad , Ville Tuulos tl;dr Metaflow is now open-source! Get started at metaflow.org. Netflix applies data science to hundreds of use cases across the company, including optimizing content delivery and video encoding. Data scientists at Netflix relish our culture that empowers them to work autonomously and use their judgment to solve problems independently.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Teradata Experts on the Top Tech Predictions for 2020

Teradata

Teradata's team of experts are chiming in on their top technology and business predictions for 2020 - from AI to Customer Experience to the Cloud. Read more!

Cloud 72
article thumbnail

How Dataquest Made the Difference for Stacey’s Data Job

Dataquest

Today, Stacey Ustian is a data engineer. But the path that led her here wasn’t always easy, and there were a few bumps and twists along the way. Her journey to data science started in a rather unusual place: the law library. After earning her Master’s degree in Library and Information Science, Stacey had taken a job working in the library of a law firm.

SQL 52

More Trending

article thumbnail

Uber’s Data Platform in 2019: Transforming Information to Intelligence

Uber Engineering

Uber’s busy 2019 included our billionth delivery of an Uber Eats order, 24 million miles covered by bike and scooter riders on our platform, and trips to top destinations such as the Empire State Building, the Eiffel Tower, and the … The post Uber’s Data Platform in 2019: Transforming Information to Intelligence appeared first on Uber Engineering Blog.

Data 121
article thumbnail

Building The Materialize Engine For Interactive Streaming Analytics In SQL

Data Engineering Podcast

Summary Transactional databases used in applications are optimized for fast reads and writes with relatively simple queries on a small number of records. Data warehouses are optimized for batched writes and complex analytical queries. Between those use cases there are varying levels of support for fast reads on quickly changing data. To address that need more completely the team at Materialize has created an engine that allows for building queryable views of your data as it is continually update

SQL 100
article thumbnail

DBLog: A Generic Change-Data-Capture Framework

Netflix Tech

Andreas Andreakis , Ioannis Papapanagiotou Overview Change-Data-Capture (CDC) allows capturing committed changes from a database in real-time and propagating those changes to downstream consumers [1][2]. CDC is becoming increasingly popular for use cases that require keeping multiple heterogeneous datastores in sync (like MySQL and ElasticSearch) and addresses challenges that exist with traditional techniques like dual-writes and distributed transactions [3][4].

MySQL 87
article thumbnail

Don’t Organize for AI, Organize for Analytics

Teradata

How do you organize your business for analytics? Here are six steps your enterprise should take when creating an analytics team. Read more!

59
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

What Data Engineers Think About - Variety, Volume, Velocity and Real-Time Analytics

Rockset

As a data engineer, my time is spent either moving data from one place to another, or preparing it for exposure to either reporting tools or front end users. As data collection and usage have become more sophisticated, the sources of data have become a lot more varied and disparate, volumes have grown and velocity has increased. Variety, Volume and Velocity were popularised as the three Vs of Big Data and in this post I’m going to talk about my considerations for each when selecting technologies

article thumbnail

Data Science Curriculum Roadmap

KDnuggets

What follows is a set of broad recommendations, and it will inevitably require a lot of adjustments in each implementation. Given that caveat, here are our curriculum recommendations.

article thumbnail

Productionizing Distributed XGBoost to Train Deep Tree Models with Large Data Sets at Uber

Uber Engineering

Michelangelo , Uber’s machine learning (ML) platform, powers machine learning model training across various use cases at Uber, such as forecasting rider demand , fraud detection , food discovery and recommendation for Uber Eats , and improving the accuracy of … The post Productionizing Distributed XGBoost to Train Deep Tree Models with Large Data Sets at Uber appeared first on Uber Engineering Blog.

Food 99
article thumbnail

Solving Data Lineage Tracking And Data Discovery At WeWork

Data Engineering Podcast

Summary Building clean datasets with reliable and reproducible ingestion pipelines is completely useless if it’s not possible to find them and understand their provenance. The solution to discoverability and tracking of data lineage is to incorporate a metadata repository into your data platform. The metadata repository serves as a data catalog and a means of reporting on the health and status of your datasets when it is properly integrated into the rest of your tools.

Metadata 100
article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

Netflix Hack Day?—?November 2019

Netflix Tech

Netflix Hack Day?—?Fall 2019 By Tom Richards , Carenina Garcia Motion , and Leslie Posada Hack Day at Netflix is an opportunity to build and show off a feature, tool, or quirky app. The goal is simple: experiment with new ideas/technologies, engage with colleagues across different disciplines, and have fun! We know even the silliest idea can spur something more.

article thumbnail

Data Analytics: How to Know the Right Business Questions to Ask

Teradata

Identifying and focusing on priority analytic use cases within your organization will ensure you are asking the right business questions. Find out more.

article thumbnail

Superset Announces Elasticsearch Support!

Preset

Announcing Elasticsearch in Superset, powered by a new open-source Python library from Preset

Python 40
article thumbnail

What is the most important question for Data Science (and Digital Transformation)

KDnuggets

With so many buzzwords surrounding AI and machine learning, understanding which can bring business value and which are best left in the lab to mature is difficult. While machine learning offers significant power in driving digital transformations, a business must start with the right questions and leave the math to the development teams.

article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Exploring ksqlDB with Twitter Data

Confluent

When KSQL was released, my first blog post about it showed how to use KSQL with Twitter data. Two years later, its successor ksqlDB was born, which we announced this […].

Data 28
article thumbnail

SnowflakeDB: The Data Warehouse Built For The Cloud

Data Engineering Podcast

Summary Data warehouses have gone through many transformations, from standard relational databases on powerful hardware, to column oriented storage engines, to the current generation of cloud-native analytical engines. SnowflakeDB has been leading the charge to take advantage of cloud services that simplify the separation of compute and storage. In this episode Kent Graziano, chief technical evangelist for SnowflakeDB, explains how it is differentiated from other managed platforms and traditiona

article thumbnail

DBLog: A Generic Change-Data-Capture Framework

Netflix Tech

Andreas Andreakis , Ioannis Papapanagiotou Overview Change-Data-Capture (CDC) allows capturing committed changes from a database in real-time and propagating those changes to downstream consumers [1][2]. CDC is becoming increasingly popular for use cases that require keeping multiple heterogeneous datastores in sync (like MySQL and ElasticSearch) and addresses challenges that exist with traditional techniques like dual-writes and distributed transactions [3][4].

MySQL 81
article thumbnail

Data Analytics in the Cloud: It's Not Just Lift and Shift

Teradata

The cloud’s flexibility is becoming an essential success factor for businesses. But moving your data analytics to the cloud isn't just lift and shift. Read more.

article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

Interpretability part 3: LIME and SHAP

KDnuggets

The third part in a series on leveraging techniques to take a look inside the black box of AI, this guide considers methods that try to explain each prediction instead of establishing a global explanation.

124
124
article thumbnail

Organizing And Empowering Data Engineers At Citadel

Data Engineering Podcast

Summary The financial industry has long been driven by data, requiring a mature and robust capacity for discovering and integrating valuable sources of information. Citadel is no exception, and in this episode Michael Watson and Robert Krzyzanowski share their experiences managing and leading the data engineering teams that power the business. They shared helpful insights into some of the challenges associated with working in a regulated industry, organizing teams to deliver value rapidly and re

article thumbnail

The 4 fastest ways not to get hired as a data scientist

KDnuggets

Ready to try to get hired as a data scientist for the first time? Avoiding these common mistakes won’t guarantee an offer, but not avoiding them is a sure fire way for your application to be tossed into the trash bin.

Data 123
article thumbnail

Explainability: Cracking open the black box, Part 1

KDnuggets

What is Explainability in AI and how can we leverage different techniques to open the black box of AI and peek inside? This practical guide offers a review and critique of the various techniques of interpretability.

127
127
article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.

article thumbnail

The Essential Toolbox for Data Cleaning

KDnuggets

Increase your confidence to perform data cleaning with a broader perspective of what datasets typically look like, and follow this toolbox of code snipets to make your data cleaning process faster and more efficient.

Datasets 117
article thumbnail

Plotnine: Python Alternative to ggplot2

KDnuggets

Python's plotting libraries such as matplotlib and seaborn does allow the user to create elegant graphics as well, but lack of a standardized syntax for implementing the grammar of graphics compared to the simple, readable and layering approach of ggplot2 in R makes it more difficult to implement in Python.

Python 120
article thumbnail

A Non-Technical Reading List for Data Science

KDnuggets

The world still cannot be reduced to numbers on a page because human beings are still the ones making all the decisions. So, the best data scientists understand the numbers and the people. Check out these great data science books that will make you a better data scientist without delving into the technical details.

article thumbnail

The Ultimate Guide to Model Retraining

KDnuggets

Once you have deployed your machine learning model into production, differences in real-world data will result in model drift. So, retraining and redeploying will likely be required. In other words, deployment should be treated as a continuous process. This guide defines model drift and how to identify it, and includes approaches to enable model training.

article thumbnail

What Is Entity Resolution? How It Works & Why It Matters

Entity Resolution Sometimes referred to as data matching or fuzzy matching, entity resolution, is critical for data quality, analytics, graph visualization and AI. Learn what entity resolution is, why it matters, how it works and its benefits. Advanced entity resolution using AI is crucial because it efficiently and easily solves many of today’s data quality and analytics problems.