Sat.Sep 21, 2019 - Fri.Sep 27, 2019

article thumbnail

5 Famous Deep Learning Courses/Schools of 2019

KDnuggets

Deep Learning is/has become the hottest skill in Data Science at the moment. There is a plethora of articles, courses, technologies, influencers and resources that we can leverage to gain the Deep Learning skills.

article thumbnail

Every Company is Becoming a Software Company

Confluent

In 2011, Marc Andressen wrote an article called Why Software is Eating the World. The central idea is that any process that can be moved into software, will be. This has become a kind of shorthand for the investment thesis behind Silicon Valley’s current wave of unicorn startups. It’s also a unifying idea behind the larger set of technology trends we see today, such as machine learning, IoT, ubiquitous mobile connectivity, SaaS, and cloud computing.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Open Source Object Storage For All Of Your Data

Data Engineering Podcast

Summary Object storage is quickly becoming the unifying layer for data intensive applications and analytics. Modern, cloud oriented data warehouses and data lakes both rely on the durability and ease of use that it provides. S3 from Amazon has quickly become the de-facto API for interacting with this service, so the team at MinIO have built a production grade, easy to manage storage engine that replicates that interface.

AWS 100
article thumbnail

Evolving Regional Evacuation

Netflix Tech

Niosha Behnam | Demand Engineering @ Netflix At Netflix we prioritize innovation and velocity in pursuit of the best experience for our 150+ million global customers. This means that our microservices constantly evolve and change, but what doesn’t change is our responsibility to provide a highly available service that delivers 100+ million hours of daily streaming to our subscribers.

article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

12 Deep Learning Researchers and Leaders

KDnuggets

Our list of deep learning researchers and industry leaders are the people you should follow to stay current with this wildly expanding field in AI. From early practitioners and established academics to entrepreneurs and today’s top corporate influencers, this diverse group of individuals is leading the way into tomorrow’s deep learning landscape.

article thumbnail

Real-Time Analytics and Monitoring Dashboards with Apache Kafka and Rockset

Confluent

In the early days, many companies simply used Apache Kafka ® for data ingestion into Hadoop or another data lake. However, Apache Kafka is more than just messaging. The significant difference today is that companies use Apache Kafka as an event streaming platform for building mission-critical infrastructures and core operations platforms. Examples include microservice architectures, mainframe integration, instant payment, fraud detection, sensor analytics, real-time monitoring, and many more—dri

Kafka 21

More Trending

article thumbnail

Scaling a Mature Data Pipeline?—?Managing Overhead

Airbnb Tech

Scaling a Mature Data Pipeline — Managing Overhead There is often a hidden performance cost tied to the complexity of data pipelines — the overhead. In this post, we will introduce its concept, and examine the techniques we use to avoid it in our data pipelines. Author : Zachary Ennenga The view from the third floor at Airbnb HQ! Background There is often a natural evolution in the tooling, organization, and technical underpinning of data pipelines.

article thumbnail

A Single Function to Streamline Image Classification with Keras

KDnuggets

We show, step-by-step, how to construct a single, generalized, utility function to pull images automatically from a directory and train a convolutional neural net model.

Utilities 123
article thumbnail

How to Make the Most of Kafka Summit San Francisco 2019

Confluent

Kafka Summit San Francisco is just one week away. Conferences can be busy affairs, so here are some tips on getting the most out of your time there. Plan. Go and check out the schedule. Spend a bit of time familiarising yourself with what sessions you want to get to, and mark them on your calendar. How do you pick which sessions to attend? My advice: diversify!

Kafka 18
article thumbnail

Why Clean Data is Critical for Your Business

Teradata

Clean data is critical to your business. Find out what three things you need to know about clean data for the health of your organization. Read more.

Data 10
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

What is Hierarchical Clustering?

KDnuggets

The article contains a brief introduction to various concepts related to Hierarchical clustering algorithm.

Algorithm 122
article thumbnail

Beta Distribution: What, When & How

KDnuggets

This article covers the beta distribution, and explains it using baseball batting averages.

IT 121
article thumbnail

Automatic Version Control for Data Scientists

KDnuggets

How can you keep your machine learning models and data organized so you can collaborate effectively? Discover this new tool set available for better version control designed for the data scientist workflow.

article thumbnail

Natural Language in Python using spaCy: An Introduction

KDnuggets

This article provides a brief introduction to working with natural language (sometimes called “text analytics”) in Python using spaCy and related libraries.

Python 120
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

A 2019 Guide for Automatic Speech Recognition

KDnuggets

In this article, we’ll look at a couple of papers aimed at solving the problem of automated speech recognition with machine and deep learning.

article thumbnail

The Future of Analytics and Data Science

KDnuggets

Learn about the the current and future issues of data science and possible solutions from this interview with IADSS Co-founder, Dr. Usama Fayyad following his keynote speech at ODSC Boston 2019.

article thumbnail

6 bits of advice for Data Scientists

KDnuggets

As a data scientist, you can get lost in your daily dives into the data. Consider this advice to be certain to follow in your work for being diligent and more impactful for your organization.

Data 118
article thumbnail

Why data analysts should choose stories over statistics

KDnuggets

Join the Crunch Data Conference in Budapest, Oct 16-18, with stellar speakers from companies like Facebook, Netflix and LinkedIn. Use the discount code ‘KDNuggets’ to save $100 off your conference ticket.

Data 113
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Customer Segmentation for R Users

KDnuggets

This article shows you how to separate your customers into distinct groups based on their purchase behavior. For the R enthusiasts out there, I demonstrated what you can do with r/stats, ggradar, ggplot2, animation, and factoextra.

109
109
article thumbnail

Using Time Series Encodings to Discover Baseball History’s Most Interesting Seasons

KDnuggets

Take me out to the ballgame! Take me out to the crowd! For the 2,829 seasons that have been played for 101 baseball teams since 1880, which seasons were unlike any others? Using SAX Encoding to recognize patterns in time series data, the most special years in baseball can be found.

Data 107
article thumbnail

Introducing IceCAPS: Microsoft’s Framework for Advanced Conversation Modeling

KDnuggets

The new open source framework that brings multi-task learning to conversational agents.

107
107
article thumbnail

Data Mapping Using Machine Learning

KDnuggets

Data mapping is a way to organize various bits of data into a manageable and easy-to-understand system.

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Webinar: Build auto-adaptive machine learning models with Kubernetes

KDnuggets

This live webinar, Oct 2 2019, will instruct data scientists and machine learning engineers how to build manage and deploy auto-adaptive machine learning models in production. Save your spot now.

article thumbnail

The thin line between data science and data engineering

KDnuggets

Today, as companies have finally come to understand the value that data science can bring, more and more emphasis is being placed on the implementation of data science in production systems. And as these implementations have required models that can perform on larger and larger datasets in real-time, an awful lot of data science problems have become engineering problems.

article thumbnail

Help Your Career Survive ‘DataGeddon’

KDnuggets

Penn State’s fully online data analytics program uniquely prepares students to advance their career in data science. Penn State offers 3 intakes every year and reviews applications on a rolling basis. GMAT or GRE waivers are available to highly qualified candidates. Learn more now.

article thumbnail

Getting to the Future First: How Social Data is Transforming Trend Discovery

KDnuggets

Register now for this webinar, Sep 25 @ 12 PM ET, for a clear approach on how to apply machine learning language technology to massive, unstructured data sets in order to create predictive models of what may be the next “it” ingredient, color, flavor or pack size.

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Top Stories, Sep 16-22: Which Data Science Skills are core and which are hot/emerging ones?

KDnuggets

Also: Explore the world of Bioinformatics with Machine Learning; My journey path from a Software Engineer to BI Specialist to a Data Scientist; 5 Beginner Friendly Steps to Learn Machine Learning and Data Science with Python; 10 Great Python Resources for Aspiring Data Scientists.

article thumbnail

Schema Validation with Confluent 5.4-preview

Confluent

Robust data governance support through Schema Validation on write is now supported in Confluent Platform 5.4-preview. This gives operators a centralized location to enforce data format correctness within Confluent Platform. Enforcing data correctness on write is the first step towards enabling centralized policy enforcement and data governance within your event streaming platform.

Kafka 16
article thumbnail

Data Quality Assessment Is Not All Roses. What Challenges Should You Be Aware Of?

KDnuggets

Of all data quality characteristics, we consider consistency and accuracy to be the most difficult ones to measure. Here, we describe the challenges that you may encounter and the ways to overcome them.

Data 81
article thumbnail

AI World Conference & Expo, Oct 23-25, Boston – Updated Agenda and Special KDnuggets Discount

KDnuggets

AI World Conference & Expo has become the industry’s largest independent business event focused on the state of the practice of AI in the enterprise. Join us in Boston, Oct 23-25. Use the discount code 1968-KDN and SAVE $200.

Coding 61
article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m