Sat.Mar 14, 2020 - Fri.Mar 20, 2020

article thumbnail

The 4 Best Jupyter Notebook Environments for Deep Learning

KDnuggets

Many cloud providers, and other third-party services, see the value of a Jupyter notebook environment which is why many companies now offer cloud hosted notebooks that are hosted on the cloud. Let's have a look at 3 such environments.

article thumbnail

Advanced Analytics for Coronavirus – Trends, Patterns, Predictions

Teradata

Advanced analytics and AI can significantly accelerate data processing required to get the insights, answers and recommendations to handle and address the COVID-19 pandemic.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Simplistic Ways to Find Interesting Data Sets

Team Data Science

I am taking you through my recent experience to find a dataset for my project. Industry Search To work with data, I need to narrow down the industry like health care, finance, insurance or other. I defined a few sources in my earlier blog post, which will give a sneak peek of techniques to extract industries. For Instance, most of the job listings introduce their job description as, One of the top insurance client looking for Data Engineer which exposes the industry.

article thumbnail

15 Things Every Apache Kafka Engineer Should Know About Confluent Replicator

Confluent

Single-cluster deployments of Apache Kafka® are rare. Most medium to large deployments employ more than one Kafka cluster, and even the smallest use cases include development, testing, and production clusters. […].

article thumbnail

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Speaker: Jason Chester, Director, Product Management

In today’s manufacturing landscape, staying competitive means moving beyond reactive quality checks and toward real-time, data-driven process control. But what does true manufacturing process optimization look like—and why is it more urgent now than ever? Join Jason Chester in this new, thought-provoking session on how modern manufacturers are rethinking quality operations from the ground up.

article thumbnail

Time Series Classification Synthetic vs Real Financial Time Series

KDnuggets

This article discusses distinguishing between real financial time series and synthetic time series using XGBoost.

article thumbnail

Improving Prediction of the Unconfirmed COVID-19 Cases

Teradata

With the lack of available tests & uncertainty around the true number of COVID-19 cases, Teradata Epidemiologist Daniel Ulatowski & Data Scientist Jack McCush hypothesize how symptomatic data & the Vantage ML Engine can be utilized to predict cases.

More Trending

article thumbnail

Building a Cloud ETL Pipeline on Confluent Cloud

Confluent

As enterprises move more and more of their applications to the cloud, they are also moving their on-prem ETL (extract, transform, load) pipelines to the cloud, as well as building […].

article thumbnail

What is the most effective policy response to the new coronavirus pandemic?

KDnuggets

Where Test/Trace/Quarantine are working, the number of cases/day have declined empirically. Furthermore, this appears to be a radically superior strategy where it can be deployed. I’ll review the evidence, discuss the other strategies and their consequences, and then discuss what can be done.

IT
article thumbnail

Building A New Foundation For CouchDB

Data Engineering Podcast

Summary CouchDB is a distributed document database built for scale and ease of operation. With a built-in synchronization protocol and a HTTP interface it has become popular as a backend for web and mobile applications. Created 15 years ago, it has accrued some technical debt which is being addressed with a refactored architecture based on FoundationDB.

article thumbnail

Teradata's Response to COVID-19

Teradata

How Teradata is responding to the COVID-19 crisis for the health and well-being of its employees, customers and partners.

IT
article thumbnail

Airflow Best Practices for ETL/ELT Pipelines

Speaker: Kenten Danas, Senior Manager, Developer Relations

ETL and ELT are some of the most common data engineering use cases, but can come with challenges like scaling, connectivity to other systems, and dynamically adapting to changing data sources. Airflow is specifically designed for moving and transforming data in ETL/ELT pipelines, and new features in Airflow 3.0 like assets, backfills, and event-driven scheduling make orchestrating ETL/ELT pipelines easier than ever!

article thumbnail

Announcing ksqlDB 0.8.0

Confluent

The latest ksqlDB release introduces long-awaited features such as tunable retention and grace period for windowed aggregates, new built-in functions including LATEST_BY_OFFSET, a peek at the new server API under […].

article thumbnail

When Will AutoML replace Data Scientists? Poll Results and Analysis

KDnuggets

Will AI always be 5-10 years away? The majority of respondents to this poll think that AutoML will reach expert level in 5-10 years. Interestingly, it is about the same as 5 years ago. We examine the trends by AutoML experience, industry, and region.

article thumbnail

How to Use KSQL Stream Processing and Real-Time Databases to Analyze Streaming Data in Kafka

Rockset

Intro In recent years, Kafka has become synonymous with “streaming,” and with features like Kafka Streams, KSQL, joins, and integrations into sinks like Elasticsearch and Druid, there are more ways than ever to build a real-time analytics application around streaming data in Kafka. With all of these stream processing and real-time data store options, though, also comes questions for when each should be used and what their pros and cons are.

article thumbnail

Build an Artificial Neural Network From Scratch: Part 2

KDnuggets

The second article in this series focuses on building an Artificial Neural Network using the Numpy Python library.

article thumbnail

Whats New in Apache Airflow 3.0 –– And How Will It Reshape Your Data Workflows?

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Five Interesting Data Engineering Projects

KDnuggets

As the role of the data engineer continues to grow in the field of data science, so are the many tools being developed to support wrangling all that data. Five of these tools are reviewed here (along with a few bonus tools) that you should pay attention to for your data pipeline work.

article thumbnail

A Beginner’s Guide to Data Integration Approaches in Business Intelligence

KDnuggets

An integrated BI system has a trickle-down effect on all business processes, especially reporting and analytics. Find out how integration can help you leverage the power of BI.

article thumbnail

Nine lessons learned during my first year as a Data Scientist

KDnuggets

What is it like to be a Data Scientist? There can be many hats to wear, and so many problems to solve that are fed with data, churned by data science, and guided by business results. Find out about lessons learned from one Data Scientist about how best to work and perform in the role.

article thumbnail

A Comprehensive Data Repository for Fake Health News Detection

KDnuggets

We introduce the FakeHealth, a new data repository for fake health news detection. Following a preliminary analysis to demonstrate its features, we consider additional potential directions for better identifying fake news.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

A Top Machine Learning Algorithm Explained: Support Vector Machines (SVM)

KDnuggets

Support Vector Machines (SVMs) are powerful for solving regression and classification problems. You should have this approach in your machine learning arsenal, and this article provides all the mathematics you need to know -- it's not as hard you might think.

article thumbnail

24 Best (and Free) Books To Understand Machine Learning

KDnuggets

We have compiled a list of some of the best (and free) machine learning books that will prove helpful for everyone aspiring to build a career in the field.

article thumbnail

Skynet Is Real: The History and Future of Factories With No Workers

KDnuggets

Let’s see whether robots will become "grave diggers" of the proletariat, what do we lack to get total automation, and what compromises exist.

article thumbnail

Forecasting Stories: Is it seasonality or not?

KDnuggets

Kicking off with a series of forecasting stories, starting with seasonality and its business applications. This first article speaks of course corrections that were based on weather and calendar driven seasonality.

IT
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Top KDnuggets tweets, Mar 11-17: Most western countries are on the same #coronavirus trajectory

KDnuggets

Most western countries are on the same #coronavirus trajectory; The Workers Who Face the Greatest #Coronavirus Risk; #Coronavirus, a Visual Rundown; How to start building an automated NLP solution for processing customer feedback.

article thumbnail

Top 20 ODSC 2020 Global Virtual Conference Sessions

KDnuggets

At ODSC 2020, we are unveiling our first ever 4-day Global Virtual Conference, an online and on-demand version of ODSC. Here are our picks for 20 talks that show how diverse and thorough the ODSC East Global Virtual Conference will be this April 14-17.

article thumbnail

Improving the partnership between Data Science and IT

KDnuggets

Friction can quickly arise as a result of these separate workflows and priorities. Given their differences, how can data science and IT more seamlessly work together in building a model-driven organization?

article thumbnail

Salesforce Open Sources a Framework for Open Domain Question Answering Using Wikipedia

KDnuggets

The framework uses a multi-hop QA method to answer complex questions by reasoning through Wikipedia’s datasets.

article thumbnail

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Exploring the Adoption of Python in the Workplace – Free Metis Corporate Training Webinar

KDnuggets

Metis will break down Python for data science and analytics, explain what is driving adoption in the field, and discuss how industries and companies are reacting to the shift.

article thumbnail

Scaling Your Data Strategy

KDnuggets

This article presents a particular vision for a cohesive data strategy for addressing large-scale problems with data-driven solutions, based on prior professional experiences.

article thumbnail

KDnuggets™ News 20:n11, Mar 18: Covid-19, your community, and you – a data science perspective; When Will AutoML replace Data Scientists? Poll Results and Analysis

KDnuggets

A Data Science perspective on Covid-19, the novel coronavirus; The results and analysis of a previous KDnuggets Poll: When Will AutoML replace Data Scientists? How to build a mature Machine Learning team; The Most Useful Machine Learning Tools of 2020; and more.

article thumbnail

Top Stories, Mar 9-15: New Poll: Coronavirus impact on Data Science community; Covid-19, your community, and you — a data science perspective

KDnuggets

Also: 50 Must-Read Free Books For Every Data Scientist in 2020; Decision Boundary for a Series of Machine Learning Models; 20 AI, Data Science, Machine Learning Terms You Need to Know in 2020 (Part 2).

article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate