Sat.Mar 14, 2020 - Fri.Mar 20, 2020

article thumbnail

Time Series Classification Synthetic vs Real Financial Time Series

KDnuggets

This article discusses distinguishing between real financial time series and synthetic time series using XGBoost.

Finance 158
article thumbnail

Advanced Analytics for Coronavirus – Trends, Patterns, Predictions

Teradata

Advanced analytics and AI can significantly accelerate data processing required to get the insights, answers and recommendations to handle and address the COVID-19 pandemic.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Simplistic Ways to Find Interesting Data Sets

Team Data Science

I am taking you through my recent experience to find a dataset for my project. Industry Search To work with data, I need to narrow down the industry like health care, finance, insurance or other. I defined a few sources in my earlier blog post, which will give a sneak peek of techniques to extract industries. For Instance, most of the job listings introduce their job description as, One of the top insurance client looking for Data Engineer which exposes the industry.

Insurance 130
article thumbnail

15 Things Every Apache Kafka Engineer Should Know About Confluent Replicator

Confluent

Single-cluster deployments of Apache Kafka® are rare. Most medium to large deployments employ more than one Kafka cluster, and even the smallest use cases include development, testing, and production clusters. […].

Kafka 122
article thumbnail

Apache Airflow® 101 Essential Tips for Beginners

Apache Airflow® is the open-source standard to manage workflows as code. It is a versatile tool used in companies across the world from agile startups to tech giants to flagship enterprises across all industries. Due to its widespread adoption, Airflow knowledge is paramount to success in the field of data engineering.

article thumbnail

The 4 Best Jupyter Notebook Environments for Deep Learning

KDnuggets

Many cloud providers, and other third-party services, see the value of a Jupyter notebook environment which is why many companies now offer cloud hosted notebooks that are hosted on the cloud. Let's have a look at 3 such environments.

More Trending

article thumbnail

10 Key skills, to help you become a data engineer

Start Data Engineering

This article gives you an overview of the 10 key skills you need to become a better data engineer. If you are struggling to get started on what to learn, start with the first topic and proceed through the list.

article thumbnail

Building a Cloud ETL Pipeline on Confluent Cloud

Confluent

As enterprises move more and more of their applications to the cloud, they are also moving their on-prem ETL (extract, transform, load) pipelines to the cloud, as well as building […].

Cloud 119
article thumbnail

What is the most effective policy response to the new coronavirus pandemic?

KDnuggets

Where Test/Trace/Quarantine are working, the number of cases/day have declined empirically. Furthermore, this appears to be a radically superior strategy where it can be deployed. I’ll review the evidence, discuss the other strategies and their consequences, and then discuss what can be done.

IT 157
article thumbnail

Building A New Foundation For CouchDB

Data Engineering Podcast

Summary CouchDB is a distributed document database built for scale and ease of operation. With a built-in synchronization protocol and a HTTP interface it has become popular as a backend for web and mobile applications. Created 15 years ago, it has accrued some technical debt which is being addressed with a refactored architecture based on FoundationDB.

Building 100
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Teradata's Response to COVID-19

Teradata

How Teradata is responding to the COVID-19 crisis for the health and well-being of its employees, customers and partners.

IT 59
article thumbnail

Announcing ksqlDB 0.8.0

Confluent

The latest ksqlDB release introduces long-awaited features such as tunable retention and grace period for windowed aggregates, new built-in functions including LATEST_BY_OFFSET, a peek at the new server API under […].

Process 101
article thumbnail

When Will AutoML replace Data Scientists? Poll Results and Analysis

KDnuggets

Will AI always be 5-10 years away? The majority of respondents to this poll think that AutoML will reach expert level in 5-10 years. Interestingly, it is about the same as 5 years ago. We examine the trends by AutoML experience, industry, and region.

Data 154
article thumbnail

How to Use KSQL Stream Processing and Real-Time Databases to Analyze Streaming Data in Kafka

Rockset

Intro In recent years, Kafka has become synonymous with “streaming,” and with features like Kafka Streams, KSQL, joins, and integrations into sinks like Elasticsearch and Druid, there are more ways than ever to build a real-time analytics application around streaming data in Kafka. With all of these stream processing and real-time data store options, though, also comes questions for when each should be used and what their pros and cons are.

Kafka 40
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Build an Artificial Neural Network From Scratch: Part 2

KDnuggets

The second article in this series focuses on building an Artificial Neural Network using the Numpy Python library.

Building 147
article thumbnail

Five Interesting Data Engineering Projects

KDnuggets

As the role of the data engineer continues to grow in the field of data science, so are the many tools being developed to support wrangling all that data. Five of these tools are reviewed here (along with a few bonus tools) that you should pay attention to for your data pipeline work.

article thumbnail

A Beginner’s Guide to Data Integration Approaches in Business Intelligence

KDnuggets

An integrated BI system has a trickle-down effect on all business processes, especially reporting and analytics. Find out how integration can help you leverage the power of BI.

article thumbnail

Nine lessons learned during my first year as a Data Scientist

KDnuggets

What is it like to be a Data Scientist? There can be many hats to wear, and so many problems to solve that are fed with data, churned by data science, and guided by business results. Find out about lessons learned from one Data Scientist about how best to work and perform in the role.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

A Comprehensive Data Repository for Fake Health News Detection

KDnuggets

We introduce the FakeHealth, a new data repository for fake health news detection. Following a preliminary analysis to demonstrate its features, we consider additional potential directions for better identifying fake news.

Data 129
article thumbnail

A Top Machine Learning Algorithm Explained: Support Vector Machines (SVM)

KDnuggets

Support Vector Machines (SVMs) are powerful for solving regression and classification problems. You should have this approach in your machine learning arsenal, and this article provides all the mathematics you need to know -- it's not as hard you might think.

article thumbnail

24 Best (and Free) Books To Understand Machine Learning

KDnuggets

We have compiled a list of some of the best (and free) machine learning books that will prove helpful for everyone aspiring to build a career in the field.

article thumbnail

Skynet Is Real: The History and Future of Factories With No Workers

KDnuggets

Let’s see whether robots will become "grave diggers" of the proletariat, what do we lack to get total automation, and what compromises exist.

104
104
article thumbnail

Apache Airflow® Crash Course: From 0 to Running your Pipeline in the Cloud

With over 30 million monthly downloads, Apache Airflow is the tool of choice for programmatically authoring, scheduling, and monitoring data pipelines. Airflow enables you to define workflows as Python code, allowing for dynamic and scalable pipelines suitable to any use case from ETL/ELT to running ML/AI operations in production. This introductory tutorial provides a crash course for writing and deploying your first Airflow pipeline.

article thumbnail

Top KDnuggets tweets, Mar 11-17: Most western countries are on the same #coronavirus trajectory

KDnuggets

Most western countries are on the same #coronavirus trajectory; The Workers Who Face the Greatest #Coronavirus Risk; #Coronavirus, a Visual Rundown; How to start building an automated NLP solution for processing customer feedback.

article thumbnail

Forecasting Stories: Is it seasonality or not?

KDnuggets

Kicking off with a series of forecasting stories, starting with seasonality and its business applications. This first article speaks of course corrections that were based on weather and calendar driven seasonality.

IT 95
article thumbnail

Top 20 ODSC 2020 Global Virtual Conference Sessions

KDnuggets

At ODSC 2020, we are unveiling our first ever 4-day Global Virtual Conference, an online and on-demand version of ODSC. Here are our picks for 20 talks that show how diverse and thorough the ODSC East Global Virtual Conference will be this April 14-17.

article thumbnail

Improving the partnership between Data Science and IT

KDnuggets

Friction can quickly arise as a result of these separate workflows and priorities. Given their differences, how can data science and IT more seamlessly work together in building a model-driven organization?

article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, Terrence Sheflin, and Mahyar Ghasemali

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

Salesforce Open Sources a Framework for Open Domain Question Answering Using Wikipedia

KDnuggets

The framework uses a multi-hop QA method to answer complex questions by reasoning through Wikipedia’s datasets.

article thumbnail

Exploring the Adoption of Python in the Workplace – Free Metis Corporate Training Webinar

KDnuggets

Metis will break down Python for data science and analytics, explain what is driving adoption in the field, and discuss how industries and companies are reacting to the shift.

Python 65
article thumbnail

KDnuggets™ News 20:n11, Mar 18: Covid-19, your community, and you – a data science perspective; When Will AutoML replace Data Scientists? Poll Results and Analysis

KDnuggets

A Data Science perspective on Covid-19, the novel coronavirus; The results and analysis of a previous KDnuggets Poll: When Will AutoML replace Data Scientists? How to build a mature Machine Learning team; The Most Useful Machine Learning Tools of 2020; and more.