Sat.Sep 14, 2019 - Fri.Sep 20, 2019

article thumbnail

Navigating Boundless Data Streams With The Swim Kernel

Data Engineering Podcast

Summary The conventional approach to analytics involves collecting large amounts of data that can be cleaned, followed by a separate step for analysis and interpretation. Unfortunately this strategy is not viable for handling real-time, real-world use cases such as traffic management or supply chain logistics. In this episode Simon Crosby, CTO of Swim Inc., explains how the SwimOS kernel and the enterprise data fabric built on top of it enable brand new use cases for instant insights.

Hadoop 100
article thumbnail

Which Data Science Skills are core and which are hot/emerging ones?

KDnuggets

We identify two main groups of Data Science skills: A: 13 core, stable skills that most respondents have and B: a group of hot, emerging skills that most do not have (yet) but want to add. See our detailed analysis.

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Self-Service Analytics: Classifying Data and Analytic States

Teradata

Learn how to better classify data & analytics within the analytic ecosystem by analyzing the various states of data & analytics within organizations. Read more.

article thumbnail

Outside Lands, Airbnb Prices, and Rockset’s Geospatial Queries

Rockset

Airbnb Prices Around Major Events Operational analytics on real-time data streams requires being able to slice and dice it along all the axes that matter to people, including time and space. We can see how important it is to analyze data spatially by looking at an app that’s all about location: Airbnb. Major events in San Francisco cause huge influxes of people, and Airbnb prices increase accordingly.

IT 40
article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

The Rise of Managed Services for Apache Kafka

Confluent

As a distributed system for collecting, storing, and processing data at scale, Apache Kafka ® comes with its own deployment complexities. Luckily for on-premises scenarios, a myriad of deployment options are available, such as the Confluent Platform which can be deployed on bare metal, virtual machines, containers, etc. But deployment is just the tip of the iceberg.

Kafka 21
article thumbnail

BERT, RoBERTa, DistilBERT, XLNet: Which one to use?

KDnuggets

Lately, varying improvements over BERT have been shown — and here I will contrast the main similarities and differences so you can choose which one to use in your research or application.

116
116

More Trending

article thumbnail

Explore the world of Bioinformatics with Machine Learning

KDnuggets

The article contains a brief introduction of Bioinformatics and how a machine learning classification algorithm can be used to classify the type of cancer in each patient by their gene expressions.

article thumbnail

My journey path from a Software Engineer to BI Specialist to a Data Scientist

KDnuggets

The career path of the Data Scientist remains a hot target for many with its continuing high demand. Becoming one requires developing a broad set of skills including statistics, programming, and even business acumen. Learn more about one person's experience making this journey, and discover the many resources available to help you find your way into a world of data science.

article thumbnail

How Bad is Multicollinearity?

KDnuggets

For some people anything below 60% is acceptable and for certain others, even a correlation of 30% to 40% is considered too high because it one variable may just end up exaggerating the performance of the model or completely messing up parameter estimates.

IT 108
article thumbnail

The Hidden Risk of AI and Big Data

KDnuggets

With recent advances in AI being enabled through access to so much “Big Data” and cheap computing power, there is incredible momentum in the field. Can big data really deliver on all this hype, and what can go wrong?

Big Data 102
article thumbnail

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Speaker: Jay Allardyce, Deepak Vittal, and Terrence Sheflin

As we look ahead to 2025, business intelligence and data analytics are set to play pivotal roles in shaping success. Organizations are already starting to face a host of transformative trends as the year comes to a close, including the integration of AI in data analytics, an increased emphasis on real-time data insights, and the growing importance of user experience in BI solutions.

article thumbnail

The 5 Sampling Algorithms every Data Scientist need to know

KDnuggets

Algorithms are at the core of data science and sampling is a critical technical that can make or break a project. Learn more about the most common sampling techniques used, so you can select the best approach while working with your data.

article thumbnail

Scikit-Learn & More for Synthetic Dataset Generation for Machine Learning

KDnuggets

While mature algorithms and extensive open-source libraries are widely available for machine learning practitioners, sufficient data to apply these techniques remains a core challenge. Discover how to leverage scikit-learn and other tools to generate synthetic data appropriate for optimizing and fine-tuning your models.

article thumbnail

Top KDnuggets tweets, Sep 11-17: Python Libraries for Interpretable Machine Learning

KDnuggets

Also: Cartoon: Unsupervised #MachineLearning?; Cartoon: Unsupervised Machine Learning ? How to Become More Marketable as a Data Scientist; Ensemble Methods for Machine Learning: AdaBoost.

article thumbnail

Automate Hyperparameter Tuning for Your Models

KDnuggets

When we create our machine learning models, a common task that falls on us is how to tune them. So that brings us to the quintessential question: Can we automate this process?

article thumbnail

How to Drive Cost Savings, Efficiency Gains, and Sustainability Wins with MES

Speaker: Nikhil Joshi, Founder & President of Snic Solutions

Is your manufacturing operation reaching its efficiency potential? A Manufacturing Execution System (MES) could be the game-changer, helping you reduce waste, cut costs, and lower your carbon footprint. Join Nikhil Joshi, Founder & President of Snic Solutions, in this value-packed webinar as he breaks down how MES can drive operational excellence and sustainability.

article thumbnail

A Gentle Introduction to PyTorch 1.2

KDnuggets

This comprehensive tutorial aims to introduce the fundamentals of PyTorch building blocks for training neural networks.

Building 107
article thumbnail

5 Alternative Data Science Tools

KDnuggets

What other creative tools for data science beyond Python and R can you use to make an impression? It's not about the tool -- it's about its impact.

article thumbnail

Cartoon: Unsupervised Machine Learning?

KDnuggets

New KDnuggets Cartoon looks at one of the hottest directions in Machine Learning and asks can Machine Learning be too unsupervised?

article thumbnail

Reddit Post Classification

KDnuggets

This article covers the implementation of a data scraping and natural language processing project which had two parts: scrape as many posts from Reddit’s API as allowed &then use classification models to predict the origin of the posts.

Project 77
article thumbnail

Improving the Accuracy of Generative AI Systems: A Structured Approach

Speaker: Anindo Banerjea, CTO at Civio & Tony Karrer, CTO at Aggregage

When developing a Gen AI application, one of the most significant challenges is improving accuracy. This can be especially difficult when working with a large data corpus, and as the complexity of the task increases. The number of use cases/corner cases that the system is expected to handle essentially explodes. 💥 Anindo Banerjea is here to showcase his significant experience building AI/ML SaaS applications as he walks us through the current problems his company, Civio, is solving.

article thumbnail

Python 2 End of Life Survey – Are You Prepared?

KDnuggets

Support for Python 2 will expire on Jan. 1, 2020, after which the Python core language and many third-party packages will no longer be supported or maintained. Take this survey to help determine and share your level of preparation.

Python 75
article thumbnail

5 Beginner Friendly Steps to Learn Machine Learning and Data Science with Python

KDnuggets

“I want to learn machine learning and artificial intelligence, where do I start?” Here.

article thumbnail

Applying Data Science to Cybersecurity Network Attacks & Events

KDnuggets

Check out this detailed tutorial on applying data science to the cybersecurity domain, written by an individual with backgrounds in both fields.

article thumbnail

Turbo-Charging Data Science with AutoML

KDnuggets

Join this technical webinar on Oct 3, where Domino Chief Data Scientist Josh Poduska will dive into popular open source and proprietary AutoML tools, and walk through hands-on examples of how to install and use these tools, so you can start using these technologies in your work right away.

article thumbnail

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

In today’s construction market, owners, construction managers, and contractors must navigate increasing challenges, from cost management to project delays. Fortunately, digital tools now offer valuable insights to help mitigate these risks. However, the sheer volume of tools and the complexity of leveraging their data effectively can be daunting. That’s where data-driven construction comes in.

article thumbnail

Top Stories, Sep 9-15: 10 Great Python Resources for Aspiring Data Scientists

KDnuggets

Also: The 5 Graph Algorithms That Data Scientists Should Know; Many Heads Are Better Than One: The Case For Ensemble Learning; BERT is changing the NLP landscape; I wasn't getting hired as a Data Scientist; There is No Free Lunch in Data Science.

Python 72
article thumbnail

5 Step Guide to Scalable Deep Learning Pipelines with d6tflow

KDnuggets

How to turn a typical pytorch script into a scalable d6tflow DAG for faster research & development.

article thumbnail

Data Science Symposium 2019, Oct 10-11, Cincinnati

KDnuggets

The UC Center for Business Analytics will present the Data Science Symposium 2019 on Oct 10 & 11, featuring 3 keynote speakers and 16 tech talks/tutorials on a wide range of data science topics and tools.

article thumbnail

Webinar: Data-Driven Approaches to Forecasting

KDnuggets

Whether it’s demand forecasting, supply chain management, or any other application, getting it right requires balancing the need for performance with the constraints of implementation and complexity. Learn more in this free webinar, Data-Driven Approaches to Forecasting, Sep 26.

Data 65
article thumbnail

Business Intelligence 101: How To Make The Best Solution Decision For Your Organization

Speaker: Evelyn Chou

Choosing the right business intelligence (BI) platform can feel like navigating a maze of features, promises, and technical jargon. With so many options available, how can you ensure you’re making the right decision for your organization’s unique needs? 🤔 This webinar brings together expert insights to break down the complexities of BI solution vetting.

article thumbnail

What is Machine Behavior?

KDnuggets

The new emerging field that wants to study AI agents the way social scientists study humans.

88
article thumbnail

Data Science is Boring (Part 1)

KDnuggets

Read about how one data scientist copes with his boring days of deploying machine learning.

article thumbnail

KDnuggets™ News 19:n35, Sep 18: Which Data Science Skills are core and which are hot/emerging ones?; There is No Free Lunch in Data Science Features

KDnuggets

Check the results of KDnuggets' latest poll to find out which data science skills are core and which are hot/emerging ones; why is there no free lunch in data science?; training Scikit-learn 100x faster; poking fun at unsupervised machine learning; exploring the case for ensemble learning. All this and much more this week on KDnuggets.

article thumbnail

Beyond Explainability: A Practical Guide to Managing Risks in Machine Learning Models

KDnuggets

This white paper provides the first-ever standard for managing risk in AI and ML, focusing on both practical processes and technical best practices “beyond explainability” alone. Download now.

article thumbnail

Driving Responsible Innovation: How to Navigate AI Governance & Data Privacy

Speaker: Aindra Misra, Senior Manager, Product Management (Data, ML, and Cloud Infrastructure) at BILL

Join us for an insightful webinar that explores the critical intersection of data privacy and AI governance. In today’s rapidly evolving tech landscape, building robust governance frameworks is essential to fostering innovation while staying compliant with regulations. Our expert speaker, Aindra Misra, will guide you through best practices for ensuring data protection while leveraging AI capabilities.