Top Data Engineering Digest Python Scala Content for Week of Sep 14

Sat.Sep 14, 2019 - Fri.Sep 20, 2019

Which Data Science Skills are core and which are hot/emerging ones?

KDnuggets

SEPTEMBER 17, 2019

We identify two main groups of Data Science skills: A: 13 core, stable skills that most respondents have and B: a group of hot, emerging skills that most do not have (yet) but want to add. See our detailed analysis.

Data Science

Data Science Data Deep Learning Scala

The Rise of Managed Services for Apache Kafka

Confluent

SEPTEMBER 20, 2019

As a distributed system for collecting, storing, and processing data at scale, Apache Kafka ® comes with its own deployment complexities. Luckily for on-premises scenarios, a myriad of deployment options are available, such as the Confluent Platform which can be deployed on bare metal, virtual machines, containers, etc. But deployment is just the tip of the iceberg.

Kafka

Kafka Management Cloud AWS

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Navigating Boundless Data Streams With The Swim Kernel

Data Engineering Podcast

SEPTEMBER 18, 2019

Summary The conventional approach to analytics involves collecting large amounts of data that can be cleaned, followed by a separate step for analysis and interpretation. Unfortunately this strategy is not viable for handling real-time, real-world use cases such as traffic management or supply chain logistics. In this episode Simon Crosby, CTO of Swim Inc., explains how the SwimOS kernel and the enterprise data fabric built on top of it enable brand new use cases for instant insights.

Hadoop

Hadoop Data Lake BI Kafka

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Self-Service Analytics: Classifying Data and Analytic States

Teradata

SEPTEMBER 17, 2019

Learn how to better classify data & analytics within the analytic ecosystem by analyzing the various states of data & analytics within organizations. Read more.

Data Analytics

Data Analytics Data

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

BERT, RoBERTa, DistilBERT, XLNet: Which one to use?

KDnuggets

SEPTEMBER 17, 2019

Lately, varying improvements over BERT have been shown — and here I will contrast the main similarities and differences so you can choose which one to use in your research or application.

Built-In Multi-Region Replication with Confluent Platform 5.4-preview

Confluent

SEPTEMBER 16, 2019

Running a single Apache Kafka ® cluster across multiple datacenters (DCs) is a common, yet somewhat taboo architecture. This architecture, referred to as a stretch cluster, provides several operational benefits and unlocks the door to many uses cases. Stretch clusters provide better durability guarantees and make disaster recovery much easier by avoiding the problem of offset translation and restarting clients.

Kafka

Kafka Metadata Architecture Software Engineer

Outside Lands, Airbnb Prices, and Rockset’s Geospatial Queries

Rockset

SEPTEMBER 20, 2019

Airbnb Prices Around Major Events Operational analytics on real-time data streams requires being able to slice and dice it along all the axes that matter to people, including time and space. We can see how important it is to analyze data spatially by looking at an app that’s all about location: Airbnb. Major events in San Francisco cause huge influxes of people, and Airbnb prices increase accordingly.

IT Data

More Trending

Outside Lands, Airbnb Prices, and Rockset’s Geospatial Queries

Rockset

SEPTEMBER 20, 2019

IT Data

Multitasking Within the Teradata Vantage Optimizer

Teradata

SEPTEMBER 15, 2019

Want scale? Without multitasking capabilities, Teradata Vantage would not be able to support hundreds or thousands of user queries at the same time. Learn more.

Explore the world of Bioinformatics with Machine Learning

KDnuggets

SEPTEMBER 17, 2019

The article contains a brief introduction of Bioinformatics and how a machine learning classification algorithm can be used to classify the type of cancer in each patient by their gene expressions.

Machine Learning

Machine Learning Algorithm Python

Reflections on Event Streaming as Confluent Turns Five – Part 2

Confluent

SEPTEMBER 19, 2019

When people ask me the very top-level question “why do people use Kafka,” I usually lead with the story in my last post , where I talked about how Apache Kafka ® is helping us deliver on the promises the cloud made to us a decade ago. But I follow it up quickly with a second and potentially unrelated pattern: real-time data pipelines. These provide a different set of motivations for using an event streaming platform than scaling and microservices: specifically, the need to produce analytics resu

Kafka

Kafka Data Pipeline Bytes Data Architect

My journey path from a Software Engineer to BI Specialist to a Data Scientist

KDnuggets

SEPTEMBER 16, 2019

The career path of the Data Scientist remains a hot target for many with its continuing high demand. Becoming one requires developing a broad set of skills including statistics, programming, and even business acumen. Learn more about one person's experience making this journey, and discover the many resources available to help you find your way into a world of data science.

Software Engineer

Software Engineer Software Engineering BI Engineering

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

A Gentle Introduction to PyTorch 1.2

KDnuggets

SEPTEMBER 20, 2019

This comprehensive tutorial aims to introduce the fundamentals of PyTorch building blocks for training neural networks.

Building

Building Python

5 Beginner Friendly Steps to Learn Machine Learning and Data Science with Python

KDnuggets

SEPTEMBER 19, 2019

“I want to learn machine learning and artificial intelligence, where do I start?” Here.

Machine Learning

Machine Learning Data Science Python Data

The Hidden Risk of AI and Big Data

KDnuggets

SEPTEMBER 20, 2019

With recent advances in AI being enabled through access to so much “Big Data” and cheap computing power, there is incredible momentum in the field. Can big data really deliver on all this hype, and what can go wrong?

Big Data

Big Data Data Accessible Accessibility

How Bad is Multicollinearity?

KDnuggets

SEPTEMBER 17, 2019

For some people anything below 60% is acceptable and for certain others, even a correlation of 30% to 40% is considered too high because it one variable may just end up exaggerating the performance of the model or completely messing up parameter estimates.

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

The 5 Sampling Algorithms every Data Scientist need to know

KDnuggets

SEPTEMBER 18, 2019

Algorithms are at the core of data science and sampling is a critical technical that can make or break a project. Learn more about the most common sampling techniques used, so you can select the best approach while working with your data.

Algorithm

Algorithm Data Science Data Project

5 Alternative Data Science Tools

KDnuggets

SEPTEMBER 17, 2019

What other creative tools for data science beyond Python and R can you use to make an impression? It's not about the tool -- it's about its impact.

Data Science

Data Science Python Data IT

Automate Hyperparameter Tuning for Your Models

KDnuggets

SEPTEMBER 20, 2019

When we create our machine learning models, a common task that falls on us is how to tune them. So that brings us to the quintessential question: Can we automate this process?

Machine Learning

Machine Learning Process

What is Machine Behavior?

KDnuggets

SEPTEMBER 16, 2019

The new emerging field that wants to study AI agents the way social scientists study humans.

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineer

Cartoon: Unsupervised Machine Learning?

KDnuggets

SEPTEMBER 14, 2019

New KDnuggets Cartoon looks at one of the hottest directions in Machine Learning and asks can Machine Learning be too unsupervised?

Machine Learning

Top KDnuggets tweets, Sep 11-17: Python Libraries for Interpretable Machine Learning

KDnuggets

SEPTEMBER 18, 2019

Also: Cartoon: Unsupervised #MachineLearning?; Cartoon: Unsupervised Machine Learning ? How to Become More Marketable as a Data Scientist; Ensemble Methods for Machine Learning: AdaBoost.

Machine Learning

Machine Learning Python Data

Scikit-Learn & More for Synthetic Dataset Generation for Machine Learning

KDnuggets

SEPTEMBER 19, 2019

While mature algorithms and extensive open-source libraries are widely available for machine learning practitioners, sufficient data to apply these techniques remains a core challenge. Discover how to leverage scikit-learn and other tools to generate synthetic data appropriate for optimizing and fine-tuning your models.

Machine Learning

Machine Learning Datasets Algorithm Data

5 Step Guide to Scalable Deep Learning Pipelines with d6tflow

KDnuggets

SEPTEMBER 16, 2019

How to turn a typical pytorch script into a scalable d6tflow DAG for faster research & development.

Deep Learning

Deep Learning Python

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

Data Science is Boring (Part 1)

KDnuggets

SEPTEMBER 18, 2019

Read about how one data scientist copes with his boring days of deploying machine learning.

Data Science

Data Science Machine Learning Data

Applying Data Science to Cybersecurity Network Attacks & Events

KDnuggets

SEPTEMBER 19, 2019

Check out this detailed tutorial on applying data science to the cybersecurity domain, written by an individual with backgrounds in both fields.

Data Science

Data Science Data Machine Learning Python

Reddit Post Classification

KDnuggets

SEPTEMBER 18, 2019

This article covers the implementation of a data scraping and natural language processing project which had two parts: scrape as many posts from Reddit’s API as allowed &then use classification models to predict the origin of the posts.

Project

Project Process Data

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineering

Python 2 End of Life Survey – Are You Prepared?

KDnuggets

SEPTEMBER 18, 2019

Support for Python 2 will expire on Jan. 1, 2020, after which the Python core language and many third-party packages will no longer be supported or maintained. Take this survey to help determine and share your level of preparation.

Python

Turbo-Charging Data Science with AutoML

KDnuggets

SEPTEMBER 17, 2019

Join this technical webinar on Oct 3, where Domino Chief Data Scientist Josh Poduska will dive into popular open source and proprietary AutoML tools, and walk through hands-on examples of how to install and use these tools, so you can start using these technologies in your work right away.

Data Science

Data Science Data Technology

Webinar: Data-Driven Approaches to Forecasting

KDnuggets

SEPTEMBER 19, 2019

Whether it’s demand forecasting, supply chain management, or any other application, getting it right requires balancing the need for performance with the constraints of implementation and complexity. Learn more in this free webinar, Data-Driven Approaches to Forecasting, Sep 26.

Data

Data Management IT

Data Science Symposium 2019, Oct 10-11, Cincinnati

KDnuggets

SEPTEMBER 16, 2019

The UC Center for Business Analytics will present the Data Science Symposium 2019 on Oct 10 & 11, featuring 3 keynote speakers and 16 tech talks/tutorials on a wide range of data science topics and tools.

Data Science

Data Science Data

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

Cloud

Sat.Sep 14, 2019 - Fri.Sep 20, 2019

Which Data Science Skills are core and which are hot/emerging ones?

The Rise of Managed Services for Apache Kafka

Webinars

Trending Sources

Navigating Boundless Data Streams With The Swim Kernel

Webinars

Self-Service Analytics: Classifying Data and Analytic States

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

BERT, RoBERTa, DistilBERT, XLNet: Which one to use?

Built-In Multi-Region Replication with Confluent Platform 5.4-preview

Outside Lands, Airbnb Prices, and Rockset’s Geospatial Queries

Sign up to get articles personalized to your interests!

More Trending

Outside Lands, Airbnb Prices, and Rockset’s Geospatial Queries

Multitasking Within the Teradata Vantage Optimizer

Explore the world of Bioinformatics with Machine Learning

Reflections on Event Streaming as Confluent Turns Five – Part 2

My journey path from a Software Engineer to BI Specialist to a Data Scientist

Agent Tooling: Connecting AI to Your Tools, Systems & Data

A Gentle Introduction to PyTorch 1.2

5 Beginner Friendly Steps to Learn Machine Learning and Data Science with Python

The Hidden Risk of AI and Big Data

How Bad is Multicollinearity?

How to Modernize Manufacturing Without Losing Control

The 5 Sampling Algorithms every Data Scientist need to know

5 Alternative Data Science Tools

Automate Hyperparameter Tuning for Your Models

What is Machine Behavior?

The Ultimate Guide to Apache Airflow DAGS

Cartoon: Unsupervised Machine Learning?

Top KDnuggets tweets, Sep 11-17: Python Libraries for Interpretable Machine Learning

Scikit-Learn & More for Synthetic Dataset Generation for Machine Learning

5 Step Guide to Scalable Deep Learning Pipelines with d6tflow

Apache Airflow® Best Practices: DAG Writing

Data Science is Boring (Part 1)

Applying Data Science to Cybersecurity Network Attacks & Events

Reddit Post Classification

Top Stories, Sep 9-15: 10 Great Python Resources for Aspiring Data Scientists

How to Achieve High-Accuracy Results When Using LLMs

Python 2 End of Life Survey – Are You Prepared?

Turbo-Charging Data Science with AutoML

Webinar: Data-Driven Approaches to Forecasting

Data Science Symposium 2019, Oct 10-11, Cincinnati

Optimizing The Modern Developer Experience with Coder

Stay Connected