Top Data Engineering Digest Software Engineering Software Engineer Content for Week of Sep 21

Sat.Sep 21, 2019 - Fri.Sep 27, 2019

5 Famous Deep Learning Courses/Schools of 2019

KDnuggets

SEPTEMBER 24, 2019

Deep Learning is/has become the hottest skill in Data Science at the moment. There is a plethora of articles, courses, technologies, influencers and resources that we can leverage to gain the Deep Learning skills.

Deep Learning

Deep Learning Data Science Technology Data

Every Company is Becoming a Software Company

Confluent

SEPTEMBER 25, 2019

In 2011, Marc Andressen wrote an article called Why Software is Eating the World. The central idea is that any process that can be moved into software, will be. This has become a kind of shorthand for the investment thesis behind Silicon Valley’s current wave of unicorn startups. It’s also a unifying idea behind the larger set of technology trends we see today, such as machine learning, IoT, ubiquitous mobile connectivity, SaaS, and cloud computing.

Database-centric

Database-centric Kafka Pipeline-centric Retail

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Open Source Object Storage For All Of Your Data

Data Engineering Podcast

SEPTEMBER 22, 2019

Summary Object storage is quickly becoming the unifying layer for data intensive applications and analytics. Modern, cloud oriented data warehouses and data lakes both rely on the durability and ease of use that it provides. S3 from Amazon has quickly become the de-facto API for interacting with this service, so the team at MinIO have built a production grade, easy to manage storage engine that replicates that interface.

AWS

AWS Google Cloud Cloud Storage Data Lake

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Evolving Regional Evacuation

Netflix Tech

SEPTEMBER 23, 2019

Niosha Behnam | Demand Engineering @ Netflix At Netflix we prioritize innovation and velocity in pursuit of the best experience for our 150+ million global customers. This means that our microservices constantly evolve and change, but what doesn’t change is our responsibility to provide a highly available service that delivers 100+ million hours of daily streaming to our subscribers.

Amazon Web Services

Amazon Web Services Electronics Java AWS

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

12 Deep Learning Researchers and Leaders

KDnuggets

SEPTEMBER 23, 2019

Our list of deep learning researchers and industry leaders are the people you should follow to stay current with this wildly expanding field in AI. From early practitioners and established academics to entrepreneurs and today’s top corporate influencers, this diverse group of individuals is leading the way into tomorrow’s deep learning landscape.

Deep Learning

Real-Time Analytics and Monitoring Dashboards with Apache Kafka and Rockset

Confluent

SEPTEMBER 26, 2019

In the early days, many companies simply used Apache Kafka ® for data ingestion into Hadoop or another data lake. However, Apache Kafka is more than just messaging. The significant difference today is that companies use Apache Kafka as an event streaming platform for building mission-critical infrastructures and core operations platforms. Examples include microservice architectures, mainframe integration, instant payment, fraud detection, sensor analytics, real-time monitoring, and many more—dri

Kafka

Kafka SQL BI Hadoop

Time Series Analysis: Looking Back to See the Future

Teradata

SEPTEMBER 24, 2019

Time series data is found everywhere from stock prices to public health. Vantage's Machine Learning Engine helps turn that data into answers. Find out how.

Machine Learning

Machine Learning Engineering Data

More Trending

Time Series Analysis: Looking Back to See the Future

Teradata

SEPTEMBER 24, 2019

Time series data is found everywhere from stock prices to public health. Vantage's Machine Learning Engine helps turn that data into answers. Find out how.

Machine Learning

Machine Learning Engineering Data

Scaling a Mature Data Pipeline?—?Managing Overhead

Airbnb Tech

SEPTEMBER 24, 2019

Scaling a Mature Data Pipeline — Managing Overhead There is often a hidden performance cost tied to the complexity of data pipelines — the overhead. In this post, we will introduce its concept, and examine the techniques we use to avoid it in our data pipelines. Author : Zachary Ennenga The view from the third floor at Airbnb HQ! Background There is often a natural evolution in the tooling, organization, and technical underpinning of data pipelines.

Data Pipeline

Data Pipeline Management Data Scala

A Single Function to Streamline Image Classification with Keras

KDnuggets

SEPTEMBER 23, 2019

We show, step-by-step, how to construct a single, generalized, utility function to pull images automatically from a directory and train a convolutional neural net model.

Utilities

Utilities Python

How to Make the Most of Kafka Summit San Francisco 2019

Confluent

SEPTEMBER 23, 2019

Kafka Summit San Francisco is just one week away. Conferences can be busy affairs, so here are some tips on getting the most out of your time there. Plan. Go and check out the schedule. Spend a bit of time familiarising yourself with what sessions you want to get to, and mark them on your calendar. How do you pick which sessions to attend? My advice: diversify!

Kafka

Kafka Hadoop Architecture Database

Why Clean Data is Critical for Your Business

Teradata

SEPTEMBER 22, 2019

Clean data is critical to your business. Find out what three things you need to know about clean data for the health of your organization. Read more.

Data

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

What is Hierarchical Clustering?

KDnuggets

SEPTEMBER 27, 2019

The article contains a brief introduction to various concepts related to Hierarchical clustering algorithm.

Algorithm

Algorithm Machine Learning Python

Beta Distribution: What, When & How

KDnuggets

SEPTEMBER 25, 2019

This article covers the beta distribution, and explains it using baseball batting averages.

Automatic Version Control for Data Scientists

KDnuggets

SEPTEMBER 24, 2019

How can you keep your machine learning models and data organized so you can collaborate effectively? Discover this new tool set available for better version control designed for the data scientist workflow.

Machine Learning

Machine Learning Data Designing

Natural Language in Python using spaCy: An Introduction

KDnuggets

SEPTEMBER 26, 2019

This article provides a brief introduction to working with natural language (sometimes called “text analytics”) in Python using spaCy and related libraries.

Python

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

A 2019 Guide for Automatic Speech Recognition

KDnuggets

SEPTEMBER 24, 2019

In this article, we’ll look at a couple of papers aimed at solving the problem of automated speech recognition with machine and deep learning.

Deep Learning

The Future of Analytics and Data Science

KDnuggets

SEPTEMBER 26, 2019

Learn about the the current and future issues of data science and possible solutions from this interview with IADSS Co-founder, Dr. Usama Fayyad following his keynote speech at ODSC Boston 2019.

Data Science

Data Science Data

6 bits of advice for Data Scientists

KDnuggets

SEPTEMBER 25, 2019

As a data scientist, you can get lost in your daily dives into the data. Consider this advice to be certain to follow in your work for being diligent and more impactful for your organization.

Data

Why data analysts should choose stories over statistics

KDnuggets

SEPTEMBER 26, 2019

Join the Crunch Data Conference in Budapest, Oct 16-18, with stellar speakers from companies like Facebook, Netflix and LinkedIn. Use the discount code ‘KDNuggets’ to save $100 off your conference ticket.

Data

Data Coding Data Analytics Data Science

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Customer Segmentation for R Users

KDnuggets

SEPTEMBER 26, 2019

This article shows you how to separate your customers into distinct groups based on their purchase behavior. For the R enthusiasts out there, I demonstrated what you can do with r/stats, ggradar, ggplot2, animation, and factoextra.

Using Time Series Encodings to Discover Baseball History’s Most Interesting Seasons

KDnuggets

SEPTEMBER 27, 2019

Take me out to the ballgame! Take me out to the crowd! For the 2,829 seasons that have been played for 101 baseball teams since 1880, which seasons were unlike any others? Using SAX Encoding to recognize patterns in time series data, the most special years in baseball can be found.

Data

Introducing IceCAPS: Microsoft’s Framework for Advanced Conversation Modeling

KDnuggets

SEPTEMBER 23, 2019

The new open source framework that brings multi-task learning to conversational agents.

Data Mapping Using Machine Learning

KDnuggets

SEPTEMBER 27, 2019

Data mapping is a way to organize various bits of data into a manageable and easy-to-understand system.

Machine Learning

Machine Learning Data Systems Management

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

Webinar: Build auto-adaptive machine learning models with Kubernetes

KDnuggets

SEPTEMBER 27, 2019

This live webinar, Oct 2 2019, will instruct data scientists and machine learning engineers how to build manage and deploy auto-adaptive machine learning models in production. Save your spot now.

Machine Learning

Machine Learning Building Engineering Management

The thin line between data science and data engineering

KDnuggets

SEPTEMBER 25, 2019

Today, as companies have finally come to understand the value that data science can bring, more and more emphasis is being placed on the implementation of data science in production systems. And as these implementations have required models that can perform on larger and larger datasets in real-time, an awful lot of data science problems have become engineering problems.

Data Science

Data Science Data Engineering Data Engineer Engineering

Help Your Career Survive ‘DataGeddon’

KDnuggets

SEPTEMBER 25, 2019

Penn State’s fully online data analytics program uniquely prepares students to advance their career in data science. Penn State offers 3 intakes every year and reviews applications on a rolling basis. GMAT or GRE waivers are available to highly qualified candidates. Learn more now.

Data Science

Data Science Data Analytics Programming Education

Getting to the Future First: How Social Data is Transforming Trend Discovery

KDnuggets

SEPTEMBER 23, 2019

Register now for this webinar, Sep 25 @ 12 PM ET, for a clear approach on how to apply machine learning language technology to massive, unstructured data sets in order to create predictive models of what may be the next “it” ingredient, color, flavor or pack size.

Unstructured Data

Unstructured Data Machine Learning Data Technology

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

Schema Validation with Confluent 5.4-preview

Confluent

SEPTEMBER 27, 2019

Robust data governance support through Schema Validation on write is now supported in Confluent Platform 5.4-preview. This gives operators a centralized location to enforce data format correctness within Confluent Platform. Enforcing data correctness on write is the first step towards enabling centralized policy enforcement and data governance within your event streaming platform.

Kafka

Kafka Data Governance Bytes Government

Data Quality Assessment Is Not All Roses. What Challenges Should You Be Aware Of?

KDnuggets

SEPTEMBER 24, 2019

Of all data quality characteristics, we consider consistency and accuracy to be the most difficult ones to measure. Here, we describe the challenges that you may encounter and the ways to overcome them.

Data

AI World Conference & Expo, Oct 23-25, Boston – Updated Agenda and Special KDnuggets Discount

KDnuggets

SEPTEMBER 24, 2019

AI World Conference & Expo has become the industry’s largest independent business event focused on the state of the practice of AI in the enterprise. Join us in Boston, Oct 23-25. Use the discount code 1968-KDN and SAVE $200.

Coding

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineer

Sat.Sep 21, 2019 - Fri.Sep 27, 2019

5 Famous Deep Learning Courses/Schools of 2019

Every Company is Becoming a Software Company

Webinars

Trending Sources

Open Source Object Storage For All Of Your Data

Webinars

Evolving Regional Evacuation

A Guide to Debugging Apache Airflow® DAGs

12 Deep Learning Researchers and Leaders

Real-Time Analytics and Monitoring Dashboards with Apache Kafka and Rockset

Time Series Analysis: Looking Back to See the Future

Sign up to get articles personalized to your interests!

More Trending

Time Series Analysis: Looking Back to See the Future

Scaling a Mature Data Pipeline?—?Managing Overhead

A Single Function to Streamline Image Classification with Keras

How to Make the Most of Kafka Summit San Francisco 2019

Why Clean Data is Critical for Your Business

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

What is Hierarchical Clustering?

Beta Distribution: What, When & How

Automatic Version Control for Data Scientists

Natural Language in Python using spaCy: An Introduction

Agent Tooling: Connecting AI to Your Tools, Systems & Data

A 2019 Guide for Automatic Speech Recognition

The Future of Analytics and Data Science

6 bits of advice for Data Scientists

Why data analysts should choose stories over statistics

How to Modernize Manufacturing Without Losing Control

Customer Segmentation for R Users

Using Time Series Encodings to Discover Baseball History’s Most Interesting Seasons

Introducing IceCAPS: Microsoft’s Framework for Advanced Conversation Modeling

Data Mapping Using Machine Learning

The Ultimate Guide to Apache Airflow DAGS

Webinar: Build auto-adaptive machine learning models with Kubernetes

The thin line between data science and data engineering

Help Your Career Survive ‘DataGeddon’

Getting to the Future First: How Social Data is Transforming Trend Discovery

Apache Airflow® Best Practices: DAG Writing

Top Stories, Sep 16-22: Which Data Science Skills are core and which are hot/emerging ones?

Schema Validation with Confluent 5.4-preview

Data Quality Assessment Is Not All Roses. What Challenges Should You Be Aware Of?

AI World Conference & Expo, Oct 23-25, Boston – Updated Agenda and Special KDnuggets Discount

How to Achieve High-Accuracy Results When Using LLMs

Stay Connected