Top Data Engineering Digest Data Science Machine Learning Content for March, 2020

March, 2020

Coronavirus Data and Poll Analysis – yes, there is hope, if we act now

KDnuggets

MARCH 23, 2020

We examine the growth of coronavirus daily cases in most affected countries, and show evidence that social distancing works in reducing the rate of spread. We also analyze KDnuggets Poll results - the scale of change to online and how Data Science work is likely to increase or drop in different regions. Stay Healthy and practice social distancing!

Data Science

Data Science Data

Why We Leverage Multi-tenancy in Uber’s Microservice Architecture

Uber Engineering

MARCH 11, 2020

The performance of Uber’s services relies on our ability to quickly and stably launch new features on our platform , regardless of where the corresponding service lives in our tech stack. Foundational to our platform’s power is its microservice-based architecture … The post Why We Leverage Multi-tenancy in Uber’s Microservice Architecture appeared first on Uber Engineering Blog.

Architecture

Architecture Engineering IT

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Advanced Analytics for Coronavirus – Trends, Patterns, Predictions

Teradata

MARCH 15, 2020

Advanced analytics and AI can significantly accelerate data processing required to get the insights, answers and recommendations to handle and address the COVID-19 pandemic.

Data Process

Data Process Process Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Ready for changes with Hexagonal Architecture

Netflix Tech

MARCH 10, 2020

by Damir Svrtan and Sergii Makagon As the production of Netflix Originals grows each year, so does our need to build apps that enable efficiency throughout the entire creative process. Our wider Studio Engineering Organization has built more than 30 apps that help content progress from pitch (aka screenplay) to playback: ranging from script content acquisition, deal negotiations and vendor management to scheduling, streamlining production workflows, and so on.

Architecture

Architecture Transportation Java Database

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

How to process simple data stream and consume with Lambda

Team Data Science

MARCH 31, 2020

I built a serverless architecture for my simulated credit card complaints stream using, AWS S3 AWS Lambda AWS Kinesis the above picture gives a high-level view of the data flow. I assume uploading the CSV file as a data producer, so once you upload a file, it generates object created event and the Lambda function is invoked asynchronously. The file data content will be written to the Kinesis stream as a record (record = data + partition key), which triggers another Lambda function and persist th

Process

Process AWS Python Architecture

Scheduling a SQL script, using Apache Airflow, with an example

Start Data Engineering

MARCH 28, 2020

One of the most common use cases for Apache Airflow is to run scheduled SQL scripts. Developers who start with Airflow often ask the following questions “How to use airflow to orchestrate sql?

SQL

The 4 Best Jupyter Notebook Environments for Deep Learning

KDnuggets

MARCH 19, 2020

Many cloud providers, and other third-party services, see the value of a Jupyter notebook environment which is why many companies now offer cloud hosted notebooks that are hosted on the cloud. Let's have a look at 3 such environments.

Deep Learning

Deep Learning Cloud Python

More Trending

The 4 Best Jupyter Notebook Environments for Deep Learning

KDnuggets

MARCH 19, 2020

Deep Learning

Deep Learning Cloud Python

Kafka Connect Elasticsearch Connector in Action

Confluent

MARCH 4, 2020

The Elasticsearch sink connector helps you integrate Apache Kafka® and Elasticsearch with minimum effort. You can take data you’ve stored in Kafka and stream it into Elasticsearch to then be […].

Kafka

Kafka IT Data

Improving Prediction of the Unconfirmed COVID-19 Cases

Teradata

MARCH 18, 2020

With the lack of available tests & uncertainty around the true number of COVID-19 cases, Teradata Epidemiologist Daniel Ulatowski & Data Scientist Jack McCush hypothesize how symptomatic data & the Vantage ML Engine can be utilized to predict cases.

Utilities

Utilities Engineering Data

The Life Of A Non-Profit Data Professional

Data Engineering Podcast

MARCH 30, 2020

Summary Building and maintaining a system that integrates and analyzes all of the data for your organization is a complex endeavor. Operating on a shoe-string budget makes it even more challenging. In this episode Tyler Colby shares his experiences working as a data professional in the non-profit sector. From managing Salesforce data models to wrangling a multitude of data sources and compliance challenges, he describes the biggest challenges that he is facing.

AWS

AWS Data Machine Learning Big Data

Simplistic Ways to Find Interesting Data Sets

Team Data Science

MARCH 15, 2020

I am taking you through my recent experience to find a dataset for my project. Industry Search To work with data, I need to narrow down the industry like health care, finance, insurance or other. I defined a few sources in my earlier blog post, which will give a sneak peek of techniques to extract industries. For Instance, most of the job listings introduce their job description as, One of the top insurance client looking for Data Engineer which exposes the industry.

Insurance

Insurance Datasets Banking Finance

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

10 Key skills, to help you become a data engineer

Start Data Engineering

MARCH 20, 2020

This article gives you an overview of the 10 key skills you need to become a better data engineer. If you are struggling to get started on what to learn, start with the first topic and proceed through the list.

Data Engineer

Data Engineer Data Engineering Engineering Data

Time Series Classification Synthetic vs Real Financial Time Series

KDnuggets

MARCH 18, 2020

This article discusses distinguishing between real financial time series and synthetic time series using XGBoost.

Finance

15 Things Every Apache Kafka Engineer Should Know About Confluent Replicator

Confluent

MARCH 17, 2020

Single-cluster deployments of Apache Kafka® are rare. Most medium to large deployments employ more than one Kafka cluster, and even the smallest use cases include development, testing, and production clusters. […].

Kafka

Kafka Engineering

How Netflix uses Druid for Real-time Insights to Ensure a High-Quality Experience

Netflix Tech

MARCH 3, 2020

By Ben Sykes Continue reading on Netflix TechBlog ».

Kafka

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Behind The Scenes Of The Linode Object Storage Service

Data Engineering Podcast

MARCH 23, 2020

Summary There are a number of platforms available for object storage, including self-managed open source projects. But what goes on behind the scenes of the companies that run these systems at scale so you don’t have to? In this episode Will Smith shares the journey that he and his team at Linode recently completed to bring a fast and reliable S3 compatible object storage to production for your benefit.

Media

Media Machine Learning Big Data Data Engineering

People, We Need to Talk About Mass Electronic Surveillance

Teradata

MARCH 30, 2020

With the COVID-19 epidemic in full swing, the countries that are faring the best are employing large-scale testing and electronic surveillance. But what does this mean for our civil liberties?

Electronics

Sending HTTP Requests with Scala and Akka in 5 Minutes

Rock the JVM

MARCH 31, 2020

Learn to use Akka HTTP with Scala and send HTTP requests in just a few minutes with the Akka HTTP server DSL

Scala

Covid-19, your community, and you — a data science perspective

KDnuggets

MARCH 11, 2020

Let's talk about covid-19; the reality, the numbers, and the data science.

Data Science

Data Science Data Data Analysis

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Building a Cloud ETL Pipeline on Confluent Cloud

Confluent

MARCH 18, 2020

As enterprises move more and more of their applications to the cloud, they are also moving their on-prem ETL (extract, transform, load) pipelines to the cloud, as well as building […].

Cloud

Cloud Building

Introducing Dispatch

Netflix Tech

MARCH 5, 2020

By Kevin Glisson, Marc Vilanova, Forest Monsen Netflix is pleased to announce the open-source release of our crisis management orchestration framework: Dispatch! Okay, but what is Dispatch? Put simply, Dispatch is: All of the ad-hoc things you’re doing to manage incidents today, done for you, and a bunch of other things you should’ve been doing, but have not had the time!

Metadata

Metadata AWS Management Architecture

Building A New Foundation For CouchDB

Data Engineering Podcast

MARCH 16, 2020

Summary CouchDB is a distributed document database built for scale and ease of operation. With a built-in synchronization protocol and a HTTP interface it has become popular as a backend for web and mobile applications. Created 15 years ago, it has accrued some technical debt which is being addressed with a refactored architecture based on FoundationDB.

Building

Building Data Warehouse NoSQL Data Lake

Five Books Every CX Leader Should Read in this Time of Social Distancing

Teradata

MARCH 22, 2020

Check out this curated reading list of books on customer experience. From updated classics to new research and insights into how large enterprises can drive business outcomes from a CX initiative.

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

Cloud

Why Is Contravariance So Hard in Scala?

Rock the JVM

MARCH 30, 2020

Unravel the complexities of Scala's powerful type system with our deep dive into contravariance: we simplify and demystify its challenging aspects

Scala

Scala Systems IT

20+ Machine Learning Datasets & Project Ideas

KDnuggets

MARCH 9, 2020

Upgrading your machine learning, AI, and Data Science skills requires practice. To practice, you need to develop models with a large amount of data. Finding good datasets to work with can be challenging, so this article discusses more than 20 great datasets along with machine learning project ideas for you to tackle today.

Machine Learning

Machine Learning Datasets Project Data Science

Sharpening your Stream Processing Skills with Kafka Tutorials

Confluent

MARCH 11, 2020

In the Apache Kafka® ecosystem, ksqlDB and Kafka Streams are two popular tools for building event streaming applications that are tightly integrated with Apache Kafka. While ksqlDB and Kafka Streams […].

Kafka

Kafka Process Building

Open-Sourcing riskquant, a library for quantifying risk

Netflix Tech

MARCH 5, 2020

Netflix has a program in our Information Security department for quantifying the risk of deliberate (attacker-driven) and accidental… Continue reading on Netflix TechBlog ».

Programming

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

Business Intelligence

Scaling Data Governance For Global Businesses With A Data Hub Architecture

Data Engineering Podcast

MARCH 9, 2020

Summary Data governance is a complex endeavor, but scaling it to meet the needs of a complex or globally distributed organization requires a well considered and coherent strategy. In this episode Tim Ward describes an architecture that he has used successfully with multiple organizations to scale compliance. By treating it as a graph problem, where each hub in the network has localized control with inheritance of higher level controls it reduces overhead and provides greater flexibility.

Data Governance

Data Governance Government Architecture Data

Teradata's Response to COVID-19

Teradata

MARCH 17, 2020

How Teradata is responding to the COVID-19 crisis for the health and well-being of its employees, customers and partners.

Why Is Contravariance So Hard in Scala?

Rock the JVM

MARCH 30, 2020

Unravel the complexities of Scala's powerful type system with our deep dive into contravariance: we simplify and demystify its challenging aspects

Scala

Scala Systems IT

50 Must-Read Free Books For Every Data Scientist in 2020

KDnuggets

MARCH 9, 2020

In this article, we are listing down some excellent data science books which cover the wide variety of topics under Data Science.

Data Science

Data Science Data

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

March, 2020

Coronavirus Data and Poll Analysis – yes, there is hope, if we act now

Why We Leverage Multi-tenancy in Uber’s Microservice Architecture

Webinars

Trending Sources

Advanced Analytics for Coronavirus – Trends, Patterns, Predictions

Webinars

Ready for changes with Hexagonal Architecture

A Guide to Debugging Apache Airflow® DAGs

How to process simple data stream and consume with Lambda

Scheduling a SQL script, using Apache Airflow, with an example

The 4 Best Jupyter Notebook Environments for Deep Learning

Sign up to get articles personalized to your interests!

More Trending

The 4 Best Jupyter Notebook Environments for Deep Learning

Kafka Connect Elasticsearch Connector in Action

Improving Prediction of the Unconfirmed COVID-19 Cases

The Life Of A Non-Profit Data Professional

Simplistic Ways to Find Interesting Data Sets

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

10 Key skills, to help you become a data engineer

Time Series Classification Synthetic vs Real Financial Time Series

15 Things Every Apache Kafka Engineer Should Know About Confluent Replicator

How Netflix uses Druid for Real-time Insights to Ensure a High-Quality Experience

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Behind The Scenes Of The Linode Object Storage Service

People, We Need to Talk About Mass Electronic Surveillance

Sending HTTP Requests with Scala and Akka in 5 Minutes

Covid-19, your community, and you — a data science perspective

How to Modernize Manufacturing Without Losing Control

Building a Cloud ETL Pipeline on Confluent Cloud

Introducing Dispatch

Building A New Foundation For CouchDB

Five Books Every CX Leader Should Read in this Time of Social Distancing

Optimizing The Modern Developer Experience with Coder

Why Is Contravariance So Hard in Scala?

20+ Machine Learning Datasets & Project Ideas

Sharpening your Stream Processing Skills with Kafka Tutorials

Open-Sourcing riskquant, a library for quantifying risk

15 Modern Use Cases for Enterprise Business Intelligence

Scaling Data Governance For Global Businesses With A Data Hub Architecture

Teradata's Response to COVID-19

Why Is Contravariance So Hard in Scala?

50 Must-Read Free Books For Every Data Scientist in 2020

The Ultimate Guide to Apache Airflow DAGS

Stay Connected