Top Data Engineering Digest Computer Science Machine Learning Content for Week of Mar 23

Sat.Mar 23, 2019 - Fri.Mar 29, 2019

The Importance of Distributed Tracing for Apache-Kafka-Based Applications

Confluent

MARCH 26, 2019

Apache-Kafka ® -based applications stand out for their ability to decouple producers and consumers using an event log as an intermediate layer. One result of this is that producers and consumers don’t know about each other, as there is no direct communication between them. This enables choreographed service collaborations, where many components can subscribe to events stored in the event log and react to them asynchronously.

Kafka

Kafka Transportation Metadata Consulting

Building An Enterprise Data Fabric At CluedIn

Data Engineering Podcast

MARCH 25, 2019

Summary Data integration is one of the most challenging aspects of any data platform, especially as the variety of data sources and formats grow. Enterprise organizations feel this acutely due to the silos that occur naturally across business units. The CluedIn team experienced this issue first-hand in their previous roles, leading them to build a business aimed at building a managed data fabric for the enterprise.

Building

Building Data Lake Machine Learning Kafka

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Netflix Tech

MARCH 25, 2019

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and Efficiency By: Di Lin , Girish Lingappa , Jitender Aswani Imagine yourself in the role of a data-inspired decision maker staring at a metric on a dashboard about to make a critical business decision but pausing to ask a question?—?“Can I run a check myself to understand what data is behind this metric?

Building

Building Metadata Transportation Data Ingestion

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Improving the User Experience with Uber’s Customer Obsession Ticket Routing Workflow and Orchestration Engine

Uber Engineering

MARCH 28, 2019

Every day, Uber users around the world initiate customer support tickets through our Customer Obsession Platform. To ensure a seamless user experience, each of those tickets must be matched with an agent who speaks the user’s language and who … The post Improving the User Experience with Uber’s Customer Obsession Ticket Routing Workflow and Orchestration Engine appeared first on Uber Engineering Blog.

Engineering

Engineering Architecture Process

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Cloudera Altus Director on AWS Marketplace makes cloud deployment and billing easy

Cloudera

MARCH 26, 2019

Roughly a quarter of Cloudera’s customers have clusters on public cloud, with a majority of them on AWS. These customers often look for cloud infrastructure best practices guidance as they venture into AWS cloud resources for the first time. Some of the questions asked include: How many AMIs do I need? Should I use EBS or S3 for storage? Many of these questions are answered in the Cloudera on AWS reference architecture guide.

AWS

AWS Cloud Architecture Management

Consuming Messages Out of Apache Kafka in a Browser

Confluent

MARCH 28, 2019

Imagine a fire hose that spews out trillions of gallons of water every day, and part of your job is to withstand every drop coming out of it. This is what it is like to visualize the message throughput of Apache Kafka ®. At Confluent, we want to help developers understand how to think about event streaming and the opportunities it can create. Educating people on what an event stream looks like is a daunting task.

Kafka

Kafka Aggregated Data Engineering Media

6 Lessons for Women in Tech

Teradata

MARCH 28, 2019

Today, we share insights from a diverse group of women at Teradata and their advice to other women entering roles in technology.

Technology

More Trending

6 Lessons for Women in Tech

Teradata

MARCH 28, 2019

Today, we share insights from a diverse group of women at Teradata and their advice to other women entering roles in technology.

Technology

From Schemaless Ingest to Smart Schema: Enabling SQL on Raw Data

Rockset

MARCH 27, 2019

You have complex, semi-structured data—nested JSON or XML, for instance, containing mixed types, sparse fields, and null values. It's messy, you don't understand how it's structured, and new fields appear every so often. The application you're implementing needs to analyze this data, combining it with other datasets, to return live metrics and recommended actions.

Raw Data

Raw Data SQL NoSQL Datasets

Managing mortgage risk in an uncertain world

Cloudera

MARCH 28, 2019

Picture the scene: a hopeful homebuyer sits in the almost deserted lobby of a high street bank, waiting for the appointment she booked with the mortgage consultant a week ago – a week ago! It annoys her that she has had to come to a branch she has not visited for years, all because she could not work out how to apply for a home loan on the bank’s website.

Management

Management Banking Consulting Machine Learning

Consuming Messages Out of Apache Kafka in a Browser

Confluent

MARCH 28, 2019

Kafka

Kafka Aggregated Data Engineering Media

Why You Get Faster Query Results with Teradata’s Adaptive Optimizer

Teradata

MARCH 24, 2019

Carrie Ballinger explores the capabilities of Teradata’s Adaptive Optimizer and how it builds better query plans for faster answers to analytic queries.

Building

Building IT

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

A Story of Rust

Zalando Engineering

MARCH 27, 2019

Introducing Rust in an Enterprise Environment Discovery Sometime in 2013, on the internet I stumbled over a new programming language called Rust. Taking a look at the language, I was impressed by its high level features. At that time I was a backend Scala developer with a.Net background. When examining Rust, I found most of the features I used every day like Pattern Matching, the “New Type Pattern” and a “Scala like” Iterator API.

Scala

Scala Programming Language Technology Utilities

Introducing Cloudera Edge Management and Cloudera Flow Management

Cloudera

MARCH 27, 2019

Cloudera’s vision of delivering Edge to AI solutions using the Enterprise Data Cloud will enable enterprises to transform dramatically. In today’s digitally connected enterprises, data originates from the edge, streams into the data center, lands in an Enterprise Data Cloud for downstream processing including Machine Learning and then serves back to the edge for real-time prediction and action.

Management

Management Data Ingestion Machine Learning Java

Learning with Limited Labeled Data

Cloudera

MARCH 25, 2019

This post was originally published on the Cloudera Fast Forward Labs blog. . In recent years, machine learning technologies – especially deep learning – have made breakthroughs which have turned science fiction into reality. Autonomous cars are almost possible, and machines can comprehend language. These technical advances are unprecedented, but they hinge on the availability of vast amounts of data.

Machine Learning

Machine Learning Deep Learning Medical Computer Science

Sat.Mar 23, 2019 - Fri.Mar 29, 2019

The Importance of Distributed Tracing for Apache-Kafka-Based Applications

Building An Enterprise Data Fabric At CluedIn

Webinars

Trending Sources

Building and Scaling Data Lineage at Netflix to Improve Data Infrastructure Reliability, and…

Webinars

Improving the User Experience with Uber’s Customer Obsession Ticket Routing Workflow and Orchestration Engine

A Guide to Debugging Apache Airflow® DAGs

Cloudera Altus Director on AWS Marketplace makes cloud deployment and billing easy

Consuming Messages Out of Apache Kafka in a Browser

6 Lessons for Women in Tech

Sign up to get articles personalized to your interests!

More Trending

6 Lessons for Women in Tech

From Schemaless Ingest to Smart Schema: Enabling SQL on Raw Data

Managing mortgage risk in an uncertain world

Consuming Messages Out of Apache Kafka in a Browser

Why You Get Faster Query Results with Teradata’s Adaptive Optimizer

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

A Story of Rust

Introducing Cloudera Edge Management and Cloudera Flow Management

Learning with Limited Labeled Data

Stay Connected