Top Data Engineering Digest Kafka AWS Content for July, 2020

July, 2020

Introducing Domain-Oriented Microservice Architecture

Uber Engineering

JULY 23, 2020

Introduction. Recently there has been substantial discussion around the downsides of service oriented architectures and microservice architectures in particular. While only a few years ago, many people readily adopted microservice architectures due to the numerous benefits they provide such as … The post Introducing Domain-Oriented Microservice Architecture appeared first on Uber Engineering Blog.

Architecture

Architecture Engineering

Doing Good with Data: Teradata's COVID-19 Resiliency Dashboard

Teradata

JULY 19, 2020

To help our customers navigate the world's new normal, our teams have created a business-centric, execution-focused tool – we call it the Resiliency Dashboard.

Data

Data IT

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Apache Kafka Native MQTT at Scale with Confluent Cloud and Waterstream

Confluent

JULY 15, 2020

With billions of Internet of Things (IoT) devices, achieving real-time interoperability has become a major challenge. Together, Confluent, Waterstream, and MQTT are accelerating Industry 4.0 with new Industrial IoT (IIoT) […].

Kafka

Kafka Cloud Programming

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Ensuring Data Quality, With Great Expectations

Start Data Engineering

JULY 26, 2020

What is data quality As the name suggest, it refers to the quality of our data. Quality should be defined based on your project requirements. It can be as simple as ensuring a certain column has only the allowed values present or falls within a given range of values to more complex cases like, when a certain column must match a specific regex pattern, fall within a standard deviation range, etc.

Data

Data Project IT

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Data

Build More Reliable Distributed Systems By Breaking Them With Jepsen

Data Engineering Podcast

JULY 27, 2020

Summary A majority of the scalable data processing platforms that we rely on are built as distributed systems. This brings with it a vast number of subtle ways that errors can creep in. Kyle Kingsbury created the Jepsen framework for testing the guarantees of distributed data processing systems and identifying when and why they break. In this episode he shares his approach to testing complex systems, the common challenges that are faced by engineers who build them, and why it is important to und

Systems

Systems Building Scala Java

Byte Down: Making Netflix’s Data Infrastructure Cost-Effective

Netflix Tech

JULY 8, 2020

By Torio Risianto, Bhargavi Reddy, Tanvi Sahni, Andrew Park Continue reading on Netflix TechBlog ».

Bytes

Bytes Data Cloud Storage AWS

The Differences Between Null, Nothing, Nil, None, and Unit in Scala

Rock the JVM

JULY 30, 2020

Discover the different flavors of 'nothing-ness' in Scala and how they impact your code

Scala

Scala Coding

More Trending

The Differences Between Null, Nothing, Nil, None, and Unit in Scala

Rock the JVM

JULY 30, 2020

Discover the different flavors of 'nothing-ness' in Scala and how they impact your code

Scala

Scala Coding

Return on Data – The New Valuation for Future Retail

Teradata

JULY 9, 2020

Today’s retailers face an abundance of data scattered across their organizations. The way forward is as much about having a strategic approach to data as it is about technology.

Retail

Retail Data Technology IT

Putting Several Event Types in the Same Topic – Revisited

Confluent

JULY 8, 2020

In the article Should You Put Several Event Types in the Same Kafka Topic?, Martin Kleppmann discusses when to combine several event types in the same topic and introduces new […].

Kafka

AWS RDS PostgreSQL Setup

Start Data Engineering

JULY 18, 2020

RDS AWS RDS is a managed service provided by AWS to run a relational database. We will see how to setup a postgres instance using AWS RDS. Log in to your AWS account. Go to Services -> RDS Click on Create Database, In the Create Database prompt, choose Standard Create option with PostgreSQL as engine type. In the Template section choose Free Tier and type in a DB Identifier, Master username and Master password.

PostgreSQL

PostgreSQL AWS Relational Database Database

Making Wind Energy More Efficient With Data At Turbit Systems

Data Engineering Podcast

JULY 20, 2020

Summary Wind energy is an important component of an ecologically friendly power system, but there are a number of variables that can affect the overall efficiency of the turbines. Michael Tegtmeier founded Turbit Systems to help operators of wind farms identify and correct problems that contribute to suboptimal power outputs. In this episode he shares the story of how he got started working with wind energy, the system that he has built to collect data from the individual turbines, and how he is

Systems

Systems Machine Learning Manufacturing Algorithm

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Empowering the Visual Effects Community with the NetFX Platform

Netflix Tech

JULY 17, 2020

The cloud-based platform allows vendors, artists and creators to connect and collaborate on visual effects (VFX) from anywhere in the… Continue reading on Netflix TechBlog ».

Cloud

The Differences Between Null, Nothing, Nil, None, and Unit in Scala

Rock the JVM

JULY 30, 2020

Discover the different flavors of 'nothing-ness' in Scala and how they impact your code

Scala

Scala Coding

The Importance of Data in UX Design

Teradata

JULY 14, 2020

The days are gone when defining a user experience was limited to the choice of designers. Now data plays a more important role in the design process than ever before.

Designing

Designing Data Process

Top 5 Reasons to Attend Kafka Summit Virtually

Confluent

JULY 17, 2020

The first-ever virtual Kafka Summit 2020 kicks off next month in the comfort of your home office, couch, spare bedroom, living room, outbuilding, lanai, veranda, or in-home portico, featuring an […].

Kafka

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Stitch S3 DB Integration

Start Data Engineering

JULY 18, 2020

Given Source S3 path and file delimiter data warehouse connection details (endpoint, port, username, password and database name) data warehouse schema name and table name Run frequency Steps Log into your stitch account, here Click on the Destination tab and use the data warehouse connection details to establish a destination database. Click on Add Integration button on your dashboard.

Data Warehouse

Data Warehouse Database Data

Open Source Production Grade Data Integration With Meltano

Data Engineering Podcast

JULY 13, 2020

Summary The first stage of every data pipeline is extracting the information from source systems. There are a number of platforms for managing data integration, but there is a notable lack of a robust and easy to use open source option. The Meltano project is aiming to provide a solution to that situation. In this episode, project lead Douwe Maan shares the history of how Meltano got started, the motivation for the recent shift in focus, and how it is implemented.

Data Integration

Data Integration Data Engineering Data Engineer Data

Machine Learning for a Better Developer Experience

Netflix Tech

JULY 20, 2020

Stanislav Kirdey , William High Imagine having to go through 2.5GB of log entries from a failed software build?—?3 million lines?—?to search for a bug or a regression that happened on line 1M. It’s probably not even doable manually! However, one smart approach to make it tractable might be to diff the lines against a recent successful build, with the hope that the bug produces unusual lines in the logs.

Machine Learning

Machine Learning Algorithm Data Science Building

Data Pipelines in the Healthcare Industry

DareData

JULY 29, 2020

The Challenges of Medical Data In recent times, there have been several developments in applications of machine learning to the medical industry. We have heard news of machine learning systems outperforming seasoned physicians on diagnosis accuracy, chatbots that present recommendations depending on your symptoms , or algorithms that can identify body parts from transversal image slices , just to name a few.

Data Pipeline

Data Pipeline Healthcare Medical Pipeline-centric

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineer

That Lockdown Feeling

Teradata

JULY 8, 2020

Don't impose an unnecessary lockdown on your data consumers by choosing the wrong data analytics platform. Choose Teradata Vantage to set them free. Read more.

Data Analytics

Data Analytics Data

I’ve Got the Key, I’ve Got the Secret. Here’s How Keys Work in ksqlDB 0.10.

Confluent

JULY 31, 2020

ksqlDB 0.10 includes significant changes and improvements to how keys are handled. This is part of a series of enhancements that began with support for non-VARCHAR keys and will ultimately […].

Process

Designing a "low-effort" ELT system, using stitch and dbt

Start Data Engineering

JULY 11, 2020

Intro A very common use case in data engineering is to build a ETL system for a data warehouse, to have data loaded in from multiple separate databases to enable data analysts/scientists to be able to run queries on this data, since the source databases are used by your applications and we do not want these analytic queries to affect our application performance and the source data is disconnected as shown below.

Systems

Systems Designing ETL System Data Warehouse

DataOps For Streaming Systems With Lenses.io

Data Engineering Podcast

JULY 6, 2020

Summary There are an increasing number of use cases for real time data, and the systems to power them are becoming more mature. Once you have a streaming platform up and running you need a way to keep an eye on it, including observability, discovery, and governance of your data. That’s what the Lenses.io DataOps platform is built for. In this episode CTO Andrew Stevenson discusses the challenges that arise from building decoupled systems, the benefits of using SQL as the common interface f

Systems

Systems Kafka SQL Data Engineering

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

Cloud

Unbundling Data Science Workflows with Metaflow and AWS Step Functions

Netflix Tech

JULY 29, 2020

by David Berg, Ravi Kiran Chirravuri, Romain Cledat, Jason Ge, Savin Goyal, Ferras Hamad, Ville Tuulos Continue reading on Netflix TechBlog ».

AWS

AWS Data Science Data Machine Learning

Quick Reports: Xero to Power BI

FreshBI

JULY 27, 2020

The objective of this blog To give you the tools and the skills to connect to Xero Accounting from the Power BI Desktop and to have immediate access to the categorized data that drives each of the built-in reports in Xero. What you need to get started To get quick immediate access to the data that drives the Xero Reports and push them into Power BI, you’ll need 3 tools : Power BI Desktop : Download here>> ‘Quick Reports’ Power BI Custom Connector for Xero AND Power BI Quick Reports Templ

BI Data Consolidation Banking Business Intelligence

Advancing the Telecom Industry through Network Experience Analytics

Teradata

JULY 26, 2020

For today's Telco providers, new products & services are all driven by the end consumer's experience. That's where Teradata's Network Experience Analytics comes to play.

Project Metamorphosis Month 3: Infinite Storage in Confluent Cloud for Apache Kafka

Confluent

JULY 1, 2020

This is the third month of Project Metamorphosis, where we discuss new features in Confluent’s offerings that bring together event streams and the best characteristics of modern cloud data systems. […].

Project

Project Cloud Kafka Systems

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

Business Intelligence

Stitch Database to data warehouse Integration

Start Data Engineering

JULY 18, 2020

Given Source database connection details (endpoint, port, username, password and database name) Source table to replicate destination schema name run frequency can be set to 10min We are assuming the destination data warehouse is already setup in stitch. Steps Log into your stitch account. here Click on Add Integration button on your dashboard. Choose PostgreSQL option as the integration in the next page.

Data Warehouse

Data Warehouse Database PostgreSQL Data

How To Build A Live-Updating COVID Dashboard Using Google Sheets and Apache Superset

Preset

JULY 27, 2020

The powerful combination of Google Sheets and Apache Superset

Building

Improving MongoDB Read Performance - Indexing, Replication and Sharding

Rockset

JULY 23, 2020

Read performance is crucial for databases. If it takes too long to read a record from a database, this can stall the request for data from the client application, which could result in unexpected behavior and adversely impact user experience. For these reasons, the read operation on your database should last no more than a fraction of a second. There are a number of ways to improve database read performance, though not all of these methods will work for every type of application.

MongoDB

MongoDB Database Project SQL

Sharing Code in Next.JS Apps with Plugins

Grouparoo

JULY 22, 2020

At Grouparoo, our front-end website is built using React and Next.js. Next.js is an excellent tool made by Vercel that handles all the hard parts of making a React app for you - Routing, Server-side Rendering, Page Hydration and more. It includes a simple starting place to build your routes and pages, based on the file system. If you want a /about page, just make an /pages/about.tsx file!

Coding

Coding Project Building Process

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

July, 2020

Introducing Domain-Oriented Microservice Architecture

Doing Good with Data: Teradata's COVID-19 Resiliency Dashboard

Webinars

Trending Sources

Apache Kafka Native MQTT at Scale with Confluent Cloud and Waterstream

Webinars

Ensuring Data Quality, With Great Expectations

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Build More Reliable Distributed Systems By Breaking Them With Jepsen

Byte Down: Making Netflix’s Data Infrastructure Cost-Effective

The Differences Between Null, Nothing, Nil, None, and Unit in Scala

Sign up to get articles personalized to your interests!

More Trending

The Differences Between Null, Nothing, Nil, None, and Unit in Scala

Return on Data – The New Valuation for Future Retail

Putting Several Event Types in the Same Topic – Revisited

AWS RDS PostgreSQL Setup

Making Wind Energy More Efficient With Data At Turbit Systems

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Empowering the Visual Effects Community with the NetFX Platform

The Differences Between Null, Nothing, Nil, None, and Unit in Scala

The Importance of Data in UX Design

Top 5 Reasons to Attend Kafka Summit Virtually

How to Modernize Manufacturing Without Losing Control

Stitch S3 DB Integration

Open Source Production Grade Data Integration With Meltano

Machine Learning for a Better Developer Experience

Data Pipelines in the Healthcare Industry

The Ultimate Guide to Apache Airflow DAGS

That Lockdown Feeling

I’ve Got the Key, I’ve Got the Secret. Here’s How Keys Work in ksqlDB 0.10.

Designing a "low-effort" ELT system, using stitch and dbt

DataOps For Streaming Systems With Lenses.io

Optimizing The Modern Developer Experience with Coder

Unbundling Data Science Workflows with Metaflow and AWS Step Functions

Quick Reports: Xero to Power BI

Advancing the Telecom Industry through Network Experience Analytics

Project Metamorphosis Month 3: Infinite Storage in Confluent Cloud for Apache Kafka

15 Modern Use Cases for Enterprise Business Intelligence

Stitch Database to data warehouse Integration

How To Build A Live-Updating COVID Dashboard Using Google Sheets and Apache Superset

Improving MongoDB Read Performance - Indexing, Replication and Sharding

Sharing Code in Next.JS Apps with Plugins

Apache Airflow® Best Practices: DAG Writing

Stay Connected