Top Data Engineering Digest Data Consolidation Data Ingestion Content for February, 2021

February, 2021

Node.js ❤️ Apache Kafka – Getting Started with KafkaJS

Confluent

FEBRUARY 8, 2021

One of the great things about using an Apache Kafka® based architecture is that it naturally decouples systems and allows you to use the best tool for the job. While […].

Kafka

Kafka Architecture Systems IT

Build your data pipelines like the Toyota Way

François Nguyen

FEBRUARY 28, 2021

If there is one only book to read about lean manufacturing, this is the one. This is the kind of book you can read again and again and still learn something about your current context. It is also a book you can read whatever your industry, you will always find situations covered by this book. Today, we are going to apply these principles to the data pipelines. “The right process will deliver the right results” – Totoya way (section II) In the 14 Toyota way principles, you have

Data Pipeline

Data Pipeline Building Manufacturing BI

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

How to set up a dbt data-ops workflow, using dbt cloud and Snowflake

Start Data Engineering

FEBRUARY 28, 2021

Introduction Pre-requisites Setting up the data-ops pipeline Snowflake Local development environment dbt cloud Connect to Snowflake Link to github repository Setup deployment(release/prod) environment Setup CI PR -> CI -> merge cycle Schedule jobs Host data documentation Conclusion and next steps Further reading References Introduction With companies realizing the importance of having correct data, there has been a lot of attention on the data-ops side of things.

Cloud

Cloud Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Why Data Capabilities Follow Up a Digital Transformation

Team Data Science

FEBRUARY 23, 2021

Companies can now make data useful to elevate decision making and to optimise products and processes. But what organizational capabilities are necessary and how to get started? It's currently easy to acquire data strategically. First, consider that smartphones function like questionnaires that customers are frequently filling out in a passive or active manner [ , 1 ].

Business Intelligence

Business Intelligence Food Unstructured Data Relational Database

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Next Stop – Building a Data Pipeline from Edge to Insight

Cloudera

FEBRUARY 8, 2021

This is part 2 in this blog series. You can read part 1, here: Digital Transformation is a Data Journey From Edge to Insight. This blog series follows the manufacturing, operations and sales data for a connected vehicle manufacturer as the data goes through stages and transformations typically experienced in a large manufacturing company on the leading edge of current technology.

Data Pipeline

Data Pipeline Building Manufacturing Data Warehouse

Is Your Data Holding You Back Instead of Driving You Forward?

Teradata

FEBRUARY 9, 2021

Everyone knows that data is vital for success in retail. But without a clear data strategy, retailers often eat up resources fighting small-scale battles, whilst gradually losing the war.

Retail

Retail Data

42 Things You Can Stop Doing Once ZooKeeper Is Gone from Apache Kafka

Confluent

FEBRUARY 18, 2021

Soon, Apache Kafka® will no longer need ZooKeeper! With KIP-500, Kafka will include its own built-in consensus layer, removing the ZooKeeper dependency altogether. The next big milestone in this effort […].

Kafka

Kafka IT

More Trending

42 Things You Can Stop Doing Once ZooKeeper Is Gone from Apache Kafka

Confluent

FEBRUARY 18, 2021

Kafka

Kafka IT

The rise and fall of the Agile Spotify Model

François Nguyen

FEBRUARY 21, 2021

If you are working in the tech field, I think you have already heard of Squads, Tribes, Chapters or Guild. It comes from Spotify, a swedish audio streaming company.If you are organizing #datateams, it could be tempting to copy/paste. You should really not ! The Spotify Model and Engineering Culture If you want to go back to the original article, it his here.

Engineering

Engineering Technology Management Building

Self Service Open Source Data Integration With AirByte

Data Engineering Podcast

FEBRUARY 22, 2021

Summary Data integration is a critical piece of every data pipeline, yet it is still far from being a solved problem. There are a number of managed platforms available, but the list of options for an open source system that supports a large variety of sources and destinations is still embarrasingly short. The team at Airbyte is adding a new entry to that list with the goal of making robust and easy to use data integration more accessible to teams who want or need to maintain full control of thei

Data Integration

Data Integration Data Warehouse Data Pipeline BI

Apache Superset Tutorial

Start Data Engineering

FEBRUARY 13, 2021

Why data exploration Apache Superset architecture Setup Prerequisites Seed data Using Apache Superset 1. Connecting to a data warehouse 2. Querying data in SQL Lab 3. Creating a chart 4. Creating a dashboard Pros and Cons Pros Cons Conclusion Why data exploration In most companies the end users of a data warehouse are analysts, data scientists and business people.

Data Warehouse

Data Warehouse SQL Architecture Data

#ClouderaLife Spotlight: Kevin Smith, Staff Customer Operations Engineer

Cloudera

FEBRUARY 23, 2021

Meet Kevin Smith, a Staff Customer Operations Engineer within the US Public Sector support team. He sums up his day-to-day by saying he works directly with clients on technical cases and provides support and guidance as they troubleshoot unexpected behavior. He also serves as a member of several project teams focusing on upgrade experiences, internal tools, product testing, training, and documentation.

Engineering

Engineering Education Technology Project

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Open Sourcing the Netflix Domain Graph Service Framework: GraphQL for Spring Boot

Netflix Tech

FEBRUARY 3, 2021

By Paul Bakker and Kavitha Srinivasan , Images by David Simmer , Edited by Greg Burrell Netflix has developed a Domain Graph Service (DGS) framework and it is now open source. The DGS framework simplifies the implementation of GraphQL, both for standalone and federated GraphQL services. Our framework is battle-hardened by our use at scale. By open-sourcing the project, we hope to contribute to the Java and GraphQL communities and learn from and collaborate with everyone who will be using the fra

Java

Java Architecture Coding Designing

Lessons Learned from Running Apache Kafka at Scale at Pinterest

Confluent

FEBRUARY 22, 2021

Apache Kafka® is at the heart of the data transportation layer at Pinterest. The amount of data that runs through Kafka has constantly grown over the years. This growth sometimes […].

Kafka

Kafka Transportation Data

Is Devops the future of Agile ?

François Nguyen

FEBRUARY 14, 2021

Let’s start with maybe the best definition you can find on Devops (credit to AWS ) : “DevOps is the combination of cultural philosophies , practices , and tools that increases an organization’s ability to deliver applications and services at high velocity : evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes.

AWS

AWS Programming Process Building

Building The Foundations For Data Driven Businesses at 5xData

Data Engineering Podcast

FEBRUARY 15, 2021

Summary Every business aims to be data driven, but not all of them succeed in that effort. In order to be able to truly derive insights from the data that an organization collects, there are certain foundational capabilities that they need to have capacity for. In order to help more businesses build those foundations, Tarush Aggarwal created 5xData, offering collaborative workshops to assist in setting up the technical and organizational systems that are necessary to succeed.

Building

Building Data Warehouse BI Data Pipeline

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

How to Join a fact and a type 2 dimension (SCD2) table

Start Data Engineering

FEBRUARY 7, 2021

Introduction What is an SCD2 table and why use it? Application table Dimension table Setup Joining fact and SCD2 tables high_spenders user_items Educating end users Conclusion Further reading Introduction If you are using a data warehouse, you would have heard of fact and dimension tables. Simply put, fact tables are used to record a business event and dimension tables are used to record the attributes of business items(eg user, item tables in an e-commerce app).

Data Warehouse

Data Warehouse Education IT Data

Data – the Octane Accelerating Intelligent Connected Vehicles

Cloudera

FEBRUARY 8, 2021

The digital revolution is making a deep impact on the automotive industry, offering practically unlimited possibilities for more efficient, convenient, and safe driving and travel experiences in connected vehicles. This revolution is just beginning to accelerate – in fact, according to a recent Applied Market Research study, the global connected car market was valued at $63.03 billion in 2019, and is projected to reach $225.16 billion by 2027, registering a CAGR of 17.1% from 2020 to 2027.

Manufacturing

Manufacturing Machine Learning Data Ingestion Electronics

Pitching a DataOps Project That Matters

DataKitchen

FEBRUARY 1, 2021

Every DataOps initiative starts with a pilot project. How do you choose a project that matters to people? DataOps addresses a broad set of use cases because it applies workflow process automation to the end-to-end data-analytics lifecycle. DataOps reduces errors, shortens cycle time, eliminates unplanned work, increases innovation, improves teamwork, and more.

Project

Project Raw Data Data Science Consulting

Introducing Confluent Platform 6.1

Confluent

FEBRUARY 10, 2021

We are pleased to announce the release of Confluent Platform 6.1. With this release, we are further simplifying management tasks for Apache Kafka® operators and providing even higher availability for […].

Kafka

Kafka Management Cloud

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Hawkins: Diving into the Reasoning Behind our Design System

Netflix Tech

FEBRUARY 10, 2021

Stranger Things imagery showcasing the inspiration for the Hawkins Design System by Hawkins team member Joshua Godi ; with art contributions by Wiki Chaves Hawkins may be the name of a fictional town in Indiana, most widely known as the backdrop for one of Netflix’s most popular TV series “Stranger Things,” but the name is so much more. Hawkins is the namesake that established the basis for a design system used across the Netflix Studio ecosystem.

Designing

Designing Systems Building Entertainment

How Shopify Is Building Their Production Data Warehouse Using DBT

Data Engineering Podcast

FEBRUARY 8, 2021

Summary With all of the tools and services available for building a data platform it can be difficult to separate the signal from the noise. One of the best ways to get a true understanding of how a technology works in practice is to hear from people who are running it in production. In this episode Zeeshan Qureshi and Michelle Ark share their experiences using DBT to manage the data warehouse for Shopify.

Data Warehouse

Data Warehouse Building BI SQL

Teradata Has Been Named One of the World's Most Ethical Companies 2021

Teradata

FEBRUARY 22, 2021

Teradata has again been recognized as one of the World’s Most Ethical Companies, for 12th consecutive year! Read more.

Data, The Unsung Hero of the Covid-19 Solution

Cloudera

FEBRUARY 3, 2021

COVID-19 vaccines from various manufacturers are being approved by more countries, but that doesn’t mean that they will be available at your local pharmacy or mass vaccination centers anytime soon. Creating, scaling-up and manufacturing the vaccine is just the first step, now the world needs to coordinate an incredible and complex supply chain system to deliver more vaccines to more places than ever before.

Manufacturing

Manufacturing Transportation Pharmaceutical Data Consolidation

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

Cloud

How DataOps Kitchens Enable Version Control

DataKitchen

FEBRUARY 4, 2021

This blog builds on earlier posts that defined Kitchens and showed how they map to technical environments. We’ve also discussed how toolchains are segmented to support multiple kitchens. DataOps automates the source code integration, release, and deployment workflows related to analytics development. To use software dev terminology, DataOps supports continuous integration, continuous delivery, and continuous deployment.

Coding

Coding Project Data Analytics Algorithm

Oracle CDC Source Premium Connector is Now Generally Available

Confluent

FEBRUARY 16, 2021

One of the most common relational database systems that connects to Apache Kafka® is Oracle, which often holds highly critical enterprise transaction workloads. While Oracle Database (DB) excels at many […].

Relational Database

Relational Database Kafka Database Systems

Edge Authentication and Token-Agnostic Identity Propagation

Netflix Tech

FEBRUARY 9, 2021

by AIM Team Members Karen Casella , Travis Nelson , Sunny Singh ; with prior art and contributions by Justin Ryan , Satyajit Thadeshwar As most developers can attest, dealing with security protocols and identity tokens, as well as user and device authentication, can be challenging. Imagine having multiple protocols, multiple tokens, 200M+ users, and thousands of device types, and the problem can explode in scope.

Architecture

Architecture Bytes Systems Accessible

System Observability For The Cloud Native Era With Chronosphere

Data Engineering Podcast

FEBRUARY 1, 2021

Summary Collecting and processing metrics for monitoring use cases is an interesting data problem. It is eminently possible to generate millions or billions of data points per second, the information needs to be propagated to a central location, processed, and analyzed in timeframes on the order of milliseconds or single-digit seconds, and the consumers of the data need to be able to query the information quickly and flexibly.

Systems

Systems Cloud BI Data Warehouse

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

Business Intelligence

How I Built an Algorithm to Help Doctors Fight COVID-19

Teradata

FEBRUARY 3, 2021

Read how a principal data scientist at Teradata leveraged his cross-industry expertise to build an algorithm to help doctors better understand & fight COVID-19.

Algorithm

Algorithm Building Data

Express Cloudera POV on 2021 data trends in insurance

Cloudera

FEBRUARY 18, 2021

Almost a year into the pandemic, the accelerated digital transformation has begun to feel less abrupt and more sustained. 2021 looks likely to be defined by a new phase: Thriving on digital transformation, rather than just surviving through it. . We’ve written about the changes forced on the traditionally risk-averse insurance industry by COVID-19. In 2021, with the crisis hopefully fading, insurance will have time to evaluate the changes made in 2020, assessing what worked and what didn’t

Insurance

Insurance Cloud Cloud Computing Data

Organizing Services with ZIO and ZLayers

Rock the JVM

FEBRUARY 28, 2021

ZIO layers (ZLayers) help structure complex services into independent, composable, and easy-to-understand modules: discover how they can simplify your architecture

Architecture

Keys in ksqlDB, Unlocked

Confluent

FEBRUARY 19, 2021

One of the most highly requested enhancements to ksqlDB is here! Apache Kafka® messages may contain data in message keys as well as message values. Until now, ksqlDB could only […].

Kafka

Kafka Data Process

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

February, 2021

Node.js ❤️ Apache Kafka – Getting Started with KafkaJS

Build your data pipelines like the Toyota Way

Webinars

Trending Sources

How to set up a dbt data-ops workflow, using dbt cloud and Snowflake

Webinars

Why Data Capabilities Follow Up a Digital Transformation

A Guide to Debugging Apache Airflow® DAGs

Next Stop – Building a Data Pipeline from Edge to Insight

Is Your Data Holding You Back Instead of Driving You Forward?

42 Things You Can Stop Doing Once ZooKeeper Is Gone from Apache Kafka

Sign up to get articles personalized to your interests!

More Trending

42 Things You Can Stop Doing Once ZooKeeper Is Gone from Apache Kafka

The rise and fall of the Agile Spotify Model

Self Service Open Source Data Integration With AirByte

Apache Superset Tutorial

#ClouderaLife Spotlight: Kevin Smith, Staff Customer Operations Engineer

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Open Sourcing the Netflix Domain Graph Service Framework: GraphQL for Spring Boot

Lessons Learned from Running Apache Kafka at Scale at Pinterest

Is Devops the future of Agile ?

Building The Foundations For Data Driven Businesses at 5xData

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Join a fact and a type 2 dimension (SCD2) table

Data – the Octane Accelerating Intelligent Connected Vehicles

Pitching a DataOps Project That Matters

Introducing Confluent Platform 6.1

How to Modernize Manufacturing Without Losing Control

Hawkins: Diving into the Reasoning Behind our Design System

How Shopify Is Building Their Production Data Warehouse Using DBT

Teradata Has Been Named One of the World's Most Ethical Companies 2021

Data, The Unsung Hero of the Covid-19 Solution

Optimizing The Modern Developer Experience with Coder

How DataOps Kitchens Enable Version Control

Oracle CDC Source Premium Connector is Now Generally Available

Edge Authentication and Token-Agnostic Identity Propagation

System Observability For The Cloud Native Era With Chronosphere

15 Modern Use Cases for Enterprise Business Intelligence

How I Built an Algorithm to Help Doctors Fight COVID-19

Express Cloudera POV on 2021 data trends in insurance

Organizing Services with ZIO and ZLayers

Keys in ksqlDB, Unlocked

The Ultimate Guide to Apache Airflow DAGS

Stay Connected