Top Data Engineering Digest Data Schemas Data Ingestion Content for Week of Feb 06

Sat.Feb 06, 2021 - Fri.Feb 12, 2021

Node.js ❤️ Apache Kafka – Getting Started with KafkaJS

Confluent

FEBRUARY 8, 2021

One of the great things about using an Apache Kafka® based architecture is that it naturally decouples systems and allows you to use the best tool for the job. While […].

Kafka

Kafka Architecture Systems IT

Next Stop – Building a Data Pipeline from Edge to Insight

Cloudera

FEBRUARY 8, 2021

This is part 2 in this blog series. You can read part 1, here: Digital Transformation is a Data Journey From Edge to Insight. This blog series follows the manufacturing, operations and sales data for a connected vehicle manufacturer as the data goes through stages and transformations typically experienced in a large manufacturing company on the leading edge of current technology.

Data Pipeline

Data Pipeline Building Manufacturing Data Warehouse

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Is Your Data Holding You Back Instead of Driving You Forward?

Teradata

FEBRUARY 9, 2021

Everyone knows that data is vital for success in retail. But without a clear data strategy, retailers often eat up resources fighting small-scale battles, whilst gradually losing the war.

Retail

Retail Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How Shopify Is Building Their Production Data Warehouse Using DBT

Data Engineering Podcast

FEBRUARY 8, 2021

Summary With all of the tools and services available for building a data platform it can be difficult to separate the signal from the noise. One of the best ways to get a true understanding of how a technology works in practice is to hear from people who are running it in production. In this episode Zeeshan Qureshi and Michelle Ark share their experiences using DBT to manage the data warehouse for Shopify.

Data Warehouse

Data Warehouse Building BI SQL

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Introducing Confluent Platform 6.1

Confluent

FEBRUARY 10, 2021

We are pleased to announce the release of Confluent Platform 6.1. With this release, we are further simplifying management tasks for Apache Kafka® operators and providing even higher availability for […].

Kafka

Kafka Management Cloud

Data – the Octane Accelerating Intelligent Connected Vehicles

Cloudera

FEBRUARY 8, 2021

The digital revolution is making a deep impact on the automotive industry, offering practically unlimited possibilities for more efficient, convenient, and safe driving and travel experiences in connected vehicles. This revolution is just beginning to accelerate – in fact, according to a recent Applied Market Research study, the global connected car market was valued at $63.03 billion in 2019, and is projected to reach $225.16 billion by 2027, registering a CAGR of 17.1% from 2020 to 2027.

Manufacturing

Manufacturing Machine Learning Data Ingestion Electronics

How to Join a fact and a type 2 dimension (SCD2) table

Start Data Engineering

FEBRUARY 7, 2021

Introduction What is an SCD2 table and why use it? Application table Dimension table Setup Joining fact and SCD2 tables high_spenders user_items Educating end users Conclusion Further reading Introduction If you are using a data warehouse, you would have heard of fact and dimension tables. Simply put, fact tables are used to record a business event and dimension tables are used to record the attributes of business items(eg user, item tables in an e-commerce app).

Data Warehouse

Data Warehouse Education IT Data

More Trending

How to Join a fact and a type 2 dimension (SCD2) table

Start Data Engineering

FEBRUARY 7, 2021

Data Warehouse

Data Warehouse Education IT Data

Hawkins: Diving into the Reasoning Behind our Design System

Netflix Tech

FEBRUARY 10, 2021

Stranger Things imagery showcasing the inspiration for the Hawkins Design System by Hawkins team member Joshua Godi ; with art contributions by Wiki Chaves Hawkins may be the name of a fictional town in Indiana, most widely known as the backdrop for one of Netflix’s most popular TV series “Stranger Things,” but the name is so much more. Hawkins is the namesake that established the basis for a design system used across the Netflix Studio ecosystem.

Designing

Designing Systems Building Entertainment

How to Write a Connector for Kafka Connect – Deep Dive into Configuration Handling

Confluent

FEBRUARY 9, 2021

Kafka Connect is part of Apache Kafka®, providing streaming integration of external systems in and out of Kafka. There are a large number of existing connectors, and you can also […].

Kafka

Kafka Systems

Cloudera Operational Database application development concepts

Cloudera

FEBRUARY 9, 2021

Cloudera Operational Database is now available in three different form-factors in Cloudera Data Platform (CDP). . If you are new to Cloudera Operational Database, see this blog post. And, check out the documentation here. . In this blog post, we’ll look at both Apache HBase and Apache Phoenix concepts relevant to developing applications for Cloudera Operational Database.

Database

Database Java SQL Data Ingestion

From Product Cycle to Digital Thread

Teradata

FEBRUARY 7, 2021

In order to survive, the auto industry needs to leverage 'digital threads’ that connect data from customers to dealers to products, & link R&D to production line & the aftermarket.

Data

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Edge Authentication and Token-Agnostic Identity Propagation

Netflix Tech

FEBRUARY 9, 2021

by AIM Team Members Karen Casella , Travis Nelson , Sunny Singh ; with prior art and contributions by Justin Ryan , Satyajit Thadeshwar As most developers can attest, dealing with security protocols and identity tokens, as well as user and device authentication, can be challenging. Imagine having multiple protocols, multiple tokens, 200M+ users, and thousands of device types, and the problem can explode in scope.

Architecture

Architecture Bytes Systems Accessible

Automatic Observer Promotion Brings Fast and Safe Multi-Datacenter Failover with Confluent Platform 6.1

Confluent

FEBRUARY 11, 2021

Persisting data in multiple regions has become crucial for modern businesses: They need their mission-critical data to be protected from accidents and disasters. They can achieve this goal by running […].

Data

Data Kafka

Coffee with Cloudera: Vinita Srivalsan

Cloudera

FEBRUARY 11, 2021

Meet Vinita Srivalsan, the powerhouse leader of the Partner Marketing team. Since this is Coffee with Cloudera, what’s your morning pick-me-up drink? I am a Chai person through and through and make it the traditional Indian way with milk and sugar! . What makes your role at Cloudera unique? . Partner Marketing is uniquely positioned to be the voice of Cloudera within a partner organization, and to represent the partner within Cloudera.

Portfolio

Portfolio Cloud Technology Data Analytics

Why Artificial Intelligence May Not Offer the Business Value You Think

DataKitchen

FEBRUARY 12, 2021

The post Why Artificial Intelligence May Not Offer the Business Value You Think first appeared on DataKitchen.

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Growth Engineering at Netflix?—?Automated Imagery Generation

Netflix Tech

FEBRUARY 9, 2021

Growth Engineering at Netflix?—?Automated Imagery Generation by Eric Eiswerth Background There’s a good chance you’ve probably visited the Netflix homepage. In the Growth Engineering team, we refer to this as the top of the signup funnel. For more background on the signup funnel and Growth Engineering’s role in the signup funnel, please read our initial post on the topic: Growth Engineering at Netflix?

Engineering

Engineering Entertainment Metadata Designing

Gifit: Turn Screen Recordings into GIFs

Grouparoo

FEBRUARY 11, 2021

When building Grouparoo, the Grouparoo team often shares screen recordings of our work with each other. In many cases, the tools we are using (like Github, until recently anyway) could only embed image content into READMEs and Pull Requests. That meant that the humble animated gif was often the best way to share a video. Here is my personal script called gifit which uses the open source ffmpeg and gifsicle tools to make it super easy to convert any video file into an easy-to-share gif!

Building

Building IT

Using COD and CML to build applications that predict stock data

Cloudera

FEBRUARY 8, 2021

No, not really. You probably won’t be rich unless you work really hard… As nice as it would be, you can’t really predict a stock price based on ML solely, but now I have your attention! . Continuing from my previous blog post about how awesome and easy it is to develop web-based applications backed by Cloudera Operational Database (COD), I started a small project to integrate COD with another CDP cloud experience, Cloudera Machine Learning (CML). .

Building

Building Machine Learning Database Algorithm

What is Teradata Unity and Why Do You Need It?

Teradata

FEBRUARY 11, 2021

Learn more about Teradata Unity, a powerful portfolio for high availability and data synchronization in a Teradata-powered analytical ecosystem.

IT Portfolio Data

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Growth Engineering at Netflix- Creating a Scalable Offers Platform

Netflix Tech

FEBRUARY 9, 2021

by Eric Eiswerth Background Netflix has been offering streaming video-on-demand (SVOD) for over 10 years. Throughout that time we’ve primarily relied on 3 plans (Basic, Standard, & Premium), combined with the 30-day free trial to drive global customer acquisition. The world has changed a lot in this time. Competition for people’s leisure time has increased, the device ecosystem has grown phenomenally, and consumers want to watch premium content whenever they want, wherever they are, and on w

Engineering

Engineering Architecture Entertainment Designing

Better Understand Your Geospatial Data - PostGIS GeoJSON

Preset

FEBRUARY 10, 2021

Apache Superset™ can visualize your geodata stored in Postgres | PostGIS GeoJSON

Data

A Cost-Effective Data Warehouse Solution in CDP Public Cloud – Part1

Cloudera

FEBRUARY 9, 2021

Today’s customers have a growing need for a faster end to end data ingestion to meet the expected speed of insights and overall business demand. This ‘need for speed’ drives a rethink on building a more modern data warehouse solution, one that balances speed with platform cost management, performance, and reliability. A typical approach that we have seen in customers’ environments is that ETL applications pull data with a frequency of minutes and land it into HDFS storage as an extra Hive table

Data Warehouse

Data Warehouse Cloud Kafka Cloud Storage

Monte Carlo Raises $25M Series B to Help Companies Achieve More Reliable Data

Monte Carlo

FEBRUARY 9, 2021

In 2021, data is your company’s most critical asset. As data pipelines become increasingly complex and companies ingest more and more data, it’s paramount that this data is reliable. After talking to hundreds of data teams over the past few years, I was struck by the fact that organizations were investing millions of dollars and strategic energy in data, but decision makers and others on the frontlines couldn’t use it or didn’t trust it.

Insurance

Insurance Data Pipeline Retail Finance

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

How to Configure Your dbt Repository (One or Many)?

dbt Developer Hub

FEBRUARY 8, 2021

At dbt Labs, as more folks adopt dbt, we have started to see more and more use cases that push the boundaries of our established best practices. This is especially true to those adopting dbt in the enterprise space. After two years of helping companies from 20-10,000+ employees implement dbt & dbt Cloud, the below is my best attempt to answer the question: “Should I have one repository for my dbt project or many?

SQL

SQL Raw Data Project Finance

Industry Leader Q&A with DataKitchen’s Chris Bergh

DataKitchen

FEBRUARY 8, 2021

Fine-Grained Authorization with Apache Kudu and Apache Ranger

Cloudera

FEBRUARY 11, 2021

When Kudu was first introduced as a part of CDH in 2017, it didn’t support any kind of authorization so only air-gapped and non-secure use cases were satisfied. Coarse-grained authorization was added along with authentication in CDH 5.11 (Kudu 1.3.0) which made it possible to restrict access only to Apache Impala where Apache Sentry policies could be applied, enabling a lot more use cases.

Hadoop

Hadoop Java Metadata Database

Node.js Memory Error on Mac Using M1

Grouparoo

FEBRUARY 7, 2021

I was working with our fancy new CLI tool with my fancy new MacBook Pro with the M1 chip when I came across this scary error, courtesy of Node.js: FATAL ERROR: wasm code commit Allocation failed - process out of memory It began occurring regularly enough that I started digging. I've since come across two methods for solving this issue. Method #1: Upgrade to Node v15 I found this discussion which noted that Node.js versions prior to v15 do not natively support the Apple M1 chip.

Coding

Coding Process Management Systems

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

Data Observability: How to Build Your Own Data Anomaly Detectors Using SQL

Monte Carlo

FEBRUARY 11, 2021

In this article series, we walk through how you can create your own data observability monitors and data anomaly detectors from scratch, mapping to five key pillars of data health. Part I can be found here. Part II of this series was adapted from Barr Moses and Ryan Kearns’ O’Reilly training, Managing Data Downtime: Applying Observability to Your Data Pipelines , the industry’s first-ever course on data observability.

SQL

SQL Building Metadata Data

7-Step Guide to Become a Machine Learning Engineer in 2023

ProjectPro

FEBRUARY 11, 2021

Spoiler Alert: Becoming a machine learning engineer can sound like a hard-to-reach goal but let us tell you the truth – it isn’t as hard as it seems. And yes, we’re talking to you - the person who’s reading this because they’re probably wondering what is a machine learning engineer, what does a machine learning engineer do, how to become a machine learning engineer , and, more importantly, whether they can pull it off.

Machine Learning

Machine Learning Engineering Programming Language Portfolio

Differentiation Through DataOps in Financial Services

DataKitchen

FEBRUARY 11, 2021

The post Differentiation Through DataOps in Financial Services first appeared on DataKitchen.

Find out what challenges Customer Conversion solves at Zalando

Zalando Engineering

FEBRUARY 10, 2021

When our Hiring Sprint kicks off next month, we will be looking for great professionals to join some of our stellar teams – Shopping Cart, Checkout, Sales Orders and Returns. All meaningful segments of our Customer Conversion organization, these teams are responsible for forging and shaping some of the most relevant experiences in Zalando customer journey.

Retail

Retail Engineering Building Designing

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineer

Sat.Feb 06, 2021 - Fri.Feb 12, 2021

Node.js ❤️ Apache Kafka – Getting Started with KafkaJS

Next Stop – Building a Data Pipeline from Edge to Insight

Webinars

Trending Sources

Is Your Data Holding You Back Instead of Driving You Forward?

Webinars

How Shopify Is Building Their Production Data Warehouse Using DBT

A Guide to Debugging Apache Airflow® DAGs

Introducing Confluent Platform 6.1

Data – the Octane Accelerating Intelligent Connected Vehicles

How to Join a fact and a type 2 dimension (SCD2) table

Sign up to get articles personalized to your interests!

More Trending

How to Join a fact and a type 2 dimension (SCD2) table

Hawkins: Diving into the Reasoning Behind our Design System

How to Write a Connector for Kafka Connect – Deep Dive into Configuration Handling

Cloudera Operational Database application development concepts

From Product Cycle to Digital Thread

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Edge Authentication and Token-Agnostic Identity Propagation

Automatic Observer Promotion Brings Fast and Safe Multi-Datacenter Failover with Confluent Platform 6.1

Coffee with Cloudera: Vinita Srivalsan

Why Artificial Intelligence May Not Offer the Business Value You Think

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Growth Engineering at Netflix?—?Automated Imagery Generation

Gifit: Turn Screen Recordings into GIFs

Using COD and CML to build applications that predict stock data

What is Teradata Unity and Why Do You Need It?

How to Modernize Manufacturing Without Losing Control

Growth Engineering at Netflix- Creating a Scalable Offers Platform

Better Understand Your Geospatial Data - PostGIS GeoJSON

A Cost-Effective Data Warehouse Solution in CDP Public Cloud – Part1

Monte Carlo Raises $25M Series B to Help Companies Achieve More Reliable Data

The Ultimate Guide to Apache Airflow DAGS

How to Configure Your dbt Repository (One or Many)?

Industry Leader Q&A with DataKitchen’s Chris Bergh

Fine-Grained Authorization with Apache Kudu and Apache Ranger

Node.js Memory Error on Mac Using M1

Apache Airflow® Best Practices: DAG Writing

Data Observability: How to Build Your Own Data Anomaly Detectors Using SQL

7-Step Guide to Become a Machine Learning Engineer in 2023

Differentiation Through DataOps in Financial Services

Find out what challenges Customer Conversion solves at Zalando

How to Achieve High-Accuracy Results When Using LLMs

Stay Connected