Top Data Engineering Digest Computer Science Cloud Content for October, 2021

October, 2021

How to add tests to your data pipelines

Start Data Engineering

OCTOBER 12, 2021

Introduction Testing your data pipeline 1. End-to-end system testing 2. Data quality testing 3. Monitoring and alerting 4. Unit and contract testing Conclusion Further reading Introduction Testing data pipelines are different from testing other applications, like a website backend.

Data Pipeline

Data Pipeline Data Systems

Tech workers warned they were going to quit. Now, the problem is spiralling out of control

DataKitchen

OCTOBER 22, 2021

The post Tech workers warned they were going to quit. Now, the problem is spiralling out of control first appeared on DataKitchen.

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Introducing uGroup: Uber’s Consumer Management Framework

Uber Engineering

OCTOBER 21, 2021

Background. Apache Kafka ® is widely used across Uber’s multiple business lines. Take the example of an Uber ride: When a user opens up the Uber app, demand and supply data are aggregated in Kafka queues to serve fare calculations. … The post Introducing uGroup: Uber’s Consumer Management Framework appeared first on Uber Engineering Blog.

Management

Management Kafka Engineering Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Kafka Streams Fundamentals

Confluent

OCTOBER 28, 2021

Kafka Streams is an abstraction over Apache Kafka® producers and consumers that lets you forget about low-level details and focus on processing your Kafka data. You could of course write […].

Kafka

Kafka Process Data

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Introduction. In this blog, I will demonstrate the value of Cloudera DataFlow (CDF) , the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP) , as a Data integration and Democratization fabric. Within the context of a data mesh architecture, I will present industry settings / use cases where the particular architecture is relevant and highlight the business value that it delivers against business and technology areas.

Architecture

Architecture Metadata Kafka Government

Removing The Barrier To Exploratory Analytics with Activity Schema and Narrator

Data Engineering Podcast

OCTOBER 29, 2021

Summary The perennial question of data warehousing is how to model the information that you are storing. This has given rise to methods as varied as star and snowflake schemas, data vault modeling, and wide tables. The challenge with many of those approaches is that they are optimized for answering known questions but brittle and cumbersome when exploring unknowns.

Data Warehouse

Data Warehouse BI Data Workflow Data Engineering

How to improve at SQL as a data engineer

Start Data Engineering

OCTOBER 22, 2021

1. Introduction 2. SQL skills 2.1. Data modeling 2.1.1. Gathering requirements 2.1.2. Exploration 2.1.3. Modeling 2.1.4. Data storage 2.2. Data transformation 2.2.1. Transformation types 2.2.1.1. Narrow transformations 2.2.1.2. Wide transformations 2.2.2. Query planner 2.2.3. Security & Permissions 2.3. Data pipeline 2.4. Data analytics 3. Practice 4.

SQL

SQL Data Engineering Data Engineer Engineering

More Trending

How to improve at SQL as a data engineer

Start Data Engineering

OCTOBER 22, 2021

SQL

SQL Data Engineering Data Engineer Engineering

5 hot new IT jobs — and why they just might stick

DataKitchen

OCTOBER 18, 2021

The post 5 hot new IT jobs — and why they just might stick first appeared on DataKitchen.

Is Balancing Complex Retail and CPG Supply Chains a Total Fantasy?

Teradata

OCTOBER 27, 2021

Recent events have illustrated the fragility of ultra-lean supply chains. Chief Supply Chain Officers must figure out how to navigate these crises to manage costs, speed & quality of service.

Retail

Retail Management

Spring for Apache Kafka 101

Confluent

OCTOBER 19, 2021

Extensive out-of-the-box functionality, a large user community, and up-to-date, cloud-native features make Spring and its libraries a strong option for anchoring your Apache Kafka® and Confluent Cloud based microservices architecture. […].

Kafka

Kafka Architecture Cloud IT

Introducing Self-Service, No-Code Airflow Authoring UI in Cloudera Data Engineering

Cloudera

OCTOBER 19, 2021

Airflow has been adopted by many Cloudera Data Platform (CDP) customers in the public cloud as the next generation orchestration service to setup and operationalize complex data pipelines. Today, customers have deployed 100s of Airflow DAGs in production performing various data transformation and preparation tasks, with differing levels of complexity.

Coding

Coding Data Engineering Data Engineer Engineering

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Streaming Data Pipelines Made SQL With Decodable

Data Engineering Podcast

OCTOBER 28, 2021

Summary Streaming data systems have been growing more capable and flexible over the past few years. Despite this, it is still challenging to build reliable pipelines for stream processing. In this episode Eric Sammer discusses the shortcomings of the current set of streaming engines and how they force engineers to work at an extremely low level of abstraction.

Data Pipeline

Data Pipeline SQL Data Warehouse Data Lake

Whats the difference between ETL & ELT?

Start Data Engineering

OCTOBER 12, 2021

1. Introduction 2. E-T-L definition 3. Differences between ETL & ELT 4. Conclusion 5. Further reading 1. Introduction If you are a student, analyst, engineer, or anyone working with data pipelines, you would have heard of ETL and ELT architecture. If you have questions like What is the difference between ETL & ELT? Should I use ETL or ELT pattern for my data pipeline?

Data Pipeline

Data Pipeline Architecture Engineering Data

Data Quality: Volume, interdependencies can create big problems

DataKitchen

OCTOBER 19, 2021

The post Data Quality: Volume, interdependencies can create big problems first appeared on DataKitchen.

Data

Volkswagen and Teradata Develop New Smart Factory Solution

Teradata

OCTOBER 4, 2021

An interdisciplinary team from Volkswagen, AWS and Teradata have created an intelligent solution that enables greater transparency and efficiency in car body construction. Find out more.

AWS

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Extracting Value from IoT Using Azure Cosmos DB, Azure Synapse Analytics, and Confluent Cloud

Confluent

OCTOBER 6, 2021

Today, an organization’s strategic objective is to deliver innovations for a connected life and to improve the quality of life worldwide. With connected devices comes data, and with data comes […].

Cloud

Cloud Data Programming

What is new in Cloudera Streaming Analytics 1.5?

Cloudera

OCTOBER 12, 2021

At the end of May, we released the second version of Cloudera SQL Stream Builder (SSB) as part of Cloudera Streaming Analytics (CSA). Among other features, the 1.4 version of CSA surfaced the expressivity of Flink SQL in SQL Stream Builder via adding DDL and Catalog support, and it greatly improved the integration with other Cloudera Data Platform components, for example via enabling stream enrichment from Hive and Kudu. .

Java

Java SQL Relational Database Database

Data Exploration For Business Users Powered By Analytics Engineering With Lightdash

Data Engineering Podcast

OCTOBER 22, 2021

Summary The market for business intelligence has been going through an evolutionary shift in recent years. One of the driving forces for that change has been the rise of analytics engineering powered by dbt. Lightdash has fully embraced that shift by building an entire open source business intelligence framework that is powered by dbt models. In this episode Oliver Laslett describes why dashboards aren’t sufficient for business analytics, how Lightdash promotes the work that you are alread

Engineering

Engineering Business Intelligence Data Warehouse BI

What are Common Table Expressions(CTEs) and when to use them?

Start Data Engineering

OCTOBER 12, 2021

Introduction Setup Common Table Expressions (CTEs) Performance comparison CTE Subquery and derived tables Temp table Trade-offs Tear down Conclusion References Introduction If you are a student, analyst, engineer, or anyone in the data space and are Wondering what CTEs are? Trying to understand CTE performance Then this post is for you. In this post, we go over what CTEs are and compare their performance to the subquery, derived table, and temp table.

Engineering

Engineering Data

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

Data Engineers are Burned Out and Calling for DataOps

DataKitchen

OCTOBER 19, 2021

The post Data Engineers are Burned Out and Calling for DataOps first appeared on DataKitchen.

Data Engineering

Data Engineering Data Engineer Engineering Data

Job Evaluation Methods: A Simplified Guide In 3 Points

U-Next

OCTOBER 19, 2021

INTRODUCTION. The evaluation of the job method determines the value of jobs at intervals a company. Various styles of jobs area unit performed by staff in a company. Some area unit is totally changed in responsibilities to every different area and a few areas similar to happiness to the same cluster. It is important to ascertain or a method to work out the relative value of work and implement clear ways to maintain the plan for equal pay in a company.

Education

Education Utilities Programming Systems

Stream Governance – How it Works

Confluent

OCTOBER 29, 2021

At the recent Kafka Summit, Confluent announced the general availability of Stream Governance–the industry’s only governance suite for data in motion. Offered as a fully managed cloud solution, it delivers […].

Government

Government IT Kafka Cloud

Our 2021 Data Impact Awards Finalists

Cloudera

OCTOBER 18, 2021

It’s that time of year again… Award season! We are thrilled to announce the finalists of the 2021 Data Impact Awards. This year’s entrants have excelled at demonstrating how innovative data solutions can help solve real-time challenges and positively impact people around the world. . The entries are some of the most remarkable we’ve seen, giving our judges the tough task of selecting an award worthy shortlist.

Banking

Banking Transportation Government Consulting

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

Cloud

Completing The Feedback Loop Of Data Through Operational Analytics With Census

Data Engineering Podcast

OCTOBER 20, 2021

Summary The focus of the past few years has been to consolidate all of the organization’s data into a cloud data warehouse. As a result there have been a number of trends in data that take advantage of the warehouse as a single focal point. Among those trends is the advent of operational analytics, which completes the cycle of data from collection, through analysis, to driving further action.

Data Warehouse

Data Warehouse Data Lake Business Intelligence Data Engineering

6 Key Concepts, to Master Window Functions

Start Data Engineering

OCTOBER 12, 2021

Introduction Prerequisites 6 Key Concepts 1. When to Use 2. Partition By 3. Order By 4. Function 5. Lead and Lag 6. Rolling Window Efficiency Considerations Conclusion Further reading References Introduction If work with data, window functions can significantly level up your SQL skills.

SQL

SQL Data

How Predictive and Prescriptive Analytics Improve the Call Center Experience

DataKitchen

OCTOBER 7, 2021

The post How Predictive and Prescriptive Analytics Improve the Call Center Experience first appeared on DataKitchen.

Data Scientist vs Data Engineer: Differences and Why You Need Both

AltexSoft

OCTOBER 30, 2021

Was Nikola Tesla a scientist or engineer? How about Edison? Or Da Vinci? It’s hard to give a solid answer, right? These men didn’t stop at scientific research and ended up conceptualizing or engineering their inventions. One discipline goes hand in hand with another. In the modern world, this distinction is even more vague. Engineers are not only the ones bearing helmets and operating on construction sites.

Data Engineering

Data Engineering Data Engineer Engineering Machine Learning

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

Business Intelligence

Using ksqlDB for Real-Time Lead Management and Reporting at Leadnomics

Confluent

OCTOBER 21, 2021

How do you continuously process half a terabyte of data in real-time? That’s the exact question we had to answer. Leadnomics is a digital marketing company that helps companies maximize […].

Management

Management Process Data

The Ultimate Map to finding Halloween candy surplus

Cloudera

OCTOBER 26, 2021

As Halloween night quickly approaches, there is only one question on every kid’s mind: how can I maximize my candy haul this year with the best possible candy? This kind of question lends itself perfectly to data science approaches that enable quick and intuitive analysis of data across multiple sources. Using Cloudera Machine Learning, the world’s first hybrid data cloud machine learning tooling, let’s take a deep dive into the world of candy analytics to answer the tough question on everyone’s

Data Mining

Data Mining Portfolio Machine Learning Data Science

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

Data Engineering Podcast

OCTOBER 15, 2021

Summary The binding element of all data work is the metadata graph that is generated by all of the workflows that produce the assets used by teams across the organization. The DataHub project was created as a way to bring order to the scale of LinkedIn’s data needs. It was also designed to be able to work for small scale systems that are just starting to develop in complexity.

Metadata

Metadata BI Data Warehouse Government

6 Responsibilities of a Data Engineer

Start Data Engineering

OCTOBER 12, 2021

Introduction Responsibilities of a data engineer 1. Move data between systems 2. Manage data warehouse 3. Schedule, execute, and monitor data pipelines 4. Serve data to the end-users 5. Data strategy for the company 6. Deploy ML models to production Conclusion Further reading Introduction Data engineering is a relatively new field, and as such, there is a huge variance in the actual job responsibilities across different companies.

Data Engineering

Data Engineering Data Engineer Engineering Data Warehouse

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

October, 2021

How to add tests to your data pipelines

Tech workers warned they were going to quit. Now, the problem is spiralling out of control

Webinars

Trending Sources

Introducing uGroup: Uber’s Consumer Management Framework

Webinars

Kafka Streams Fundamentals

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Removing The Barrier To Exploratory Analytics with Activity Schema and Narrator

How to improve at SQL as a data engineer

Sign up to get articles personalized to your interests!

More Trending

How to improve at SQL as a data engineer

5 hot new IT jobs — and why they just might stick

Is Balancing Complex Retail and CPG Supply Chains a Total Fantasy?

Spring for Apache Kafka 101

Introducing Self-Service, No-Code Airflow Authoring UI in Cloudera Data Engineering

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Streaming Data Pipelines Made SQL With Decodable

Whats the difference between ETL & ELT?

Data Quality: Volume, interdependencies can create big problems

Volkswagen and Teradata Develop New Smart Factory Solution

How to Modernize Manufacturing Without Losing Control

Extracting Value from IoT Using Azure Cosmos DB, Azure Synapse Analytics, and Confluent Cloud

What is new in Cloudera Streaming Analytics 1.5?

Data Exploration For Business Users Powered By Analytics Engineering With Lightdash

What are Common Table Expressions(CTEs) and when to use them?

The Ultimate Guide to Apache Airflow DAGS

Data Engineers are Burned Out and Calling for DataOps

Job Evaluation Methods: A Simplified Guide In 3 Points

Stream Governance – How it Works

Our 2021 Data Impact Awards Finalists

Optimizing The Modern Developer Experience with Coder

Completing The Feedback Loop Of Data Through Operational Analytics With Census

6 Key Concepts, to Master Window Functions

How Predictive and Prescriptive Analytics Improve the Call Center Experience

Data Scientist vs Data Engineer: Differences and Why You Need Both

15 Modern Use Cases for Enterprise Business Intelligence

Using ksqlDB for Real-Time Lead Management and Reporting at Leadnomics

The Ultimate Map to finding Halloween candy surplus

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

6 Responsibilities of a Data Engineer

Apache Airflow® Best Practices: DAG Writing

Stay Connected