October, 2021

article thumbnail

How to add tests to your data pipelines

Start Data Engineering

Introduction Testing your data pipeline 1. End-to-end system testing 2. Data quality testing 3. Monitoring and alerting 4. Unit and contract testing Conclusion Further reading Introduction Testing data pipelines are different from testing other applications, like a website backend.

article thumbnail

Tech workers warned they were going to quit. Now, the problem is spiralling out of control

DataKitchen

The post Tech workers warned they were going to quit. Now, the problem is spiralling out of control first appeared on DataKitchen.

145
145
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

Introducing uGroup: Uber’s Consumer Management Framework

Uber Engineering

Background. Apache Kafka ® is widely used across Uber’s multiple business lines. Take the example of an Uber ride: When a user opens up the Uber app, demand and supply data are aggregated in Kafka queues to serve fare calculations. … The post Introducing uGroup: Uber’s Consumer Management Framework appeared first on Uber Engineering Blog.

article thumbnail

Kafka Streams Fundamentals

Confluent

Kafka Streams is an abstraction over Apache Kafka® producers and consumers that lets you forget about low-level details and focus on processing your Kafka data. You could of course write […].

Kafka 131
article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

Introduction. In this blog, I will demonstrate the value of Cloudera DataFlow (CDF) , the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP) , as a Data integration and Democratization fabric. Within the context of a data mesh architecture, I will present industry settings / use cases where the particular architecture is relevant and highlight the business value that it delivers against business and technology areas.

article thumbnail

Removing The Barrier To Exploratory Analytics with Activity Schema and Narrator

Data Engineering Podcast

Summary The perennial question of data warehousing is how to model the information that you are storing. This has given rise to methods as varied as star and snowflake schemas, data vault modeling, and wide tables. The challenge with many of those approaches is that they are optimized for answering known questions but brittle and cumbersome when exploring unknowns.

More Trending

article thumbnail

5 hot new IT jobs — and why they just might stick

DataKitchen

The post 5 hot new IT jobs — and why they just might stick first appeared on DataKitchen.

IT 142
article thumbnail

Is Balancing Complex Retail and CPG Supply Chains a Total Fantasy?

Teradata

Recent events have illustrated the fragility of ultra-lean supply chains. Chief Supply Chain Officers must figure out how to navigate these crises to manage costs, speed & quality of service.

Retail 98
article thumbnail

Spring for Apache Kafka 101

Confluent

Extensive out-of-the-box functionality, a large user community, and up-to-date, cloud-native features make Spring and its libraries a strong option for anchoring your Apache Kafka® and Confluent Cloud based microservices architecture. […].

Kafka 130
article thumbnail

Introducing Self-Service, No-Code Airflow Authoring UI in Cloudera Data Engineering

Cloudera

Airflow has been adopted by many Cloudera Data Platform (CDP) customers in the public cloud as the next generation orchestration service to setup and operationalize complex data pipelines. Today, customers have deployed 100s of Airflow DAGs in production performing various data transformation and preparation tasks, with differing levels of complexity.

Coding 120
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Streaming Data Pipelines Made SQL With Decodable

Data Engineering Podcast

Summary Streaming data systems have been growing more capable and flexible over the past few years. Despite this, it is still challenging to build reliable pipelines for stream processing. In this episode Eric Sammer discusses the shortcomings of the current set of streaming engines and how they force engineers to work at an extremely low level of abstraction.

article thumbnail

Whats the difference between ETL & ELT?

Start Data Engineering

1. Introduction 2. E-T-L definition 3. Differences between ETL & ELT 4. Conclusion 5. Further reading 1. Introduction If you are a student, analyst, engineer, or anyone working with data pipelines, you would have heard of ETL and ELT architecture. If you have questions like What is the difference between ETL & ELT? Should I use ETL or ELT pattern for my data pipeline?

article thumbnail

Data Quality: Volume, interdependencies can create big problems

DataKitchen

The post Data Quality: Volume, interdependencies can create big problems first appeared on DataKitchen.

Data 98
article thumbnail

Volkswagen and Teradata Develop New Smart Factory Solution

Teradata

An interdisciplinary team from Volkswagen, AWS and Teradata have created an intelligent solution that enables greater transparency and efficiency in car body construction. Find out more.

AWS 98
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Extracting Value from IoT Using Azure Cosmos DB, Azure Synapse Analytics, and Confluent Cloud

Confluent

Today, an organization’s strategic objective is to deliver innovations for a connected life and to improve the quality of life worldwide. With connected devices comes data, and with data comes […].

Cloud 124
article thumbnail

What is new in Cloudera Streaming Analytics 1.5?

Cloudera

At the end of May, we released the second version of Cloudera SQL Stream Builder (SSB) as part of Cloudera Streaming Analytics (CSA). Among other features, the 1.4 version of CSA surfaced the expressivity of Flink SQL in SQL Stream Builder via adding DDL and Catalog support, and it greatly improved the integration with other Cloudera Data Platform components, for example via enabling stream enrichment from Hive and Kudu. .

Java 115
article thumbnail

Data Exploration For Business Users Powered By Analytics Engineering With Lightdash

Data Engineering Podcast

Summary The market for business intelligence has been going through an evolutionary shift in recent years. One of the driving forces for that change has been the rise of analytics engineering powered by dbt. Lightdash has fully embraced that shift by building an entire open source business intelligence framework that is powered by dbt models. In this episode Oliver Laslett describes why dashboards aren’t sufficient for business analytics, how Lightdash promotes the work that you are alread

article thumbnail

What are Common Table Expressions(CTEs) and when to use them?

Start Data Engineering

Introduction Setup Common Table Expressions (CTEs) Performance comparison CTE Subquery and derived tables Temp table Trade-offs Tear down Conclusion References Introduction If you are a student, analyst, engineer, or anyone in the data space and are Wondering what CTEs are? Trying to understand CTE performance Then this post is for you. In this post, we go over what CTEs are and compare their performance to the subquery, derived table, and temp table.

article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Data Engineers are Burned Out and Calling for DataOps

DataKitchen

The post Data Engineers are Burned Out and Calling for DataOps first appeared on DataKitchen.

article thumbnail

Job Evaluation Methods: A Simplified Guide In 3 Points

U-Next

INTRODUCTION. The evaluation of the job method determines the value of jobs at intervals a company. Various styles of jobs area unit performed by staff in a company. Some area unit is totally changed in responsibilities to every different area and a few areas similar to happiness to the same cluster. It is important to ascertain or a method to work out the relative value of work and implement clear ways to maintain the plan for equal pay in a company.

article thumbnail

Stream Governance – How it Works

Confluent

At the recent Kafka Summit, Confluent announced the general availability of Stream Governance–the industry’s only governance suite for data in motion. Offered as a fully managed cloud solution, it delivers […].

article thumbnail

Our 2021 Data Impact Awards Finalists

Cloudera

It’s that time of year again… Award season! We are thrilled to announce the finalists of the 2021 Data Impact Awards. This year’s entrants have excelled at demonstrating how innovative data solutions can help solve real-time challenges and positively impact people around the world. . The entries are some of the most remarkable we’ve seen, giving our judges the tough task of selecting an award worthy shortlist.

Banking 111
article thumbnail

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

article thumbnail

Completing The Feedback Loop Of Data Through Operational Analytics With Census

Data Engineering Podcast

Summary The focus of the past few years has been to consolidate all of the organization’s data into a cloud data warehouse. As a result there have been a number of trends in data that take advantage of the warehouse as a single focal point. Among those trends is the advent of operational analytics, which completes the cycle of data from collection, through analysis, to driving further action.

article thumbnail

6 Key Concepts, to Master Window Functions

Start Data Engineering

Introduction Prerequisites 6 Key Concepts 1. When to Use 2. Partition By 3. Order By 4. Function 5. Lead and Lag 6. Rolling Window Efficiency Considerations Conclusion Further reading References Introduction If work with data, window functions can significantly level up your SQL skills.

SQL 130
article thumbnail

How Predictive and Prescriptive Analytics Improve the Call Center Experience

DataKitchen

The post How Predictive and Prescriptive Analytics Improve the Call Center Experience first appeared on DataKitchen.

98
article thumbnail

Data Scientist vs Data Engineer: Differences and Why You Need Both

AltexSoft

Was Nikola Tesla a scientist or engineer? How about Edison? Or Da Vinci? It’s hard to give a solid answer, right? These men didn’t stop at scientific research and ended up conceptualizing or engineering their inventions. One discipline goes hand in hand with another. In the modern world, this distinction is even more vague. Engineers are not only the ones bearing helmets and operating on construction sites.

article thumbnail

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

article thumbnail

Using ksqlDB for Real-Time Lead Management and Reporting at Leadnomics

Confluent

How do you continuously process half a terabyte of data in real-time? That’s the exact question we had to answer. Leadnomics is a digital marketing company that helps companies maximize […].

article thumbnail

The Ultimate Map to finding Halloween candy surplus

Cloudera

As Halloween night quickly approaches, there is only one question on every kid’s mind: how can I maximize my candy haul this year with the best possible candy? This kind of question lends itself perfectly to data science approaches that enable quick and intuitive analysis of data across multiple sources. Using Cloudera Machine Learning, the world’s first hybrid data cloud machine learning tooling, let’s take a deep dive into the world of candy analytics to answer the tough question on everyone’s

article thumbnail

Bringing The Power Of The DataHub Real-Time Metadata Graph To Everyone At Acryl Data

Data Engineering Podcast

Summary The binding element of all data work is the metadata graph that is generated by all of the workflows that produce the assets used by teams across the organization. The DataHub project was created as a way to bring order to the scale of LinkedIn’s data needs. It was also designed to be able to work for small scale systems that are just starting to develop in complexity.

Metadata 100
article thumbnail

6 Responsibilities of a Data Engineer

Start Data Engineering

Introduction Responsibilities of a data engineer 1. Move data between systems 2. Manage data warehouse 3. Schedule, execute, and monitor data pipelines 4. Serve data to the end-users 5. Data strategy for the company 6. Deploy ML models to production Conclusion Further reading Introduction Data engineering is a relatively new field, and as such, there is a huge variance in the actual job responsibilities across different companies.

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!