Top Data Engineering Digest Database-centric Data Validation Content for August, 2021

August, 2021

How Uber Achieves Operational Excellence in the Data Quality Experience

Uber Engineering

AUGUST 5, 2021

Uber delivers efficient and reliable transportation across the global marketplace, which is powered by hundreds of services, machine learning models, and tens of thousands of datasets. While growing rapidly, we’re also committed to maintaining data quality, as it can greatly … The post How Uber Achieves Operational Excellence in the Data Quality Experience appeared first on Uber Engineering Blog.

Transportation

Transportation Machine Learning Datasets Data

How ksqlDB Works: Internal Architecture and Advanced Features

Confluent

AUGUST 25, 2021

To effectively use ksqlDB, the streaming database for Apache Kafka®, you should of course be familiar with its features and syntax. However, a deeper understanding of what goes on underneath […].

Architecture

Architecture Kafka Database IT

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Trending Sources

Natural Language Processing: A Guide to NLP Use Cases, Approaches, and Tools

AltexSoft

AUGUST 25, 2021

Humans have been trying to make machines chat for decades. Alan Turing considered computers’ ability to generate natural speech a proof of their ability to think. Today, we converse with virtual companions all the time. But despite years of research and innovation, their unnatural responses remind us that no, we’re not yet at the HAL 9000-level of speech sophistication.

Process

Process Deep Learning Datasets Machine Learning

Webinars

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Understand & Deliver on Your Data Engineering Task

Start Data Engineering

AUGUST 29, 2021

1. Introduction 2. Understanding your data engineering task 2.1. Data infrastructure overview 2.2. What exactly 2.3. Why exactly 2.4. Current state 2.5. Downstream impact 3. Delivering your data engineering task 3.1. How 3.2. Breakdown into sub-tasks 3.3. Delivering the finished task 4. Conclusion 5. Further reading 1. Introduction Congratulations! You are given a quick overview of the business and data architecture and are assigned your very first data engineering task.

Data Engineering

Data Engineering Data Engineer Engineering Data Architecture

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Speaker: Jason Chester, Director, Product Management

In today’s manufacturing landscape, staying competitive means moving beyond reactive quality checks and toward real-time, data-driven process control. But what does true manufacturing process optimization look like—and why is it more urgent now than ever? Join Jason Chester in this new, thought-provoking session on how modern manufacturers are rethinking quality operations from the ground up.

Manufacturing

Build Trust In Your Data By Understanding Where It Comes From And How It Is Used With Stemma

Data Engineering Podcast

AUGUST 10, 2021

Summary All of the fancy data platform tools and shiny dashboards that you use are pointless if the consumers of your analysis don’t have trust in the answers. Stemma helps you establish and maintain that trust by giving visibility into who is using what data, annotating the reports with useful context, and understanding who is responsible for keeping it up to date.

IT Building Data Warehouse Python

A ‘Fresh Squeeze on Data’ to Help Children Learn about Data, AI and Machine Learning

Cloudera

AUGUST 17, 2021

Dear Parents and Educators and Friends of Cloudera, If you are reading this blog, you know us at Cloudera as a group of self-described data geeks and data analysts. We believe data drives better decisions and moves businesses forward and for us, that’s exciting. We are innovating and helping Fortune 500 transform and grow because they can make better data-driven decisions at the accelerated pace we live and work in today.

Machine Learning

Machine Learning Entertainment Education Data

Chugai Pharmaceutical

Teradata

AUGUST 29, 2021

Accelerating drug discovery and development with Teradata Vantage on AWS.

Pharmaceutical

Pharmaceutical AWS

More Trending

Chugai Pharmaceutical

Teradata

AUGUST 29, 2021

Accelerating drug discovery and development with Teradata Vantage on AWS.

Pharmaceutical

Pharmaceutical AWS

Announcing Elastic Data Streams Support for Confluent’s Elasticsearch Sink Connector

Confluent

AUGUST 31, 2021

Today, as part of our expanded partnership with Elastic, we are announcing an update to the fully managed Elasticsearch Sink Connector in Confluent Cloud. This update allows you to take […].

Cloud

Cloud Data Management Kafka

Implementing a Pharma Data Mesh using DataOps

DataKitchen

AUGUST 19, 2021

Below is our fourth post (4 of 5) on combining data mesh with DataOps to foster innovation while addressing the challenges of a decentralized architecture. We’ve covered the basic ideas behind data mesh and some of the difficulties that must be managed. Below is a discussion of a data mesh implementation in the pharmaceutical space. For those embarking on the data mesh journey, it may be helpful to discuss a real-world example and the lessons learned from an actual data mesh implementation.

Pharmaceutical

Pharmaceutical Data Lake Data Warehouse Raw Data

4 Key Patterns to Load Data Into A Data Warehouse

Start Data Engineering

AUGUST 17, 2021

Introduction Patterns 1. Batch Data Pipelines 1.1 Process => Data Warehouse 1.2 Process => Cloud Storage => Data Warehouse 2. Near Real-Time Data pipelines 2.1 Data Stream => Consumer => Data Warehouse 2.2 Cloud Storage => process => Data Warehouse Conclusion Further Reading Introduction Loading data into a data warehouse is a key component of most data pipelines.

Data Warehouse

Data Warehouse Cloud Storage Data Pipeline Data

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Data Engineering Podcast

AUGUST 3, 2021

Summary Data lake architectures have largely been biased toward batch processing workflows due to the volume of data that they are designed for. With more real-time requirements and the increasing use of streaming data there has been a struggle to merge fast, incremental updates with large, historical analysis. Vinoth Chandar helped to create the Hudi project while at Uber to address this challenge.

Data Lake

Data Lake Data Warehouse Hadoop Kafka

Airflow Best Practices for ETL/ELT Pipelines

Speaker: Kenten Danas, Senior Manager, Developer Relations

ETL and ELT are some of the most common data engineering use cases, but can come with challenges like scaling, connectivity to other systems, and dynamically adapting to changing data sources. Airflow is specifically designed for moving and transforming data in ETL/ELT pipelines, and new features in Airflow 3.0 like assets, backfills, and event-driven scheduling make orchestrating ETL/ELT pipelines easier than ever!

Data Engineering

Cloudera DataFlow for the Public Cloud: A technical deep dive

Cloudera

AUGUST 16, 2021

We just announced Cloudera DataFlow for the Public Cloud (CDF-PC), the first cloud-native runtime for Apache NiFi data flows. CDF-PC enables Apache NiFi users to run their existing data flows on a managed, auto-scaling platform with a streamlined way to deploy NiFi data flows and a central monitoring dashboard making it easier than ever before to operate NiFi data flows at scale in the public cloud.

Cloud

Cloud Unstructured Data Utilities Metadata

Mitsui Sumitomo Insurance Co., Ltd.

Teradata

AUGUST 17, 2021

Vantage on AWS supports Next Best Action efforts - adding new supplemental coverage on policy renewals at a rate of 250%.

Insurance

Insurance AWS

Designing and Architecting the Confluent CLI

Confluent

AUGUST 6, 2021

It is often difficult enough to build one application that talks to a single middleware or backend layer; e.g., a whole team of frontend engineers may build a web application […].

Designing

Designing Building Engineering IT

4 Ways Conversational AI Is Improving the Customer Experience

DataKitchen

AUGUST 19, 2021

The post 4 Ways Conversational AI Is Improving the Customer Experience first appeared on DataKitchen.

Whats New in Apache Airflow 3.0 –– And How Will It Reshape Your Data Workflows?

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data Workflow

Towards a Reliable Device Management Platform

Netflix Tech

AUGUST 30, 2021

By Benson Ma , Alok Ahuja Introduction At Netflix, hundreds of different device types, from streaming sticks to smart TVs, are tested every day through automation to ensure that new software releases continue to deliver the quality of the Netflix experience that our customers enjoy. In addition, Netflix continuously works with its partners (such as Roku, Samsung, LG, Amazon) to port the Netflix SDK to their new and upcoming devices (TVs, smart boxes, etc), to ensure the quality bar is reached be

Management

Management Kafka Transportation Cloud

Do Away With Data Integration Through A Dataware Architecture With Cinchy

Data Engineering Podcast

AUGUST 27, 2021

Summary The reason that so much time and energy is spent on data integration is because of how our applications are designed. By making the software be the owner of the data that it generates, we have to go through the trouble of extracting the information to then be used elsewhere. The team at Cinchy are working to bring about a new paradigm of software architecture that puts the data as the central element.

Data Integration

Data Integration Architecture Data Warehouse Data Lake

Five Reasons Why Platforms Beat Point Solutions in Every Business Case

Cloudera

AUGUST 11, 2021

Once upon an IT time, everything was a “point product,” a specific application designed to do a single job inside a desktop PC, server, storage array, network, or mobile device. Point solutions are still used every day in many enterprise systems, but as IT continues to evolve, the platform approach beats point solutions in almost every use case.

Cloud

Cloud Big Data Government Cloud Computing

The Power of Path Analysis

Teradata

AUGUST 12, 2021

For both analysts and data scientists, identifying paths and patterns in data is a valuable way to gain insight into the occurrences leading to or from any event of interest. Read more.

Data

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Driving New Integrations with Confluent and ksqlDB at ACERTUS

Confluent

AUGUST 26, 2021

When companies need help with their vehicle fleets—including transport, storage, or renewing expired registrations—they don’t want to have to deal with multiple vehicle logistics providers. For these companies, ACERTUS provides […].

Transportation

Transportation Architecture

How DataOps is Transforming Commercial Pharma Analytics

DataKitchen

AUGUST 27, 2021

DataOps has become an essential methodology in pharmaceutical enterprise data organizations, especially for commercial operations. Companies that implement it well derive significant competitive advantage from their superior ability to manage and create value from data. They will be able to produce high-quality, on-demand insight that consistently leads to successful business decisions.

Pharmaceutical

Pharmaceutical Pipeline-centric Data Lake Data Analytics

Predictive Lead Scoring: Discovering Best-Fit Prospects with Machine Learning

AltexSoft

AUGUST 10, 2021

B2B sales strategies can be roughly divided into two activities: lead generation and lead conversion. It’s clear how each works. The former, attracting visitors to your website and then helping them take certain actions, is almost automated and works through carefully placed calls to action. The latter, supporting a lead to make the purchasing decision, is done by professional sales people with their arsenal of personalized tactics.

Machine Learning

Machine Learning Data Mining Algorithm Datasets

Decoupling Data Operations From Data Infrastructure Using Nexla

Data Engineering Podcast

AUGUST 25, 2021

Summary The technological and social ecosystem of data engineering and data management has been reaching a stage of maturity recently. As part of this stage in our collective journey the focus has been shifting toward operation and automation of the infrastructure and workflows that power our analytical workloads. It is an encouraging sign for the industry, but it is still a complex and challenging undertaking.

Data

Data Metadata Data Engineering Data Engineer

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

Cloud

Replace and Boost your Apache Storm Topologies with Apache NiFi Flows

Cloudera

AUGUST 2, 2021

Recently, I worked with a large fortune 500 customer on their migration from Apache Storm to Apache NiFi. If you’re asking yourself, “Isn’t Storm for complex event processing and NiFi for simple event processing?”, you’re correct. A few customers chose a complex event engine like Apache Storm for their simple event processing, even when Apache NiFi is the more practical choice, cutting drastically down on SDLC (software development lifecycle) time.

Kafka

Kafka Java Coding Process

Back to School! Time to Ditch the Promotions Calendar?

Teradata

AUGUST 26, 2021

As Back to School promotions hit the shelves, Christmas & New Year offers are already locked in. Are these long-lead cycles still effective in today’s dynamic Retail & CPG environment?

Retail

Announcing the Confluent Q3 ’21 Release

Confluent

AUGUST 17, 2021

The Confluent Q3 ‘21 release is here and packed full of new features that enable the world’s most innovative businesses to continue building what keeps them on top: real-time, mission-critical […].

Building

Building Kafka Cloud

Accelerating Drug Discovery and Development with DataOps

DataKitchen

AUGUST 13, 2021

A drug company tests 50,000 molecules and spends a billion dollars or more to find a single safe and effective medicine that addresses a substantial market. Figure 1 shows the 15-year cycle from screening to government agency approval and phase IV trials. Drug companies desperately look for ways to compress this lengthy time frame and to demonstrate the competitive advantage of their intellectual property.

Pharmaceutical

Pharmaceutical Amazon Web Services Government Google Cloud

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

Business Intelligence

Data Marts: What They Are and Why Businesses Need Them

AltexSoft

AUGUST 4, 2021

Imagine you run a candy store. Some sweets are presented on your display cases for quick access while the rest is kept in the storeroom. Now let’s think of sweets as the data required for your company’s daily operations. Instead of combing through the vast amounts of all organizational data stored in a data warehouse, you can use a data mart — a repository that makes specific pieces of data available quickly to any given business unit.

Data Lake

Data Lake Data Warehouse ETL Tools Telecommunication

Let Your Analysts Build A Data Lakehouse With Cuelake

Data Engineering Podcast

AUGUST 20, 2021

Summary Data lakes have been gaining popularity alongside an increase in their sophistication and usability. Despite improvements in performance and data architecture they still require significant knowledge and experience to deploy and manage. In this episode Vikrant Dubey discusses his work on the Cuelake project which allows data analysts to build a lakehouse with SQL queries.

Building

Building Data Lake Data Warehouse SQL

Choosing Your Upgrade or Migration Path to Cloudera Data Platform

Cloudera

AUGUST 5, 2021

In our previous blog, we talked about the four paths to Cloudera Data Platform. . In-place Upgrade. Sidecar Migration. Rolling Sidecar Migration. Migrating to Cloud. If you haven’t read that yet, we invite you to take a moment and run through the scenarios in that blog. The four strategies will be relevant throughout the rest of this discussion. Today, we’ll discuss an example of how you might make this decision for a cluster using a “round of elimination” process based on our decision workflow.

Finance

Finance Cloud Data Government

Maximizing the 5G Analytics Dividend

Teradata

AUGUST 22, 2021

As 5G puts data analytics at the heart of the next wave of sustainable growth, telcos must ensure their existing investments in data infrastructure can be leveraged to enable that growth.

Data Analytics

Data Analytics Data

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

August, 2021

How Uber Achieves Operational Excellence in the Data Quality Experience

How ksqlDB Works: Internal Architecture and Advanced Features

Webinars

Trending Sources

Natural Language Processing: A Guide to NLP Use Cases, Approaches, and Tools

Webinars

Understand & Deliver on Your Data Engineering Task

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Build Trust In Your Data By Understanding Where It Comes From And How It Is Used With Stemma

A ‘Fresh Squeeze on Data’ to Help Children Learn about Data, AI and Machine Learning

Chugai Pharmaceutical

Sign up to get articles personalized to your interests!

More Trending

Chugai Pharmaceutical

Announcing Elastic Data Streams Support for Confluent’s Elasticsearch Sink Connector

Implementing a Pharma Data Mesh using DataOps

4 Key Patterns to Load Data Into A Data Warehouse

Charting A Path For Streaming Data To Fill Your Data Lake With Hudi

Airflow Best Practices for ETL/ELT Pipelines

Cloudera DataFlow for the Public Cloud: A technical deep dive

Mitsui Sumitomo Insurance Co., Ltd.

Designing and Architecting the Confluent CLI

4 Ways Conversational AI Is Improving the Customer Experience

Whats New in Apache Airflow 3.0 –– And How Will It Reshape Your Data Workflows?

Towards a Reliable Device Management Platform

Do Away With Data Integration Through A Dataware Architecture With Cinchy

Five Reasons Why Platforms Beat Point Solutions in Every Business Case

The Power of Path Analysis

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Driving New Integrations with Confluent and ksqlDB at ACERTUS

How DataOps is Transforming Commercial Pharma Analytics

Predictive Lead Scoring: Discovering Best-Fit Prospects with Machine Learning

Decoupling Data Operations From Data Infrastructure Using Nexla

Optimizing The Modern Developer Experience with Coder

Replace and Boost your Apache Storm Topologies with Apache NiFi Flows

Back to School! Time to Ditch the Promotions Calendar?

Announcing the Confluent Q3 ’21 Release

Accelerating Drug Discovery and Development with DataOps

15 Modern Use Cases for Enterprise Business Intelligence

Data Marts: What They Are and Why Businesses Need Them

Let Your Analysts Build A Data Lakehouse With Cuelake

Choosing Your Upgrade or Migration Path to Cloudera Data Platform

Maximizing the 5G Analytics Dividend

How to Modernize Manufacturing Without Losing Control

Stay Connected