Data Pipeline, Data Warehouse and High Quality Data

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. Multiple open source projects and vendors have been working together to make this vision a reality.

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Tackling Real Time Streaming Data With SQL Using RisingWave

Data Engineering Podcast

FEBRUARY 4, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Dagster offers a new approach to building and running data platforms and data pipelines. Starburst : ![Starburst

SQL

SQL Data Lake High Quality Data Machine Learning

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

AI data engineers are data engineers that are responsible for developing and managing data pipelines that support AI and GenAI data products. Essential Skills for AI Data Engineers Expertise in Data Pipelines and ETL Processes A foundational skill for data engineers?

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

JUNE 9, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake.

Process

Process Data Lake High Quality Data Machine Learning

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern customer data platform (CDP).

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

Data Engineering Weekly #206

Data Engineering Weekly

FEBRUARY 2, 2025

Shifting left involves moving data processing upstream, closer to the source, enabling broader access to high-quality data through well-defined data products and contracts, thus reducing duplication, enhancing data integrity, and bridging the gap between operational and analytical data domains.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

Data Pipelines in the Healthcare Industry

DareData

JULY 29, 2020

With these points in mind, I argue that the biggest hurdle to the widespread adoption of these advanced techniques in the healthcare industry is not intrinsic to the industry itself, or in any way related to its practitioners or patients, but simply the current lack of high-quality data pipelines.

Data Pipeline

Data Pipeline Healthcare Medical Pipeline-centric

Implementing Data Contracts in the Data Warehouse

Monte Carlo

JANUARY 25, 2023

In this article, Chad Sanderson , Head of Product, Data Platform , at Convoy and creator of Data Quality Camp , introduces a new application of data contracts: in your data warehouse. In the last couple of posts , I’ve focused on implementing data contracts in production services.

Data Warehouse

Data Warehouse Data High Quality Data Metadata

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

DataKitchen

JULY 27, 2023

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure. While working in Azure with our customers, we have noticed several standard Azure tools people use to develop data pipelines and ETL or ELT processes. We counted ten ‘standard’ ways to transform and set up batch data pipelines in Microsoft Azure.

Data Pipeline

Data Pipeline BI Machine Learning Data Preparation

Build vs Buy Data Pipeline Guide

Monte Carlo

APRIL 24, 2023

During data ingestion, raw data is extracted from sources and ferried to either a staging server for transformation or directly into the storage level of your data stack—usually in the form of a data warehouse or data lake. There are two primary types of raw data. Missed Nishith’s 5 considerations?

Data Pipeline

Data Pipeline Building Data Ingestion BI

Data Engineering Weekly #186

Data Engineering Weekly

AUGUST 25, 2024

Take Astro (the fully managed Airflow solution) for a test drive today and unlock a suite of features designed to simplify, optimize, and scale your data pipelines. Try For Free → Conference Alert: Data Engineering for AI/ML This is a virtual conference at the intersection of Data and AI.

Data Engineering

Data Engineering Data Engineer Engineering Database-centric

Data Quality Score: The next chapter of data quality at Airbnb

Airbnb Tech

NOVEMBER 28, 2023

However, for all of our uncertified data, which remained the majority of our offline data, we lacked visibility into its quality and didn’t have clear mechanisms for up-leveling it. How could we scale the hard-fought wins and best practices of Midas across our entire data warehouse?

Data Warehouse

Data Warehouse Metadata Data Certification

What is Data Observability? 5 Key Pillars To Know

Monte Carlo

AUGUST 10, 2023

Data observability tools employ automated monitoring, root cause analysis, data lineage, and data health insights to proactively detect, resolve, and prevent data anomalies. Freshness : Freshness seeks to understand how up-to-date your data tables are, as well as the cadence at which your tables are updated.

Data Pipeline

Data Pipeline Software Engineer Software Engineering Machine Learning

Available Now! Automated Testing for Data Transformations

Wayne Yaddow

FEBRUARY 18, 2025

Selecting the strategies and tools for validating data transformations and data conversions in your data pipelines. Introduction Data transformations and data conversions are crucial to ensure that raw data is organized, processed, and ready for useful analysis.

Data Pipeline

Data Pipeline SQL Raw Data Python

Centralize Your Data Processes With a DataOps Process Hub

DataKitchen

NOVEMBER 4, 2021

It’s too hard to change our IT data product. Can we create high-quality data in an “answer-ready” format that can address many scenarios, all with minimal keyboarding? . “I I get cut off at the knees from a data perspective, and I am getting handed a sandwich of sorts and not a good one!”.

Process

Process Data Process Pharmaceutical Data Lake

Data Quality at Airbnb

Airbnb Tech

NOVEMBER 24, 2020

During this transformation, Airbnb experienced the typical growth challenges that most companies do, including those that affect the data warehouse. In the first post of this series, we shared an overview of how we evolved our organization and technology standards to address the data quality challenges faced during hyper growth.

Data Warehouse

Data Warehouse Certification Data Pipeline Data

[O’Reilly Book] Chapter 1: Why Data Quality Deserves Attention Now

Monte Carlo

AUGUST 31, 2023

As the data analyst or engineer responsible for managing this data and making it usable, accessible, and trustworthy, rarely a day goes by without having to field some request from your stakeholders. But what happens when the data is wrong? In our opinion, data quality frequently gets a bad rep.

Data Lake

Data Lake Data Pipeline Unstructured Data Data Warehouse

The Role of Data Observability in Building Reliable GenAI Systems

Monte Carlo

FEBRUARY 23, 2024

And this renewed focus on data quality is bringing much needed visibility into the health of technical systems. As generative AI (and the data powering it) takes center stage, it’s critical to bring this level of observability to where your data lives, in your data warehouse , data lake , or data lakehouse.

Systems

Systems Building Retail Data Lake

Data Observability Tools: Types, Capabilities, and Notable Solutions

Databand.ai

JULY 5, 2023

What Are Data Observability Tools? Data observability tools are software solutions that oversee, analyze, and improve the performance of data pipelines. Data observability tools allow teams to detect issues such as missing values, duplicate records, or inconsistent formats early on before they affect downstream processes.

Data Pipeline

Data Pipeline Data Lake Data Warehouse Datasets

Data Quality Engineer: Skills, Salary, & Tools Required

Monte Carlo

JULY 27, 2023

These specialists are also commonly referred to as data reliability engineers. To be successful in their role, data quality engineers will need to gather data quality requirements (mentioned in 65% of job postings) from relevant stakeholders.

Engineering

Engineering Healthcare Data Warehouse Scala

DataOps Explained: How To Not Screw It Up

Monte Carlo

APRIL 26, 2022

DataOps was first spearheaded by large data-first companies such as Netflix, Uber, and Airbnb that had adopted continuous integration / continuous deployment (CI/CD) principles, even building open source tools to foster their growth for data teams. Monitor : Continuously monitoring and alerting for any anomalies in the data.

IT

IT Data Pipeline Data Engineer Data Engineering

How HomeToGo Is Building a Robust Clickstream Data Architecture with Snowflake, Snowplow and dbt

Snowflake

JULY 27, 2023

It also came with other advantages such as independence of cloud infrastructure providers, data recovery features such as Time Travel , and zero copy cloning which made setting up several environments — such as dev, stage or production — way more efficient.

Data Architecture

Data Architecture Architecture Building Structured Data

How to become Azure Data Engineer I Edureka

Edureka

FEBRUARY 7, 2023

Azure Data Engineers use a variety of Azure data services, such as Azure Synapse Analytics, Azure Data Factory, Azure Stream Analytics, and Azure Databricks, to design and implement data solutions that meet the needs of their organization. Gain hands-on experience using Azure data services.

Data Engineering

Data Engineering Data Engineer Engineering Programming Language

DataOps For Business Analytics Teams

DataKitchen

JANUARY 3, 2022

They need high-quality data in an answer-ready format to address many scenarios with minimal keyboarding. What they are getting from IT and other data sources is, in reality, poor-quality data in a format that requires manual customization. DataOps Process Hub.

Business Analyst

Business Analyst Data Lake Consulting Data Analytics

Creating Value With a Data-Centric Culture: Essential Capabilities to Treat Data as a Product

Ascend.io

JUNE 8, 2023

The Essential Six Capabilities To set the stage for impactful and trustworthy data products in your organization, you need to invest in six foundational capabilities. Data pipelines Data integrity Data lineage Data stewardship Data catalog Data product costing Let’s review each one in detail.

Pipeline-centric

Pipeline-centric Database-centric Data Ingestion Data Pipeline

What is Data Orchestration?

Monte Carlo

MAY 25, 2023

Picture this: your data is scattered. Data pipelines originate in multiple places and terminate in various silos across your organization. Your data is inconsistent, ungoverned, inaccessible, and difficult to use. Some of the value companies can generate from data orchestration tools include: Faster time-to-insights.

Data Pipeline

Data Pipeline Data Workflow Data Data Governance

Interpreting the Gartner Data Observability Market Guide

Monte Carlo

AUGUST 13, 2024

Here’s how Gartner officially defines the category of data observability tools: “Data observability tools are software applications that enable organizations to understand the state and health of their data, data pipelines, data landscapes, data infrastructures, and the financial operational cost of the data across distributed environments.

Data

Data Data Warehouse Data Pipeline Data Architecture

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

DataKitchen

SEPTEMBER 21, 2023

Data in Place refers to the organized structuring and storage of data within a specific storage medium, be it a database, bucket store, files, or other storage platforms. In the contemporary data landscape, data teams commonly utilize data warehouses or lakes to arrange their data into L1, L2, and L3 layers.

Raw Data

Raw Data Data Business Intelligence Data Engineer

Best Data Observability Tools (with RFP Template and Analyst Reports)

Monte Carlo

FEBRUARY 8, 2024

GigaOm GigaOm’s Data Observability Radar Report covers the problem data observability tools look to solve saying, “Data observability is critical for countering, if not eliminating, data downtime, in which the results of analytics or the performance of applications are compromised because of unhealthy, inaccurate data.”

BI

BI Data Warehouse Data Pipeline Data

The Symbiotic Relationship Between AI and Data Engineering

Ascend.io

FEBRUARY 28, 2024

While data engineering and Artificial Intelligence (AI) may seem like distinct fields at first glance, their symbiosis is undeniable. The foundation of any AI system is high-quality data. Here lies the critical role of data engineering: preparing and managing data to feed AI models.

Data Engineering

Data Engineering Data Engineer Engineering Metadata

Data Engineering Weekly #107

Data Engineering Weekly

NOVEMBER 13, 2022

Data Engineering Weekly Is Brought to You by RudderStack RudderStack provides data pipelines that make it easy to collect data from every application, website, and SaaS platform, then activate it in your warehouse and business tools. Sign up free to test out the tool today.

Data Engineering

Data Engineering Data Engineer Engineering Kafka

The Role of Data Observability in Building Reliable GenAI Systems

Monte Carlo

FEBRUARY 23, 2024

And this renewed focus on data quality is bringing much needed visibility into the health of technical systems. As generative AI (and the data powering it) takes center stage, it’s critical to bring this level of observability to where your data lives, in your data warehouse , data lake , or data lakehouse.

Systems

Systems Building Retail Data Lake

8 Data Ingestion Tools (Quick Reference Guide)

Monte Carlo

FEBRUARY 20, 2024

Choosing one tool over another isn’t just about the features it offers today; it’s a bet on the future of how data will flow within organizations. Matillion is an all-in-one ETL solution that stands out for its ability to handle complex data transformation tasks in all the popular cloud data warehouses.

Data Ingestion

Data Ingestion Google Cloud Kafka AWS

Data Validation Testing: Techniques, Examples, & Tools

Monte Carlo

AUGUST 8, 2023

By applying rules and checks, data validation testing verifies the data meets predefined standards and business requirements to help prevent data quality issues and data downtime. From this perspective, the data validation process looks a lot like any other DataOps process.

Data Validation

Data Validation Data Pipeline SQL Data

How Checkout.com Achieves Data Reliability at Scale with Monte Carlo

Monte Carlo

JANUARY 31, 2023

Partnering with Monte Carlo enabled the data team to gain greater visibility over the entire data platform and streamline incident management and resolution by leveraging Monte Carlo’s central UI. Ready to learn more about data observability and empower your company to drive adoption and trust of your data?

Data

Data Data Engineer Data Engineering High Quality Data

Celebrating the New Pioneers of Data Reliability

Monte Carlo

AUGUST 17, 2021

Data informs every business decision, from customer support to feature development, and most recently, how to support pricing plans for organizations most affected during COVID-19. When migrating to Snowflake, PagerDuty wanted to understand the health of their data pipelines through fully automated data observability.

Insurance

Insurance Retail Data Pipeline Portfolio

What is DataOps? The Ultimate Guide for Data Teams

Databand.ai

JANUARY 24, 2023

DataOps helps ensure organizations make decisions based on sound data. Previously, organizations have grabbed their full dataset across multiple environments, put it all into a data warehouse, and surfaced information from there. Altogether, these elements enable DataOps to turn bottlenecks into opportunities.

Retail

Retail Banking Government Manufacturing

How to Treat Your Data As a Product

Monte Carlo

MARCH 28, 2024

For the past few decades, most companies have kept data in an organizational silo. Analytics teams served business units, and even as data became more crucial to decision-making and product roadmaps, the teams in charge of data pipelines were treated more like plumbers and less like partners.

Data

Data High Quality Data Machine Learning Accessible

What is dbt Testing? Definition, Best Practices, and More

Monte Carlo

AUGUST 30, 2023

Run the test again to validate that the initial problem is solved and that your data meets your quality and accuracy standards. Schedule and automate Just like schema tests, custom data tests in dbt are typically not run just once but are incorporated into your regular data pipeline to ensure ongoing data quality.

SQL

SQL Datasets Database High Quality Data

61 Data Observability Use Cases From Real Data Teams

Monte Carlo

MAY 17, 2023

Data Warehouse (Or Lakehouse) Migration 34. Integrate Data Stacks Post Merger 35. Know When To Fix Vs. Refactor Data Pipelines Improve DataOps Processes 37. Analyze Data Incident Impact and Triage 39. Reduce The Amount Of Data Incidents Resident, an online mattress and homegoods store, has a lot of data.

Data

Data Data Pipeline Data Engineer Data Engineering

61 Data Observability Use Cases That Aren’t Totally Made Up

Monte Carlo

MAY 17, 2023

Data warehouse (or Lakehouse) migration 34. Integrate Data Stacks Post Merger 35. Know When To Fix Vs. Refactor Data Pipelines Improve DataOps Processes 37. Analyze Data Incident Impact and Triage 39. Reduce the amount of data incidents Resident, an online mattress and homegoods store, has a lot of data.

Data Pipeline

Data Pipeline Data Engineer Data Engineering Data

A Comprehensive Guide to Operational Analytics

Striim

JANUARY 8, 2025

To ensure effective implementation, decision services must access all existing data infrastructure components, such as data warehouses, BI tools, and real-time data pipelines. These improvements empower sales teams to act on high-quality data, driving better outcomes.

BI

BI Business Analyst Retail Raw Data

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Tackling Real Time Streaming Data With SQL Using RisingWave

Webinars

Trending Sources

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Webinars

X-Ray Vision For Your Flink Stream Processing With Datorios

Modern Customer Data Platform Principles

Data Engineering Weekly #206

Data Pipelines in the Healthcare Industry

Implementing Data Contracts in the Data Warehouse

The Ten Standard Tools To Develop Data Pipelines In Microsoft Azure

Build vs Buy Data Pipeline Guide

Data Engineering Weekly #186

Data Quality Score: The next chapter of data quality at Airbnb

What is Data Observability? 5 Key Pillars To Know

Available Now! Automated Testing for Data Transformations

Centralize Your Data Processes With a DataOps Process Hub

Data Quality at Airbnb

[O’Reilly Book] Chapter 1: Why Data Quality Deserves Attention Now

The Role of Data Observability in Building Reliable GenAI Systems

Data Observability Tools: Types, Capabilities, and Notable Solutions

Data Quality Engineer: Skills, Salary, & Tools Required

DataOps Explained: How To Not Screw It Up

How HomeToGo Is Building a Robust Clickstream Data Architecture with Snowflake, Snowplow and dbt

How to become Azure Data Engineer I Edureka

DataOps For Business Analytics Teams

Creating Value With a Data-Centric Culture: Essential Capabilities to Treat Data as a Product

What is Data Orchestration?

Interpreting the Gartner Data Observability Market Guide

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

Best Data Observability Tools (with RFP Template and Analyst Reports)

The Symbiotic Relationship Between AI and Data Engineering

Data Engineering Weekly #107

The Role of Data Observability in Building Reliable GenAI Systems

8 Data Ingestion Tools (Quick Reference Guide)

Data Validation Testing: Techniques, Examples, & Tools

How Checkout.com Achieves Data Reliability at Scale with Monte Carlo

Celebrating the New Pioneers of Data Reliability

What is DataOps? The Ultimate Guide for Data Teams

How to Treat Your Data As a Product

What is dbt Testing? Definition, Best Practices, and More

61 Data Observability Use Cases From Real Data Teams

61 Data Observability Use Cases That Aren’t Totally Made Up

A Comprehensive Guide to Operational Analytics

Stay Connected