Data Warehouse and Systems - Data Engineering Digest

Data Warehouse Interview Questions

Analytics Vidhya

FEBRUARY 8, 2023

source: svitla.com Introduction Before jumping to the data warehouse interview questions, let’s first understand the overview of a data warehouse. The data is then organized and structured […] The post Data Warehouse Interview Questions appeared first on Analytics Vidhya.

Data Warehouse

Data Warehouse Data Systems Management

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Snowflake

NOVEMBER 26, 2024

Many of our customers — from Marriott to AT&T — start their journey with the Snowflake AI Data Cloud by migrating their data warehousing workloads to the platform. Today we’re focusing on customers who migrated from a cloud data warehouse to Snowflake and some of the benefits they saw.

Data Warehouse

Data Warehouse Cloud PostgreSQL Hadoop

Simplify Data Warehouse Migrations: Free SnowConvert

Snowflake

JANUARY 28, 2025

Migrating from a traditional data warehouse to a cloud data platform is often complex, resource-intensive and costly. Snowflake and many of its system integrator (SI) partners have leveraged SnowConvert to accelerate hundreds of migration projects.

Data Warehouse

Data Warehouse Professional Services SQL Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Simplify Data Warehouse Migrations: Free SnowConvert with Redshift Support

Snowflake

JANUARY 28, 2025

Migrating from a traditional data warehouse to a cloud data platform is often complex, resource-intensive and costly. Snowflake and many of its system integrator (SI) partners have leveraged SnowConvert to accelerate hundreds of migration projects.

Data Warehouse

Data Warehouse Professional Services SQL Coding

Data Integrity for AI: What’s Old is New Again

Precisely

JANUARY 9, 2025

The goal of this post is to understand how data integrity best practices have been embraced time and time again, no matter the technology underpinning. In the beginning, there was a data warehouse The data warehouse (DW) was an approach to data architecture and structured data management that really hit its stride in the early 1990s.

Data Integration

Data Integration Hadoop Data Warehouse Data Lake

Scaling Beyond Postgres: How to Choose a Real-Time Analytical Database

Simon Späti

MARCH 11, 2025

But data volumes grow, analytical demands become more complex, and Postgres stops being enough. Therefore, you’ve probably come across terms like OLAP (Online Analytical Processing) systems, data warehouses, and, more recently, real-time analytical databases.

Database

Database Data Warehouse Data Engineering Data Engineer

How Meta discovers data flows via lineage at scale

Engineering at Meta

JANUARY 22, 2025

Data lineage is an instrumental part of Metas Privacy Aware Infrastructure (PAI) initiative, a suite of technologies that efficiently protect user privacy. It is a critical and powerful tool for scalable discovery of relevant data and data flows, which supports privacy controls across Metas systems.

Data Warehouse

Data Warehouse SQL Programming Language Data

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

The world we live in today presents larger datasets, more complex data, and diverse needs, all of which call for efficient, scalable data systems. Open Table Format (OTF) architecture now provides a solution for efficient data storage, management, and processing while ensuring compatibility across different platforms.

Architecture

Architecture Systems Data Lake Google Cloud

Accelerate Offloading to Cloudera Data Warehouse (CDW) with Procedural SQL Support

Cloudera

JULY 16, 2021

Did you know Cloudera customers, such as SMG and Geisinger , offloaded their legacy DW environment to Cloudera Data Warehouse (CDW) to take advantage of CDW’s modern architecture and best-in-class performance? The Data Warehouse on Cloudera Data Platform provides easy to use self-service and advanced analytics use cases at scale.

Data Warehouse

Data Warehouse SQL PostgreSQL Database

Data News — Week 25.02

Christophe Blefari

JANUARY 11, 2025

AI Product Day on March 31 ( register ) AI News 🤖 The current economic uncertainties are affecting the tech and data worlds. AI companies are aiming for the moon—AGI—promising it will arrive once OpenAI develops a system capable of generating at least $100 billion in profits.

Data

Data Data Warehouse Coding Programming Language

Designing a "low-effort" ELT system, using stitch and dbt

Start Data Engineering

JULY 11, 2020

Intro A very common use case in data engineering is to build a ETL system for a data warehouse, to have data loaded in from multiple separate databases to enable data analysts/scientists to be able to run queries on this data, since the source databases are used by your applications and we do not want these analytic queries to affect our application (..)

Systems

Systems Designing ETL System Data Warehouse

Design Considerations for Cloud-Native Data Systems

Confluent

JULY 26, 2021

Twenty years ago, the data warehouses of choice were Oracle and Teradata. Since then, growth and innovation has shifted to the cloud, and a new generation of data systems have […].

Systems

Systems Cloud Designing Data Warehouse

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Traditional databases excelled at structured data and transactional workloads but struggled with performance at scale as data volumes grew.

Data Lake

Data Lake Cloud Storage Metadata Data Warehouse

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. These patterns include both centralized storage patterns like data warehouse , data lake and data lakehouse , and distributed patterns such as data mesh.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

Review: Building a Real Time Data Warehouse

Start Data Engineering

APRIL 10, 2020

Many data engineers coming from traditional batch processing frameworks have questions about real time data processing systems, like “What kind of data model did you implement, for real-time processing?”

Data Warehouse

Data Warehouse Building Data Data Engineering

Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems

Data Engineering Podcast

DECEMBER 25, 2022

Recognizing this shortcoming and the capabilities that could be unlocked by a robust solution Rishabh Poddar helped to create Opaque Systems as an outgrowth of his PhD studies. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production.

Machine Learning

Machine Learning Systems Data Lake Data Warehouse

Comprehensive Guide to Modern Data Warehouse in 2024

Hevo

SEPTEMBER 4, 2024

A data warehouse is a centralized system that stores, integrates, and analyzes large volumes of structured data from various sources. It is predicted that more than 200 zettabytes of data will be stored in the global cloud by 2025.

Data Warehouse

Data Warehouse Structured Data Data Cloud

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). What are the pain points that are still prevalent in lakehouse architectures as compared to warehouse or vertically integrated systems?

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Understanding The Immune System With Data At ImmunAI

Data Engineering Podcast

FEBRUARY 20, 2022

In this episode Guy Yachdav, director of software engineering for ImmunAI, shares the complexities that are inherent to managing data workflows for bioinformatics. RudderStack’s smart customer data pipeline is warehouse-first. RudderStack’s smart customer data pipeline is warehouse-first.

Systems

Systems Software Engineer Software Engineering Data Warehouse

A Look At The Data Systems Behind The Gameplay For League Of Legends

Data Engineering Podcast

NOVEMBER 20, 2022

In this episode Ian Schweer shares his experiences at Riot Games supporting player-focused features such as machine learning models and recommeder systems that are deployed as part of the game binary. The biggest challenge with modern data systems is understanding what data you have, where it is located, and who is using it.

Systems

Systems Metadata Data Pipeline MongoDB

Supporting Diverse ML Systems at Netflix

Netflix Tech

MARCH 7, 2024

The Machine Learning Platform (MLP) team at Netflix provides an entire ecosystem of tools around Metaflow , an open source machine learning infrastructure framework we started, to empower data scientists and machine learning practitioners to build and manage a variety of ML systems. ETL workflows), as well as downstream (e.g.

Systems

Systems Media Machine Learning Data Warehouse

Leading The Charge For The ELT Data Integration Pattern For Cloud Data Warehouses At Matillion

Data Engineering Podcast

MAY 1, 2022

Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows.

Data Warehouse

Data Warehouse Data Integration Cloud Google Cloud

Data logs: The latest evolution in Meta’s access tools

Engineering at Meta

FEBRUARY 4, 2025

Were sharing how Meta built support for data logs, which provide people with additional data about how they use our products. Here we explore initial system designs we considered, an overview of the current architecture, and some important principles Meta takes into account in making data accessible and easy to understand.

Accessible

Accessible Accessibility Raw Data Data Warehouse

Data Warehousing Essentials: A Guide To Data Warehousing

Seattle Data Guy

FEBRUARY 10, 2024

Photo by Tiger Lily Data warehouses and data lakes play a crucial role for many businesses. It gives businesses access to the data from all of their various systems. As well as often integrating data so that end-users can answer business critical questions.

Data Lake

Data Lake Data Warehouse Data Accessible

Moving Enterprise Data From Anywhere to Any System Made Easy

Cloudera

JUNE 2, 2022

In a recent customer workshop with a large retail data science media company, one of the attendees, an engineering leader, made the following observation: “Everytime I go to your competitor website, they only care about their system. How to onboard data into their system? I don’t care about their system.

Systems

Systems Data Lake Google Cloud Data Collection

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

Snowflake

DECEMBER 4, 2024

The trend to centralize data will accelerate, making sure that data is high-quality, accurate and well managed. Overall, data must be easily accessible to AI systems, with clear metadata management and a focus on relevance and timeliness.

Unstructured Data

Unstructured Data Data Lake Deep Learning Structured Data

Snowflake Ventures Invests in Anomalo for Advanced Data Quality

Snowflake

MARCH 12, 2025

These products and services generate petabytes of data based on every transaction and decision, says Prakash Jaganathan, Senior Director of Data and Analytics Engineering at Discover. While working together, they bonded over their shared passion for data.

Unstructured Data

Unstructured Data High Quality Data Banking Machine Learning

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Cloudera

APRIL 3, 2023

In this blog, we will share with you in detail how Cloudera integrates core compute engines including Apache Hive and Apache Impala in Cloudera Data Warehouse with Iceberg. We will publish follow up blogs for other data services. Try Cloudera Data Warehouse (CDW) by signing up for a 60 day trial , or test drive CDP.

Data Warehouse

Data Warehouse Java Metadata Data

Build Confidence In Your Data Platform With Schema Compatibility Reports That Span Systems And Domains Using Schemata

Data Engineering Podcast

SEPTEMBER 11, 2022

Summary Data engineering systems are complex and interconnected with myriad and often opaque chains of dependencies. In order to turn this into a tractable problem one approach is to define and enforce contracts between producers and consumers of data. Can you describe what Schemata is and the story behind it?

Systems

Systems Metadata Building MongoDB

How Meta understands data at scale

Engineering at Meta

APRIL 28, 2025

Managing and understanding large-scale data ecosystems is a significant challenge for many organizations, requiring innovative solutions to efficiently safeguard user data. Meta’s vast and diverse systems make it particularly challenging to comprehend its structure, meaning, and context at scale.

Metadata

Metadata Data Utilities Data Warehouse

Gain Visibility Into Your Entire Machine Learning System Using Data Logging With WhyLogs

Data Engineering Podcast

APRIL 24, 2022

WhyLogs is a powerful library for flexibly instrumenting all of your data systems to understand the entire lifecycle of your data from source to productionized model. You have full control over your data and their plugin system lets you integrate with all of your other data tools, including data warehouses and SaaS platforms.

Machine Learning

Machine Learning Systems Data Lake Java

A Data Mesh Implementation: Expediting Value Extraction from ERP/CRM Systems

Towards Data Science

FEBRUARY 6, 2024

ERP and CRM systems are designed and built to fulfil a broad range of business processes and functions. This generalisation makes their data models complex and cryptic and require domain expertise. Accessibility : I could easily request access to these data products.

Systems

Systems Raw Data Metadata Data Cleanse

Simplify Delta Lake Complexity with mack.

Confessions of a Data Guy

JANUARY 12, 2023

Anyone who’s been roaming around the forest of Data Engineering has probably run into many of the newish tools that have been growing rapidly around the concepts of Data Warehouses, Data Lakes, and Lake Houses … the merging of the old relational database functionality with TB and PB level cloud-based file storage systems.

Data Lake

Data Lake Relational Database Data Warehouse Data Engineering

Introducing Configurable Metaflow

Netflix Tech

DECEMBER 19, 2024

Many of these projects are under constant development by dedicated teams with their own business goals and development best practices, such as the system that supports our content decision makers , or the system that ranks which language subtitles are most valuable for a specific piece ofcontent.

Machine Learning

Machine Learning Project Data Warehouse Coding

Engineering Privacy: A Technical Overview of Privacy in Data Systems

Data Engineering Weekly

SEPTEMBER 26, 2024

Once again, I want to thank the Data Heros community. Last Friday, we discussed the challenges in bulk discovery and anonymization processes in data warehouses. The collective design choices and ideas lead to a comprehensive overview of thinking about designing data infrastructure with a privacy-first approach.

Systems

Systems Engineering Data Warehouse Architecture

Building and Evaluating GenAI Knowledge Management Systems using Ollama, Trulens and Cloudera

Cloudera

MAY 23, 2024

In modern enterprises, the exponential growth of data means organizational knowledge is distributed across multiple formats, ranging from structured data stores such as data warehouses to multi-format data stores like data lakes. This makes gathering information for decision making a challenge.

Systems

Systems Building Management Data Lake

Data Lake vs. Data Warehouse vs. Data Lakehouse

Sync Computing

NOVEMBER 7, 2024

Data volume and velocity, governance, structure, and regulatory requirements have all evolved and continue to. Despite these limitations, data warehouses, introduced in the late 1980s based on ideas developed even earlier, remain in widespread use today for certain business intelligence and data analysis applications.

Data Lake

Data Lake Data Warehouse Business Intelligence Unstructured Data

Trends and Takeaways from Banking and Payments’ Event of the Year

Snowflake

NOVEMBER 11, 2024

This is not surprising when you consider all the benefits, such as reducing complexity [and] costs and enabling zero-copy data access (ideal for centralizing data governance).

Banking

Banking Finance Retail Food

4 ELT Alternatives To Airbyte – How To Ingest Your Data

Seattle Data Guy

MAY 7, 2024

Getting data out of source systems and into a data warehouse or data lake is one of the first steps in making it usable by analysts and data scientists. The question is how will your team do that?

Data Lake

Data Lake Data Warehouse Data Systems

5 Advantages of Real-Time ETL for Snowflake

Striim

MARCH 21, 2025

With instant elasticity, high-performance, and secure data sharing across multiple clouds , Snowflake has become highly in-demand for its cloud-based data warehouse offering. As organizations adopt Snowflake for business-critical workloads, they also need to look for a modern data integration approach.

Data Warehouse

Data Warehouse MongoDB MySQL Hadoop

On-Prem vs. The Cloud: Key Considerations

phData: Data Engineering

FEBRUARY 21, 2025

In this post, we will be particularly interested in the impact that cloud computing left on the modern data warehouse. We will explore the different options for data warehousing and how you can leverage this information to make the right decisions for your organization. Understanding the Basics What is a Data Warehouse?

Cloud

Cloud Data Warehouse Amazon Web Services Data Ingestion

Evaluating Change Data Capture Tools: A Comprehensive Guide

Data Engineering Weekly

AUGUST 6, 2024

CDC Evaluation Guide Google Sheet Link: [link] CDC Evaluation Guide Github Link: [link] Change Data Capture (CDC) is a powerful technology in data engineering that allows for continuously capturing changes (inserts, updates, and deletes) made to source systems.

Data Lake

Data Lake Data Warehouse Database Data Architecture

Functional Data Engineering — a modern paradigm for batch data processing

Maxime Beauchemin

JANUARY 7, 2018

This means that ideally the logic in source control describes how to build the full state of the data warehouse throughout all time periods. But how do we model this in a functional data warehouse without mutating data? With dimension snapshots where a new partition is appended at each ETL schedule.

Data Process

Data Process Data Engineer Data Engineering Process

Cloudera Named a Leader in the 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems (DBMS)

Cloudera

DECEMBER 16, 2022

We are pleased to announce that Cloudera has been named a Leader in the 2022 Gartner ® Magic Quadrant for Cloud Database Management Systems. Cloudera has long had the capabilities of a data lakehouse, if not the label. Cloudera has been recognized in this cloud DBMS report since its inception in 2020.

Database

Database Cloud Systems Management

Data Warehouse Interview Questions

Cloud Data Warehouse Migrations: Success Stories from WHOOP and Nexon

Webinars

Trending Sources

Simplify Data Warehouse Migrations: Free SnowConvert

Webinars

Simplify Data Warehouse Migrations: Free SnowConvert with Redshift Support

Data Integrity for AI: What’s Old is New Again

Scaling Beyond Postgres: How to Choose a Real-Time Analytical Database

How Meta discovers data flows via lineage at scale

Why Open Table Format Architecture is Essential for Modern Data Systems

Accelerate Offloading to Cloudera Data Warehouse (CDW) with Procedural SQL Support

Data News — Week 25.02

Designing a "low-effort" ELT system, using stitch and dbt

Design Considerations for Cloud-Native Data Systems

How Apache Iceberg Is Changing the Face of Data Lakes

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Review: Building a Real Time Data Warehouse

Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems

Comprehensive Guide to Modern Data Warehouse in 2024

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Understanding The Immune System With Data At ImmunAI

A Look At The Data Systems Behind The Gameplay For League Of Legends

Supporting Diverse ML Systems at Netflix

Leading The Charge For The ELT Data Integration Pattern For Cloud Data Warehouses At Matillion

Data logs: The latest evolution in Meta’s access tools

Data Warehousing Essentials: A Guide To Data Warehousing

Moving Enterprise Data From Anywhere to Any System Made Easy

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

Snowflake Ventures Invests in Anomalo for Advanced Data Quality

Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs

Build Confidence In Your Data Platform With Schema Compatibility Reports That Span Systems And Domains Using Schemata

How Meta understands data at scale

Gain Visibility Into Your Entire Machine Learning System Using Data Logging With WhyLogs

A Data Mesh Implementation: Expediting Value Extraction from ERP/CRM Systems

Simplify Delta Lake Complexity with mack.

Introducing Configurable Metaflow

Engineering Privacy: A Technical Overview of Privacy in Data Systems

Building and Evaluating GenAI Knowledge Management Systems using Ollama, Trulens and Cloudera

Data Lake vs. Data Warehouse vs. Data Lakehouse

Trends and Takeaways from Banking and Payments’ Event of the Year

4 ELT Alternatives To Airbyte – How To Ingest Your Data

5 Advantages of Real-Time ETL for Snowflake

On-Prem vs. The Cloud: Key Considerations

Evaluating Change Data Capture Tools: A Comprehensive Guide

Functional Data Engineering — a modern paradigm for batch data processing

Cloudera Named a Leader in the 2022 Gartner® Magic Quadrant™ for Cloud Database Management Systems (DBMS)

Stay Connected