Data Architecture, Data Warehouse and Metadata

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Traditional databases excelled at structured data and transactional workloads but struggled with performance at scale as data volumes grew.

Data Lake

Data Lake Cloud Storage Metadata Data Warehouse

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. Each of these architectures has its own unique strengths and tradeoffs.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

Keeping Your Data Warehouse In Order With DataForm

Data Engineering Podcast

OCTOBER 14, 2019

Summary Managing a data warehouse can be challenging, especially when trying to maintain a common set of patterns. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, Alluxio, and Data Council.

Data Warehouse

Data Warehouse PostgreSQL AWS Programming Language

Breaking State and Local Data Silos with Modern Data Architectures

Cloudera

AUGUST 30, 2022

Modern data architectures. To eliminate or integrate these silos, the public sector needs to adopt robust data management solutions that support modern data architectures (MDAs). Deploying modern data architectures. Lack of sharing hinders the elimination of fraud, waste, and abuse. Forrester ).

Data Architecture

Data Architecture Architecture Data Lake NoSQL

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

First, we create an Iceberg table in Snowflake and then insert some data. Then, we add another column called HASHKEY , add more data, and locate the S3 file containing metadata for the iceberg table. In the screenshot below, we can see that the metadata file for the Iceberg table retains the snapshot history.

Architecture

Architecture Systems Data Lake Google Cloud

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

Cloudera

SEPTEMBER 29, 2022

Each of these trends claim to be complete models for their data architectures to solve the “everything everywhere all at once” problem. Data teams are confused as to whether they should get on the bandwagon of just one of these trends or pick a combination. First, we describe how data mesh and data fabric could be related.

Architecture

Architecture Data Architecture Metadata Data Warehouse

How Column-Aware Development Tooling Yields Better Data Models

Data Engineering Podcast

JUNE 17, 2023

Sign up free at dataengineeringpodcast.com/rudderstack - Your host is Tobias Macey and today I'm interviewing Satish Jayanthi about the practice and promise of building a column-aware data architecture through intentional modeling Interview Introduction How did you get involved in the area of data management?

Data Lake

Data Lake Machine Learning Metadata Data Architecture

A Primer On Enterprise Data Curation with Todd Walter - Episode 49

Data Engineering Podcast

SEPTEMBER 23, 2018

This includes modeling the lifecycle of your information as a pipeline from the raw, messy, loosely structured records in your data lake, through a series of transformations and ultimately to your data warehouse. Can you walk through the stages of an ideal lifecycle for data within the context of an organizations uses for it?

Data Lake

Data Lake Data Warehouse Data Architecture Architecture

The View From The Lakehouse Of Architectural Patterns For Your Data Platform

Data Engineering Podcast

JULY 3, 2022

Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code.

Architecture

Architecture Metadata MongoDB Data Warehouse

Is Modern Data Warehouse Architecture Broken?

Monte Carlo

APRIL 16, 2022

The data warehouse is the foundation of the modern data stack, so it caught our attention when we saw Convoy head of data Chad Sanderson declare, “ the data warehouse is broken ” on LinkedIn. Treating data like an API. Immutable data warehouses have challenges too.

Data Warehouse

Data Warehouse Architecture Data Data Architect

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. The past decades of enterprise data platform architectures can be summarized in 69 words. Introduction to Data Mesh. Source: Thoughtworks.

Pharmaceutical

Pharmaceutical Data Lake Data Architecture Architecture

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

With Cloudera’s vision of hybrid data , enterprises adopting an open data lakehouse can easily get application interoperability and portability to and from on premises environments and any public cloud without worrying about data scaling. Why integrate Apache Iceberg with Cloudera Data Platform?

Data Lake

Data Lake Business Intelligence Metadata Data Warehouse

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. In recent years, the term “data lakehouse” was coined to describe this architectural pattern of tabular analytics over data in the data lake.

Data Lake

Data Lake Data Warehouse BI SQL

Data Catalog - A Broken Promise

Data Engineering Weekly

DECEMBER 29, 2022

Data Catalog as a passive web portal to display metadata requires significant rethinking to adopt modern data workflow, not just adding “modern” in its prefix. I know that is an expensive statement to make😊 To be fair, I’m a big fan of data catalogs, or metadata management , to be precise.

Metadata

Metadata Data Warehouse ETL Tools Data Workflow

Hands-On Introduction to Delta Lake with (py)Spark

Towards Data Science

FEBRUARY 15, 2023

In this context, data management in an organization is a key point for the success of its projects involving data. One of the main aspects of correct data management is the definition of a data architecture. The Lakehouse architecture was one of them. show() The history object is a Spark Data Frame.

Data Lake

Data Lake Data Warehouse Hadoop Architecture

Data Architect: Role Description, Skills, Certifications and When to Hire

AltexSoft

FEBRUARY 11, 2023

This specialist works closely with people on both business and IT sides of a company to understand the current needs of the stakeholders and help them unlock the full potential of data. To get a better understanding of a data architect’s role, let’s clear up what data architecture is.

Data Architect

Data Architect Certification Generalist Big Data

AI at Scale isn’t Magic, it’s Data – Hybrid Data

Cloudera

OCTOBER 11, 2022

But it isn’t just aggregating data for models. Data needs to be prepared and analyzed. Different data types need different types of analytics – real-time, streaming, operational, data warehouses. And that data is likely in clouds, in data centers and at the edge.

Data Science

Data Science Aggregated Data Data Consulting

Zero-ETL, ChatGPT, And The Future of Data Engineering

Towards Data Science

APRIL 3, 2023

As part of this movement, Fivetran and dbt fundamentally altered the data pipeline from ETL to ELT. Hightouch interrupted SaaS eating the world in an attempt to shift the center of gravity to the data warehouse. Other common light transformations done within the ingestion phase are data formatting and deduplication.

Data Engineering

Data Engineering Data Engineer Engineering Data Warehouse

Modern Data Management Essentials: Exploring Data Fabric

Precisely

JULY 18, 2024

Key Takeaways Data Fabric is a modern data architecture that facilitates seamless data access, sharing, and management across an organization. Data management recommendations and data products emerge dynamically from the fabric through automation, activation, and AI/ML analysis of metadata.

Data Management

Data Management Management Metadata Database-centric

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

The consumption of the data should be supported through an elastic delivery layer that aligns with demand, but also provides the flexibility to present the data in a physical format that aligns with the analytic application, ranging from the more traditional data warehouse view to a graph view in support of relationship analysis.

Data Lake

Data Lake Analytics Application Cloud Storage Architecture

The Evolution of Table Formats

Monte Carlo

MAY 14, 2024

As organizations seek greater value from their data, data architectures are evolving to meet the demand — and table formats are no exception. At its core, a table format is a sophisticated metadata layer that defines, organizes, and interprets multiple underlying data files.

Data Lake

Data Lake Metadata Hadoop Data Governance

Data Pipeline Observability: A Model For Data Engineers

Databand.ai

JUNE 28, 2023

This might include processes like data extraction from different sources, data cleansing, data transformation (like aggregation), and loading the data into a database or a data warehouse. They’re betting their business on it and that the data pipelines that run it will continue to work.

Data Pipeline

Data Pipeline Data Engineering Data Engineer Engineering

Data Scientist vs Data Engineer: Differences and Why You Need Both

AltexSoft

OCTOBER 30, 2021

Data engineer’s integral task is building and maintaining data infrastructure — the system managing the flow of data from its source to destination. This typically includes setting up two processes: an ETL pipeline , which moves data, and a data storage (typically, a data warehouse ), where it’s kept.

Data Engineering

Data Engineering Data Engineer Engineering Machine Learning

Educating ChatGPT on Data Lakehouse

Cloudera

MARCH 17, 2023

As the use of ChatGPT becomes more prevalent, I frequently encounter customers and data users citing ChatGPT’s responses in their discussions. I love the enthusiasm surrounding ChatGPT and the eagerness to learn about modern data architectures such as data lakehouses, data meshes, and data fabrics.

Education

Education Unstructured Data Data Lake Data Warehouse

Data Lakehouse: Concept, Key Features, and Architecture Layers

AltexSoft

NOVEMBER 10, 2021

The pun being obvious, there’s more to that than just a new term: Data lakehouses combine the best features of both data lakes and data warehouses and this post will explain this all. What is a data lakehouse? Data warehouse vs data lake vs data lakehouse: What’s the difference.

Architecture

Architecture Data Lake Data Warehouse Metadata

Data Vault Architecture, Data Quality Challenges, And How To Solve Them

Monte Carlo

FEBRUARY 9, 2023

Over the past several years, data warehouses have evolved dramatically, but that doesn’t mean the fundamentals underpinning sound data architecture needs to be thrown out the window. What is a Data Vault model? Pie Insurance , a leading small business insurtech, leverages a data vault 2.0

Architecture

Architecture Raw Data Metadata Data Warehouse

Modernizing Data Warehousing with Snowflake and Hybrid Data Vault

Snowflake

APRIL 5, 2023

Two different data modeling approaches—dimensional data modeling and Data Vault—each have their own pros and cons. Modernizing a data warehouse with Snowflake Data Cloud is a smart investment that can provide significant benefits to businesses of all sizes, today more than ever as data models become ever more complex.

Data Warehouse

Data Warehouse Healthcare Unstructured Data Metadata

Announcing the 2019 Data Impact Awards

Cloudera

MAY 22, 2019

Although the program is technically in its seventh year, as the first joint awards program, this year’s Data Impact Awards will span even more use cases, covering even more advances in IoT, data warehouse, machine learning, and more. DATA SECURITY AND GOVERNANCE.

Pharmaceutical

Pharmaceutical Recruitment Machine Learning Data Architecture

5 Reasons Data Discovery Platforms Are Best For Data Lakes

Monte Carlo

APRIL 1, 2021

is whether to choose a data warehouse or lake to power storage and compute for their analytics. While data warehouses provide structure that makes it easy for data teams to efficiently operationalize data (i.e., Data discovery tools and platforms can help. Image courtesy of Adrian on Unsplash.

Data Lake

Data Lake Data Warehouse Unstructured Data Government

Data Cloud Deployment Framework: Architecture

Cloudyard

MARCH 4, 2023

Read Time: 5 Minute, 16 Second As we know Snowflake has introduced latest badge “Data Cloud Deployment Framework” which helps to understand knowledge in designing, deploying, and managing the Snowflake landscape. Leverage the MERGE command to get the latest data in the Business layer.

Architecture

Architecture Cloud Metadata Data Ingestion

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

AltexSoft

AUGUST 29, 2023

Instead of relying on traditional hierarchical structures and predefined schemas, as in the case of data warehouses, a data lake utilizes a flat architecture. This structure is made efficient by data engineering practices that include object storage. Data warehouse vs. data lake in a nutshell.

Data Lake

Data Lake Architecture IT Amazon Web Services

Data Lineage Tools: Key Capabilities and 5 Notable Solutions

Databand.ai

JULY 19, 2023

This capability is useful for businesses, as it provides a clear and comprehensive view of their data’s history and transformations. Data lineage tools are not a new concept. In this article: Why Are Data Lineage Tools Important? It provides context for data, making it easier to understand and manage.

Pipeline-centric

Pipeline-centric Data Governance Metadata Government

Moving Past ETL and ELT: Understanding the EtLT Approach

Ascend.io

AUGUST 31, 2023

In the dynamic world of data, many professionals are still fixated on traditional patterns of data warehousing and ETL, even while their organizations are migrating to the cloud and adopting cloud-native data services. Modern platforms like Redshift , Snowflake , and BigQuery have elevated the data warehouse model.

Data Lake

Data Lake Data Warehouse ETL Tools Data Pipeline

Snowflake Architecture and It's Fundamental Concepts

ProjectPro

JANUARY 31, 2022

As the demand for big data grows, an increasing number of businesses are turning to cloud data warehouses. The cloud is the only platform to handle today's colossal data volumes because of its flexibility and scalability. Launched in 2014, Snowflake is one of the most popular cloud data solutions on the market.

Architecture

Architecture IT Data Warehouse Amazon Web Services

Data Engineering Glossary

Silectis

JANUARY 3, 2021

Big Data Processing In order to extract value or insights out of big data, one must first process it using big data processing software or frameworks, such as Hadoop. Big Query Google’s cloud data warehouse. Data Catalog An organized inventory of data assets relying on metadata to help with data management.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Just Launched: Dremio SQL Query Engine Data Quality Monitoring

Monte Carlo

AUGUST 30, 2024

It’s our goal at Monte Carlo to provide data observability and quality across the enterprise by monitoring every system vital in the delivery of data from source to consumption. We started with popular modern data warehouses and quickly expanded our support as data lakes became data lakehouses.

SQL

SQL Engineering Data Lake High Quality Data

Databricks SQL Analytics Workspace - The Evolution of the Lakehouse

Advancing Analytics: Data Engineering

NOVEMBER 10, 2020

We have discussed in the past this idea of the lakehouse , the aspirational target of many analytics platforms these days of combining the huge power and potential of data lakes with the rigour, reliability and concurrency of a data warehouse. Essentially, we store data twice so that we can achieve the best of both worlds.

SQL

SQL BI Data Warehouse Data Lake

Data Engineering Weekly #114

Data Engineering Weekly

JANUARY 15, 2023

SiliconANGLE theCUBE: Analyst Predictions 2023 - The Future of Data Management By far one of the best analyses of trends in Data Management. 2023 predictions from the panel are; Unified metadata becomes kingmaker. RudderStack builds your CDP on top of your data warehouse, giving you a more secure and cost-effective solution.

Data Engineering

Data Engineering Data Engineer Engineering Metadata

Data Provenance vs. Data Lineage: What’s the Difference?

Monte Carlo

DECEMBER 8, 2023

Enter: data provenance and data lineage. Data lineage is a visual tool that tracks the movement and transformations of data through various systems, processes, and applications. Data provenance is the record of metadata from data’s original sources, providing the historical context and authenticity of data.

Metadata

Metadata Data Data Warehouse Government

Data Provenance vs. Data Lineage: What’s the Difference?

Monte Carlo

DECEMBER 8, 2023

Enter: data provenance and data lineage. Data lineage is a visual tool that tracks the movement and transformations of data through various systems, processes, and applications. Data provenance is the record of metadata from data’s original sources, providing the historical context and authenticity of data.

Metadata

Metadata Data Data Warehouse Government

5 Helpful Extract & Load Practices for High-Quality Raw Data

Meltano

DECEMBER 7, 2022

ELT is becoming the default choice for data architectures and yet, many best practices focus primarily on “T”: the transformations. But the extract and load phase is where data quality is determined for transformation and beyond. Source system: Metadata about where the data was extracted from.

Raw Data

Raw Data Metadata Data Database

The Symbiotic Relationship Between AI and Data Engineering

Ascend.io

FEBRUARY 28, 2024

This process reduces noise in the data, which is crucial for the effectiveness of AI algorithms, especially in complex predictive models and deep learning applications. Such comprehensive metadata management is crucial in adhering to privacy and compliance standards, safeguarding AI operations against potential legal and ethical pitfalls.

Data Engineering

Data Engineering Data Engineer Engineering Metadata

What’s Next for Data Engineering in 2023? 10 Predictions

Monte Carlo

NOVEMBER 21, 2022

Pro-tip: be sure to check out his talk from IMPACT: The Data Observability Summit. That gives a lot of credence to the idea you can look at Snowflake’s revenues as a proxy for what’s happening in the larger data ecosystem. billion in four years, which underscores the terrific demand there is for cloud data warehouses.

Data Engineering

Data Engineering Data Engineer Engineering Data Warehouse

Interpreting the Gartner Data Observability Market Guide

Monte Carlo

AUGUST 13, 2024

This year data observability skyrocketed to the top of the Gartner’s Hype Cycles. According to Gartner, 50% of enterprise companies implementing distributed data architectures will have adopted data observability tools by 2026 – up from just ~20% in 2024. Image courtesy of Gartner.

Data

Data Data Warehouse Data Pipeline Data Architecture

How Apache Iceberg Is Changing the Face of Data Lakes

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Trending Sources

Keeping Your Data Warehouse In Order With DataForm

Breaking State and Local Data Silos with Modern Data Architectures

Why Open Table Format Architecture is Essential for Modern Data Systems

The Top Three Entangled Trends in Data Architectures: Data Mesh, Data Fabric, and Hybrid Architectures

How Column-Aware Development Tooling Yields Better Data Models

A Primer On Enterprise Data Curation with Todd Walter - Episode 49

The View From The Lakehouse Of Architectural Patterns For Your Data Platform

Is Modern Data Warehouse Architecture Broken?

What is a Data Mesh?

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

The Future of the Data Lakehouse – Open

Data Catalog - A Broken Promise

Hands-On Introduction to Delta Lake with (py)Spark

Data Architect: Role Description, Skills, Certifications and When to Hire

AI at Scale isn’t Magic, it’s Data – Hybrid Data

Zero-ETL, ChatGPT, And The Future of Data Engineering

Modern Data Management Essentials: Exploring Data Fabric

Demystifying Modern Data Platforms

The Evolution of Table Formats

Data Pipeline Observability: A Model For Data Engineers

Data Scientist vs Data Engineer: Differences and Why You Need Both

Educating ChatGPT on Data Lakehouse

Data Lakehouse: Concept, Key Features, and Architecture Layers

Data Vault Architecture, Data Quality Challenges, And How To Solve Them

Modernizing Data Warehousing with Snowflake and Hybrid Data Vault

Announcing the 2019 Data Impact Awards

5 Reasons Data Discovery Platforms Are Best For Data Lakes

Data Cloud Deployment Framework: Architecture

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

Data Lineage Tools: Key Capabilities and 5 Notable Solutions

Moving Past ETL and ELT: Understanding the EtLT Approach

Snowflake Architecture and It's Fundamental Concepts

Data Engineering Glossary

Just Launched: Dremio SQL Query Engine Data Quality Monitoring

Databricks SQL Analytics Workspace - The Evolution of the Lakehouse

Data Engineering Weekly #114

Data Provenance vs. Data Lineage: What’s the Difference?

Data Provenance vs. Data Lineage: What’s the Difference?

5 Helpful Extract & Load Practices for High-Quality Raw Data

The Symbiotic Relationship Between AI and Data Engineering

What’s Next for Data Engineering in 2023? 10 Predictions

Interpreting the Gartner Data Observability Market Guide

Stay Connected