Data Warehouse, Metadata and Structured Data

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Traditional databases excelled at structured data and transactional workloads but struggled with performance at scale as data volumes grew.

Data Lake

Data Lake Cloud Storage Metadata Data Warehouse

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. These patterns include both centralized storage patterns like data warehouse , data lake and data lakehouse , and distributed patterns such as data mesh.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

Snowflake

DECEMBER 4, 2024

The trend to centralize data will accelerate, making sure that data is high-quality, accurate and well managed. Overall, data must be easily accessible to AI systems, with clear metadata management and a focus on relevance and timeliness.

Unstructured Data

Unstructured Data Data Lake Deep Learning Structured Data

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Netflix Tech

OCTOBER 27, 2020

Usually Data scientists and engineers write Extract-Transform-Load (ETL) jobs and pipelines using big data compute technologies, like Spark or Presto , to process this data and periodically compute key information for a member or a video. The processed data is typically stored as data warehouse tables in AWS S3.

Data Warehouse

Data Warehouse Datasets Data Big Data

Key considerations when making a decision on a Cloud Data Warehouse

Cloudera

MAY 17, 2021

Making a decision on a cloud data warehouse is a big deal. Modernizing your data warehousing experience with the cloud means moving from dedicated, on-premises hardware focused on traditional relational analytics on structured data to a modern platform.

Data Warehouse

Data Warehouse Cloud Government Metadata

Is Apache Iceberg the New Hadoop? Navigating the Complexities of Modern Data Lakehouses

Data Engineering Weekly

MARCH 5, 2025

This ecosystem includes: Catalogs: Services that manage metadata about Iceberg tables (e.g., Compute Engines: Tools that query and process data stored in Iceberg tables (e.g., Maintenance Processes: Operations that optimize Iceberg tables, such as compacting small files and managing metadata. Trino, Spark, Snowflake, DuckDB).

Hadoop

Hadoop Metadata Data Ingestion Data Governance

Data Lake vs. Data Warehouse vs. Data Lakehouse

Sync Computing

NOVEMBER 7, 2024

Data volume and velocity, governance, structure, and regulatory requirements have all evolved and continue to. Despite these limitations, data warehouses, introduced in the late 1980s based on ideas developed even earlier, remain in widespread use today for certain business intelligence and data analysis applications.

Data Lake

Data Lake Data Warehouse Business Intelligence Unstructured Data

Bring Order To The Chaos Of Your Unstructured Data Assets With Unstruk

Data Engineering Podcast

JUNE 17, 2021

In this episode he shares the goals of the Unstruk Data Warehouse, how it is architected to extract asset metadata and build a searchable knowledge graph from the information, and the myriad ways that the system can be used. Hightouch is the easiest way to sync data into the platforms that your business teams rely on.

Unstructured Data

Unstructured Data Data Warehouse Metadata Media

Data Lakes vs. Data Warehouses

Grouparoo

JANUARY 11, 2022

This article looks at the options available for storing and processing big data, which is too large for conventional databases to handle. There are two main options available, a data lake and a data warehouse. What is a Data Warehouse? What is a Data Lake?

Data Lake

Data Lake Data Warehouse Unstructured Data Raw Data

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Monte Carlo

AUGUST 25, 2023

Different vendors offering data warehouses, data lakes, and now data lakehouses all offer their own distinct advantages and disadvantages for data teams to consider. So let’s get to the bottom of the big question: what kind of data storage layer will provide the strongest foundation for your data platform?

Data Lake

Data Lake Data Warehouse Unstructured Data Raw Data

Data Lake vs Data Warehouse - Working Together in the Cloud

ProjectPro

AUGUST 11, 2021

“Data Lake vs Data Warehouse = Load First, Think Later vs Think First, Load Later” The terms data lake and data warehouse are frequently stumbled upon when it comes to storing large volumes of data. Data Warehouse Architecture What is a Data lake?

Data Lake

Data Lake Data Warehouse Cloud Hadoop

Implementing Data Contracts in the Data Warehouse

Monte Carlo

JANUARY 25, 2023

In this article, Chad Sanderson , Head of Product, Data Platform , at Convoy and creator of Data Quality Camp , introduces a new application of data contracts: in your data warehouse. In the last couple of posts , I’ve focused on implementing data contracts in production services.

Data Warehouse

Data Warehouse Data High Quality Data Metadata

How to get powerful and actionable insights from any and all of your data, without delay

Cloudera

SEPTEMBER 17, 2020

To address this, they focused on creating an experimentation-oriented culture, enabled thanks to a cloud-native platform supporting the full data lifecycle. This platform, including an ad-hoc capable data warehouse service with built-in, easy-to-use visualization, made it easy for anyone to jump in and start experimenting.

Data Warehouse

Data Warehouse Unstructured Data Pharmaceutical MySQL

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

Cloudera

APRIL 1, 2024

We scored the highest in hybrid, intercloud, and multi-cloud capabilities because we are the only vendor in the market with a true hybrid data platform that can run on any cloud including private cloud to deliver a seamless, unified experience for all data, wherever it lies.

Cloud

Cloud Unstructured Data Metadata Government

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData: Data Engineering

SEPTEMBER 19, 2023

Want to learn more about data governance? Check out our Data Governance on Snowflake blog! Metadata Management Data modeling methodologies help in managing metadata within the data lake. Metadata describes the characteristics, attributes, and context of the data.

Data Lake

Data Lake Process Metadata Data Warehouse

When to Build vs. Buy Your Data Warehouse (5 Key Factors)

Monte Carlo

JANUARY 25, 2023

When it comes to the question of building or buying your data stack, there’s never a one-size-fits-all solution for every data team—or every component of your data stack. Data storage and compute are very much the foundation of your data platform. Let’s jump in! So, let’s take a look at each in a bit more detail.

Data Warehouse

Data Warehouse Building Data Lake Data Storage

Data Lakehouse: Concept, Key Features, and Architecture Layers

AltexSoft

NOVEMBER 10, 2021

The pun being obvious, there’s more to that than just a new term: Data lakehouses combine the best features of both data lakes and data warehouses and this post will explain this all. What is a data lakehouse? Data warehouse vs data lake vs data lakehouse: What’s the difference.

Architecture

Architecture Data Lake Data Warehouse Metadata

A Major Step Forward For Generative AI and Vector Database Observability

Monte Carlo

FEBRUARY 12, 2024

To differentiate and expand the usefulness of these models, organizations must augment them with first-party data – typically via a process called RAG (retrieval augmented generation). Today, this first-party data mostly lives in two types of data repositories.

Database

Database Unstructured Data Data Pipeline Metadata

Data Vault on Snowflake: Feature Engineering and Business Vault

Snowflake

MARCH 30, 2023

A 2016 data science report from data enrichment platform CrowdFlower found that data scientists spend around 80% of their time in data preparation (collecting, cleaning, and organizing of data) before they can even begin to build machine learning (ML) models to deliver business value. Enter Snowpark !

Engineering

Engineering Raw Data Data Science Machine Learning

Hands-On Introduction to Delta Lake with (py)Spark

Towards Data Science

FEBRUARY 15, 2023

Before going into further details on Delta Lake, we need to remember the concept of Data Lake, so let’s travel through some history. The main player in the context of the first data lakes was Hadoop, a distributed file system, with MapReduce, a processing paradigm built over the idea of minimal data movement and high parallelism.

Data Lake

Data Lake Data Warehouse Hadoop Architecture

Implementing the Netflix Media Database

Netflix Tech

DECEMBER 14, 2018

A fundamental requirement for any lasting data system is that it should scale along with the growth of the business applications it wishes to serve. NMDB is built to be a highly scalable, multi-tenant, media metadata system that can serve a high volume of write/read throughput as well as support near real-time queries.

Media

Media Database Metadata Data Schemas

Mastering the Art of ETL on AWS for Data Management

ProjectPro

FEBRUARY 16, 2023

With so much riding on the efficiency of ETL processes for data engineering teams, it is essential to take a deep dive into the complex world of ETL on AWS to take your data management to the next level. Data integration with ETL has changed in the last three decades. But cloud computing is preferred over the other.

AWS

AWS Data Management ETL Tools Management

Accelerate your Data Migration to Snowflake

RandomTrees

SEPTEMBER 6, 2020

Snowflake Overview A data warehouse is a critical part of any business organization. Lot of cloud-based data warehouses are available in the market today, out of which let us focus on Snowflake. Snowflake is an analytical data warehouse that is provided as Software-as-a-Service (SaaS).

Cloud Storage

Cloud Storage Data Ingestion Data Cleanse Data Warehouse

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

AltexSoft

AUGUST 29, 2023

Instead of relying on traditional hierarchical structures and predefined schemas, as in the case of data warehouses, a data lake utilizes a flat architecture. This structure is made efficient by data engineering practices that include object storage. Data warehouse vs. data lake in a nutshell.

Data Lake

Data Lake Architecture IT Amazon Web Services

Top Data Lake Vendors (Quick Reference Guide)

Monte Carlo

APRIL 24, 2023

Traditionally, after being stored in a data lake, raw data was then often moved to various destinations like a data warehouse for further processing, analysis, and consumption. Databricks Data Catalog and AWS Lake Formation are examples in this vein. See our post: Data Lakes vs. Data Warehouses.

Data Lake

Data Lake Google Cloud Data Warehouse AWS

Empower Your Cyber Defenders with Real-Time Analytics

Cloudera

NOVEMBER 15, 2024

A Better Way Forward: Cloudera’s Open Data Lakehouse Cloudera offers a solution to these challenges with its open data lakehouse, which combines the flexibility and scalability of data lake storage with data warehouse functionality to unify and simplify the management of cyber log data.

Metadata

Metadata Unstructured Data Data Lake Government

5 Reasons Data Discovery Platforms Are Best For Data Lakes

Monte Carlo

APRIL 1, 2021

is whether to choose a data warehouse or lake to power storage and compute for their analytics. While data warehouses provide structure that makes it easy for data teams to efficiently operationalize data (i.e., Data discovery tools and platforms can help. Image courtesy of Adrian on Unsplash.

Data Lake

Data Lake Data Warehouse Unstructured Data Government

A Definitive Guide to Using BigQuery Efficiently

Towards Data Science

MARCH 5, 2024

At its core, BigQuery is a serverless Data Warehouse for analytical purposes and built-in features like Machine Learning ( BigQuery ML ). The storage system is using Capacitor, a proprietary columnar storage format by Google for semi-structured data and the file system underneath is Colossus, the distributed file system by Google.

Bytes

Bytes Google Cloud Cloud Storage Utilities

Snowflake Architecture and It's Fundamental Concepts

ProjectPro

JANUARY 31, 2022

As the demand for big data grows, an increasing number of businesses are turning to cloud data warehouses. The cloud is the only platform to handle today's colossal data volumes because of its flexibility and scalability. Launched in 2014, Snowflake is one of the most popular cloud data solutions on the market.

Architecture

Architecture IT Data Warehouse Amazon Web Services

Data Engineering Zoomcamp – Data Ingestion (Week 2)

Hepta Analytics

FEBRUARY 14, 2022

This week, we got to think about our data ingestion design. We looked at the following: How do we ingest – ETL vs ELT Where do we store the data – Data lake vs data warehouse Which tool to we use to ingest – cronjob vs workflow engine NOTE : This weeks task requires good internet speed and good compute.

Data Ingestion

Data Ingestion Data Engineering Data Engineer Engineering

Moving Past ETL and ELT: Understanding the EtLT Approach

Ascend.io

AUGUST 31, 2023

Secondly , the rise of data lakes that catalyzed the transition from ELT to ELT and paved the way for niche paradigms such as Reverse ETL and Zero-ETL. Still, these methods have been overshadowed by EtLT — the predominant approach reshaping today’s data landscape. Read More: What is ETL?

Data Lake

Data Lake Data Warehouse ETL Tools Data Pipeline

What is Data Hub: Purpose, Architecture Patterns, and Existing Solutions Overview

AltexSoft

SEPTEMBER 23, 2021

One of the innovative ways to address this problem is to build a data hub — a platform that unites all your information sources under a single umbrella. This article explains the main concepts of a data hub, its architecture, and how it differs from data warehouses and data lakes. What is Data Hub?

Architecture

Architecture Data Lake Unstructured Data Data Warehouse

What is Data Fabric: Architecture, Principles, Advantages, and Ways to Implement

AltexSoft

AUGUST 22, 2022

A data fabric is an architecture design presented as an integration and orchestration layer built on top of multiple disjointed data sources like relational databases , data warehouses , data lakes, data marts , IoT , legacy systems, etc., to provide a unified view of all enterprise data.

Architecture

Architecture Metadata Data Lake Machine Learning

A Guide to Data Contracts

Striim

JANUARY 4, 2023

Companies need to analyze large volumes of datasets, leading to an increase in data producers and consumers within their IT infrastructures. These companies collect data from production applications and B2B SaaS tools (e.g., This data makes its way into a data repository, like a data warehouse (e.g.,

PostgreSQL

PostgreSQL Data Warehouse Data Data Lake

The Symbiotic Relationship Between AI and Data Engineering

Ascend.io

FEBRUARY 28, 2024

Read More: AI Data Platform: Key Requirements for Fueling AI Initiatives How Data Engineering Enables AI Data engineering is the backbone of AI’s potential to transform industries , offering the essential infrastructure that powers AI algorithms.

Data Engineering

Data Engineering Data Engineer Engineering Metadata

The Modern Data Stack: What It Is, How It Works, Use Cases, and Ways to Implement

AltexSoft

MARCH 14, 2023

As the volume and complexity of data continue to grow, organizations seek faster, more efficient, and cost-effective ways to manage and analyze data. In recent years, cloud-based data warehouses have revolutionized data processing with their advanced massively parallel processing (MPP) capabilities and SQL support.

IT

IT Data Warehouse Data Governance Data Lake

Ensuring Data Transformation Quality with dbt Core

Wayne Yaddow

MARCH 14, 2025

Workaround: Implement custom metadata tracking scripts or use dbt Clouds freshness monitoring. Testing Limitations: Both dbt Cloud and dbtCore dbt is designed for SQL-based transformations in data warehouses, meaning it is not well-suited for non-SQL, real-time, or highly complex unstructured data transformations.

Unstructured Data

Unstructured Data SQL Data Pipeline Data Validation

Data Engineering Glossary

Silectis

JANUARY 3, 2021

Big Data Processing In order to extract value or insights out of big data, one must first process it using big data processing software or frameworks, such as Hadoop. Big Query Google’s cloud data warehouse. Data Catalog An organized inventory of data assets relying on metadata to help with data management.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

The Good and the Bad of Databricks Lakehouse Platform

AltexSoft

MARCH 30, 2023

What is Databricks Databricks is an analytics platform with a unified set of tools for data engineering, data management , data science, and machine learning. It combines the best elements of a data warehouse, a centralized repository for structured data, and a data lake used to host large amounts of raw data.

Scala

Scala Data Lake Machine Learning BI

Data Collection for Machine Learning: Steps, Methods, and Best Practices

AltexSoft

JUNE 26, 2023

From the perspective of data science, all miscellaneous forms of data fall into three large groups: structured, semi-structured, and unstructured. Key differences between structured, semi-structured, and unstructured data. Note, though, that not any type of web scraping is legal.

Data Collection

Data Collection Machine Learning Unstructured Data Non-relational Database

An Engineering Guide to Data Creation - A Data Contract perspective - Part 1

Data Engineering Weekly

MARCH 24, 2023

About Schemata Schemata is a schema modeling framework for decentralized domain-driven ownership of data. Schemata combine a set of standard metadata definitions for each schema & data field and a scoring algorithm to provide a feedback loop on how efficient the data modeling of your data warehouse is.

Engineering

Engineering Data Transportation Database

20 Latest AWS Glue Interview Questions and Answers for 2023

ProjectPro

JANUARY 24, 2023

You can leverage AWS Glue to discover, transform, and prepare your data for analytics. In addition to databases running on AWS, Glue can automatically find structured and semi-structured data kept in your data lake on Amazon S3, data warehouse on Amazon Redshift, and other storage locations.

AWS

AWS Data Lake ETL Tools Scala

Difference between Pig and Hive-The Two Key Components of Hadoop Ecosystem

ProjectPro

OCTOBER 15, 2014

Hive- Performance Benchmarking Hive vs Pig Pig vs Hive - Differences Pig Hive Procedural Data Flow Language Declarative SQLish Language For Programming For creating reports Mainly used by Researchers and Programmers Mainly used by Data Analysts Operates on the client side of a cluster. Does not have a dedicated metadata database.

Hadoop

Hadoop Java Unstructured Data SQL

Three Reference Architectures for Real-Time Analytics On Streaming Data

Rockset

APRIL 26, 2023

Offline feature store : Detecting anomalies requires historical data in order to have a baseline for comparisons. This data tends to be slow changing and is stored in an offline feature store. This could be a cloud data warehouse, a data lake, or a database. The database has two primary jobs.

Architecture

Architecture Transportation Data Lake Insurance

How Apache Iceberg Is Changing the Face of Data Lakes

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Trending Sources

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Key considerations when making a decision on a Cloud Data Warehouse

Is Apache Iceberg the New Hadoop? Navigating the Complexities of Modern Data Lakehouses

Data Lake vs. Data Warehouse vs. Data Lakehouse

Bring Order To The Chaos Of Your Unstructured Data Assets With Unstruk

Data Lakes vs. Data Warehouses

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Data Lake vs Data Warehouse - Working Together in the Cloud

Implementing Data Contracts in the Data Warehouse

How to get powerful and actionable insights from any and all of your data, without delay

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

When to Build vs. Buy Your Data Warehouse (5 Key Factors)

Data Lakehouse: Concept, Key Features, and Architecture Layers

A Major Step Forward For Generative AI and Vector Database Observability

Data Vault on Snowflake: Feature Engineering and Business Vault

Hands-On Introduction to Delta Lake with (py)Spark

Implementing the Netflix Media Database

Mastering the Art of ETL on AWS for Data Management

Accelerate your Data Migration to Snowflake

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

Top Data Lake Vendors (Quick Reference Guide)

Empower Your Cyber Defenders with Real-Time Analytics

5 Reasons Data Discovery Platforms Are Best For Data Lakes

A Definitive Guide to Using BigQuery Efficiently

Snowflake Architecture and It's Fundamental Concepts

Data Engineering Zoomcamp – Data Ingestion (Week 2)

Moving Past ETL and ELT: Understanding the EtLT Approach

What is Data Hub: Purpose, Architecture Patterns, and Existing Solutions Overview

What is Data Fabric: Architecture, Principles, Advantages, and Ways to Implement

A Guide to Data Contracts

The Symbiotic Relationship Between AI and Data Engineering

The Modern Data Stack: What It Is, How It Works, Use Cases, and Ways to Implement

Ensuring Data Transformation Quality with dbt Core

Data Engineering Glossary

The Good and the Bad of Databricks Lakehouse Platform

Data Collection for Machine Learning: Steps, Methods, and Best Practices

An Engineering Guide to Data Creation - A Data Contract perspective - Part 1

20 Latest AWS Glue Interview Questions and Answers for 2023

Difference between Pig and Hive-The Two Key Components of Hadoop Ecosystem

Three Reference Architectures for Real-Time Analytics On Streaming Data

Stay Connected