Architecture, Data Lake and Structured Data

Architecture

Data Lake

Structured Data

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Traditional databases excelled at structured data and transactional workloads but struggled with performance at scale as data volumes grew.

Data Lake

Data Lake Metadata Cloud Storage Data Warehouse

Data Integrity for AI: What’s Old is New Again

Precisely

JANUARY 9, 2025

The goal of this post is to understand how data integrity best practices have been embraced time and time again, no matter the technology underpinning. In the beginning, there was a data warehouse The data warehouse (DW) was an approach to data architecture and structured data management that really hit its stride in the early 1990s.

Data Integration

Data Integration Hadoop Data Warehouse Data Lake

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Waitingforcode

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. Each of these architectures has its own unique strengths and tradeoffs.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Edureka

APRIL 22, 2025

The alternative, however, provides more multi-cloud flexibility and strong performance on structured data. It incorporates elements from several Microsoft products working together, like Power BI, Azure Synapse Analytics, Data Factory, and OneLake, into a single SaaS experience.

BI Pipeline-centric Data Lake Google Cloud

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

In this blog, I will demonstrate the value of Cloudera DataFlow (CDF) , the edge-to-cloud streaming data platform available on the Cloudera Data Platform (CDP) , as a Data integration and Democratization fabric. Introduction to the Data Mesh Architecture and its Required Capabilities.

Architecture

Architecture Metadata Kafka Government

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

AltexSoft

AUGUST 29, 2023

In 2010, a transformative concept took root in the realm of data storage and analytics — a data lake. The term was coined by James Dixon , Back-End Java, Data, and Business Intelligence Engineer, and it started a new era in how organizations could store, manage, and analyze their data. What is a data lake?

Data Lake

Data Lake Architecture IT Amazon Web Services

Data Lake vs. Data Warehouse vs. Data Lakehouse

Sync Computing

NOVEMBER 7, 2024

While data warehouses are still in use, they are limited in use-cases as they only support structured data. Data lakes add support for semi-structured and unstructured data, and data lakehouses add further flexibility with better governance in a true hybrid solution built from the ground-up.

Data Lake

Data Lake Data Warehouse Business Intelligence Unstructured Data

8 Essential Data Pipeline Design Patterns You Should Know

Monte Carlo

NOVEMBER 21, 2024

Whether it’s customer transactions, IoT sensor readings, or just an endless stream of social media hot takes, you need a reliable way to get that data from point A to point B while doing something clever with it along the way. That’s where data pipeline design patterns come in. Lambda Architecture Pattern 4.

Data Pipeline

Data Pipeline Designing Lambda Architecture Kafka

Data Lake vs. Data Warehouse: Differences and Similarities

U-Next

SEPTEMBER 7, 2022

The terms “ Data Warehouse ” and “ Data Lake ” may have confused you, and you have some questions. Structuring data refers to converting unstructured data into tables and defining data types and relationships based on a schema. What is Data Lake? .

Data Lake

Data Lake Data Warehouse Unstructured Data Amazon Web Services

Data Lake vs Data Warehouse - Working Together in the Cloud

ProjectPro

AUGUST 11, 2021

“Data Lake vs Data Warehouse = Load First, Think Later vs Think First, Load Later” The terms data lake and data warehouse are frequently stumbled upon when it comes to storing large volumes of data. Data Warehouse Architecture What is a Data lake?

Data Lake

Data Lake Data Warehouse Cloud Hadoop

Top Data Lake Vendors (Quick Reference Guide)

Monte Carlo

APRIL 24, 2023

Data lakes are useful, flexible data storage repositories that enable many types of data to be stored in its rawest state. Traditionally, after being stored in a data lake, raw data was then often moved to various destinations like a data warehouse for further processing, analysis, and consumption.

Data Lake

Data Lake Google Cloud Data Warehouse AWS

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

Key Differences Between AI Data Engineers and Traditional Data Engineers While traditional data engineers and AI data engineers have similar responsibilities, they ultimately differ in where they focus their efforts. Data Storage Solutions As we all know, data can be stored in a variety of ways.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Monte Carlo

AUGUST 25, 2023

Data teams need to balance the need for robust, powerful data platforms with increasing scrutiny on costs. That’s why it’s essential for teams to choose the right architecture for the storage layer of their data stack. But, the options for data storage are evolving quickly.

Data Lake

Data Lake Data Warehouse Unstructured Data Raw Data

A Guide to Data Pipelines (And How to Design One From Scratch)

Striim

SEPTEMBER 11, 2024

Data pipelines are the backbone of your business’s data architecture. Implementing a robust and scalable pipeline ensures you can effectively manage, analyze, and organize your growing data. Benjamin Kennedy, Cloud Solutions Architect at Striim, emphasizes the outcome-driven nature of data pipelines.

Data Pipeline

Data Pipeline Designing Data Lake Data Warehouse

5 Reasons Data Discovery Platforms Are Best For Data Lakes

Monte Carlo

APRIL 1, 2021

Over the past few years, data lakes have emerged as a must-have for the modern data stack. But while the technologies powering our access and analysis of data have matured, the mechanics behind understanding this data in a distributed environment have lagged behind. Data discovery tools and platforms can help.

Data Lake

Data Lake Data Warehouse Unstructured Data Government

Hands-On Introduction to Delta Lake with (py)Spark

Towards Data Science

FEBRUARY 15, 2023

In this context, data management in an organization is a key point for the success of its projects involving data. One of the main aspects of correct data management is the definition of a data architecture. What is Delta Lake? The data became useless. The Lakehouse architecture was one of them.

Data Lake

Data Lake Data Warehouse Hadoop Architecture

Data Lakehouse: Concept, Key Features, and Architecture Layers

AltexSoft

NOVEMBER 10, 2021

The pun being obvious, there’s more to that than just a new term: Data lakehouses combine the best features of both data lakes and data warehouses and this post will explain this all. What is a data lakehouse? Data warehouse vs data lake vs data lakehouse: What’s the difference.

Architecture

Architecture Data Lake Data Warehouse Metadata

A Prequel to Data Mesh

Towards Data Science

JANUARY 16, 2024

My personal take on justifying the existence of Data Mesh A senior stakeholder at one my projects mentioned that they wanted to decentralise their data platform architecture and democratise data across the organisation. When I heard the words ‘decentralised data architecture’, I was left utterly confused at first!

Data Warehouse

Data Warehouse Data Architecture Relational Database NoSQL

Building A Better Data Warehouse For The Cloud At Firebolt

Data Engineering Podcast

AUGUST 31, 2020

Summary Data warehouse technology has been around for decades and has gone through several generational shifts in that time. The current trends in data warehousing are oriented around cloud native architectures that take advantage of dynamic scaling and the separation of compute and storage.

Data Warehouse

Data Warehouse Cloud Building Data Lake

Data Modeling That Evolves With Your Business Using Data Vault

Data Engineering Podcast

FEBRUARY 9, 2020

We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Is there any utility in data vault modeling in a data lake context (S3, Hadoop, etc.)?

Data Lake

Data Lake Data Warehouse Hadoop NoSQL

How to Design a Modern, Robust Data Ingestion Architecture

Monte Carlo

MAY 28, 2024

A data ingestion architecture is the technical blueprint that ensures that every pulse of your organization’s data ecosystem brings critical information to where it’s needed most. Data Transformation : Clean, format, and convert extracted data to ensure consistency and usability for both batch and real-time processing.

Data Ingestion

Data Ingestion Architecture Designing Hadoop

What is Data Hub: Purpose, Architecture Patterns, and Existing Solutions Overview

AltexSoft

SEPTEMBER 23, 2021

One of the innovative ways to address this problem is to build a data hub — a platform that unites all your information sources under a single umbrella. This article explains the main concepts of a data hub, its architecture, and how it differs from data warehouses and data lakes. What is Data Hub?

Architecture

Architecture Data Lake Unstructured Data Data Warehouse

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Monte Carlo

JUNE 14, 2023

In this post, we will help you quickly level up your overall knowledge of data pipeline architecture by reviewing: Table of Contents What is data pipeline architecture? Why is data pipeline architecture important? What is data pipeline architecture? Why is data pipeline architecture important?

Data Pipeline

Data Pipeline Architecture Data Lake Data Warehouse

Data Mesh Architecture: Concept, Main Principles, and Implementation

AltexSoft

JULY 19, 2022

In the last few decades, we’ve seen a lot of architectural approaches to building data pipelines , changing one another and promising better and easier ways of deriving insights from information. There have been relational databases, data warehouses, data lakes, and even a combination of the latter two.

Architecture

Architecture Data Lake Medical Datasets

Data Pipeline- Definition, Architecture, Examples, and Use Cases

ProjectPro

DECEMBER 7, 2021

Data pipelines are a significant part of the big data domain, and every professional working or willing to work in this field must have extensive knowledge of them. As data is expanding exponentially, organizations struggle to harness digital information's power for different business use cases. What is a Big Data Pipeline?

Data Pipeline

Data Pipeline Architecture Kafka AWS

What is Data Fabric: Architecture, Principles, Advantages, and Ways to Implement

AltexSoft

AUGUST 22, 2022

In this post, we’ll attempt to explain the idea behind a data fabric, its architectural building blocks, the benefits it brings, and ways to approach its implementation. What is data fabric? to provide a unified view of all enterprise data. Data fabric architecture example. Data fabric vs data mesh.

Architecture

Architecture Metadata Data Lake Machine Learning

Snowflake Architecture and It's Fundamental Concepts

ProjectPro

JANUARY 31, 2022

This blog walks you through what does Snowflake do , the various features it offers, the Snowflake architecture, and so much more. Table of Contents Snowflake Overview and Architecture What is Snowflake Data Warehouse? Snowflake is not based on existing database systems or big data software platforms like Hadoop.

Architecture

Architecture IT Data Warehouse Amazon Web Services

Three Reference Architectures for Real-Time Analytics On Streaming Data

Rockset

APRIL 26, 2023

We’ve noticed many common patterns across streaming data architectures and we’ll be sharing a blueprint for three of the most popular: anomaly detection, IoT, and recommendations. The majority of anomaly detectors require streaming data, real-time data and historical data in order to generate inferences.

Architecture

Architecture Transportation Data Lake Insurance

Bring Order To The Chaos Of Your Unstructured Data Assets With Unstruk

Data Engineering Podcast

JUNE 17, 2021

Summary Working with unstructured data has typically been a motivation for a data lake. Kirk Marple has spent years working with data systems and the media industry, which inspired him to build a platform for automatically organizing your unstructured assets to make them more valuable.

Unstructured Data

Unstructured Data Data Warehouse Metadata Media

Five Strategies to Accelerate Data Product Development

Cloudera

JULY 26, 2021

The alleviation of infrastructure and computational constraints associated with solely on-premises data platforms; Data Products can now use different deployment models (e.g., The proliferation of real-time processing by deploying event-driven architectures (e.g., data warehousing). Deep Java Learning, Apache Spark 3.x,

Generalist

Generalist Telecommunication Healthcare Data Science

Migrate Hive data from CDH to CDP public cloud

Cloudera

JUNE 25, 2021

Using easy-to-define policies, Replication Manager solves one of the biggest barriers for the customers in their cloud adoption journey by allowing them to move both tables/structured data and files/unstructured data to the CDP cloud of their choice easily. CDP Data Lake cluster versions – CM 7.4.0,

Cloud

Cloud Data Lake Cloud Storage Metadata

Hadoop Ecosystem Components and Its Architecture

ProjectPro

JUNE 4, 2015

The holistic view of Hadoop architecture gives prominence to Hadoop common, Hadoop YARN, Hadoop Distributed File Systems (HDFS ) and Hadoop MapReduce of the Hadoop Ecosystem. HDFS in Hadoop architecture provides high throughput access to application data and Hadoop MapReduce provides YARN based parallel processing of large data sets.

Hadoop

Hadoop Architecture IT Java

Snowflake Data Warehouse 101: A Comprehensive Guide

Hevo

MAY 17, 2024

Snowflake Data Warehouse delivers essential infrastructure for handling a Data Lake, and Data Warehouse needs. It can store semi-structured and structured data in one place due to its multi-clusters architecture that allows users to independently query data using SQL.

Data Warehouse

Data Warehouse Data Lake Structured Data SQL

Chose Both: Data Fabric and Data Lakehouse

Cloudera

SEPTEMBER 12, 2022

First, organizations have a tough time getting their arms around their data. More data is generated in ever wider varieties and in ever more locations. Organizations don’t know what they have anymore and so can’t fully capitalize on it — the majority of data generated goes unused in decision making. Unified data fabric.

Unstructured Data

Unstructured Data Data Lake Data Architecture Data

Azure Synapse vs Databricks: 2023 Comparison Guide

Knowledge Hut

SEPTEMBER 26, 2023

Key connectivity features include: Data Ingestion: Databricks supports data ingestion from a variety of sources, including data lakes, databases, streaming platforms, and cloud storage. This flexibility allows organizations to ingest data from virtually anywhere.

Data Lake

Data Lake Database-centric Pipeline-centric Machine Learning

Data Marts: What They Are and Why Businesses Need Them

AltexSoft

AUGUST 4, 2021

Since data marts provide analytical capabilities for a restricted area of a data warehouse, they offer isolated security and isolated performance. Data mart vs data warehouse vs data lake vs OLAP cube. Data lakes, data warehouses, and data marts are all data repositories of different sizes.

Data Lake

Data Lake Data Warehouse ETL Tools Database

The Good and the Bad of Databricks Lakehouse Platform

AltexSoft

MARCH 30, 2023

It combines the best elements of a data warehouse, a centralized repository for structured data, and a data lake used to host large amounts of raw data. The relatively new storage architecture powering Databricks is called a data lakehouse. Databricks lakehouse platform architecture.

Scala

Scala Data Lake Machine Learning BI

Unstructured Data: Examples, Tools, Techniques, and Best Practices

AltexSoft

MAY 12, 2023

What is unstructured data? Definition and examples Unstructured data , in its simplest form, refers to any data that does not have a pre-defined structure or organization. It can come in different forms, such as text documents, emails, images, videos, social media posts, sensor data, etc.

Unstructured Data

Unstructured Data NoSQL Hadoop Data Lake

Data Vault on Snowflake: Feature Engineering and Business Vault

Snowflake

MARCH 30, 2023

Based on Tecton blog So is this similar to data engineering pipelines into a data lake/warehouse? Schema drift on a wide table structure needs an ALTER TABLE statement, whereas the tall table structure does not. The friction of data movement is reduced. Yes, feature stores are part of the MLOps discipline.

Engineering

Engineering Raw Data Data Science Machine Learning

Taking Charge of Tables: Introducing OpenHouse for Big Data Management

LinkedIn Engineering

JULY 19, 2023

Open source data lakehouse deployments are built on the foundations of compute engines (like Apache Spark, Trino, Apache Flink), distributed storage (HDFS, cloud blob stores), and metadata catalogs / table formats (like Apache Iceberg, Delta, Hudi, Apache Hive Metastore).

Big Data

Big Data Data Management Management Metadata

Moving Past ETL and ELT: Understanding the EtLT Approach

Ascend.io

AUGUST 31, 2023

In the dynamic world of data, many professionals are still fixated on traditional patterns of data warehousing and ETL, even while their organizations are migrating to the cloud and adopting cloud-native data services. Central to this transformation are two shifts. Let’s take a closer look.

Data Lake

Data Lake Data Warehouse ETL Tools Data Pipeline

The Modern Data Stack: What It Is, How It Works, Use Cases, and Ways to Implement

AltexSoft

MARCH 14, 2023

Known as the Modern Data Stack (MDS) , this suite of tools and technologies has transformed how businesses approach data management and analysis. What is a modern data stack? A data stack, in turn, focuses on data : It helps businesses manage data and make the most out of it. Modern data stack architecture.

IT Data Warehouse Data Governance Data Lake

The Future of Data Warehousing

Monte Carlo

JANUARY 16, 2024

In this blog post, we’ll look at six innovations that are shaping the future of the data warehousing, as well as challenges and considerations that organizations should keep in mind. Data lake and data warehouse convergence 2. Easier to stream real-time data 3. Zero-copy data sharing 4.

Data Lake

Data Lake Data Warehouse Unstructured Data AWS

How to Build a 5-Layer Data Stack

Monte Carlo

JULY 19, 2023

Before you can model the data for your stakeholders, you need a place to collect and store it. In simple terms, data transformation refers to the process of converting all that data from a variety of disparate formats into something consistent and useful for modeling. Image courtesy of Databricks.

Building

Building Business Intelligence Cloud Storage BI

How Apache Iceberg Is Changing the Face of Data Lakes

Data Integrity for AI: What’s Old is New Again

Webinars

Trending Sources

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Webinars

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

Data Lake vs. Data Warehouse vs. Data Lakehouse

8 Essential Data Pipeline Design Patterns You Should Know

Data Lake vs. Data Warehouse: Differences and Similarities

Data Lake vs Data Warehouse - Working Together in the Cloud

Top Data Lake Vendors (Quick Reference Guide)

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

A Guide to Data Pipelines (And How to Design One From Scratch)

5 Reasons Data Discovery Platforms Are Best For Data Lakes

Hands-On Introduction to Delta Lake with (py)Spark

Data Lakehouse: Concept, Key Features, and Architecture Layers

A Prequel to Data Mesh

Building A Better Data Warehouse For The Cloud At Firebolt

Data Modeling That Evolves With Your Business Using Data Vault

How to Design a Modern, Robust Data Ingestion Architecture

What is Data Hub: Purpose, Architecture Patterns, and Existing Solutions Overview

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Data Mesh Architecture: Concept, Main Principles, and Implementation

Data Pipeline- Definition, Architecture, Examples, and Use Cases

What is Data Fabric: Architecture, Principles, Advantages, and Ways to Implement

Snowflake Architecture and It's Fundamental Concepts

Three Reference Architectures for Real-Time Analytics On Streaming Data

Bring Order To The Chaos Of Your Unstructured Data Assets With Unstruk

Five Strategies to Accelerate Data Product Development

Migrate Hive data from CDH to CDP public cloud

Hadoop Ecosystem Components and Its Architecture

Snowflake Data Warehouse 101: A Comprehensive Guide

Chose Both: Data Fabric and Data Lakehouse

Azure Synapse vs Databricks: 2023 Comparison Guide

Data Marts: What They Are and Why Businesses Need Them

The Good and the Bad of Databricks Lakehouse Platform

Unstructured Data: Examples, Tools, Techniques, and Best Practices

Data Vault on Snowflake: Feature Engineering and Business Vault

Taking Charge of Tables: Introducing OpenHouse for Big Data Management

Moving Past ETL and ELT: Understanding the EtLT Approach

The Modern Data Stack: What It Is, How It Works, Use Cases, and Ways to Implement

The Future of Data Warehousing

How to Build a 5-Layer Data Stack

Stay Connected