Architecture, Cloud Storage and Data Lake

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Now you dont have to choose. This is why Snowflake is fully embracing this open table format.

Data Lake

Data Lake Metadata Cloud Storage Data Warehouse

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Data lakes are notoriously complex. To start, can you share your definition of what constitutes a "Data Lakehouse"?

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

The world we live in today presents larger datasets, more complex data, and diverse needs, all of which call for efficient, scalable data systems. Though basic and easy to use, traditional table storage formats struggle to keep up. Adopting an Open Table Format architecture is becoming indispensable for modern data systems.

Architecture

Architecture Systems Data Lake Google Cloud

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Enabling Multi-User Fine-Grained Access Control for Cloud Storage in CDP

Cloudera

SEPTEMBER 10, 2021

Shared Data Experience ( SDX ) on Cloudera Data Platform ( CDP ) enables centralized data access control and audit for workloads in the Enterprise Data Cloud. The public cloud (CDP-PC) editions default to using cloud storage (S3 for AWS, ADLS-gen2 for Azure).

Cloud Storage

Cloud Storage Accessible Accessibility Cloud

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Edureka

APRIL 22, 2025

It incorporates elements from several Microsoft products working together, like Power BI, Azure Synapse Analytics, Data Factory, and OneLake, into a single SaaS experience. In contrast to conventional warehouses, it keeps computation and storage apart, allowing for cost-effectiveness and dynamic scaling.

BI

BI Pipeline-centric Data Lake Google Cloud

Build an Open Data Lakehouse with Iceberg Tables, Now in Public Preview

Snowflake

DECEMBER 4, 2023

Apache Iceberg’s ecosystem of diverse adopters, contributors and commercial support continues to grow, establishing itself as the industry standard table format for an open data lakehouse architecture. Are you using Snowflake on AWS and already using Glue Data Catalog for your data lake?

Building

Building Metadata Cloud Storage AWS

Top Data Lake Vendors (Quick Reference Guide)

Monte Carlo

APRIL 24, 2023

Data lakes are useful, flexible data storage repositories that enable many types of data to be stored in its rawest state. Traditionally, after being stored in a data lake, raw data was then often moved to various destinations like a data warehouse for further processing, analysis, and consumption.

Data Lake

Data Lake Google Cloud Data Warehouse AWS

Data Lake vs. Data Warehouse: Differences and Similarities

U-Next

SEPTEMBER 7, 2022

The terms “ Data Warehouse ” and “ Data Lake ” may have confused you, and you have some questions. Structuring data refers to converting unstructured data into tables and defining data types and relationships based on a schema. What is Data Lake? . Athena on AWS. .

Data Lake

Data Lake Data Warehouse Unstructured Data Amazon Web Services

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

By separating the compute, the metadata, and data storage, CDW dynamically adapts to changing workloads and resource requirements, speeding up deployment while effectively managing costs – while preserving a shared access and governance model. Architecture overview. Separate storage. Separate compute.

IT

IT Data Lake Data Warehouse Cloud Storage

Discover And De-Clutter Your Unstructured Data With Aparavi

Data Engineering Podcast

JUNE 12, 2022

Acryl Data provides DataHub as an easy to consume SaaS product which has been adopted by several companies. Signup for the SaaS product at dataengineeringpodcast.com/acryl RudderStack helps you build a customer data platform on your warehouse or data lake. What are the mechanisms that you use for categorizing data assets?

Unstructured Data

Unstructured Data MongoDB MySQL Scala

Fivetran Supports the Automation of the Modern Data Lake on Amazon S3

phData: Data Engineering

APRIL 4, 2023

Today we want to introduce Fivetran’s support for Amazon S3 with Apache Iceberg, investigate some of the implications of this feature, and learn how it fits into the modern data architecture as a whole. Fivetran today announced support for Amazon Simple Storage Service (Amazon S3) with Apache Iceberg data lake format.

Data Lake

Data Lake Amazon Web Services Data Cleanse Data Warehouse

A Serverless Query Engine from Spare Parts

Towards Data Science

APRIL 26, 2023

An open-source implementation of a Data Lake with DuckDB and AWS Lambdas A duck in the cloud. Photo by László Glatz on Unsplash In this post we will show how to build a simple end-to-end application in the cloud on a serverless infrastructure. The idea is to start from a Data Lake where our data are stored.

Engineering

Engineering Data Lake AWS BI

Data Lake vs Data Warehouse - Working Together in the Cloud

ProjectPro

AUGUST 11, 2021

“Data Lake vs Data Warehouse = Load First, Think Later vs Think First, Load Later” The terms data lake and data warehouse are frequently stumbled upon when it comes to storing large volumes of data. Data Warehouse Architecture What is a Data lake?

Data Lake

Data Lake Data Warehouse Cloud Hadoop

Cloudera Data Platform extends Hybrid Cloud vision support by supporting Google Cloud

Cloudera

MARCH 31, 2021

With the addition of Google Cloud, we deliver on our vision of providing a hybrid and multi-cloud architecture to support our customer’s analytics needs regardless of deployment platform. . You could then use an existing pipeline to run analytics on the prepared data in BigQuery. .

Google Cloud

Google Cloud Cloud Amazon Web Services Cloud Storage

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Monte Carlo

AUGUST 25, 2023

Data teams need to balance the need for robust, powerful data platforms with increasing scrutiny on costs. That’s why it’s essential for teams to choose the right architecture for the storage layer of their data stack. But, the options for data storage are evolving quickly.

Data Lake

Data Lake Data Warehouse Unstructured Data Raw Data

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

Modern data platforms deliver an elastic, flexible, and cost-effective environment for analytic applications by leveraging a hybrid, multi-cloud architecture to support data fabric, data mesh, data lakehouse and, most recently, data observability. Luke: What is a modern data platform?

Data Lake

Data Lake Analytics Application Cloud Storage Architecture

Migrate Hive data from CDH to CDP public cloud

Cloudera

JUNE 25, 2021

This blog post outlines detailed step by step instructions to perform Hive Replication from an on-prem CDH cluster to a CDP Public Cloud Data Lake. CDP Data Lake cluster versions – CM 7.4.0, Architecture. Pre-Check: Data Lake Cluster. Understanding Ranger Policies in Data Lake Cluster.

Cloud

Cloud Data Lake Cloud Storage Metadata

Open Source Object Storage For All Of Your Data

Data Engineering Podcast

SEPTEMBER 22, 2019

Summary Object storage is quickly becoming the unifying layer for data intensive applications and analytics. Modern, cloud oriented data warehouses and data lakes both rely on the durability and ease of use that it provides. How do you approach project governance and sustainability?

AWS

AWS Google Cloud Cloud Storage Data Lake

Data Pipeline- Definition, Architecture, Examples, and Use Cases

ProjectPro

DECEMBER 7, 2021

Data pipelines are a significant part of the big data domain, and every professional working or willing to work in this field must have extensive knowledge of them. As data is expanding exponentially, organizations struggle to harness digital information's power for different business use cases. What is a Big Data Pipeline?

Data Pipeline

Data Pipeline Architecture Kafka AWS

How to Build a 5-Layer Data Stack

Monte Carlo

JULY 19, 2023

In this article, we’ll present you with the Five Layer Data Stack—a model for platform development consisting of five critical tools that will not only allow you to maximize impact but empower you to grow with the needs of your organization. Before you can model the data for your stakeholders, you need a place to collect and store it.

Building

Building Business Intelligence Cloud Storage BI

How Much Data Do We Need? Balancing Machine Learning with Security Considerations

Towards Data Science

DECEMBER 15, 2023

Even the best of us sometimes demonize the parts of our organization whose primary goals are in the privacy and security area and conflict with our wishes to splash around in the data lake. In reality, data scientists are not always the heroes and IT and security teams are not the villains. You’re using the data, of course!

Machine Learning

Machine Learning Data Science Data Security Data Storage

Consulting Case Study: Job Market Analysis

WeCloudData

OCTOBER 19, 2021

The team was able to achieve this by leveraging cloud as well as open source tools in a modular set up, taking advantage of relatively cheap cloud storage, a versatile programming language in Python and Spark’s powerful processing engine.

Consulting

Consulting Raw Data Data Lake Data Pipeline

Consulting Case Study: Job Market Analysis

WeCloudData

OCTOBER 19, 2021

The team was able to achieve this by leveraging cloud as well as open source tools in a modular set up, taking advantage of relatively cheap cloud storage, a versatile programming language in Python and Spark’s powerful processing engine.

Consulting

Consulting Raw Data Data Lake Data Pipeline

How to Build a 5-Layer Data Stack

Towards Data Science

JULY 21, 2023

In this article, we’ll present you with the Five Layer Data Stack — a model for platform development consisting of five critical tools that will not only allow you to maximize impact but empower you to grow with the needs of your organization. Before you can model the data for your stakeholders, you need a place to collect and store it.

Building

Building Business Intelligence BI Cloud Storage

Data Architect: Role Description, Skills, Certifications and When to Hire

AltexSoft

FEBRUARY 11, 2023

To get a better understanding of a data architect’s role, let’s clear up what data architecture is. Data architecture is the organization and design of how data is collected, transformed, integrated, stored, and used by a company. Sample of a high-level data architecture blueprint for Azure BI programs.

Data Architect

Data Architect Certification Generalist Big Data

The Good and the Bad of Databricks Lakehouse Platform

AltexSoft

MARCH 30, 2023

It combines the best elements of a data warehouse, a centralized repository for structured data, and a data lake used to host large amounts of raw data. The relatively new storage architecture powering Databricks is called a data lakehouse. Databricks lakehouse platform architecture.

Scala

Scala Data Lake Machine Learning BI

Azure Synapse vs Databricks: 2023 Comparison Guide

Knowledge Hut

SEPTEMBER 26, 2023

Key connectivity features include: Data Ingestion: Databricks supports data ingestion from a variety of sources, including data lakes, databases, streaming platforms, and cloud storage. This flexibility allows organizations to ingest data from virtually anywhere.

Data Lake

Data Lake Database-centric Machine Learning Pipeline-centric

Unstructured Data: Examples, Tools, Techniques, and Best Practices

AltexSoft

MAY 12, 2023

Unstructured data , on the other hand, is unpredictable and has no fixed schema, making it more challenging to analyze. Without a fixed schema, the data can vary in structure and organization. There are several widely used unstructured data storage solutions such as data lakes (e.g., Build data architecture.

Unstructured Data

Unstructured Data NoSQL Hadoop Data Lake

Moving Past ETL and ELT: Understanding the EtLT Approach

Ascend.io

AUGUST 31, 2023

In the dynamic world of data, many professionals are still fixated on traditional patterns of data warehousing and ETL, even while their organizations are migrating to the cloud and adopting cloud-native data services. Central to this transformation are two shifts. Let’s take a closer look.

Data Lake

Data Lake Data Warehouse ETL Tools Data Pipeline

What is Azure Data Factory – Here’s Everything You Need to Know

Edureka

JULY 3, 2024

ADF leverages compute services like Azure HDInsight, Spark, Azure Data Lake Analytics, or Machine Learning to process and analyze the data according to defined requirements. Publish: Transformed data is then published either back to on-premises sources like SQL Server or kept in cloud storage.

Pipeline-centric

Pipeline-centric Data Lake Database-centric Data Pipeline

Rethinking Data Marts in the Cloud

Cloudera

OCTOBER 26, 2017

Become more agile with business intelligence and data analytics. Clouds (source: Pexels ). Organizations find they have much more agility with analytics in the cloud and can operate at a lower cost point than has been possible with legacy on-premises solutions. Architecture patterns for the cloud.

Cloud

Cloud BI Cloud Storage Business Intelligence

15 Sample GCP Projects Ideas for Beginners to Practice in 2023

ProjectPro

OCTOBER 6, 2021

Online Book Store System using Google Cloud Platform 15 Sample GCP Real Time Projects for Practice in 2023 With the need to learn Cloud Platform as part of any analytical job role, it is essential to understand the basics and then gain some hands-on experience leveraging the cloud platforms.

Google Cloud

Google Cloud Project Data Lake Healthcare

Rollups on Streaming Data: Rockset vs Apache Druid

Rockset

AUGUST 25, 2021

With this release , Rockset users have the capability to continuously aggregate and transform data at the time of ingest, using SQL, from any data source (data streams, databases and data lakes). This is a first in the industry and frees users from managing slow, expensive ETL pipelines for their streaming data.

Aggregated Data

Aggregated Data Hadoop SQL Data Lake

The Advantages Of Live Data-Streaming In The Competitive Financial Services Sector (Part I)

Cloudera

AUGUST 21, 2020

Data-in-motion is predominantly about streaming data so enterprises typically have two different ways or binary ways of looking at data. This can extend to streaming analytics capabilities into any cloud environment.

Banking

Banking Kafka Cloud Storage Government

Top 12 Data Engineering Project Ideas [With Source Code]

Knowledge Hut

JUNE 26, 2023

A complete end-to-end stream processing pipeline is shown here using an architectural diagram. The pipeline in this reference design collects data from two different sources, then conducts a join operation on related records from each stream, then enriches the output, and finally produces an average.

Data Engineering

Data Engineering Data Engineer Coding Project

Real-Time Data Ingestion: Snowflake, Snowpipe and Rockset

Rockset

AUGUST 4, 2021

Organizations that depend on data for their success and survival need robust, scalable data architecture, typically employing a data warehouse for analytics needs. Snowflake is often their cloud-native data warehouse of choice. Snowflake provides a couple of ways to load data.

Data Ingestion

Data Ingestion Cloud Storage Data Warehouse Architecture

Google BigQuery: A Game-Changing Data Warehousing Solution

ProjectPro

JANUARY 24, 2023

Tired of relentlessly searching for the most effective and powerful data warehousing solutions on the internet? This blog is your comprehensive guide to Google BigQuery, its architecture, and a beginner-friendly tutorial on how to use Google BigQuery for your data warehousing activities. Search no more! Did you know ?

Bytes

Bytes Google Cloud Data Warehouse Cloud Storage

The Hidden Challenges of the Modern Data Stack

Ascend.io

MAY 18, 2023

The growing complexity drove a proliferation of software and data innovations, which in turn demanded highly trained data engineers to build code-based data pipelines that ensured data quality, consistency, and stability. Because data pipelines were coded from scratch, they started breaking down under the complexity.

Data Warehouse

Data Warehouse Data Pipeline Data Lake Data Engineering

What Is a Serverless Database and Why Use One

Rockset

MAY 24, 2021

Serverless computing (often just called "serverless") is a model where a cloud provider, like AWS, abstracts away the concept of servers from the user. Serverless architecture entails the dynamic allocation of resources to carry out various execution tasks. What Is Serverless? Serverless is not limited to functions.

Database

Database Google Cloud AWS Cloud Storage

How to Build a 5-Layer Modern Data Stack (with Example Tools)

Monte Carlo

JANUARY 27, 2024

Those tools include: Table of Contents Cloud storage and compute Data transformation Business Intelligence (BI) Data observability Data orchestration The most important part? Cloud storage and compute Whether you’re stacking data tools or pancakes, you always build from the bottom up.

Building

Building Business Intelligence Cloud Storage BI

?? On Track with Apache Kafka – Building a Streaming ETL Solution with Rail Data

Confluent

OCTOBER 16, 2019

Along with using Postgres (or KSQL as shown above) for analytics, the data can be streamed using Kafka Connect into S3, from where it can serve multiple roles. In S3, it can be seen as the “cold storage”, or the data lake, against which as-yet-unknown applications and processes may be run.

Kafka

Kafka Building Data Coding

Top 14 Azure Tools You Must Know in 2023

Knowledge Hut

JULY 6, 2023

It is a built-in Massively parallel processing (MPP) data lake house to handle all your infrastructure observability and security needs. It is a free standalone application that makes working with Azure Storage data on Windows, macOS, and Linux effortlessly.

Amazon Web Services

Amazon Web Services Data Lake Java SQL

20+ Data Engineering Projects for Beginners with Source Code

ProjectPro

AUGUST 24, 2021

If you are a newbie in data engineering and are interested in exploring real-world data engineering projects, check out the list of best data engineering project examples below. With the trending advance of IoT in every facet of life, technology has enabled us to handle a large amount of data ingested with high velocity.

Data Engineering

Data Engineering Data Engineer Coding Project

An In-Depth Guide to Real-Time Analytics

Striim

AUGUST 22, 2024

“Sometimes there’s so much data that old batch processing (late at night once a day or once a week) just doesn’t have time to move all data and hence the only way to do it is trickle feed data via CDC,” says Dmitriy Rudakov, Director of Solution Architecture at Striim.

Data Warehouse

Data Warehouse Retail Machine Learning Database

How Apache Iceberg Is Changing the Face of Data Lakes

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Webinars

Trending Sources

Why Open Table Format Architecture is Essential for Modern Data Systems

Webinars

Enabling Multi-User Fine-Grained Access Control for Cloud Storage in CDP

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Build an Open Data Lakehouse with Iceberg Tables, Now in Public Preview

Top Data Lake Vendors (Quick Reference Guide)

Data Lake vs. Data Warehouse: Differences and Similarities

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Discover And De-Clutter Your Unstructured Data With Aparavi

Fivetran Supports the Automation of the Modern Data Lake on Amazon S3

A Serverless Query Engine from Spare Parts

Data Lake vs Data Warehouse - Working Together in the Cloud

Cloudera Data Platform extends Hybrid Cloud vision support by supporting Google Cloud

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Demystifying Modern Data Platforms

Migrate Hive data from CDH to CDP public cloud

Open Source Object Storage For All Of Your Data

Data Pipeline- Definition, Architecture, Examples, and Use Cases

How to Build a 5-Layer Data Stack

How Much Data Do We Need? Balancing Machine Learning with Security Considerations

Consulting Case Study: Job Market Analysis

Consulting Case Study: Job Market Analysis

How to Build a 5-Layer Data Stack

Data Architect: Role Description, Skills, Certifications and When to Hire

The Good and the Bad of Databricks Lakehouse Platform

Azure Synapse vs Databricks: 2023 Comparison Guide

Unstructured Data: Examples, Tools, Techniques, and Best Practices

Moving Past ETL and ELT: Understanding the EtLT Approach

What is Azure Data Factory – Here’s Everything You Need to Know

Rethinking Data Marts in the Cloud

15 Sample GCP Projects Ideas for Beginners to Practice in 2023

Rollups on Streaming Data: Rockset vs Apache Druid

The Advantages Of Live Data-Streaming In The Competitive Financial Services Sector (Part I)

Top 12 Data Engineering Project Ideas [With Source Code]

Real-Time Data Ingestion: Snowflake, Snowpipe and Rockset

Google BigQuery: A Game-Changing Data Warehousing Solution

The Hidden Challenges of the Modern Data Stack

What Is a Serverless Database and Why Use One

How to Build a 5-Layer Modern Data Stack (with Example Tools)

?? On Track with Apache Kafka – Building a Streaming ETL Solution with Rail Data

Top 14 Azure Tools You Must Know in 2023

20+ Data Engineering Projects for Beginners with Source Code

An In-Depth Guide to Real-Time Analytics

Stay Connected