Data Architecture and Data Lake - Data Engineering Digest

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Traditional databases excelled at structured data and transactional workloads but struggled with performance at scale as data volumes grew.

Data Lake

Data Lake Cloud Storage Metadata Data Warehouse

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

Whether it’s unifying transactional and analytical data with Hybrid Tables, improving governance for an open lakehouse with Snowflake Open Catalog or enhancing threat detection and monitoring with Snowflake Horizon Catalog , Snowflake is reducing the number of moving parts to give customers a fully managed service that just works.

Data Architecture

Data Architecture Architecture Data Lake Kafka

Data Warehouses vs. Data Lakes vs. Data Marts: Need Help Deciding?

KDnuggets

OCTOBER 30, 2023

A comparative overview of data warehouses, data lakes, and data marts to help you make informed decisions on data storage solutions for your data architecture.

Data Lake

Data Lake Data Warehouse Data Storage Data

Webinars

How to Achieve High-Accuracy Results When Using LLMs

MORE WEBINARS

Data Integrity for AI: What’s Old is New Again

Precisely

JANUARY 9, 2025

The goal of this post is to understand how data integrity best practices have been embraced time and time again, no matter the technology underpinning. In the beginning, there was a data warehouse The data warehouse (DW) was an approach to data architecture and structured data management that really hit its stride in the early 1990s.

Data Integration

Data Integration Hadoop Data Warehouse Data Lake

How Marriott Modernized Their Data Architecture with Snowflake

Snowflake

SEPTEMBER 14, 2023

More than 50% of data leaders recently surveyed by BCG said the complexity of their data architecture is a significant pain point in their enterprise. As a result,” says BCG, “many companies find themselves at a tipping point, at risk of drowning in a deluge of data, overburdened with complexity and costs.”

Data Architecture

Data Architecture Architecture Hadoop Data Warehouse

Laying the Foundation for Modern Data Architecture

Cloudera

MAY 28, 2024

It’s not enough for businesses to implement and maintain a data architecture. The unpredictability of market shifts and the evolving use of new technologies means businesses need more data they can trust than ever to stay agile and make the right decisions.

Data Architecture

Data Architecture Architecture Data Lake Data Warehouse

Straining Your Data Lake Through A Data Mesh

Data Engineering Podcast

JULY 22, 2019

Summary The current trend in data management is to centralize the responsibilities of storing and curating the organization’s information to a data engineering team. This organizational pattern is reinforced by the architectural pattern of data lakes as a solution for managing storage and access.

Data Lake

Data Lake Hadoop Data Architecture

Maintaining Your Data Lake At Scale With Spark

Data Engineering Podcast

JUNE 16, 2019

Summary Building and maintaining a data lake is a choose your own adventure of tools, services, and evolving best practices. The flexibility and freedom that data lakes provide allows for generating significant value, but it can also lead to anti-patterns and inconsistent quality in your analytics.

Data Lake

Data Lake Lambda Architecture Data Warehouse Hadoop

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. Each of these architectures has its own unique strengths and tradeoffs. The schema of semi-structured data tends to evolve over time.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

Data Architecture and Strategy in the AI Era

Cloudera

MARCH 28, 2024

But, even with the backdrop of an AI-dominated future, many organizations still find themselves struggling with everything from managing data volumes and complexity to security concerns to rapidly proliferating data silos and governance challenges.

Data Architecture

Data Architecture Architecture Data Lake Data

Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana

Data Engineering Podcast

SEPTEMBER 1, 2021

Summary The Presto project has become the de facto option for building scalable open source analytics in SQL for the data lake. Another area that has been seeing a lot of activity is data lakes and projects to make them more manageable and feature complete (e.g. Hudi, Delta Lake, Iceberg, Nessie, LakeFS, etc.).

Data Lake

Data Lake Cloud AWS SQL

Modern Data Architecture for Telecommunications

Cloudera

SEPTEMBER 6, 2022

Data has continued to grow both in scale and in importance through this period, and today telecommunications companies are increasingly seeing data architecture as an independent organizational challenge, not merely an item on an IT checklist. Previously, there were three types of data structures in telco: .

Telecommunication

Telecommunication Data Architecture Architecture Government

Simplify Your Data Architecture With The Presto Distributed SQL Engine

Data Engineering Podcast

SEPTEMBER 7, 2020

In this episode he explains how it is designed to allow for querying and combining data where it resides, the use cases that such an architecture unlocks, and the innovative ways that it is being employed at companies across the world.

Architecture

Architecture Data Architecture SQL Engineering

Breaking State and Local Data Silos with Modern Data Architectures

Cloudera

AUGUST 30, 2022

Modern data architectures. To eliminate or integrate these silos, the public sector needs to adopt robust data management solutions that support modern data architectures (MDAs). Deploying modern data architectures. Lack of sharing hinders the elimination of fraud, waste, and abuse. Forrester ).

Data Architecture

Data Architecture Architecture Data Lake NoSQL

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform. Can you describe what role Trino and Iceberg play in Stripe's data architecture?

Data Lake

Data Lake High Quality Data Metadata Machine Learning

Open, Interoperable Storage with Iceberg Tables, Now Generally Available

Snowflake

JUNE 21, 2024

Snowflake is now making it even easier for customers to bring the platform’s usability, performance, governance and many workloads to more data with Iceberg tables (now generally available), unlocking full storage interoperability. Iceberg tables provide compute engine interoperability over a single copy of data.

Data Lake

Data Lake BI Business Intelligence Metadata

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

Note : Cloud Data warehouses like Snowflake and Big Query already have a default time travel feature. However, this feature becomes an absolute must-have if you are operating your analytics on top of your data lake or lakehouse. It can also be integrated into major data platforms like Snowflake.

Architecture

Architecture Systems Data Lake Google Cloud

How Column-Aware Development Tooling Yields Better Data Models

Data Engineering Podcast

JUNE 17, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. How has the move to the cloud for data warehousing/data platforms influenced the practice of data modeling?

Data Lake

Data Lake Machine Learning Metadata Data Architecture

Evaluating Change Data Capture Tools: A Comprehensive Guide

Data Engineering Weekly

AUGUST 6, 2024

CDC tools fuel analytical apps and mission-critical data feeds in banking and regulated industries, with use cases ranging from data synchronization, managing risk, and preventing fraud to driving personalization. Unlike data lakes, which are predominantly append-only, lakehouses support data mutation natively.

Data Lake

Data Lake Data Warehouse Database Data Architecture

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Cloudera

OCTOBER 23, 2024

In August, we wrote about how in a future where distributed data architectures are inevitable, unifying and managing operational and business metadata is critical to successfully maximizing the value of data, analytics, and AI.

Metadata

Metadata BI Data Lake Business Intelligence

The No-Panic Guide to Building a Data Engineering Pipeline That Actually Scales

Monte Carlo

NOVEMBER 22, 2024

Gone are the days of just dumping everything into a single database; modern data architectures typically use a combination of data lakes and warehouses. Think of your data lake as a vast reservoir where you store raw data in its original form—great for when you’re not quite sure how you’ll use it yet.

Data Engineering

Data Engineering Data Engineer Building Engineering

A Primer On Enterprise Data Curation with Todd Walter - Episode 49

Data Engineering Podcast

SEPTEMBER 23, 2018

Using the metaphor of a museum curator carefully managing the precious resources on display and in the vaults, he discusses the various layers of an enterprise data strategy. Can you walk through the stages of an ideal lifecycle for data within the context of an organizations uses for it?

Data Lake

Data Lake Data Warehouse Data Architecture Architecture

Data Engineering: A Formula 1-inspired Guide for Beginners

Towards Data Science

DECEMBER 4, 2023

Anyways, I wasn’t paying enough attention during university classes, and today I’ll walk you through data layers using — guess what — an example. Business Scenario & Data Architecture Imagine this: next year, a new team on the grid, Red Thunder Racing, will call us (yes, me and you) to set up their new data infrastructure.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

Modern Data Architectures Provide a Foundation for Innovation

Precisely

JUNE 6, 2023

At Precisely’s Trust ’23 conference, Chief Operating Officer Eric Yau hosted an expert panel discussion on modern data architectures. The group kicked off the session by exchanging ideas about what it means to have a modern data architecture.

Data Architecture

Data Architecture Architecture Metadata Data Lake

Data Pitfalls to Avoid with Data Warehouses & Data Lakes

Acceldata

DECEMBER 19, 2022

Avoid these three data pitfalls when attempting to modernize your data architecture with a data lake or data warehouse.

Data Lake

Data Lake Data Warehouse Data Architecture Data

Fivetran Supports the Automation of the Modern Data Lake on Amazon S3

phData: Data Engineering

APRIL 4, 2023

Today we want to introduce Fivetran’s support for Amazon S3 with Apache Iceberg, investigate some of the implications of this feature, and learn how it fits into the modern data architecture as a whole. Fivetran today announced support for Amazon Simple Storage Service (Amazon S3) with Apache Iceberg data lake format.

Data Lake

Data Lake Amazon Web Services Data Cleanse Data Warehouse

5 Reasons Data Discovery Platforms Are Best For Data Lakes

Monte Carlo

APRIL 1, 2021

Over the past few years, data lakes have emerged as a must-have for the modern data stack. But while the technologies powering our access and analysis of data have matured, the mechanics behind understanding this data in a distributed environment have lagged behind. Data discovery tools and platforms can help.

Data Lake

Data Lake Data Warehouse Unstructured Data Government

Data Engineering Weekly #209

Data Engineering Weekly

FEBRUARY 23, 2025

[link] Alireza Sadeghi: Open Source Data Engineering Landscape 2025 This article comprehensively overviews the 2025 open-source data engineering landscape, highlighting key trends, active projects, and emerging technologies.

Data Engineering

Data Engineering Data Engineer Engineering Kafka

Hands-On Introduction to Delta Lake with (py)Spark

Towards Data Science

FEBRUARY 15, 2023

In this context, data management in an organization is a key point for the success of its projects involving data. One of the main aspects of correct data management is the definition of a data architecture. What is Delta Lake? The data became useless. The Lakehouse architecture was one of them.

Data Lake

Data Lake Data Warehouse Hadoop Architecture

Data Fabric: The Future of Data Architecture

Monte Carlo

FEBRUARY 21, 2023

Today, as data sources become increasingly varied, data management becomes more complex, and agility and scalability become essential traits for data leaders, data fabric is quickly becoming the future of data architecture. If data fabric is the future, how can you get your organization up-to-speed?

Data Architecture

Data Architecture Architecture Metadata Unstructured Data

Data Fabric: The Future of Data Architecture

Monte Carlo

FEBRUARY 21, 2023

Today, as data sources become increasingly varied, data management becomes more complex, and agility and scalability become essential traits for data leaders, data fabric is quickly becoming the future of data architecture. If data fabric is the future, how can you get your organization up-to-speed?

Data Architecture

Data Architecture Architecture Metadata Unstructured Data

Data Lake Best Practices: The Do’s and Don’ts

Hevo

JANUARY 8, 2025

In today’s data-driven world, data lakes have emerged as the data architecture of choice when storing and analyzing large volumes of data. However, implementing a successful data lake requires diligent planning and design, as it can quickly become a data swamp with no additional value.

Data Lake

Data Lake Data Architecture Architecture Data

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

Key Differences Between AI Data Engineers and Traditional Data Engineers While traditional data engineers and AI data engineers have similar responsibilities, they ultimately differ in where they focus their efforts. Data Storage Solutions As we all know, data can be stored in a variety of ways.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

AltexSoft

AUGUST 29, 2023

In 2010, a transformative concept took root in the realm of data storage and analytics — a data lake. The term was coined by James Dixon , Back-End Java, Data, and Business Intelligence Engineer, and it started a new era in how organizations could store, manage, and analyze their data. What is a data lake?

Data Lake

Data Lake Architecture IT Amazon Web Services

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes.

Data Lake

Data Lake Data Warehouse BI SQL

A Prequel to Data Mesh

Towards Data Science

JANUARY 16, 2024

When I heard the words ‘decentralised data architecture’, I was left utterly confused at first! In my then limited experience as a Data Engineer, I had only come across centralised data architectures and they seemed to be working very well. New data formats emerged — JSON, Avro, Parquet, XML etc.

Data Warehouse

Data Warehouse Data Architecture Relational Database NoSQL

Evaluating Data Observability Tools: A Comprehensive Guide

Data Engineering Weekly

SEPTEMBER 18, 2024

The Rise of Data Observability Data observability has become increasingly critical as companies seek greater visibility into their data processes. This growing demand has found a natural synergy with the rise of the data lake. As a result, monitoring data in real time was often an afterthought.

Data Lake

Data Lake Data Pipeline Unstructured Data Data

What is a Data Mesh?

DataKitchen

AUGUST 3, 2021

The data mesh design pattern breaks giant, monolithic enterprise data architectures into subsystems or domains, each managed by a dedicated team. First-generation – expensive, proprietary enterprise data warehouse and business intelligence platforms maintained by a specialized team drowning in technical debt.

Pharmaceutical

Pharmaceutical Data Lake Data Architecture Architecture

Let Your Analysts Build A Data Lakehouse With Cuelake

Data Engineering Podcast

AUGUST 20, 2021

Summary Data lakes have been gaining popularity alongside an increase in their sophistication and usability. Despite improvements in performance and data architecture they still require significant knowledge and experience to deploy and manage. Can you describe what Cuelake is and the story behind it?

Building

Building Data Lake Data Warehouse SQL

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

Over the past decade, Cloudera has enabled multi-function analytics on data lakes through the introduction of the Hive table format and Hive ACID. Companies, on the other hand, have continued to demand highly scalable and flexible analytic engines and services on the data lake, without vendor lock-in.

Data Lake

Data Lake Business Intelligence Metadata Data Warehouse

Eight Top DataOps Trends for 2022

DataKitchen

NOVEMBER 29, 2021

Data Gets Meshier. 2022 will bring further momentum behind modular enterprise architectures like data mesh. The data mesh addresses the problems characteristic of large, complex, monolithic data architectures by dividing the system into discrete domains managed by smaller, cross-functional teams.

Data Lake

Data Lake Manufacturing Architecture Data Architecture

Transforming Data Architecture through Data Mesh and Striim

Striim

FEBRUARY 12, 2024

For instance, analytical data may uncover customer preferences or seasonal product trends, helping teams in strategic planning. This data is commonly gathered through an ETL process and stored in a central location, such as a data lake. Want to see how Striim’s Data Mesh and AI can benefit your organization?

Data Architecture

Data Architecture Architecture Data Data Lake

Understanding Modern Data Architecture

Hevo

SEPTEMBER 17, 2024

Organizations have begun to built data warehouses and lakes to analyze large amounts of data for insights and business reports. Often time they bring data from multiple data silos into their data lake and also have data stored in particular data stores like NoSQL databases to support different use cases.

Data Architecture

Data Architecture Architecture NoSQL Data Lake

A Guide to Data Pipelines (And How to Design One From Scratch)

Striim

SEPTEMBER 11, 2024

Data pipelines are the backbone of your business’s data architecture. Implementing a robust and scalable pipeline ensures you can effectively manage, analyze, and organize your growing data. Understanding the essential components of data pipelines is crucial for designing efficient and effective data architectures.

Data Pipeline

Data Pipeline Designing Data Lake Data Warehouse

How Apache Iceberg Is Changing the Face of Data Lakes

Simplifying Data Architecture and Security to Accelerate Value

Webinars

Trending Sources

Data Warehouses vs. Data Lakes vs. Data Marts: Need Help Deciding?

Webinars

Data Integrity for AI: What’s Old is New Again

How Marriott Modernized Their Data Architecture with Snowflake

Laying the Foundation for Modern Data Architecture

Straining Your Data Lake Through A Data Mesh

Maintaining Your Data Lake At Scale With Spark

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Data Architecture and Strategy in the AI Era

Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana

Modern Data Architecture for Telecommunications

Simplify Your Data Architecture With The Presto Distributed SQL Engine

Breaking State and Local Data Silos with Modern Data Architectures

Being Data Driven At Stripe With Trino And Iceberg

Open, Interoperable Storage with Iceberg Tables, Now Generally Available

Why Open Table Format Architecture is Essential for Modern Data Systems

How Column-Aware Development Tooling Yields Better Data Models

Evaluating Change Data Capture Tools: A Comprehensive Guide

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

The No-Panic Guide to Building a Data Engineering Pipeline That Actually Scales

A Primer On Enterprise Data Curation with Todd Walter - Episode 49

Data Engineering: A Formula 1-inspired Guide for Beginners

Modern Data Architectures Provide a Foundation for Innovation

Data Pitfalls to Avoid with Data Warehouses & Data Lakes

Fivetran Supports the Automation of the Modern Data Lake on Amazon S3

5 Reasons Data Discovery Platforms Are Best For Data Lakes

Data Engineering Weekly #209

Hands-On Introduction to Delta Lake with (py)Spark

Data Fabric: The Future of Data Architecture

Data Fabric: The Future of Data Architecture

Data Lake Best Practices: The Do’s and Don’ts

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

The Future of the Data Lakehouse – Open

A Prequel to Data Mesh

Evaluating Data Observability Tools: A Comprehensive Guide

What is a Data Mesh?

Let Your Analysts Build A Data Lakehouse With Cuelake

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Eight Top DataOps Trends for 2022

Transforming Data Architecture through Data Mesh and Striim

Understanding Modern Data Architecture

A Guide to Data Pipelines (And How to Design One From Scratch)

Stay Connected