Cloud Storage and Data Architecture - Data Engineering Digest

Cloud Storage

Data Architecture

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Faster compute: Iceberg's metadata layer is optimized for cloud storage, allowing for advance file and partition pruning with minimal IO overhead. Building an open data lakehouse Snowflakes goal is to help organizations establish and accelerate their open lakehouse ambitions so they can unlock more impact with less complexity.

Data Lake

Data Lake Cloud Storage Metadata Data Warehouse

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

This is particularly beneficial in complex analytical queries, where processing smaller, targeted segments of data results in quicker and more efficient query execution. Additionally, the optimized query execution and data pruning features reduce the compute cost associated with querying large datasets.

Architecture

Architecture Systems Data Lake Google Cloud

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

Bronze, Silver, and Gold – The Data Architecture Olympics? The Bronze layer is the initial landing zone for all incoming raw data, capturing it in its unprocessed, original form. This foundational layer is a repository for various data types, from transaction logs and sensor data to social media feeds and system logs.

Architecture

Architecture Raw Data Pipeline-centric Data Ingestion

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Edureka

APRIL 22, 2025

In contrast to conventional warehouses, it keeps computation and storage apart, allowing for cost-effectiveness and dynamic scaling. It provides real multi-cloud flexibility in its operations on AWS , Azure, and Google Cloud. Its multi-cluster shared data architecture is one of its primary features.

BI Pipeline-centric Data Lake Google Cloud

Open Source Object Storage For All Of Your Data

Data Engineering Podcast

SEPTEMBER 22, 2019

We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, and Data Council. Upcoming events include the O’Reilly AI conference, the Strata Data conference, the combined events of the Data Architecture Summit and Graphorum, and Data Council in Barcelona.

AWS

AWS Google Cloud Cloud Storage Data Lake

Cloudera Data Engineering 2021 Year End Review

Cloudera

DECEMBER 21, 2021

Cloudera’s Shared Data Experience (SDX) provides all these capabilities allowing seamless data sharing across all the Data Services including CDE. A new capability called Ranger Authorization Service (RAZ) provides fine grained authorization on cloud storage. Modernizing pipelines.

Data Engineer

Data Engineer Data Engineering Engineering Data Pipeline

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

The consumption of the data should be supported through an elastic delivery layer that aligns with demand, but also provides the flexibility to present the data in a physical format that aligns with the analytic application, ranging from the more traditional data warehouse view to a graph view in support of relationship analysis.

Data Lake

Data Lake Analytics Application Cloud Storage Architecture

How Much Data Do We Need? Balancing Machine Learning with Security Considerations

Towards Data Science

DECEMBER 15, 2023

So, you set up data systems and start filling up those tables or topics. After a few years go by you can end up with huge volumes of data. Maybe you need to scale up to a cloud storage provider like Snowflake or AWS to keep up and make all this data accessible at the pace you need. You’re using the data, of course!

Machine Learning

Machine Learning Data Science Data Security Data Storage

Data Architect: Role Description, Skills, Certifications and When to Hire

AltexSoft

FEBRUARY 11, 2023

To get a better understanding of a data architect’s role, let’s clear up what data architecture is. Data architecture is the organization and design of how data is collected, transformed, integrated, stored, and used by a company. Sample of a high-level data architecture blueprint for Azure BI programs.

Data Architect

Data Architect Certification Generalist Big Data

Microsoft Fabric vs Power BI: Key Differences & Which to Use

Edureka

APRIL 14, 2025

Integration with Azure and Data Sources Fabric is deeply integrated with Azure tools such as Synapse, Data Factory, and OneLake. This allows seamless data movement and end-to-end workflows within the same environment. Its flexibility suits advanced users creating end-to-end data solutions.

BI Business Intelligence Raw Data Retail

How ATB Financial is Utilizing Hybrid Cloud to Reduce the Time to Value for Big Data Analytics by 90 Percent

Cloudera

FEBRUARY 7, 2019

With the right technology now in place, ATB Financial is landing and curating more data than ever to bring data-driven insights to the business and its customers. Implementing a Modern Data Architecture.

Big Data

Big Data Utilities Google Cloud Data Analytics

Fivetran Supports the Automation of the Modern Data Lake on Amazon S3

phData: Data Engineering

APRIL 4, 2023

Today we want to introduce Fivetran’s support for Amazon S3 with Apache Iceberg, investigate some of the implications of this feature, and learn how it fits into the modern data architecture as a whole. Fivetran today announced support for Amazon Simple Storage Service (Amazon S3) with Apache Iceberg data lake format.

Data Lake

Data Lake Amazon Web Services Data Cleanse Data Warehouse

Unstructured Data: Examples, Tools, Techniques, and Best Practices

AltexSoft

MAY 12, 2023

These robust security measures ensure that data is always secure and private. There are several widely used unstructured data storage solutions such as data lakes (e.g., Amazon S3, Google Cloud Storage, Microsoft Azure Blob Storage), NoSQL databases (e.g., Build data architecture.

Unstructured Data

Unstructured Data NoSQL Hadoop Data Lake

The Advantages Of Live Data-Streaming In The Competitive Financial Services Sector (Part I)

Cloudera

AUGUST 21, 2020

Data-in-motion is predominantly about streaming data so enterprises typically have two different ways or binary ways of looking at data. This can extend to streaming analytics capabilities into any cloud environment.

Banking

Banking Kafka Cloud Storage Government

Best Online Courses with Certificates in 2024 [Free + Paid]

Knowledge Hut

DECEMBER 26, 2023

You will retain use of the following Google Cloud application deployment environments: App Engine, Kubernetes Engine, and Compute Engine. Select and use one of Google Cloud's storage solutions, which include Cloud Storage, Cloud SQL, Cloud Bigtable, and Firestore.

Certification

Certification Java Google Cloud Education

Real-Time Data Ingestion: Snowflake, Snowpipe and Rockset

Rockset

AUGUST 4, 2021

Organizations that depend on data for their success and survival need robust, scalable data architecture, typically employing a data warehouse for analytics needs. Snowflake is often their cloud-native data warehouse of choice. Snowflake provides a couple of ways to load data.

Data Ingestion

Data Ingestion Cloud Storage Data Warehouse Architecture

7 key points to successfully upgrade from Pentaho to Apache Hop

know.bi

JUNE 15, 2022

A lot of new transforms and actions have appeared, and Apache Hop integrates a lot better with existing data architectures. Container and cloud support : Hop comes with a pre-built container image for long-lived (Hop Server) and short-lived (Hop Run) scenarios.

Metadata

Metadata Data Integration Cloud Storage Project

Azure Synapse vs Databricks: 2023 Comparison Guide

Knowledge Hut

SEPTEMBER 26, 2023

Key connectivity features include: Data Ingestion: Databricks supports data ingestion from a variety of sources, including data lakes, databases, streaming platforms, and cloud storage. This flexibility allows organizations to ingest data from virtually anywhere.

Data Lake

Data Lake Database-centric Machine Learning Pipeline-centric

The Good and the Bad of Databricks Lakehouse Platform

AltexSoft

MARCH 30, 2023

Databricks lakehouse platform architecture. Source: Databricks Delta Lake is an open-source, file-based storage layer that adds reliability and functionality to existing data lakes built on Amazon S3, Google Cloud Storage, Azure Data Lake Storage, Alibaba Cloud, HDFS ( Hadoop distributed file system), and others.

Scala

Scala Data Lake Machine Learning BI

How to Become an Azure Data Engineer in 2023?

ProjectPro

JANUARY 19, 2022

Hadoop, MongoDB, and Kafka are popular Big Data tools and technologies a data engineer needs to be familiar with. Companies are increasingly substituting physical servers with cloud services, so data engineers need to know about cloud storage and cloud computing.

Data Engineer

Data Engineer Data Engineering Engineering Data Storage

Moving Past ETL and ELT: Understanding the EtLT Approach

Ascend.io

AUGUST 31, 2023

In the dynamic world of data, many professionals are still fixated on traditional patterns of data warehousing and ETL, even while their organizations are migrating to the cloud and adopting cloud-native data services. Central to this transformation are two shifts.

Data Lake

Data Lake Data Warehouse ETL Tools Data Pipeline

SQL for Data Engineering: Success Blueprint for Data Engineers

ProjectPro

FEBRUARY 16, 2023

They must load the raw data into a data warehouse for this analysis. There are numerous ways to import data into a data warehouse using SQL. For instance, data engineers can easily transfer the data onto a cloud storage system and load the raw data into their data warehouse using the COPY INTO command.

Data Engineer

Data Engineer Data Engineering SQL Engineering

Data Pipeline- Definition, Architecture, Examples, and Use Cases

ProjectPro

DECEMBER 7, 2021

What is a Big Data Pipeline? Data pipelines have evolved to manage big data, just like many other elements of data architecture. Big data pipelines are data pipelines designed to support one or more of the three characteristics of big data (volume, variety, and velocity).

Data Pipeline

Data Pipeline Architecture Kafka AWS

The No-Panic Guide to Building a Data Engineering Pipeline That Actually Scales

Monte Carlo

NOVEMBER 22, 2024

Gone are the days of just dumping everything into a single database; modern data architectures typically use a combination of data lakes and warehouses. Think of your data lake as a vast reservoir where you store raw data in its original form—great for when you’re not quite sure how you’ll use it yet.

Data Engineer

Data Engineer Data Engineering Building Engineering

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData: Data Engineering

SEPTEMBER 27, 2024

But with modern cloud storage solutions and clever techniques like log compaction (where obsolete entries are removed), this is becoming less and less of an issue. The benefits of log-based approaches often far outweigh the storage costs. It’s a place where raw, unaltered customer data is stored indefinitely.

Data

Data Raw Data Data Lake Architecture

20 Solved End-to-End Big Data Projects with Source Code

ProjectPro

MAY 31, 2021

Data Description: You will use the Covid-19 dataset(COVID-19 Cases.csv) from data.world , for this project, which contains a few of the following attributes: people_positive_cases_count county_name case_type data_source Language Used: Python 3.7 What are the main components of a big data architecture?

Big Data

Big Data Coding Project Hadoop

Envisioning LakeDB: The Next Evolution of the Lakehouse Architecture

Data Engineering Weekly

JANUARY 24, 2025

The world of data management is undergoing a rapid transformation. The rise of cloud storage, coupled with the increasing demand for real-time analytics, has led to the emergence of the Data Lakehouse. This paradigm combines the flexibility of data lakes with the performance and reliability of data warehouses.

Architecture

Architecture Metadata Data Ingestion Data Lake

How Apache Iceberg Is Changing the Face of Data Lakes

Why Open Table Format Architecture is Essential for Modern Data Systems

Webinars

Trending Sources

The Race For Data Quality in a Medallion Architecture

Webinars

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Open Source Object Storage For All Of Your Data

Cloudera Data Engineering 2021 Year End Review

Demystifying Modern Data Platforms

How Much Data Do We Need? Balancing Machine Learning with Security Considerations

Data Architect: Role Description, Skills, Certifications and When to Hire

Microsoft Fabric vs Power BI: Key Differences & Which to Use

How ATB Financial is Utilizing Hybrid Cloud to Reduce the Time to Value for Big Data Analytics by 90 Percent

Fivetran Supports the Automation of the Modern Data Lake on Amazon S3

Unstructured Data: Examples, Tools, Techniques, and Best Practices

The Advantages Of Live Data-Streaming In The Competitive Financial Services Sector (Part I)

Best Online Courses with Certificates in 2024 [Free + Paid]

Real-Time Data Ingestion: Snowflake, Snowpipe and Rockset

7 key points to successfully upgrade from Pentaho to Apache Hop

Azure Synapse vs Databricks: 2023 Comparison Guide

The Good and the Bad of Databricks Lakehouse Platform

How to Become an Azure Data Engineer in 2023?

Moving Past ETL and ELT: Understanding the EtLT Approach

SQL for Data Engineering: Success Blueprint for Data Engineers

Data Pipeline- Definition, Architecture, Examples, and Use Cases

The No-Panic Guide to Building a Data Engineering Pipeline That Actually Scales

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

20 Solved End-to-End Big Data Projects with Source Code

Envisioning LakeDB: The Next Evolution of the Lakehouse Architecture

Stay Connected