AWS, Data Architecture and Data Lake - Data Engineering Digest

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

Whether it’s unifying transactional and analytical data with Hybrid Tables, improving governance for an open lakehouse with Snowflake Open Catalog or enhancing threat detection and monitoring with Snowflake Horizon Catalog , Snowflake is reducing the number of moving parts to give customers a fully managed service that just works.

Data Architecture

Data Architecture Architecture Data Lake Kafka

How Marriott Modernized Their Data Architecture with Snowflake

Snowflake

SEPTEMBER 14, 2023

More than 50% of data leaders recently surveyed by BCG said the complexity of their data architecture is a significant pain point in their enterprise. As a result,” says BCG, “many companies find themselves at a tipping point, at risk of drowning in a deluge of data, overburdened with complexity and costs.”

Data Architecture

Data Architecture Architecture Hadoop Data Warehouse

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. Each of these architectures has its own unique strengths and tradeoffs.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Edureka

APRIL 22, 2025

It incorporates elements from several Microsoft products working together, like Power BI, Azure Synapse Analytics, Data Factory, and OneLake, into a single SaaS experience. It provides real multi-cloud flexibility in its operations on AWS , Azure, and Google Cloud.

BI

BI Pipeline-centric Data Lake Google Cloud

Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana

Data Engineering Podcast

SEPTEMBER 1, 2021

Summary The Presto project has become the de facto option for building scalable open source analytics in SQL for the data lake. Another area that has been seeing a lot of activity is data lakes and projects to make them more manageable and feature complete (e.g. Hudi, Delta Lake, Iceberg, Nessie, LakeFS, etc.).

Data Lake

Data Lake Cloud AWS SQL

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform. Can you describe what role Trino and Iceberg play in Stripe's data architecture?

Data Lake

Data Lake High Quality Data Metadata Machine Learning

Simplify Your Data Architecture With The Presto Distributed SQL Engine

Data Engineering Podcast

SEPTEMBER 7, 2020

In this episode he explains how it is designed to allow for querying and combining data where it resides, the use cases that such an architecture unlocks, and the innovative ways that it is being employed at companies across the world.

Architecture

Architecture Data Architecture SQL Engineering

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Cloudera

OCTOBER 23, 2024

In August, we wrote about how in a future where distributed data architectures are inevitable, unifying and managing operational and business metadata is critical to successfully maximizing the value of data, analytics, and AI. Both platforms are free to try today.

Metadata

Metadata BI Data Lake Business Intelligence

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

Note : Cloud Data warehouses like Snowflake and Big Query already have a default time travel feature. However, this feature becomes an absolute must-have if you are operating your analytics on top of your data lake or lakehouse. It can also be integrated into major data platforms like Snowflake.

Architecture

Architecture Systems Data Lake Google Cloud

Data Engineering Weekly #209

Data Engineering Weekly

FEBRUARY 23, 2025

[link] Alireza Sadeghi: Open Source Data Engineering Landscape 2025 This article comprehensively overviews the 2025 open-source data engineering landscape, highlighting key trends, active projects, and emerging technologies.

Data Engineering

Data Engineering Data Engineer Engineering Kafka

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

Key Differences Between AI Data Engineers and Traditional Data Engineers While traditional data engineers and AI data engineers have similar responsibilities, they ultimately differ in where they focus their efforts. Data Storage Solutions As we all know, data can be stored in a variety of ways.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Data Engineering: A Formula 1-inspired Guide for Beginners

Towards Data Science

DECEMBER 4, 2023

Anyways, I wasn’t paying enough attention during university classes, and today I’ll walk you through data layers using — guess what — an example. Business Scenario & Data Architecture Imagine this: next year, a new team on the grid, Red Thunder Racing, will call us (yes, me and you) to set up their new data infrastructure.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

Modern Data Architectures Provide a Foundation for Innovation

Precisely

JUNE 6, 2023

At Precisely’s Trust ’23 conference, Chief Operating Officer Eric Yau hosted an expert panel discussion on modern data architectures. The group kicked off the session by exchanging ideas about what it means to have a modern data architecture.

Data Architecture

Data Architecture Architecture Metadata Data Lake

Fivetran Supports the Automation of the Modern Data Lake on Amazon S3

phData: Data Engineering

APRIL 4, 2023

Today we want to introduce Fivetran’s support for Amazon S3 with Apache Iceberg, investigate some of the implications of this feature, and learn how it fits into the modern data architecture as a whole. Fivetran today announced support for Amazon Simple Storage Service (Amazon S3) with Apache Iceberg data lake format.

Data Lake

Data Lake Amazon Web Services Data Cleanse Data Warehouse

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

AltexSoft

AUGUST 29, 2023

In 2010, a transformative concept took root in the realm of data storage and analytics — a data lake. The term was coined by James Dixon , Back-End Java, Data, and Business Intelligence Engineer, and it started a new era in how organizations could store, manage, and analyze their data. What is a data lake?

Data Lake

Data Lake Architecture IT Amazon Web Services

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

Cloudera customers run some of the biggest data lakes on earth. These lakes power mission critical large scale data analytics, business intelligence (BI), and machine learning use cases, including enterprise data warehouses. On data warehouses and data lakes.

Data Lake

Data Lake Data Warehouse BI SQL

How to Become a Microsoft Fabric Engineer?

Edureka

APRIL 9, 2025

Imagine being in charge of creating an intelligent data universe where collaboration, analytics, and artificial intelligence all work together harmoniously. Companies with expertise in Microsoft Fabric are in high demand, including Microsoft, Accenture, AWS, and Deloitte Are you prepared to influence the data-driven future?

Engineering

Engineering Data Ingestion Data Lake Programming Language

Open Source Object Storage For All Of Your Data

Data Engineering Podcast

SEPTEMBER 22, 2019

Summary Object storage is quickly becoming the unifying layer for data intensive applications and analytics. Modern, cloud oriented data warehouses and data lakes both rely on the durability and ease of use that it provides.

AWS

AWS Google Cloud Cloud Storage Data Lake

Escaping Analysis Paralysis For Your Data Platform With Data Virtualization

Data Engineering Podcast

NOVEMBER 18, 2019

Summary With the constant evolution of technology for data management it can seem impossible to make an informed decision about whether to build a data warehouse, or a data lake, or just leave your data wherever it currently rests. How does it influence the relevancy of data warehouses or data lakes?

Data Lake

Data Lake Scala Data Warehouse Hadoop

Centralize Your Data Processes With a DataOps Process Hub

DataKitchen

NOVEMBER 4, 2021

Data organizations often have a mix of centralized and decentralized activity. DataOps concerns itself with the complex flow of data across teams, data centers and organizational boundaries. It expands beyond tools and data architecture and views the data organization from the perspective of its processes and workflows.

Process

Process Data Process Pharmaceutical Data Lake

Data Orchestration For Hybrid Cloud Analytics

Data Engineering Podcast

OCTOBER 21, 2019

This week’s episode is also sponsored by Datacoral, an AWS-native, serverless, data infrastructure that installs in your VPC. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, Alluxio, and Data Council.

Cloud

Cloud Hadoop Data Lake Programming Language

Data Architect: Role Description, Skills, Certifications and When to Hire

AltexSoft

FEBRUARY 11, 2023

To get a better understanding of a data architect’s role, let’s clear up what data architecture is. Data architecture is the organization and design of how data is collected, transformed, integrated, stored, and used by a company. Sample of a high-level data architecture blueprint for Azure BI programs.

Data Architect

Data Architect Certification Generalist Big Data

Data Engineering Weekly #161

Data Engineering Weekly

MARCH 3, 2024

The migration enhanced data quality, lineage visibility, performance improvements, cost reductions, and better reliability and scalability, setting a robust foundation for future expansions and onboarding. This approach helps maintain accuracy, relevance, and compliance in generative AI applications.

Data Engineering

Data Engineering Data Engineer Engineering Pipeline-centric

How Much Data Do We Need? Balancing Machine Learning with Security Considerations

Towards Data Science

DECEMBER 15, 2023

Even the best of us sometimes demonize the parts of our organization whose primary goals are in the privacy and security area and conflict with our wishes to splash around in the data lake. In reality, data scientists are not always the heroes and IT and security teams are not the villains. You’re using the data, of course!

Machine Learning

Machine Learning Data Science Data Security Data Storage

The Good and the Bad of Databricks Lakehouse Platform

AltexSoft

MARCH 30, 2023

What is Databricks Databricks is an analytics platform with a unified set of tools for data engineering, data management , data science, and machine learning. It combines the best elements of a data warehouse, a centralized repository for structured data, and a data lake used to host large amounts of raw data.

Scala

Scala Data Lake Machine Learning BI

Bynder Builds a Foundation for the Future on Snowflake’s Data Cloud

Snowflake

AUGUST 10, 2023

With major clients including Spotify, Puma, Five Guys, and Icelandair, Bynder uses large amounts of data to provide dashboards and open APIs to its customers, as well as vital operational insights to internal users. But when the company started to experience rapid growth, it noticed performance issues with its data architecture. “

Cloud

Cloud Building Amazon Web Services BI

Data Lakehouse: Concept, Key Features, and Architecture Layers

AltexSoft

NOVEMBER 10, 2021

The pun being obvious, there’s more to that than just a new term: Data lakehouses combine the best features of both data lakes and data warehouses and this post will explain this all. What is a data lakehouse? Data warehouse vs data lake vs data lakehouse: What’s the difference.

Architecture

Architecture Data Lake Data Warehouse Metadata

The State of Data Engineering in 2024: Key Insights and Trends

Data Engineering Weekly

DECEMBER 16, 2024

Evolution of Data Lake Technologies The data lake ecosystem has matured significantly in 2024, particularly in table formats and storage technologies. S3 Tables and Cloud Integration AWS’s introduction of S3 Tables marked a pivotal shift, enabling faster queries and easier management.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Data Pipeline- Definition, Architecture, Examples, and Use Cases

ProjectPro

DECEMBER 7, 2021

The Importance of a Data Pipeline What is an ETL Data Pipeline? What is a Big Data Pipeline? Features of a Data Pipeline Data Pipeline Architecture How to Build an End-to-End Data Pipeline from Scratch? The transformed data is then placed into the destination data warehouse or data lake.

Data Pipeline

Data Pipeline Architecture Kafka AWS

Three Reference Architectures for Real-Time Analytics On Streaming Data

Rockset

APRIL 26, 2023

We’ve noticed many common patterns across streaming data architectures and we’ll be sharing a blueprint for three of the most popular: anomaly detection, IoT, and recommendations. Offline feature store : Detecting anomalies requires historical data in order to have a baseline for comparisons. The database has two primary jobs.

Architecture

Architecture Transportation Data Lake Insurance

Top 10 Azure Data Engineer Job Opportunities in 2024 [Career Options]

Knowledge Hut

MARCH 28, 2024

Role Level Intermediate Responsibilities Design and develop data pipelines to ingest, process, and transform data. Implemented and managed data storage solutions using Azure services like Azure SQL Database , Azure Data Lake Storage, and Azure Cosmos DB.

Data Engineering

Data Engineering Data Engineer Engineering Data Warehouse

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Monte Carlo

JUNE 14, 2023

The modern data stack era , roughly 2017 to present data, saw the widespread adoption of cloud computing and modern data repositories that decoupled storage from compute such as data warehouses, data lakes, and data lakehouses. They also recently acquired Apache Flink , another streaming solution.

Data Pipeline

Data Pipeline Architecture Data Lake Data Warehouse

Azure Synapse vs Databricks: 2023 Comparison Guide

Knowledge Hut

SEPTEMBER 26, 2023

Key connectivity features include: Data Ingestion: Databricks supports data ingestion from a variety of sources, including data lakes, databases, streaming platforms, and cloud storage. This flexibility allows organizations to ingest data from virtually anywhere.

Data Lake

Data Lake Database-centric Machine Learning Pipeline-centric

Moving Past ETL and ELT: Understanding the EtLT Approach

Ascend.io

AUGUST 31, 2023

In the dynamic world of data, many professionals are still fixated on traditional patterns of data warehousing and ETL, even while their organizations are migrating to the cloud and adopting cloud-native data services. Central to this transformation are two shifts.

Data Lake

Data Lake Data Warehouse ETL Tools Data Pipeline

Data Engineering Weekly #169

Data Engineering Weekly

APRIL 28, 2024

Iris tackles the challenge of extracting actionable insights from complex, cross-platform metrics by routing data in real-time to systems like InfluxDB and offline to AWS-based data lakes.

Data Engineering

Data Engineering Data Engineer Engineering Hospitality

Azure Data Engineer Resume

Edureka

FEBRUARY 9, 2023

Some of the top skills to include are: Experience with Azure data storage solutions: Azure Data Engineers should have hands-on experience with various Azure data storage solutions such as Azure Cosmos DB, Azure Data Lake Storage, and Azure Blob Storage.

Data Engineering

Data Engineering Data Engineer Engineering Amazon Web Services

Unstructured Data: Examples, Tools, Techniques, and Best Practices

AltexSoft

MAY 12, 2023

Unstructured data , on the other hand, is unpredictable and has no fixed schema, making it more challenging to analyze. Without a fixed schema, the data can vary in structure and organization. There are several widely used unstructured data storage solutions such as data lakes (e.g., Build data architecture.

Unstructured Data

Unstructured Data NoSQL Hadoop Data Lake

A Comprehensive Overview of Microsoft Fabric & Its Use Cases

RandomTrees

SEPTEMBER 27, 2024

Data Factory, Data Activator, Power BI, Synapse Real-Time Analytics, Synapse Data Engineering, Synapse Data Science, and Synapse Data Warehouse are some of them. With One Lake serving as a primary multi-cloud repository, Fabric is designed with an open, lake-centric architecture.

Database-centric

Database-centric Pipeline-centric IT BI

[O’Reilly Book] Chapter 1: Why Data Quality Deserves Attention Now

Monte Carlo

AUGUST 31, 2023

Data pipelines can handle both batch and streaming data, and at a high-level, the methods for measuring data quality for either type of asset are much the same. We’ll take a closer look at variables that can impact your data next. What is a decentralized data architecture?

Data Lake

Data Lake Data Pipeline Unstructured Data Data Warehouse

The Top 25 Data Engineering Influencers and Content Creators on LinkedIn

Databand.ai

DECEMBER 13, 2022

He also has more than 10 years of experience in big data, being among the few data engineers to work on Hadoop Big Data Analytics prior to the adoption of public cloud providers like AWS, Azure, and Google Cloud Platform. He is also an AWS Certified Solutions Architect and AWS Certified Big Data expert.

Data Engineering

Data Engineering Data Engineer Engineering AWS

The Future of Data Engineering and Data Engineers

Knowledge Hut

JULY 5, 2024

Cloud Era: Cloud platforms like AWS and Azure took center stage, making sophisticated data solutions accessible to all. Modern Landscape: Today, Data Engineering involves slick ETL processes, real-time streaming, and the concept of data lakes, shaping the backbone of our data-driven world.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Snowflake Architecture and It's Fundamental Concepts

ProjectPro

JANUARY 31, 2022

Snowflake Features that Make Data Science Easier Building Data Applications with Snowflake Data Warehouse Snowflake Data Warehouse Architecture How Does Snowflake Store Data Internally? The query processing layer is separated from the disk storage layer in the Snowflake data architecture.

Architecture

Architecture IT Data Warehouse Amazon Web Services

Azure Synapse vs. Databricks – What Are the Differences?

Edureka

JULY 4, 2024

Azure Synapse offers a second layer of encryption for data at rest using customer-managed keys stored in Azure Key Vault, providing enhanced data security and control over key management. Cost-Effective Data Lake Integration Azure Synapse lets you ditch the traditional separation between SQL and Spark for data lake exploration.

Data Lake

Data Lake Pipeline-centric Data Warehouse ETL Tools

Forge Your Career Path with Best Data Engineering Certifications

ProjectPro

FEBRUARY 21, 2023

AWS or Azure? With so many data engineering certifications available , choosing the right one can be a daunting task. This section mainly focuses on the three most valuable and popular vendor-specific data engineering certifications- AWS, Azure , and GCP. Cloudera or Databricks?

Certification

Certification Data Engineering Data Engineer Engineering

Simplifying Data Architecture and Security to Accelerate Value

How Marriott Modernized Their Data Architecture with Snowflake

Webinars

Trending Sources

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Webinars

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana

Being Data Driven At Stripe With Trino And Iceberg

Simplify Your Data Architecture With The Presto Distributed SQL Engine

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Why Open Table Format Architecture is Essential for Modern Data Systems

Data Engineering Weekly #209

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Data Engineering: A Formula 1-inspired Guide for Beginners

Modern Data Architectures Provide a Foundation for Innovation

Fivetran Supports the Automation of the Modern Data Lake on Amazon S3

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

The Future of the Data Lakehouse – Open

How to Become a Microsoft Fabric Engineer?

Open Source Object Storage For All Of Your Data

Escaping Analysis Paralysis For Your Data Platform With Data Virtualization

Centralize Your Data Processes With a DataOps Process Hub

Data Orchestration For Hybrid Cloud Analytics

Data Architect: Role Description, Skills, Certifications and When to Hire

Data Engineering Weekly #161

How Much Data Do We Need? Balancing Machine Learning with Security Considerations

The Good and the Bad of Databricks Lakehouse Platform

Bynder Builds a Foundation for the Future on Snowflake’s Data Cloud

Data Lakehouse: Concept, Key Features, and Architecture Layers

The State of Data Engineering in 2024: Key Insights and Trends

Data Pipeline- Definition, Architecture, Examples, and Use Cases

Three Reference Architectures for Real-Time Analytics On Streaming Data

Top 10 Azure Data Engineer Job Opportunities in 2024 [Career Options]

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Azure Synapse vs Databricks: 2023 Comparison Guide

Moving Past ETL and ELT: Understanding the EtLT Approach

Data Engineering Weekly #169

Azure Data Engineer Resume

Unstructured Data: Examples, Tools, Techniques, and Best Practices

A Comprehensive Overview of Microsoft Fabric & Its Use Cases

[O’Reilly Book] Chapter 1: Why Data Quality Deserves Attention Now

The Top 25 Data Engineering Influencers and Content Creators on LinkedIn

The Future of Data Engineering and Data Engineers

Snowflake Architecture and It's Fundamental Concepts

Azure Synapse vs. Databricks – What Are the Differences?

Forge Your Career Path with Best Data Engineering Certifications

Stay Connected