Data Architecture, Data Storage and Metadata

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Traditional databases excelled at structured data and transactional workloads but struggled with performance at scale as data volumes grew.

Data Lake

Data Lake Cloud Storage Metadata Data Warehouse

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

The world we live in today presents larger datasets, more complex data, and diverse needs, all of which call for efficient, scalable data systems. Though basic and easy to use, traditional table storage formats struggle to keep up. Track data files within the table along with their column statistics.

Architecture

Architecture Systems Data Lake Google Cloud

Hands-On Introduction to Delta Lake with (py)Spark

Towards Data Science

FEBRUARY 15, 2023

Concepts, theory, and functionalities of this modern data storage framework Photo by Nick Fewings on Unsplash Introduction I think it’s now perfectly clear to everybody the value data can have. To use a hyped example, models like ChatGPT could only be built on a huge mountain of data, produced and collected over years.

Data Lake

Data Lake Data Warehouse Hadoop Architecture

The State of Data Engineering in 2024: Key Insights and Trends

Data Engineering Weekly

DECEMBER 16, 2024

Grab’s Metasense , Uber’s DataK9 , and Meta’s classification systems use AI to automatically categorize vast data sets, reducing manual efforts and improving accuracy. Beyond classification, organizations now use AI for automated metadata generation and data lineage tracking, creating more intelligent data infrastructures.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

The Evolution of Table Formats

Monte Carlo

MAY 14, 2024

As organizations seek greater value from their data, data architectures are evolving to meet the demand — and table formats are no exception. At its core, a table format is a sophisticated metadata layer that defines, organizes, and interprets multiple underlying data files.

Data Lake

Data Lake Metadata Hadoop Data Governance

DataOps Architecture: 5 Key Components and How to Get Started

Databand.ai

AUGUST 30, 2023

A DataOps architecture is the structural foundation that supports the implementation of DataOps principles within an organization. It encompasses the systems, tools, and processes that enable businesses to manage their data more efficiently and effectively. As a result, they can be slow, inefficient, and prone to errors.

Architecture

Architecture Data Ingestion Data Governance Data Cleanse

Data Pipeline Observability: A Model For Data Engineers

Databand.ai

JUNE 28, 2023

They’re betting their business on it and that the data pipelines that run it will continue to work. Context is crucial (and often lacking) A major cause of data quality issues and pipeline failures are transformations within those pipelines. Most data architecture today is opaque—you can’t tell what’s happening inside.

Data Pipeline

Data Pipeline Data Engineering Data Engineer Engineering

Data Vault Architecture, Data Quality Challenges, And How To Solve Them

Monte Carlo

FEBRUARY 9, 2023

Over the past several years, data warehouses have evolved dramatically, but that doesn’t mean the fundamentals underpinning sound data architecture needs to be thrown out the window. Data vault collects and organizes raw data as underlying structure to act as the source to feed Kimball or Inmon dimensional models.

Architecture

Architecture Raw Data Metadata Data Warehouse

Data Engineering Weekly #193

Data Engineering Weekly

OCTOBER 13, 2024

link] Lak Lakshmanan: What goes into bronze, silver, and gold layers of a medallion data architecture? If I understand correctly, the gist of the article is where you position the common data model/ metrics that can be used across the organization. I think these layers are a guiding principle instead of a strict framework.

Data Engineering

Data Engineering Data Engineer Engineering PostgreSQL

Data Scientist vs Data Engineer: Differences and Why You Need Both

AltexSoft

OCTOBER 30, 2021

Data engineer’s integral task is building and maintaining data infrastructure — the system managing the flow of data from its source to destination. This typically includes setting up two processes: an ETL pipeline , which moves data, and a data storage (typically, a data warehouse ), where it’s kept.

Data Engineering

Data Engineering Data Engineer Engineering Machine Learning

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

AltexSoft

AUGUST 29, 2023

In 2010, a transformative concept took root in the realm of data storage and analytics — a data lake. The term was coined by James Dixon , Back-End Java, Data, and Business Intelligence Engineer, and it started a new era in how organizations could store, manage, and analyze their data.

Data Lake

Data Lake Architecture IT Amazon Web Services

Zero-ETL, ChatGPT, And The Future of Data Engineering

Towards Data Science

APRIL 3, 2023

At the moment, this tight integration is possible because most zero-ETL architectures require both the transactional database and data warehouse to be from the same cloud provider. No duplicate data storage. Cons : Less ability to customize how the data is treated during the ingestion phase. Pros : Reduced latency.

Data Engineering

Data Engineering Data Engineer Engineering Data Warehouse

Data Lakehouse: Concept, Key Features, and Architecture Layers

AltexSoft

NOVEMBER 10, 2021

The pun being obvious, there’s more to that than just a new term: Data lakehouses combine the best features of both data lakes and data warehouses and this post will explain this all. What is a data lakehouse? Another type of data storage — a data lake — tried to address these and other issues.

Architecture

Architecture Data Lake Data Warehouse Metadata

Data Engineering Glossary

Silectis

JANUARY 3, 2021

Big Query Google’s cloud data warehouse. Data Architecture Data architecture is a composition of models, rules, and standards for all data systems and interactions between them. Data Catalog An organized inventory of data assets relying on metadata to help with data management.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Unstructured Data: Examples, Tools, Techniques, and Best Practices

AltexSoft

MAY 12, 2023

For example, developers can use Twitter API to access and collect public tweets, user profiles, and other data from the Twitter platform. Data ingestion tools are software applications or services designed to collect, import, and process data from various sources into a central data storage system or repository.

Unstructured Data

Unstructured Data NoSQL Hadoop Data Lake

Modernizing Data Warehousing with Snowflake and Hybrid Data Vault

Snowflake

APRIL 5, 2023

Traditionally, the dimensional data modeling approach is used to build complex data warehouses, while Data Vaults are used in data warehouses to offer long-term historical data storage while modeling. Why is data modeling important for a data warehouse?

Data Warehouse

Data Warehouse Healthcare Unstructured Data Metadata

Snowflake Architecture and It's Fundamental Concepts

ProjectPro

JANUARY 31, 2022

Snowflake in Action at Western Union Snowflake's multi-cluster shared data architecture expanded instantaneously to serve Western Union's data, users, and workloads without causing resource conflict. The query processing layer is separated from the disk storage layer in the Snowflake data architecture.

Architecture

Architecture IT Data Warehouse Amazon Web Services

15+ Must Have Data Engineer Skills in 2023

Knowledge Hut

NOVEMBER 28, 2023

Technical Data Engineer Skills 1.Python Python Python is one of the most looked upon and popular programming languages, using which data engineers can create integrations, data pipelines, integrations, automation, and data cleansing and analysis.

Data Engineering

Data Engineering Data Engineer Engineering Generalist

100+ Big Data Interview Questions and Answers 2023

ProjectPro

JANUARY 31, 2023

There are three steps involved in the deployment of a big data model: Data Ingestion: This is the first step in deploying a big data model - Data ingestion, i.e., extracting data from multiple data sources. Data Variety Hadoop stores structured, semi-structured and unstructured data.

Big Data

Big Data Hadoop Relational Database AWS

How to Become a Big Data Engineer in 2023

ProjectPro

SEPTEMBER 26, 2021

Becoming a Big Data Engineer - The Next Steps Big Data Engineer - The Market Demand An organization’s data science capabilities require data warehousing and mining, modeling, data infrastructure, and metadata management. Most of these are performed by Data Engineers.

Big Data

Big Data Data Engineering Data Engineer Engineering

Data Governance: Framework, Tools, Principles, Benefits

Knowledge Hut

APRIL 20, 2023

This may involve assigning data stewards and data custodians and implementing data governance committees. Data architecture and metadata management: It involves establishing a framework for data management, including data architecture and metadata management.

Data Governance

Data Governance Government Data Cleanse Data Security

What is a Data Mesh — and How Not to Mesh it Up

Monte Carlo

JULY 30, 2020

In the age of self-service business intelligence , nearly every company considers themselves a data-first company, but not every company is treating their data architecture with the level of democratization and scalability it deserves. Your company, for one, views data as a driver of innovation.

IT

IT Data Lake Architecture Data Architecture

Hadoop Architecture Explained-What it is and why it matters

ProjectPro

NOVEMBER 7, 2016

Role of the most recent component- Hadoop Ozone in Hadoop Application Architecture Implementation Hadoop Big Data Architecture Design – Best Practices to Follow Latest Version of Hadoop Architecture (Version 3.3.3) Case Studies of Hadoop Architecture Facebook Hadoop Architecture Yahoo Hadoop Architecture Last.FM

Hadoop

Hadoop Architecture IT Big Data

Data Governance: Concept, Models, Framework, Tools, and Implementation Best Practices

AltexSoft

MARCH 2, 2023

However, decentralized models may result in inconsistent and duplicate master data. There’s a centralized structure that provides a framework, which is then used by autonomous departments that own their data and metadata. A Data Architect typically focuses on creating and building data infrastructures inside an organization.

Data Governance

Data Governance Government Programming Healthcare

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Monte Carlo

JUNE 14, 2023

In this post, we will help you quickly level up your overall knowledge of data pipeline architecture by reviewing: Table of Contents What is data pipeline architecture? Why is data pipeline architecture important? Think of an automatically updating encyclopedia for your data platform.

Data Pipeline

Data Pipeline Architecture Data Lake Data Warehouse

Azure Data Engineer (DP-203) Certification Cost in 2023

Knowledge Hut

SEPTEMBER 29, 2023

By combining data from various structured and unstructured data systems into structures, Microsoft Azure Data Engineers will be able to create analytics solutions. Why Should You Get an Azure Data Engineer Certification? Data Scientist: To extract value from data, data scientists execute sophisticated analytics.

Certification

Certification Data Engineering Data Engineer Engineering

Data Observability Tools: Data Engineering’s Next Frontier

Monte Carlo

SEPTEMBER 16, 2020

This approach provides rich context that enables rapid triage and troubleshooting and effective communication with stakeholders impacted by data reliability issues. Data storage, orchestration, transformation, and BI tiers. Book a time to speak with us using the form below.

Software Engineering

Software Engineering Software Engineer Data Machine Learning

The Good and the Bad of Apache Spark Big Data Processing

AltexSoft

JULY 18, 2023

When transformations are applied to RDDs, Spark records the metadata to build up a DAG, which reflects the sequence of computations performed during the execution of the Spark job. Apache Mesos : A robust option that manages resources across entire data centers, making it suitable for large-scale, diverse workloads.

Big Data

Big Data Data Process Process Hadoop

17 Super Valuable Automated Data Lineage Use Cases With Examples

Monte Carlo

APRIL 20, 2023

I can surface ownership metadata and alert the relevant owners to make sure the appropriate changes are made so these breakages never happen. Data System Modernization And Team Reorganization The only constant in data engineering is change. This is where data lineage can help you scope and plan your migration waves.

Data Warehouse

Data Warehouse BI Data Government

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

phData: Data Engineering

SEPTEMBER 27, 2024

It enables advanced analytics, makes debugging your marketing automations easier, provides natural audit trails for compliance, and allows for flexible, evolving customer data models. So next time you’re designing your customer data architecture in your CDP, don’t just think about the current state of your customers.

Data

Data Raw Data Data Lake Architecture

The Ultimate Modern Data Stack Migration Guide

phData: Data Engineering

JULY 18, 2023

Zero Copy Cloning: Create multiple ‘copies’ of tables, schemas, or databases without actually copying the data. This noticeably saves time on copying and drastically reduces data storage costs. The Solution phData recognized the imperative to enhance both the technical and people aspects of the data platform.

Data Warehouse

Data Warehouse Pipeline-centric Government Data

“You Complete Me,” said Data Lineage to DataOps Observability.

DataKitchen

JANUARY 23, 2023

Data lineage is what’s in your database – which is not everything. Data lineage primarily focuses on tracking the movement and transformation of data within the database or data storage systems. Data lineage is static and often lags by weeks or months. Are problems with data tests? Did it fail?

Data Governance

Data Governance Government Data Pipeline Data

Data Engineering Digest

How Apache Iceberg Is Changing the Face of Data Lakes

Why Open Table Format Architecture is Essential for Modern Data Systems

Trending Sources

Hands-On Introduction to Delta Lake with (py)Spark

The State of Data Engineering in 2024: Key Insights and Trends

The Evolution of Table Formats

DataOps Architecture: 5 Key Components and How to Get Started

Data Pipeline Observability: A Model For Data Engineers

Data Vault Architecture, Data Quality Challenges, And How To Solve Them

Data Engineering Weekly #193

Data Scientist vs Data Engineer: Differences and Why You Need Both

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

Zero-ETL, ChatGPT, And The Future of Data Engineering

Data Lakehouse: Concept, Key Features, and Architecture Layers

Data Engineering Glossary

Unstructured Data: Examples, Tools, Techniques, and Best Practices

Modernizing Data Warehousing with Snowflake and Hybrid Data Vault

Snowflake Architecture and It's Fundamental Concepts

15+ Must Have Data Engineer Skills in 2023

100+ Big Data Interview Questions and Answers 2023

How to Become a Big Data Engineer in 2023

Data Governance: Framework, Tools, Principles, Benefits

What is a Data Mesh — and How Not to Mesh it Up

Hadoop Architecture Explained-What it is and why it matters

Data Governance: Concept, Models, Framework, Tools, and Implementation Best Practices

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Azure Data Engineer (DP-203) Certification Cost in 2023

Data Observability Tools: Data Engineering’s Next Frontier

The Good and the Bad of Apache Spark Big Data Processing

17 Super Valuable Automated Data Lineage Use Cases With Examples

The Evolution of Customer Data Modeling: From Static Profiles to Dynamic Customer 360

The Ultimate Modern Data Stack Migration Guide

“You Complete Me,” said Data Lineage to DataOps Observability.

Stay Connected