Data Governance, Data Lake and Metadata

Level Up Your Data Platform With Active Metadata

Data Engineering Podcast

JUNE 19, 2022

Summary Metadata is the lifeblood of your data platform, providing information about what is happening in your systems. In order to level up their value a new trend of active metadata is being implemented, allowing use cases like keeping BI reports up to date, auto-scaling your warehouses, and automated data governance.

Metadata

Metadata MongoDB MySQL Scala

Building A Data Governance Bridge Between Cloud And Datacenters For The Enterprise At Privacera

Data Engineering Podcast

MARCH 27, 2022

Summary Data governance is a practice that requires a high degree of flexibility and collaboration at the organizational and technical levels. The growing prominence of cloud and hybrid environments in data management adds additional stress to an already complex endeavor.

Data Governance

Data Governance Government Cloud Building

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform. What are the other systems that feed into and rely on the Trino/Iceberg service?

Data Lake

Data Lake High Quality Data Metadata Machine Learning

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

Implement better data governance by easily tracking and handling sensitive data The Lineage Visualization Interface (public preview) allows customers to easily track the flow of data and ML assets with an interactive interface in Snowsight.

Data Architecture

Data Architecture Architecture Data Lake Kafka

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Cloudera

OCTOBER 23, 2024

In August, we wrote about how in a future where distributed data architectures are inevitable, unifying and managing operational and business metadata is critical to successfully maximizing the value of data, analytics, and AI.

Metadata

Metadata BI Data Lake Business Intelligence

2024 Governance Trends for Data Leaders

phData: Data Engineering

NOVEMBER 1, 2024

In an effort to better understand where data governance is heading, we spoke with top executives from IT, healthcare, and finance to hear their thoughts on the biggest trends, key challenges, and what insights they would recommend. Get the Trendbook What is the Impact of Data Governance on GenAI?

Government

Government Data Governance Finance Metadata

Toward a Data Mesh (part 2) : Architecture & Technologies

François Nguyen

MARCH 22, 2021

TL;DR After setting up and organizing the teams, we are describing 4 topics to make data mesh a reality. How do we build data products ? How can we interoperate between the data domains ? Data As Code is a very strong choice : we do not want any UI because it is an heritage of the ETL period.

Technology

Technology Architecture Google Cloud Metadata

Pillars of Knowledge, Best Practices for Data Governance

Cloudera

AUGUST 4, 2021

And if data security tops IT concerns, data governance should be their second priority. Not only is it critical to protect data, but data governance is also the foundation for data-driven businesses and maximizing value from data analytics. But it’s still not easy. But it’s still not easy.

Data Governance

Data Governance Government Metadata Data Security

Addressing The Challenges Of Component Integration In Data Platform Architectures

Data Engineering Podcast

NOVEMBER 26, 2023

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Architecture

Architecture Data Lake High Quality Data SQL

Insights from the Gartner Data & Analytics Summit in London: Embracing Data Leadership and Strategy

Precisely

JUNE 24, 2024

The Precisely team recently had the privilege of hosting a luncheon at the Gartner Data & Analytics Summit in London. It was an engaging gathering of industry leaders from various sectors, who exchanged valuable insights into crucial aspects of data governance, strategy, and innovation.

Food

Food Data Analytics Pharmaceutical Consulting

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData: Data Engineering

SEPTEMBER 19, 2023

With the amount of data companies are using growing to unprecedented levels, organizations are grappling with the challenge of efficiently managing and deriving insights from these vast volumes of structured and unstructured data. What is a Data Lake? Consistency of data throughout the data lake.

Data Lake

Data Lake Process Metadata Data Warehouse

Data Lake vs. Data Warehouse vs. Data Lakehouse

Sync Computing

NOVEMBER 7, 2024

While data warehouses are still in use, they are limited in use-cases as they only support structured data. Data lakes add support for semi-structured and unstructured data, and data lakehouses add further flexibility with better governance in a true hybrid solution built from the ground-up.

Data Lake

Data Lake Data Warehouse Business Intelligence Unstructured Data

Top Data Lake Vendors (Quick Reference Guide)

Monte Carlo

APRIL 24, 2023

Data lakes are useful, flexible data storage repositories that enable many types of data to be stored in its rawest state. Traditionally, after being stored in a data lake, raw data was then often moved to various destinations like a data warehouse for further processing, analysis, and consumption.

Data Lake

Data Lake Google Cloud Data Warehouse AWS

5 Reasons Data Discovery Platforms Are Best For Data Lakes

Monte Carlo

APRIL 1, 2021

Over the past few years, data lakes have emerged as a must-have for the modern data stack. But while the technologies powering our access and analysis of data have matured, the mechanics behind understanding this data in a distributed environment have lagged behind. Data discovery tools and platforms can help.

Data Lake

Data Lake Data Warehouse Unstructured Data Government

A Primer On Enterprise Data Curation with Todd Walter - Episode 49

Data Engineering Podcast

SEPTEMBER 23, 2018

Using the metaphor of a museum curator carefully managing the precious resources on display and in the vaults, he discusses the various layers of an enterprise data strategy. In terms of infrastructure, what are the components of a modern data architecture and how has that changed over the years?

Data Lake

Data Lake Data Warehouse Data Architecture Architecture

Are Apache Iceberg Tables Right For Your Data Lake? 6 Reasons Why.

Monte Carlo

NOVEMBER 14, 2023

Databricks announced that Delta tables metadata will also be compatible with the Iceberg format, and Snowflake has also been moving aggressively to integrate with Iceberg. How Apache Iceberg tables structure metadata. Is your data lake a good fit for Iceberg? Limited data type support. Image courtesy of Dremio.

Data Lake

Data Lake Metadata Data Warehouse SQL

Data Mesh vs Data Lake: Pros, Cons, & How to Decide

Monte Carlo

JANUARY 23, 2023

When it comes to the data community, there’s always a debate broiling about something— and right now “data mesh vs data lake” is right at the top of that list. In this post we compare and contrast the data mesh vs data lake to illustrate the benefits of each and help discover what’s right for your data platform.

Data Lake

Data Lake Architecture Business Intelligence Unstructured Data

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Architecture

Architecture Metadata Kafka Government

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Monte Carlo

AUGUST 25, 2023

That’s why it’s essential for teams to choose the right architecture for the storage layer of their data stack. But, the options for data storage are evolving quickly. Different vendors offering data warehouses, data lakes, and now data lakehouses all offer their own distinct advantages and disadvantages for data teams to consider.

Data Lake

Data Lake Data Warehouse Unstructured Data Raw Data

3 Steps to AI-Ready Data

Monte Carlo

DECEMBER 12, 2024

It means defining that data by documenting relationships between creator and context (like customers and their orders), establishing clear business definitions (what exactly counts as an “active user”?), and maintaining metadata about data freshness, quality, and lineage (more on that in a moment).

Government

Government Data Cloud Datasets

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

AltexSoft

AUGUST 29, 2023

In 2010, a transformative concept took root in the realm of data storage and analytics — a data lake. The term was coined by James Dixon , Back-End Java, Data, and Business Intelligence Engineer, and it started a new era in how organizations could store, manage, and analyze their data. What is a data lake?

Data Lake

Data Lake Architecture IT Amazon Web Services

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

Cloudera

OCTOBER 11, 2021

The platform converges data cataloging, data ingestion, data profiling, data tagging, data discovery, and data exploration into a unified platform, driven by metadata. Modak Nabu automates repetitive tasks in the data preparation process and thus accelerates the data preparation by 4x.

Data Engineering

Data Engineering Data Engineer Cloud Engineering

The State of Data Engineering in 2024: Key Insights and Trends

Data Engineering Weekly

DECEMBER 16, 2024

Grab’s Metasense , Uber’s DataK9 , and Meta’s classification systems use AI to automatically categorize vast data sets, reducing manual efforts and improving accuracy. Beyond classification, organizations now use AI for automated metadata generation and data lineage tracking, creating more intelligent data infrastructures.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Snowflake’s Best-in-Class Enterprise Data Foundation Unlocks Interoperability with Open Data and Internal Collaboration

Snowflake

JUNE 4, 2024

This includes pipelines and transformations with Snowpark, Streams, Tasks and Dynamic Tables (public preview soon); extending AI and ML to Iceberg with Snowflake Cortex AI; performing storage maintenance with capabilities like automatic clustering and compaction; as well as securely collaborating on live data shares.

Government

Government Data Ingestion Data PostgreSQL

The Evolution of Table Formats

Monte Carlo

MAY 14, 2024

At its core, a table format is a sophisticated metadata layer that defines, organizes, and interprets multiple underlying data files. Table formats incorporate aspects like columns, rows, data types, and relationships, but can also include information about the structure of the data itself.

Data Lake

Data Lake Metadata Hadoop Data Governance

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Cloudera

MARCH 23, 2022

The table information (such as schema, partition) is stored as part of the metadata (manifest) file separately, making it easier for applications to quickly integrate with the tables and the storage formats of their choice. Enterprise grade security and data governance – centralized data authorization to lineage and auditing.

Metadata

Metadata Data Architecture BI Machine Learning

Modern Data Management Essentials: Exploring Data Fabric

Precisely

JULY 18, 2024

Data management recommendations and data products emerge dynamically from the fabric through automation, activation, and AI/ML analysis of metadata. As data grows exponentially, so do the complexities of managing and leveraging it to fuel AI and analytics.

Data Management

Data Management Management Metadata Database-centric

Top 5 Open Source Data Lineage Tools (With User Reviews)

Monte Carlo

MAY 18, 2023

Whether you’re a data scientist, data engineer, or business analyst, keeping track of your data’s origin, transformation, and movement is crucial for maintaining transparency, enforcing data governance, and ensuring data quality. The Spark data lake lineage example is helpful for context here.

Metadata

Metadata Data Lake Government Data Governance

Mainframe Optimization: 5 Best Practices to Implement Now

Precisely

JANUARY 25, 2024

There are three potential approaches to mainframe modernization: Data Replication creates a duplicate copy of mainframe data in a cloud data warehouse or data lake, enabling high-performance analytics virtually in real time, without negatively impacting mainframe performance. Best Practice 2. Best Practice 3.

Metadata

Metadata Relational Database Data Governance Government

Mainframe Data: Empowering Democratized Cloud Analytics

Precisely

OCTOBER 16, 2023

The cloud is especially well-suited to large-scale storage and big data analytics, due in part to its capacity to handle intensive computing requirements at scale. BI platforms and data warehouses have been replaced by modern data lakes and cloud analytics solutions. Secure data exchange takes on much greater importance.

Cloud

Cloud Cloud Computing Metadata Data Governance

Data Virtualization: Process, Components, Benefits, and Available Tools

AltexSoft

NOVEMBER 23, 2021

If the transformation step comes after loading (for example, when data is consolidated in a data lake or a data lakehouse ), the process is known as ELT. You can learn more about how such data pipelines are built in our video about data engineering. Abstraction layer.

Process

Process Data Lake Metadata Data Warehouse

Data Lakehouse: Concept, Key Features, and Architecture Layers

AltexSoft

NOVEMBER 10, 2021

The pun being obvious, there’s more to that than just a new term: Data lakehouses combine the best features of both data lakes and data warehouses and this post will explain this all. What is a data lakehouse? Data warehouse vs data lake vs data lakehouse: What’s the difference.

Architecture

Architecture Data Lake Data Warehouse Metadata

Data Lineage Now Available with Silectis Magpie Data Engineering Platform

Silectis

FEBRUARY 3, 2021

Below we’ll cover the basics of data lineage, why it is important, and how Magpie enables teams to trust their data with this important new release. What is Data Lineage? Data lineage refers to the entire lifecycle of a dataset from its sources of origin all the way to its current state.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

What is Data Fabric: Architecture, Principles, Advantages, and Ways to Implement

AltexSoft

AUGUST 22, 2022

A data fabric is an architecture design presented as an integration and orchestration layer built on top of multiple disjointed data sources like relational databases , data warehouses , data lakes, data marts , IoT , legacy systems, etc., to provide a unified view of all enterprise data.

Architecture

Architecture Metadata Data Lake Machine Learning

Modern Data Architectures Provide a Foundation for Innovation

Precisely

JUNE 6, 2023

The group kicked off the session by exchanging ideas about what it means to have a modern data architecture. Atif Salam noted that as recently as a year ago, the primary focus in many organizations was on ingesting data and building data lakes.

Data Architecture

Data Architecture Architecture Metadata Data Lake

Demystifying Modern Data Platforms

Cloudera

SEPTEMBER 15, 2022

Mark: The first element in the process is the link between the source data and the entry point into the data platform. At Ramsey International (RI), we refer to that layer in the architecture as the foundation, but others call it a staging area, raw zone, or even a source data lake. See below. .

Data Lake

Data Lake Analytics Application Cloud Storage Architecture

Data Architect: Role Description, Skills, Certifications and When to Hire

AltexSoft

FEBRUARY 11, 2023

It serves as a foundation for the entire data management strategy and consists of multiple components including data pipelines; , on-premises and cloud storage facilities – data lakes , data warehouses , data hubs ;, data streaming and Big Data analytics solutions ( Hadoop , Spark , Kafka , etc.);

Data Architect

Data Architect Certification Generalist Big Data

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Data Science Blog: Data Engineering

MAY 20, 2024

It offers full BI-Stack Automation, from source to data warehouse through to frontend. It supports a holistic data model, allowing for rapid prototyping of various models. It also supports a wide range of data warehouses, analytical databases, data lakes, frontends, and pipelines/ETL. pipelines, Azure Data Bricks.

Data Pipeline

Data Pipeline BI Data Lake Data Warehouse

Data Mesh vs. Data Fabric: Which One Is Right for You?

Ascend.io

APRIL 7, 2023

Data fabric is a centralized platform architecture originating from a curated metadata layer that sits on top of an organization’s data infrastructure. Every time a new data source is added, the metadata layer is updated to define how and when that data should be used. Increasing speed.

Metadata

Metadata Data Governance Datasets Government

Redefining Data Engineering: GenAI for Data Modernization and Innovation – RandomTrees

RandomTrees

FEBRUARY 6, 2024

Over the years, the field of data engineering has seen significant changes and paradigm shifts driven by the phenomenal growth of data and by major technological advances such as cloud computing, data lakes, distributed computing, containerization, serverless computing, machine learning, graph database, etc.

Data Engineer

Data Engineer Data Engineering Engineering Data Lake

Data Fabric: The Future of Data Architecture

Monte Carlo

FEBRUARY 21, 2023

In this post, we’ll discuss what, exactly, a data fabric is, how other companies have used it, and how you can build one at your company. Table of Contents What is a data fabric? A data fabric infuses data governance and security across all forms of data, no matter its origin or destination within the organization.

Data Architecture

Data Architecture Architecture Metadata Unstructured Data

Data Fabric: The Future of Data Architecture

Monte Carlo

FEBRUARY 21, 2023

In this post, we’ll discuss what, exactly, a data fabric is, how other companies have used it, and how you can build one at your company. Table of Contents What is a data fabric? A data fabric infuses data governance and security across all forms of data, no matter its origin or destination within the organization.

Data Architecture

Data Architecture Architecture Metadata Unstructured Data

What is Data Hub: Purpose, Architecture Patterns, and Existing Solutions Overview

AltexSoft

SEPTEMBER 23, 2021

One of the innovative ways to address this problem is to build a data hub — a platform that unites all your information sources under a single umbrella. This article explains the main concepts of a data hub, its architecture, and how it differs from data warehouses and data lakes. What is Data Hub?

Architecture

Architecture Data Lake Unstructured Data Data Warehouse

What is Data Lineage?

Databand.ai

JULY 28, 2022

In this very simplified example, we can see an ELT: Some pipeline tasks, probably running by Airflow , are scraping external data sources and collecting data from there. Those tasks are saving the extracted data in the data lake (or warehouse or lakehouse). This technique focuses directly on the data (vs.

Metadata

Metadata Data Lake Datasets Data Warehouse

Level Up Your Data Platform With Active Metadata

Building A Data Governance Bridge Between Cloud And Datacenters For The Enterprise At Privacera

Webinars

Trending Sources

Being Data Driven At Stripe With Trino And Iceberg

Webinars

Simplifying Data Architecture and Security to Accelerate Value

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

2024 Governance Trends for Data Leaders

Toward a Data Mesh (part 2) : Architecture & Technologies

Pillars of Knowledge, Best Practices for Data Governance

Addressing The Challenges Of Component Integration In Data Platform Architectures

Insights from the Gartner Data & Analytics Summit in London: Embracing Data Leadership and Strategy

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

Data Lake vs. Data Warehouse vs. Data Lakehouse

Top Data Lake Vendors (Quick Reference Guide)

5 Reasons Data Discovery Platforms Are Best For Data Lakes

A Primer On Enterprise Data Curation with Todd Walter - Episode 49

Are Apache Iceberg Tables Right For Your Data Lake? 6 Reasons Why.

Data Mesh vs Data Lake: Pros, Cons, & How to Decide

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

3 Steps to AI-Ready Data

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

Accelerate Your Data Mesh in the Cloud with Cloudera Data Engineering and Modak NabuTM

The State of Data Engineering in 2024: Key Insights and Trends

Snowflake’s Best-in-Class Enterprise Data Foundation Unlocks Interoperability with Open Data and Internal Collaboration

The Evolution of Table Formats

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Modern Data Management Essentials: Exploring Data Fabric

Top 5 Open Source Data Lineage Tools (With User Reviews)

Mainframe Optimization: 5 Best Practices to Implement Now

Mainframe Data: Empowering Democratized Cloud Analytics

Data Virtualization: Process, Components, Benefits, and Available Tools

Data Lakehouse: Concept, Key Features, and Architecture Layers

Data Lineage Now Available with Silectis Magpie Data Engineering Platform

What is Data Fabric: Architecture, Principles, Advantages, and Ways to Implement

Modern Data Architectures Provide a Foundation for Innovation

Demystifying Modern Data Platforms

Data Architect: Role Description, Skills, Certifications and When to Hire

CI/CD for Data Pipelines: A Game-Changer with AnalyticsCreator

Data Mesh vs. Data Fabric: Which One Is Right for You?

Redefining Data Engineering: GenAI for Data Modernization and Innovation – RandomTrees

Data Fabric: The Future of Data Architecture

Data Fabric: The Future of Data Architecture

What is Data Hub: Purpose, Architecture Patterns, and Existing Solutions Overview

What is Data Lineage?

Stay Connected