Accessibility, Metadata and Structured Data

Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents

dbt Developer Hub

APRIL 20, 2025

dbt is the standard for creating governed, trustworthy datasets on top of your structured data. We expect that over the coming years, structured data is going to become heavily integrated into AI workflows and that dbt will play a key role in building and provisioning this data. What is MCP?

Structured Data

Structured Data SQL BI Project

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Traditional databases excelled at structured data and transactional workloads but struggled with performance at scale as data volumes grew.

Data Lake

Data Lake Cloud Storage Metadata Data Warehouse

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

Snowflake

DECEMBER 4, 2024

The next evolution in data is making it AI ready. For years, an essential tenet of digital transformation has been to make data accessible, to break down silos so that the enterprise can draw value from all of its data. For this reason, internal-facing AI will continue to be the focus for the next couple of years.

Unstructured Data

Unstructured Data Data Lake Deep Learning Structured Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Your Enterprise Data Needs an Agent

Snowflake

FEBRUARY 12, 2025

Agents need to access an organization's ever-growing structured and unstructured data to be effective and reliable. As data connections expand, managing access controls and efficiently retrieving accurate informationwhile maintaining strict privacy protocolsbecomes increasingly complex.

Unstructured Data

Unstructured Data Government SQL Structured Data

Simplifying Multimodal Data Analysis with Snowflake Cortex AI

Snowflake

APRIL 16, 2025

Bridging the data gap In todays data-driven landscape, organizations can gain a significant competitive advantage by effortlessly combining insights from unstructured sources like text, image, audio, and video with structured data are gaining a significant competitive advantage.

Data Analysis

Data Analysis Unstructured Data Manufacturing Retail

Is Apache Iceberg the New Hadoop? Navigating the Complexities of Modern Data Lakehouses

Data Engineering Weekly

MARCH 5, 2025

This ecosystem includes: Catalogs: Services that manage metadata about Iceberg tables (e.g., Compute Engines: Tools that query and process data stored in Iceberg tables (e.g., Maintenance Processes: Operations that optimize Iceberg tables, such as compacting small files and managing metadata. Trino, Spark, Snowflake, DuckDB).

Hadoop

Hadoop Metadata Data Ingestion Data Governance

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

To give customers flexibility for how they fit Snowflake into their architecture, Iceberg Tables can be configured to use either Snowflake or an external service like AWS Glue as the tables’s catalog to track metadata, with an easy one-line SQL command to convert to Snowflake in a metadata-only operation.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Architecture

Architecture Metadata Kafka Government

The Future Is Hybrid Data, Embrace It

Cloudera

JUNE 7, 2022

We live in a hybrid data world. In the past decade, the amount of structured data created, captured, copied, and consumed globally has grown from less than 1 ZB in 2011 to nearly 14 ZB in 2020. Impressive, but dwarfed by the amount of unstructured data, cloud data, and machine data – another 50 ZB.

IT

IT Unstructured Data Data Architecture Government

Logarithm: A logging engine for AI training workflows and services

Engineering at Meta

MARCH 18, 2024

Users can query using regular expressions on log lines, arbitrary metadata fields attached to logs, and across log files of hosts and services. Logarithm’s data model Logarithm represents logs as a named log stream of (host-local) time-ordered sequences of immutable unstructured text, corresponding to a single log file. in PyTorch).

Engineering

Engineering Metadata Architecture Designing

Snowflake Announces State-of-the-Art AI to Talk to your Data, Securely Customize LLMs and Streamline Model Operations

Snowflake

JUNE 4, 2024

Studio is accessible within Snowsight to access interactive interfaces for teams to quickly combine multiple models with their data and compare results to accelerate deployment to applications in production. For details on pricing and what models are supported, check out more details in our documentation.

Data Security

Data Security Machine Learning Unstructured Data SQL

A Flexible and Efficient Storage System for Diverse Workloads

Cloudera

SEPTEMBER 15, 2022

Today’s platform owners, business owners, data developers, analysts, and engineers create new apps on the Cloudera Data Platform and they must decide where and how to store that data. Structured data (such as name, date, ID, and so on) will be stored in regular SQL databases like Hive or Impala databases.

Systems

Systems Hadoop Metadata Telecommunication

Hadoop vs Spark: Main Big Data Tools Explained

AltexSoft

JUNE 7, 2021

As a result, a Big Data analytics task is split up, with each machine performing its own little part in parallel. Hadoop hides away the complexities of distributed computing, offering an abstracted API to get direct access to the system’s functionality and its benefits — such as. HDFS master-slave structure. Data access options.

Big Data Tools

Big Data Tools Hadoop Big Data Database-centric

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

Cloudera

APRIL 1, 2024

We scored the highest in hybrid, intercloud, and multi-cloud capabilities because we are the only vendor in the market with a true hybrid data platform that can run on any cloud including private cloud to deliver a seamless, unified experience for all data, wherever it lies.

Cloud

Cloud Unstructured Data Metadata Government

Cleaning And Curating Open Data For Archaeology

Data Engineering Podcast

FEBRUARY 3, 2019

So I decided to focus my energies in research data management. Open Context is an open access data publishing service for archaeology. It started because we need better ways of dissminating structured data and digital media than is possible with conventional articles, books and reports.

Digital Media

Digital Media Media PostgreSQL Datasets

Implementing the Netflix Media Database

Netflix Tech

DECEMBER 14, 2018

A fundamental requirement for any lasting data system is that it should scale along with the growth of the business applications it wishes to serve. NMDB is built to be a highly scalable, multi-tenant, media metadata system that can serve a high volume of write/read throughput as well as support near real-time queries.

Media

Media Database Metadata Data Schemas

Taking Charge of Tables: Introducing OpenHouse for Big Data Management

LinkedIn Engineering

JULY 19, 2023

Open source data lakehouse deployments are built on the foundations of compute engines (like Apache Spark, Trino, Apache Flink), distributed storage (HDFS, cloud blob stores), and metadata catalogs / table formats (like Apache Iceberg, Delta, Hudi, Apache Hive Metastore). Tables are governed as per agreed upon company standards.

Big Data

Big Data Data Management Management Metadata

Data Vault on Snowflake: Feature Engineering and Business Vault

Snowflake

MARCH 30, 2023

Automation , because the same loader patterns are used for both and the same metadata tags are expected from both, meaning the applied date timestamp in the business vault will match up with the raw date timestamp where it came from. These methods can be applied to structured and semi-structured data as well.

Engineering

Engineering Raw Data Data Science Machine Learning

Migrate Hive data from CDH to CDP public cloud

Cloudera

JUNE 25, 2021

Using easy-to-define policies, Replication Manager solves one of the biggest barriers for the customers in their cloud adoption journey by allowing them to move both tables/structured data and files/unstructured data to the CDP cloud of their choice easily. Specification of access conditions for specific users and groups.

Cloud

Cloud Data Lake Cloud Storage Metadata

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Netflix Tech

MARCH 5, 2019

Challenges & Opportunities in the Infra Data Space Security Events Platform for Anomaly Detection How can we develop a complex event processing system to ingest semi-structured data predicated on schema contracts from hundreds of sources and transform it into event streams of structured data for downstream analysis?

Cloud

Cloud Building Amazon Web Services Metadata

Unlocking Effective Data Governance with Unity Catalog – Data Bricks

RandomTrees

SEPTEMBER 17, 2024

Understanding the Object Hierarchy in Metastore Identifying the Admin Roles in Unity Catalog Unveiling Data Lineage in Unity Catalog: Capture and Visualize Simplifying Data Access using Delta Sharing 1. Improved Data Discovery The tagging and documentation features in Unity Catalog facilitate better data discovery.

Data Governance

Data Governance Government Metadata Machine Learning

Key considerations when making a decision on a Cloud Data Warehouse

Cloudera

MAY 17, 2021

Modernizing your data warehousing experience with the cloud means moving from dedicated, on-premises hardware focused on traditional relational analytics on structured data to a modern platform. Beyond there being a number of choices each with very different strengths, the parameters for your decision have also changed.

Data Warehouse

Data Warehouse Cloud Government Metadata

A Definitive Guide to Using BigQuery Efficiently

Towards Data Science

MARCH 5, 2024

The storage system is using Capacitor, a proprietary columnar storage format by Google for semi-structured data and the file system underneath is Colossus, the distributed file system by Google. Choosing the right model depends on your data access patterns and compression capabilities. So, which model should you choose?

Bytes

Bytes Google Cloud Cloud Storage Utilities

A Major Step Forward For Generative AI and Vector Database Observability

Monte Carlo

FEBRUARY 12, 2024

To differentiate and expand the usefulness of these models, organizations must augment them with first-party data – typically via a process called RAG (retrieval augmented generation). Today, this first-party data mostly lives in two types of data repositories.

Database

Database Unstructured Data Data Pipeline Metadata

Empower Your Cyber Defenders with Real-Time Analytics

Cloudera

NOVEMBER 15, 2024

Here’s how Cloudera makes it possible: One unified system: Cloudera’s open data lakehouse consolidates all critical log data into one system. Whether they need to query data from today or from years past, the system scales up or down to meet their needs.

Metadata

Metadata Unstructured Data Data Lake Government

Data Lake vs. Data Warehouse vs. Data Lakehouse

Sync Computing

NOVEMBER 7, 2024

As the magnitude and role of data in society has changed, so have the tools for dealing with it. While a +3500 year data retention capability for data stored on clay tablets is impressive, the access latency and forward compatibility of clay tablets fall a little short. Book a Demo!

Data Lake

Data Lake Data Warehouse Business Intelligence Unstructured Data

What is Data Fabric: Architecture, Principles, Advantages, and Ways to Implement

AltexSoft

AUGUST 22, 2022

So, instead of replacing or rebuilding the existing infrastructure, you add a new, ML-powered abstraction layer on top of the underlying data sources, enabling various users to access and manage the information they need without duplication. Data fabric architecture example. Unified data access. Data and metadata.

Architecture

Architecture Metadata Data Lake Machine Learning

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData: Data Engineering

SEPTEMBER 19, 2023

Data Governance and Security By defining data models, organizations can establish policies, access controls, and security measures to protect sensitive data. Data models can also facilitate compliance with regulations and ensure proper data handling and protection. Want to learn more about data governance?

Data Lake

Data Lake Process Metadata Data Warehouse

Using Graph Processing for Kafka Stream Visualizations

Confluent

AUGUST 29, 2019

In an identity/access management application, it’s the relationships between roles and their privileges that matters most. If you’ve found yourself needing to write very large JOIN statements or dealing with long paths through your data, then you are probably facing a graph problem. Relationships act like verbs in your graph.

Kafka

Kafka Process Algorithm Cloud

Data Lakehouse: Concept, Key Features, and Architecture Layers

AltexSoft

NOVEMBER 10, 2021

At the same time, it brings structure to data and empowers data management features similar to those in data warehouses by implementing the metadata layer on top of the store. Traditional data warehouse platform architecture. Key features of a data lakehouse. Unstructured and streaming data support.

Architecture

Architecture Data Lake Data Warehouse Metadata

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Monte Carlo

AUGUST 25, 2023

Understanding data warehouses A data warehouse is a consolidated storage unit and processing hub for your data. Teams using a data warehouse usually leverage SQL queries for analytics use cases. This same structure aids in maintaining data quality and simplifies how users interact with and understand the data.

Data Lake

Data Lake Data Warehouse Unstructured Data Raw Data

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

Netflix Tech

OCTOBER 27, 2020

Netflix Scheduler is built on top of Meson which is a general purpose workflow orchestration and scheduling framework to execute and manage the lifecycle of the data workflow. Bulldozer makes data warehouse tables more accessible to different microservices and reduces each individual team’s burden to build their own solutions.

Data Warehouse

Data Warehouse Datasets Data Big Data

The Symbiotic Relationship Between AI and Data Engineering

Ascend.io

FEBRUARY 28, 2024

Read More: AI Data Platform: Key Requirements for Fueling AI Initiatives How Data Engineering Enables AI Data engineering is the backbone of AI’s potential to transform industries , offering the essential infrastructure that powers AI algorithms.

Data Engineering

Data Engineering Data Engineer Engineering Metadata

Comparing Performance of Big Data File Formats: A Practical Guide

Towards Data Science

JANUARY 17, 2024

These are key in nearly all data pipelines, allowing for efficient data storage and easier querying and information extraction. They are designed to handle the challenges of big data like size, speed, and structure. Data engineers often face a plethora of choices.

Big Data

Big Data Data Data Storage SQL

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

AltexSoft

AUGUST 29, 2023

Instead of relying on traditional hierarchical structures and predefined schemas, as in the case of data warehouses, a data lake utilizes a flat architecture. This structure is made efficient by data engineering practices that include object storage. Watch our video explaining how data engineering works.

Data Lake

Data Lake Architecture IT Amazon Web Services

Mastering the Art of ETL on AWS for Data Management

ProjectPro

FEBRUARY 16, 2023

Data integration with ETL has evolved from structured data stores with high computing costs to natural state storage with read operation alterations thanks to the agility of the cloud. Data integration with ETL has changed in the last three decades. This ensures that companies' data is always protected and secure.

AWS

AWS Data Management ETL Tools Management

Unstructured Data: Examples, Tools, Techniques, and Best Practices

AltexSoft

MAY 12, 2023

What is unstructured data? Definition and examples Unstructured data , in its simplest form, refers to any data that does not have a pre-defined structure or organization. It can come in different forms, such as text documents, emails, images, videos, social media posts, sensor data, etc.

Unstructured Data

Unstructured Data NoSQL Hadoop Data Lake

5 Reasons Data Discovery Platforms Are Best For Data Lakes

Monte Carlo

APRIL 1, 2021

Over the past few years, data lakes have emerged as a must-have for the modern data stack. But while the technologies powering our access and analysis of data have matured, the mechanics behind understanding this data in a distributed environment have lagged behind. Who has access to it?

Data Lake

Data Lake Data Warehouse Unstructured Data Government

Accelerate your Data Migration to Snowflake

RandomTrees

SEPTEMBER 6, 2020

A combination of structured and semi structured data can be used for analysis and loaded into the cloud database without the need of transforming into a fixed relational scheme first. This stage handles all the aspects of data storage like organization, file size, structure, compression, metadata, statistics.

Cloud Storage

Cloud Storage Data Ingestion Data Cleanse Data Warehouse

Top Data Lake Vendors (Quick Reference Guide)

Monte Carlo

APRIL 24, 2023

Traditionally, after being stored in a data lake, raw data was then often moved to various destinations like a data warehouse for further processing, analysis, and consumption. Databricks Data Catalog and AWS Lake Formation are examples in this vein. AWS is one of the most popular data lake vendors.

Data Lake

Data Lake Google Cloud Data Warehouse AWS

Data Lake vs Data Warehouse - Working Together in the Cloud

ProjectPro

AUGUST 11, 2021

This means that a data warehouse is a collection of technologies and components that are used to store data for some strategic use. Data is collected and stored in data warehouses from multiple sources to provide insights into business data. Data from data warehouses is queried using SQL.

Data Lake

Data Lake Data Warehouse Cloud Hadoop

4 Ways Automation Helps Data Engineering Teams

Monte Carlo

JULY 13, 2023

Data-driven organizations generate, collect, and store vast amounts of data. To effectively manage and analyze this data, data engineering teams must navigate a wide range of challenges, including data access, security, compliance, and data observability. Automating self-service access.

Data Engineering

Data Engineering Data Engineer Engineering Data Governance

Difference between Pig and Hive-The Two Key Components of Hadoop Ecosystem

ProjectPro

OCTOBER 15, 2014

Does not have a dedicated metadata database. Get FREE Access to Data Analytics Example Codes for Data Cleaning, Data Munging, and Data Visualization Hadoop technology is the buzz word these days but most of the IT professionals still are not aware of the key components that comprise the Hadoop Ecosystem.

Hadoop

Hadoop Java Unstructured Data SQL

How we manage our 1200 incident playbooks

Zalando Engineering

JANUARY 30, 2023

This structure allows all stakeholders involved in incident response to clearly understand the executed actions and target state of the system to expect. Lastly, by having playbooks in a single location, our Incident Responders and Incident Commanders have easy access to all available emergency procedures in a consistent format.

Management

Management Metadata Software Engineer Software Engineering

Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents

How Apache Iceberg Is Changing the Face of Data Lakes

Webinars

Trending Sources

AI and Data Predictions 2025: Strategies to Realize the Promise of AI

Webinars

Your Enterprise Data Needs an Agent

Simplifying Multimodal Data Analysis with Snowflake Cortex AI

Is Apache Iceberg the New Hadoop? Navigating the Complexities of Modern Data Lakehouses

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

How Cloudera Data Flow Enables Successful Data Mesh Architectures

The Future Is Hybrid Data, Embrace It

Logarithm: A logging engine for AI training workflows and services

Snowflake Announces State-of-the-Art AI to Talk to your Data, Securely Customize LLMs and Streamline Model Operations

A Flexible and Efficient Storage System for Diverse Workloads

Hadoop vs Spark: Main Big Data Tools Explained

Cloudera Named a Visionary in the Gartner MQ for Cloud DBMS

Cleaning And Curating Open Data For Archaeology

Implementing the Netflix Media Database

Taking Charge of Tables: Introducing OpenHouse for Big Data Management

Data Vault on Snowflake: Feature Engineering and Business Vault

Migrate Hive data from CDH to CDP public cloud

How Data Inspires Building a Scalable, Resilient and Secure Cloud Infrastructure At Netflix

Unlocking Effective Data Governance with Unity Catalog – Data Bricks

Key considerations when making a decision on a Cloud Data Warehouse

A Definitive Guide to Using BigQuery Efficiently

A Major Step Forward For Generative AI and Vector Database Observability

Empower Your Cyber Defenders with Real-Time Analytics

Data Lake vs. Data Warehouse vs. Data Lakehouse

What is Data Fabric: Architecture, Principles, Advantages, and Ways to Implement

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

Using Graph Processing for Kafka Stream Visualizations

Data Lakehouse: Concept, Key Features, and Architecture Layers

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Bulldozer: Batch Data Moving from Data Warehouse to Online Key-Value Stores

The Symbiotic Relationship Between AI and Data Engineering

Comparing Performance of Big Data File Formats: A Practical Guide

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

Mastering the Art of ETL on AWS for Data Management

Unstructured Data: Examples, Tools, Techniques, and Best Practices

5 Reasons Data Discovery Platforms Are Best For Data Lakes

Accelerate your Data Migration to Snowflake

Top Data Lake Vendors (Quick Reference Guide)

Data Lake vs Data Warehouse - Working Together in the Cloud

4 Ways Automation Helps Data Engineering Teams

Difference between Pig and Hive-The Two Key Components of Hadoop Ecosystem

How we manage our 1200 incident playbooks

Stay Connected