Accessibility, Data Storage and Metadata

How Apache Iceberg Is Changing the Face of Data Lakes

Snowflake

APRIL 2, 2025

Data storage has been evolving, from databases to data warehouses and expansive data lakes, with each architecture responding to different business and data needs. Traditional databases excelled at structured data and transactional workloads but struggled with performance at scale as data volumes grew.

Data Lake

Data Lake Metadata Cloud Storage Data Warehouse

Turbocharging Atlas: How we reduced server initialization time to less than 2 minutes

ThoughtSpot

NOVEMBER 5, 2024

In the realm of modern analytics platforms, where rapid and efficient processing of large datasets is essential, swift metadata access and management are critical for optimal system performance. Any delays in metadata retrieval can negatively impact user experience, resulting in decreased productivity and satisfaction.

Metadata

Metadata PostgreSQL Java Database

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

The world we live in today presents larger datasets, more complex data, and diverse needs, all of which call for efficient, scalable data systems. Though basic and easy to use, traditional table storage formats struggle to keep up. Track data files within the table along with their column statistics.

Architecture

Architecture Systems Data Lake Google Cloud

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Snowflake and the Pursuit Of Precision Medicine

Snowflake

NOVEMBER 29, 2023

In medicine, lower sequencing costs and improved clinical access to NGS technology has been shown to increase diagnostic yield for a range of diseases, from relatively well-understood Mendelian disorders, including muscular dystrophy and epilepsy , to rare diseases such as Alagille syndrome.

Metadata

Metadata Healthcare Medical Data Storage

On-Premise vs Cloud: Where Does the Future of Data Storage Lie?

Monte Carlo

AUGUST 15, 2023

Regardless, the important thing to understand is that the modern data stack doesn’t just allow you to store and process bigger data faster, it allows you to handle data fundamentally differently to accomplish new goals and extract different types of value. Export external data sharing Copying and exporting data is the worst.

Data Storage

Data Storage Cloud Metadata Machine Learning

Iceberg Is An Implementation Detail

dbt Developer Hub

OCTOBER 3, 2024

If you haven’t paid attention to the data industry news cycle, you might have missed the recent excitement centered around an open table format called Apache Iceberg™. These formats are changing the way data is stored and metadata accessed. Storage systems should just work.” “We But not for the reasons you think.

Metadata

Metadata Data Lake Data Storage Accessible

Reflections On Designing A Data Platform From Scratch

Data Engineering Podcast

FEBRUARY 27, 2022

Batch or streaming (acceptable latencies) Data storage (lake or warehouse) How is the data going to be used? The warehouse (Bigquery, Snowflake, Redshift) has become the focal point of the "modern data stack" Data orchestration Who will be managing the workflow logic?

Designing

Designing Metadata Data Lake Relational Database

Introducing Netflix’s Key-Value Data Abstraction Layer

Netflix Tech

SEPTEMBER 18, 2024

Second, developers had to constantly re-learn new data modeling practices and common yet critical data access patterns. To overcome these challenges, we developed a holistic approach that builds upon our Data Gateway Platform. Each namespace may use different backends: Cassandra, EVCache, or combinations of multiple.

Bytes

Bytes Metadata Database Data

Top 7 Mobile Security Threats and Prevention

Edureka

MARCH 20, 2025

These scams often target passwords, banking details, or sensitive organizational data by posing as a boss or coworker requesting confidential information. These apps may silently harvest personal data or metadata and, in some cases, install malware onto the device.

Banking

Banking Entertainment Media Transportation

Introducing Vector Search on Rockset: How to run semantic search with OpenAI and Rockset

Rockset

APRIL 18, 2023

Vector search has seen an explosion in popularity due to improvements in accuracy and broadened accessibility to the models used to generate embeddings. Rockset offers a number of benefits along with vector search support to create relevant experiences: Real-Time Data: Ingest and index incoming data in real-time with support for updates.

Unstructured Data

Unstructured Data Metadata Machine Learning SQL

Hadoop vs Spark: Main Big Data Tools Explained

AltexSoft

JUNE 7, 2021

As a result, a Big Data analytics task is split up, with each machine performing its own little part in parallel. Hadoop hides away the complexities of distributed computing, offering an abstracted API to get direct access to the system’s functionality and its benefits — such as. High latency of data access. scalability.

Big Data Tools

Big Data Tools Hadoop Big Data Database-centric

A Flexible and Efficient Storage System for Diverse Workloads

Cloudera

SEPTEMBER 15, 2022

Structured data (such as name, date, ID, and so on) will be stored in regular SQL databases like Hive or Impala databases. There are also newer AI/ML applications that need data storage, optimized for unstructured data using developer friendly paradigms like Python Boto API. Diversity of workloads. LEGACY Bucket.

Systems

Systems Hadoop Metadata Telecommunication

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

ProjectPro

FEBRUARY 8, 2023

When Glue receives a trigger, it collects the data, transforms it using code that Glue generates automatically, and then loads it into Amazon S3 or Amazon Redshift. Then, Glue writes the job's metadata into the embedded AWS Glue Data Catalog. being data exactly matches the classifier, and 0.0 Why Use AWS Glue?

AWS

AWS Scala Metadata Data Lake

Getting Started with Cloudera Data Platform Operational Database (COD)

Cloudera

NOVEMBER 23, 2021

Apache Knox Gateway provides perimeter security so that the enterprise can confidently extend access to new users. Security and governance policies are set once and applied across all data and workloads. The SDX layer of CDP leverages the full spectrum of Atlas to automatically track and control all data assets. Apache HBase.

Database

Database Non-relational Database NoSQL Government

dbt Core, Snowflake, and GitHub Actions: pet project for Data Engineers

Towards Data Science

DECEMBER 1, 2023

Storage — Snowflake Snowflake, a cloud-based data warehouse tailored for analytical needs, will serve as our data storage solution. The data volume we will deal with is small, so we will not try to overkill with data partitioning, time travel, Snowpark, and other Snowflake advanced capabilities.

Data Engineering

Data Engineering Data Engineer Project Engineering

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Cloudera

JANUARY 21, 2021

With CDW, as an integrated service of CDP, your line of business gets immediate resources needed for faster application launches and expedited data access, all while protecting the company’s multi-year investment in centralized data management, security, and governance. One IT-step away from a life outside the shadows.

IT

IT Data Lake Data Warehouse Cloud Storage

Implementing the Netflix Media Database

Netflix Tech

DECEMBER 14, 2018

A fundamental requirement for any lasting data system is that it should scale along with the growth of the business applications it wishes to serve. NMDB is built to be a highly scalable, multi-tenant, media metadata system that can serve a high volume of write/read throughput as well as support near real-time queries.

Media

Media Database Metadata Data Schemas

Taking Charge of Tables: Introducing OpenHouse for Big Data Management

LinkedIn Engineering

JULY 19, 2023

Open source data lakehouse deployments are built on the foundations of compute engines (like Apache Spark, Trino, Apache Flink), distributed storage (HDFS, cloud blob stores), and metadata catalogs / table formats (like Apache Iceberg, Delta, Hudi, Apache Hive Metastore). Tables are governed as per agreed upon company standards.

Big Data

Big Data Data Management Management Metadata

The State of Data Engineering in 2024: Key Insights and Trends

Data Engineering Weekly

DECEMBER 16, 2024

Organizations across industries moved beyond experimental phases to implement production-ready GenAI solutions within their data infrastructure. Natural Language Interfaces Companies like Uber, Pinterest, and Intuit adopted sophisticated text-to-SQL interfaces, democratizing data access across their organizations.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Unlocking Effective Data Governance with Unity Catalog – Data Bricks

RandomTrees

SEPTEMBER 17, 2024

Understanding the Object Hierarchy in Metastore Identifying the Admin Roles in Unity Catalog Unveiling Data Lineage in Unity Catalog: Capture and Visualize Simplifying Data Access using Delta Sharing 1. Improved Data Discovery The tagging and documentation features in Unity Catalog facilitate better data discovery.

Data Governance

Data Governance Government Metadata Machine Learning

Comparing Performance of Big Data File Formats: A Practical Guide

Towards Data Science

JANUARY 17, 2024

Parquet vs ORC vs Avro vs Delta Lake Photo by Viktor Talashuk on Unsplash The big data world is full of various storage systems, heavily influenced by different file formats. These are key in nearly all data pipelines, allowing for efficient data storage and easier querying and information extraction.

Big Data

Big Data Data Data Storage SQL

Mainframe Optimization: 5 Best Practices to Implement Now

Precisely

JANUARY 25, 2024

Today’s cloud systems excel at high-volume data storage, powerful analytics, AI, and software & systems development. You must carefully consider various mainframe functions, including security, system logs, metadata, and COBOL copybooks when moving to the new cloud platform. Best Practice 2. Best Practice 3.

Metadata

Metadata Relational Database Data Governance Government

Apache Ozone – A High Performance Object Store for CDP Private Cloud

Cloudera

OCTOBER 15, 2021

With FSO, Apache Ozone guarantees atomic directory operations, and renaming or deleting a directory is a simple metadata operation even if the directory has a large set of sub-paths (directories/files) within it. In fact, this gives Apache Ozone a significant performance advantage over other object stores in the data analytics ecosystem.

Cloud

Cloud Hadoop Data Analytics Metadata

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

Cloudera

AUGUST 31, 2021

While this “data tsunami” may pose a new set of challenges, it also opens up opportunities for a wide variety of high value business intelligence (BI) and other analytics use cases that most companies are eager to deploy. . Traditional data warehouse vendors may have maturity in data storage, modeling, and high-performance analysis.

Data Warehouse

Data Warehouse Database-centric Metadata Cloud

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

phData: Data Engineering

SEPTEMBER 19, 2023

This blog will guide you through the best data modeling methodologies and processes for your data lake, helping you make informed decisions and optimize your data management practices. What is a Data Lake? What are Data Modeling Methodologies, and Why Are They Important for a Data Lake?

Data Lake

Data Lake Process Metadata Data Warehouse

The Evolution of Table Formats

Monte Carlo

MAY 14, 2024

Depending on the quantity of data flowing through an organization’s pipeline — or the format the data typically takes — the right modern table format can help to make workflows more efficient, increase access, extend functionality, and even offer new opportunities to activate your unstructured data.

Data Lake

Data Lake Metadata Hadoop Data Governance

Data Engineering Weekly #164

Data Engineering Weekly

MARCH 24, 2024

The APIs support emitting unstructured log lines and typed metadata key-value pairs (per line). The extracted key-value pairs are written to the line’s metadata. Query clusters support interactive and bulk queries on one or more log streams with predicate filters on log text and metadata.

Data Engineering

Data Engineering Data Engineer Engineering Metadata

Pioneering Data Observability:Data, Code, Infrastructure, & AI

Towards Data Science

AUGUST 8, 2023

Where we started In the mid-2010s, data teams began migrating to the cloud and adopting data storage and compute technologies — Redshift, Snowflake, Databricks, GCP, oh my! — to The cloud made data faster to process, easier to transform and far more accessible. to meet the growing demand for analytics.

Coding

Coding Data Software Engineering Software Engineer

5 Layers of Data Lakehouse Architecture Explained

Monte Carlo

JANUARY 5, 2024

This architecture format consists of several key layers that are essential to helping an organization run fast analytics on structured and unstructured data. Table of Contents What is data lakehouse architecture? The 5 key layers of data lakehouse architecture 1. Storage layer 3. Metadata layer 4. API layer 5.

Architecture

Architecture Data Lake Metadata Unstructured Data

Data Lakehouse Architecture Explained: 5 Layers

Monte Carlo

JANUARY 5, 2024

This architecture format consists of several key layers that are essential to helping an organization run fast analytics on structured and unstructured data. Table of Contents What is data lakehouse architecture? The 5 key layers of data lakehouse architecture 1. Storage layer 3. Metadata layer 4. API layer 5.

Architecture

Architecture Data Lake Metadata Unstructured Data

Data Integrity Trends for 2024

Precisely

FEBRUARY 9, 2024

But few organizations have the data integrity required to power meaningful outcomes. Organizations must focus on breaking down silos and integrating all relevant, critical data into on-premises or cloud storage for AI model training and inference.

Data Integration

Data Integration Government Data Metadata

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

AltexSoft

AUGUST 29, 2023

In 2010, a transformative concept took root in the realm of data storage and analytics — a data lake. The term was coined by James Dixon , Back-End Java, Data, and Business Intelligence Engineer, and it started a new era in how organizations could store, manage, and analyze their data.

Data Lake

Data Lake Architecture IT Amazon Web Services

Costwiz: Saving cost for LinkedIn enterprise on Azure

LinkedIn Engineering

JULY 27, 2023

It’s now much easier to manage large infra requirements that have traditionally demanded an amalgamation of teams like DBA, Infra-SRE, Onprem-SMEs, network managers, and access control managers working together. Data connections are secured through Azure Key Vaults and network connectivity is protected by LinkedIn's NACL control.

Metadata

Metadata Utilities Cloud Database

Data Vault on Snowflake: Feature Engineering and Business Vault

Snowflake

MARCH 30, 2023

Snowflake can also ingest external tables from on-premise s data sources via S3-compliant data storage APIs. Batch/file-based data is modeled into the raw vault table structures as the hub, link, and satellite tables illustrated at the beginning of this post.

Engineering

Engineering Raw Data Data Science Machine Learning

Data Independence in DBMS: Understanding the Concept and Importance

Knowledge Hut

JULY 24, 2023

In this article, we will explore the concept of data independence in relational databases and how it can benefit your organization by allowing you to work more effectively with your data while ensuring it always remains accessible and secure. What is Data Independence of DBMS? Physical Data Independence in DBMS 1.

Database Design

Database Design Relational Database Database Metadata

Iceberg, Right Ahead! 7 Apache Iceberg Best Practices for Smooth Data Sailing

Monte Carlo

MAY 30, 2023

It’s designed to improve upon the performance and usability challenges of older data storage formats such as Apache Hive and Apache Parquet. Choose the right partitioning strategy Partitioning can significantly improve query performance by reducing the amount of data scanned.

Metadata

Metadata Raw Data Data Lake Data

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Monte Carlo

AUGUST 25, 2023

That’s why it’s essential for teams to choose the right architecture for the storage layer of their data stack. But, the options for data storage are evolving quickly. So let’s get to the bottom of the big question: what kind of data storage layer will provide the strongest foundation for your data platform?

Data Lake

Data Lake Data Warehouse Unstructured Data Raw Data

Data Pipeline Observability: A Model For Data Engineers

Databand.ai

JUNE 28, 2023

Having a bigger and more specialized data team can help, but it can hurt if those team members don’t coordinate. More people accessing the data and running their own pipelines and their own transformations causes errors and impacts data stability. is a unified data observability platform built for data engineers.

Data Pipeline

Data Pipeline Data Engineering Data Engineer Engineering

Top Data Lake Vendors (Quick Reference Guide)

Monte Carlo

APRIL 24, 2023

Data lakes are useful, flexible data storage repositories that enable many types of data to be stored in its rawest state. Traditionally, after being stored in a data lake, raw data was then often moved to various destinations like a data warehouse for further processing, analysis, and consumption.

Data Lake

Data Lake Google Cloud Data Warehouse AWS

Data Lakehouse: Concept, Key Features, and Architecture Layers

AltexSoft

NOVEMBER 10, 2021

At the same time, it brings structure to data and empowers data management features similar to those in data warehouses by implementing the metadata layer on top of the store. Another type of data storage — a data lake — tried to address these and other issues. Data lake. DataFrame API support.

Architecture

Architecture Data Lake Data Warehouse Metadata

HBase vs Cassandra-The Battle of the Best NoSQL Databases

ProjectPro

SEPTEMBER 16, 2021

NoSQL databases are the new-age solutions to distributed unstructured data storage and processing. The speed, scalability, and fail-over safety offered by NoSQL databases are needed in the current times in the wake of Big Data Analytics and Data Science technologies. Hence, writes in Hbase are operation intensive.

NoSQL

NoSQL Database Hadoop Big Data

Data Scientist vs Data Engineer: Differences and Why You Need Both

AltexSoft

OCTOBER 30, 2021

Here, data scientists are supported by data engineers. Data engineering itself is a process of creating mechanisms for accessing data. Data engineer’s integral task is building and maintaining data infrastructure — the system managing the flow of data from its source to destination.

Data Engineering

Data Engineering Data Engineer Engineering Machine Learning

Data Lakes vs. Data Warehouses

Grouparoo

JANUARY 11, 2022

When it comes to storing large volumes of data, a simple database will be impractical due to the processing and throughput inefficiencies that emerge when managing and accessing big data. This article looks at the options available for storing and processing big data, which is too large for conventional databases to handle.

Data Lake

Data Lake Data Warehouse Unstructured Data Raw Data

Monte Carlo Announces Delta Lake, Unity Catalog Integrations To Bring End-to-End Data Observability to Databricks

Monte Carlo

JUNE 28, 2022

Traditionally, data lakes held raw data in its native format and were known for their flexibility, speed, and open source ecosystem. By design, data was less structured with limited metadata and no ACID properties. Unity Catalog The Unity Catalog unifies metastores, catalogs, and metadata within Databricks.

Data Lake

Data Lake Metadata AWS Data Warehouse

How Apache Iceberg Is Changing the Face of Data Lakes

Turbocharging Atlas: How we reduced server initialization time to less than 2 minutes

Webinars

Trending Sources

Why Open Table Format Architecture is Essential for Modern Data Systems

Webinars

Snowflake and the Pursuit Of Precision Medicine

On-Premise vs Cloud: Where Does the Future of Data Storage Lie?

Iceberg Is An Implementation Detail

Reflections On Designing A Data Platform From Scratch

Introducing Netflix’s Key-Value Data Abstraction Layer

Top 7 Mobile Security Threats and Prevention

Introducing Vector Search on Rockset: How to run semantic search with OpenAI and Rockset

Hadoop vs Spark: Main Big Data Tools Explained

A Flexible and Efficient Storage System for Diverse Workloads

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

Getting Started with Cloudera Data Platform Operational Database (COD)

dbt Core, Snowflake, and GitHub Actions: pet project for Data Engineers

Get Your Analytics Insights Instantly – Without Abandoning Central IT

Implementing the Netflix Media Database

Taking Charge of Tables: Introducing OpenHouse for Big Data Management

The State of Data Engineering in 2024: Key Insights and Trends

Unlocking Effective Data Governance with Unity Catalog – Data Bricks

Comparing Performance of Big Data File Formats: A Practical Guide

Mainframe Optimization: 5 Best Practices to Implement Now

Apache Ozone – A High Performance Object Store for CDP Private Cloud

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

What Are the Best Data Modeling Methodologies & Processes for My Data Lake?

The Evolution of Table Formats

Data Engineering Weekly #164

Pioneering Data Observability:Data, Code, Infrastructure, & AI

5 Layers of Data Lakehouse Architecture Explained

Data Lakehouse Architecture Explained: 5 Layers

Data Integrity Trends for 2024

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

Costwiz: Saving cost for LinkedIn enterprise on Azure

Data Vault on Snowflake: Feature Engineering and Business Vault

Data Independence in DBMS: Understanding the Concept and Importance

Iceberg, Right Ahead! 7 Apache Iceberg Best Practices for Smooth Data Sailing

Data Warehouse vs Data Lake vs Data Lakehouse: Definitions, Similarities, and Differences

Data Pipeline Observability: A Model For Data Engineers

Top Data Lake Vendors (Quick Reference Guide)

Data Lakehouse: Concept, Key Features, and Architecture Layers

HBase vs Cassandra-The Battle of the Best NoSQL Databases

Data Scientist vs Data Engineer: Differences and Why You Need Both

Data Lakes vs. Data Warehouses

Monte Carlo Announces Delta Lake, Unity Catalog Integrations To Bring End-to-End Data Observability to Databricks

Stay Connected