Data Architecture, Data Lake and Data Management

Simplifying Data Architecture and Security to Accelerate Value

Snowflake

NOVEMBER 11, 2024

Whether it’s unifying transactional and analytical data with Hybrid Tables, improving governance for an open lakehouse with Snowflake Open Catalog or enhancing threat detection and monitoring with Snowflake Horizon Catalog , Snowflake is reducing the number of moving parts to give customers a fully managed service that just works.

Data Architecture

Data Architecture Architecture Data Lake Kafka

Data Integrity for AI: What’s Old is New Again

Precisely

JANUARY 9, 2025

The goal of this post is to understand how data integrity best practices have been embraced time and time again, no matter the technology underpinning. In the beginning, there was a data warehouse The data warehouse (DW) was an approach to data architecture and structured data management that really hit its stride in the early 1990s.

Data Integration

Data Integration Hadoop Data Warehouse Data Lake

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Snowflake

NOVEMBER 2, 2023

Over the years, the technology landscape for data management has given rise to various architecture patterns, each thoughtfully designed to cater to specific use cases and requirements. Each of these architectures has its own unique strengths and tradeoffs. The schema of semi-structured data tends to evolve over time.

Data Lake

Data Lake Data Warehouse Cloud Unstructured Data

Straining Your Data Lake Through A Data Mesh

Data Engineering Podcast

JULY 22, 2019

Summary The current trend in data management is to centralize the responsibilities of storing and curating the organization’s information to a data engineering team. This organizational pattern is reinforced by the architectural pattern of data lakes as a solution for managing storage and access.

Data Lake

Data Lake Hadoop Data Architecture

Laying the Foundation for Modern Data Architecture

Cloudera

MAY 28, 2024

It’s not enough for businesses to implement and maintain a data architecture. The unpredictability of market shifts and the evolving use of new technologies means businesses need more data they can trust than ever to stay agile and make the right decisions.

Data Architecture

Data Architecture Architecture Data Lake Data Warehouse

Maintaining Your Data Lake At Scale With Spark

Data Engineering Podcast

JUNE 16, 2019

Summary Building and maintaining a data lake is a choose your own adventure of tools, services, and evolving best practices. The flexibility and freedom that data lakes provide allows for generating significant value, but it can also lead to anti-patterns and inconsistent quality in your analytics.

Data Lake

Data Lake Lambda Architecture Data Warehouse Hadoop

Data Architecture and Strategy in the AI Era

Cloudera

MARCH 28, 2024

But, even with the backdrop of an AI-dominated future, many organizations still find themselves struggling with everything from managing data volumes and complexity to security concerns to rapidly proliferating data silos and governance challenges.

Data Architecture

Data Architecture Architecture Data Lake Data

Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana

Data Engineering Podcast

SEPTEMBER 1, 2021

Summary The Presto project has become the de facto option for building scalable open source analytics in SQL for the data lake. Can you give an overview of the options that are available for someone wanting to use its SQL engine for querying their data? Hudi, Delta Lake, Iceberg, Nessie, LakeFS, etc.).

Data Lake

Data Lake Cloud AWS SQL

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform. Can you describe what role Trino and Iceberg play in Stripe's data architecture?

Data Lake

Data Lake High Quality Data Metadata Machine Learning

Modern Data Architecture for Telecommunications

Cloudera

SEPTEMBER 6, 2022

Data has continued to grow both in scale and in importance through this period, and today telecommunications companies are increasingly seeing data architecture as an independent organizational challenge, not merely an item on an IT checklist. Previously, there were three types of data structures in telco: .

Telecommunication

Telecommunication Data Architecture Architecture Government

Simplify Your Data Architecture With The Presto Distributed SQL Engine

Data Engineering Podcast

SEPTEMBER 7, 2020

In this episode he explains how it is designed to allow for querying and combining data where it resides, the use cases that such an architecture unlocks, and the innovative ways that it is being employed at companies across the world. Can you start by giving an overview of what Presto is and its origin story?

Architecture

Architecture Data Architecture SQL Engineering

Breaking State and Local Data Silos with Modern Data Architectures

Cloudera

AUGUST 30, 2022

Agencies are plagued by a wide range of data formats and storage environments—legacy systems, databases, on-premises applications, citizen access portals, innumerable sensors and devices, and more—that all contribute to a siloed ecosystem and the data management challenge. . Modern data architectures. Forrester ).

Data Architecture

Data Architecture Architecture Data Lake NoSQL

How Column-Aware Development Tooling Yields Better Data Models

Data Engineering Podcast

JUNE 17, 2023

In this episode Satish Jayanthi explores the benefits of incorporating column-aware tooling in the data modeling process. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake.

Data Lake

Data Lake Machine Learning Metadata Data Architecture

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Cloudera

OCTOBER 23, 2024

In August, we wrote about how in a future where distributed data architectures are inevitable, unifying and managing operational and business metadata is critical to successfully maximizing the value of data, analytics, and AI.

Metadata

Metadata BI Data Lake Business Intelligence

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

Track data files within the table along with their column statistics. Open table formats enable efficient data management and retrieval by storing these files chronologically, with a history of DDL and DML actions and an index of data file locations. It can also be integrated into major data platforms like Snowflake.

Architecture

Architecture Systems Data Lake Google Cloud

Evaluating Change Data Capture Tools: A Comprehensive Guide

Data Engineering Weekly

AUGUST 6, 2024

CDC tools fuel analytical apps and mission-critical data feeds in banking and regulated industries, with use cases ranging from data synchronization, managing risk, and preventing fraud to driving personalization. Unlike data lakes, which are predominantly append-only, lakehouses support data mutation natively.

Data Lake

Data Lake Data Warehouse Database Data Architecture

A Primer On Enterprise Data Curation with Todd Walter - Episode 49

Data Engineering Podcast

SEPTEMBER 23, 2018

Using the metaphor of a museum curator carefully managing the precious resources on display and in the vaults, he discusses the various layers of an enterprise data strategy. How do you define data curation? How does the size and maturity of a company affect the ways that they architect and interact with their data systems?

Data Lake

Data Lake Data Warehouse Data Architecture Architecture

Modern Data Architectures Provide a Foundation for Innovation

Precisely

JUNE 6, 2023

At Precisely’s Trust ’23 conference, Chief Operating Officer Eric Yau hosted an expert panel discussion on modern data architectures. The group kicked off the session by exchanging ideas about what it means to have a modern data architecture.

Data Architecture

Data Architecture Architecture Metadata Data Lake

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

Key Differences Between AI Data Engineers and Traditional Data Engineers While traditional data engineers and AI data engineers have similar responsibilities, they ultimately differ in where they focus their efforts. Data Storage Solutions As we all know, data can be stored in a variety of ways.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Data Fabric: The Future of Data Architecture

Monte Carlo

FEBRUARY 21, 2023

Enter data fabric: a data management architecture designed to serve the needs of the business, not just those of data engineers. A data fabric is an architecture and associated data products that provide consistent capabilities across a variety of endpoints spanning multiple cloud environments.

Data Architecture

Data Architecture Architecture Metadata Unstructured Data

Data Fabric: The Future of Data Architecture

Monte Carlo

FEBRUARY 21, 2023

Enter data fabric: a data management architecture designed to serve the needs of the business, not just those of data engineers. A data fabric is an architecture and associated data products that provide consistent capabilities across a variety of endpoints spanning multiple cloud environments.

Data Architecture

Data Architecture Architecture Metadata Unstructured Data

Hands-On Introduction to Delta Lake with (py)Spark

Towards Data Science

FEBRUARY 15, 2023

Data by itself has no value, it needs to be organized, standardized, and clean. In this context, data management in an organization is a key point for the success of its projects involving data. One of the main aspects of correct data management is the definition of a data architecture.

Data Lake

Data Lake Data Warehouse Hadoop Architecture

Let Your Analysts Build A Data Lakehouse With Cuelake

Data Engineering Podcast

AUGUST 20, 2021

Summary Data lakes have been gaining popularity alongside an increase in their sophistication and usability. Despite improvements in performance and data architecture they still require significant knowledge and experience to deploy and manage. Can you describe what Cuelake is and the story behind it?

Building

Building Data Lake Data Warehouse SQL

5 Reasons Data Discovery Platforms Are Best For Data Lakes

Monte Carlo

APRIL 1, 2021

Over the past few years, data lakes have emerged as a must-have for the modern data stack. But while the technologies powering our access and analysis of data have matured, the mechanics behind understanding this data in a distributed environment have lagged behind. Data discovery tools and platforms can help.

Data Lake

Data Lake Data Warehouse Unstructured Data Government

Modern Data Management Essentials: Exploring Data Fabric

Precisely

JULY 18, 2024

Key Takeaways Data Fabric is a modern data architecture that facilitates seamless data access, sharing, and management across an organization. Data management recommendations and data products emerge dynamically from the fabric through automation, activation, and AI/ML analysis of metadata.

Data Management

Data Management Management Metadata Database-centric

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

AltexSoft

AUGUST 29, 2023

In 2010, a transformative concept took root in the realm of data storage and analytics — a data lake. The term was coined by James Dixon , Back-End Java, Data, and Business Intelligence Engineer, and it started a new era in how organizations could store, manage, and analyze their data. What is a data lake?

Data Lake

Data Lake Architecture IT Amazon Web Services

Evaluating Data Observability Tools: A Comprehensive Guide

Data Engineering Weekly

SEPTEMBER 18, 2024

The Rise of Data Observability Data observability has become increasingly critical as companies seek greater visibility into their data processes. This growing demand has found a natural synergy with the rise of the data lake. As a result, monitoring data in real time was often an afterthought.

Data Lake

Data Lake Data Pipeline Unstructured Data Data

Escaping Analysis Paralysis For Your Data Platform With Data Virtualization

Data Engineering Podcast

NOVEMBER 18, 2019

Summary With the constant evolution of technology for data management it can seem impossible to make an informed decision about whether to build a data warehouse, or a data lake, or just leave your data wherever it currently rests. How does it influence the relevancy of data warehouses or data lakes?

Data Lake

Data Lake Scala Data Warehouse Hadoop

2024 Governance Trends for Data Leaders

phData: Data Engineering

NOVEMBER 1, 2024

Monitor and Adapt: Continuously assess the impact of GenAI on data governance practices and be prepared to adapt policies as technologies evolve. Data governance is the only way to ensure those requirements are met. Chief Technology Officer, Finance Industry For all the quotes, download the Trendbook today!

Government

Government Data Governance Finance Metadata

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Cloudera

JUNE 30, 2022

Over the past decade, Cloudera has enabled multi-function analytics on data lakes through the introduction of the Hive table format and Hive ACID. Companies, on the other hand, have continued to demand highly scalable and flexible analytic engines and services on the data lake, without vendor lock-in.

Data Lake

Data Lake Business Intelligence Metadata Data Warehouse

Transforming Data Architecture through Data Mesh and Striim

Striim

FEBRUARY 12, 2024

Data Mesh plays a vital role in managing data effectively and is a valuable asset for organizations looking to improve agility, intelligence, and success in their operations in today’s constantly evolving environment. It also allows experts to access data directly, making work faster and more productive.

Data Architecture

Data Architecture Architecture Data Data Lake

SoftBank Selects Cloudera Data Platform to Leverage Customer Intelligence While Ensuring Data Security

Cloudera

MAY 9, 2023

New Data Lakehouse Enables Stronger Data Governance SoftBank needed to reduce the number of workloads on its existing platform and decided to adopt Cloudera to build a data lake capable of managing data more effectively. Team members with various Cloudera capabilities provided 24-hour support for upgrade.

Data Security

Data Security Telecommunication Data Lake Data Governance

Habib Bank manages data at scale with Cloudera Data Platform

Cloudera

NOVEMBER 17, 2022

The Solution: CDP Private Cloud brings a next-generation hybrid architecture with cloud-native benefits to HBL’s data platform. HBL started their data journey in 2019 when data lake initiative was started to consolidate complex data sources and enable the bank to use single version of truth for decision making.

Banking

Banking Management Data Lake Professional Services

Open Source Object Storage For All Of Your Data

Data Engineering Podcast

SEPTEMBER 22, 2019

Summary Object storage is quickly becoming the unifying layer for data intensive applications and analytics. Modern, cloud oriented data warehouses and data lakes both rely on the durability and ease of use that it provides. Interview Introduction How did you get involved in the area of data management?

AWS

AWS Google Cloud Cloud Storage Data Lake

A Guide to Data Pipelines (And How to Design One From Scratch)

Striim

SEPTEMBER 11, 2024

Data pipelines are the backbone of your business’s data architecture. Implementing a robust and scalable pipeline ensures you can effectively manage, analyze, and organize your growing data. Understanding the essential components of data pipelines is crucial for designing efficient and effective data architectures.

Data Pipeline

Data Pipeline Designing Data Lake Data Warehouse

Data Orchestration For Hybrid Cloud Analytics

Data Engineering Podcast

OCTOBER 21, 2019

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.

Cloud

Cloud Hadoop Data Lake Programming Language

Is the data warehouse going under the data lake?

ProjectPro

JULY 22, 2016

For the same cost, organizations can now store 50 times as much data as in a Hadoop data lake than in a data warehouse. Data lake is gaining momentum across various organizations and everyone wants to know how to implement a data lake and why.

Data Lake

Data Lake Data Warehouse Hadoop Unstructured Data

Navigating Boundless Data Streams With The Swim Kernel

Data Engineering Podcast

SEPTEMBER 18, 2019

This was an eye opening conversation about how stateful computation of data streams from edge devices can reduce cost and complexity as compared to batch oriented workflows. We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, and Data Council.

Hadoop

Hadoop Data Lake BI Kafka

Scale Your Analytics On The Clickhouse Data Warehouse

Data Engineering Podcast

JULY 8, 2019

It was interesting to learn about some of the custom data types and performance optimizations that are included. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Coming up this fall is the combined events of Graphorum and the Data Architecture Summit.

Data Warehouse

Data Warehouse MySQL Hadoop Data Lake

A High Performance Platform For The Full Big Data Lifecycle

Data Engineering Podcast

AUGUST 19, 2019

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode.

Big Data

Big Data Hadoop Data Lake Media

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

The concept of the data mesh architecture is not entirely new; Its conceptual origins are rooted in the microservices architecture, its design principles (i.e., need to integrate multiple “point solutions” used in a data ecosystem) and organization reasons (e.g., How CDF enables successful Data Mesh Architectures.

Architecture

Architecture Metadata Kafka Government

Chose Both: Data Fabric and Data Lakehouse

Cloudera

SEPTEMBER 12, 2022

Combining and analyzing both structured and unstructured data is a whole new challenge to come to grips with, let alone doing so across different infrastructures. Both obstacles can be overcome using modern data architectures, specifically data fabric and data lakehouse. Unified data fabric.

Unstructured Data

Unstructured Data Data Lake Data Architecture Data

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Cloudera

MARCH 23, 2022

In fact, we recently announced the integration with our cloud ecosystem bringing the benefits of Iceberg to enterprises as they make their journey to the public cloud, and as they adopt more converged architectures like the Lakehouse. Simplify data management . 1: Multi-function analytics . The *Any*-house.

Metadata

Metadata Data Architecture BI Machine Learning

Data Architect: Role Description, Skills, Certifications and When to Hire

AltexSoft

FEBRUARY 11, 2023

To get a better understanding of a data architect’s role, let’s clear up what data architecture is. Data architecture is the organization and design of how data is collected, transformed, integrated, stored, and used by a company. Sample of a high-level data architecture blueprint for Azure BI programs.

Data Architect

Data Architect Certification Generalist Big Data

Simplifying Data Architecture and Security to Accelerate Value

Data Integrity for AI: What’s Old is New Again

Trending Sources

Announcing New Innovations for Data Warehouse, Data Lake, and Data Lakehouse in the Data Cloud

Straining Your Data Lake Through A Data Mesh

Laying the Foundation for Modern Data Architecture

Maintaining Your Data Lake At Scale With Spark

Data Architecture and Strategy in the AI Era

Presto Powered Cloud Data Lakes At Speed Made Easy With Ahana

Being Data Driven At Stripe With Trino And Iceberg

Modern Data Architecture for Telecommunications

Simplify Your Data Architecture With The Presto Distributed SQL Engine

Breaking State and Local Data Silos with Modern Data Architectures

How Column-Aware Development Tooling Yields Better Data Models

Cloudera and Snowflake Partner to Deliver the Most Comprehensive Open Data Lakehouse

Why Open Table Format Architecture is Essential for Modern Data Systems

Evaluating Change Data Capture Tools: A Comprehensive Guide

A Primer On Enterprise Data Curation with Todd Walter - Episode 49

Modern Data Architectures Provide a Foundation for Innovation

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Data Fabric: The Future of Data Architecture

Data Fabric: The Future of Data Architecture

Hands-On Introduction to Delta Lake with (py)Spark

Let Your Analysts Build A Data Lakehouse With Cuelake

5 Reasons Data Discovery Platforms Are Best For Data Lakes

Modern Data Management Essentials: Exploring Data Fabric

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

Evaluating Data Observability Tools: A Comprehensive Guide

Escaping Analysis Paralysis For Your Data Platform With Data Virtualization

2024 Governance Trends for Data Leaders

Supercharge Your Data Lakehouse with Apache Iceberg in Cloudera Data Platform

Transforming Data Architecture through Data Mesh and Striim

SoftBank Selects Cloudera Data Platform to Leverage Customer Intelligence While Ensuring Data Security

Habib Bank manages data at scale with Cloudera Data Platform

Open Source Object Storage For All Of Your Data

A Guide to Data Pipelines (And How to Design One From Scratch)

Data Orchestration For Hybrid Cloud Analytics

Is the data warehouse going under the data lake?

Navigating Boundless Data Streams With The Swim Kernel

Scale Your Analytics On The Clickhouse Data Warehouse

A High Performance Platform For The Full Big Data Lifecycle

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Chose Both: Data Fabric and Data Lakehouse

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Data Architect: Role Description, Skills, Certifications and When to Hire

Stay Connected