Data Architecture and Data Process - Data Engineering Digest

Laying the Foundation for Modern Data Architecture

Cloudera

MAY 28, 2024

It’s not enough for businesses to implement and maintain a data architecture. The unpredictability of market shifts and the evolving use of new technologies means businesses need more data they can trust than ever to stay agile and make the right decisions.

Data Architecture

Data Architecture Architecture Data Lake Data Warehouse

Centralize Your Data Processes With a DataOps Process Hub

DataKitchen

NOVEMBER 4, 2021

Data organizations often have a mix of centralized and decentralized activity. DataOps concerns itself with the complex flow of data across teams, data centers and organizational boundaries. It expands beyond tools and data architecture and views the data organization from the perspective of its processes and workflows.

Process

Process Data Process Pharmaceutical Data Lake

Five Ways A Modern Data Architecture Can Reduce Costs in Telco

Cloudera

JUNE 27, 2023

The way to achieve this balance is by moving to a modern data architecture (MDA) that makes it easier to manage, integrate, and govern large volumes of distributed data. When you deploy a platform that supports MDA you can consolidate other systems, like legacy data mediation and disparate data storage solutions.

Data Architecture

Data Architecture Architecture Government Data Governance

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Telco Enterprise Data Platforms: Key Success Factors in Building for an AI Future

Cloudera

DECEMBER 17, 2024

The introduction of these faster, more powerful networks has triggered an explosion of data, which needs to be processed in real time to meet customer demands. Traditional data architectures struggle to handle these workloads, and without a robust, scalable hybrid data platform, the risk of falling behind is real.

Building

Building Telecommunication Data Architecture Architecture

How Retail and Media Leaders Drive Customer Satisfaction and Profits with Data and AI

Snowflake

MARCH 19, 2025

Attendees will discover how to accelerate their critical business workflows with the right data, technology and ecosystem access. Explore AI and unstructured data processing use cases with proven ROI: This year, retailers and brands will face intense pressure to demonstrate tangible returns on their AI investments.

Retail

Retail Media Entertainment Unstructured Data

Data Fabric: The Future of Data Architecture

Monte Carlo

FEBRUARY 21, 2023

Today, as data sources become increasingly varied, data management becomes more complex, and agility and scalability become essential traits for data leaders, data fabric is quickly becoming the future of data architecture. If data fabric is the future, how can you get your organization up-to-speed?

Data Architecture

Data Architecture Architecture Metadata Unstructured Data

Data Fabric: The Future of Data Architecture

Monte Carlo

FEBRUARY 21, 2023

Today, as data sources become increasingly varied, data management becomes more complex, and agility and scalability become essential traits for data leaders, data fabric is quickly becoming the future of data architecture. If data fabric is the future, how can you get your organization up-to-speed?

Data Architecture

Data Architecture Architecture Metadata Unstructured Data

Trends and Takeaways from Banking and Payments’ Event of the Year

Snowflake

NOVEMBER 11, 2024

Data and AI architecture matter “Before focusing on AI/ML use cases such as hyper personalization and fraud prevention, it is important that the data and data architecture are organized and structured in a way which meets the requirements and standards of the local regulators around the world.

Banking

Banking Finance Retail Food

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

Key Differences Between AI Data Engineers and Traditional Data Engineers While traditional data engineers and AI data engineers have similar responsibilities, they ultimately differ in where they focus their efforts.

Data Engineering

Data Engineering Data Engineer Engineering Unstructured Data

Ramp Simplifies Data Architecture Management, Cuts Costs, and Delivers Market Insights to Customers at Scale

Snowflake

JANUARY 30, 2023

After four months of testing, du Toit and his team had moved one of its databases to Snowflake for bulk data processing. Ramp fetches and delivers data into S3 buckets as well as uses dbt to transform data at each stage. “Snowflake quickly beat the other vendors we were looking at in terms of price and features.”

Data Architecture

Data Architecture Architecture Management Datasets

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

Additionally, the optimized query execution and data pruning features reduce the compute cost associated with querying large datasets. Scaling data infrastructure while maintaining efficiency is one of the primary challenges of modern data architecture.

Architecture

Architecture Systems Data Lake Google Cloud

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Edureka

APRIL 22, 2025

Its multi-cluster shared data architecture is one of its primary features. Additionally, Fabric has deep integrations with Power BI for visualization and Microsoft Purview for governance, resulting in a smooth experience for both business users and data professionals.

BI

BI Pipeline-centric Data Lake Google Cloud

Integrating Striim with BigQuery ML: Real-time Data Processing for Machine Learning

Striim

NOVEMBER 17, 2023

Striim serves as a real-time data integration platform that seamlessly and continuously moves data from diverse data sources to destinations such as cloud databases, messaging systems, and data warehouses, making it a vital component in modern data architectures.

Machine Learning

Machine Learning Data Process PostgreSQL Process

The Good and the Bad of Apache Spark Big Data Processing

AltexSoft

JULY 18, 2023

It allows data scientists to analyze large datasets and interactively run jobs on them from the R shell. Big data processing. Despite these nuances, Spark’s high-speed processing capabilities make it an attractive choice for big data processing tasks. Here are some of the possible use cases.

Big Data

Big Data Data Process Process Hadoop

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

The Race For Data Quality In A Medallion Architecture The Medallion architecture pattern is gaining traction among data teams. It is a layered approach to managing and transforming data. It sounds great, but how do you prove the data is correct at each layer? How do you ensure data quality in every layer

Architecture

Architecture Raw Data Pipeline-centric Data Ingestion

5 Advantages of Real-Time ETL for Snowflake

Striim

MARCH 21, 2025

Before loading the data to Snowflake with sub-second latency, Striim allows users to perform in-line transformations, including denormalization, filtering, enrichment and masking, using a SQL-based language. In-flight data processing reduces the time needed for data preparation as it delivers the data in a consumable form.

Data Warehouse

Data Warehouse MongoDB MySQL Hadoop

Back to the Financial Regulatory Future

Cloudera

FEBRUARY 15, 2024

Seeing the future in a modern data architecture The key to successfully navigating these challenges lies in the adoption of a modern data architecture.

Insurance

Insurance Banking Data Architecture Data Ingestion

Data Engineering: A Formula 1-inspired Guide for Beginners

Towards Data Science

DECEMBER 4, 2023

Anyways, I wasn’t paying enough attention during university classes, and today I’ll walk you through data layers using — guess what — an example. Business Scenario & Data Architecture Imagine this: next year, a new team on the grid, Red Thunder Racing, will call us (yes, me and you) to set up their new data infrastructure.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

Thoughts on Amazon Express One and its impact in Data Infrastructure

Data Engineering Weekly

DECEMBER 2, 2023

The Current State of the Data Architecture S3 intelligent tiered storage provides a fine balance between the cost and the duration of the data retention. However, the real-time insight on accessing the recent data remains a big challenge. The combination of stream processing + OLAP storage like Pinot. What is Next?

IT

IT BI AWS Kafka

Four Ways Telcos Can Realize Data-Driven Transformation

Cloudera

OCTOBER 19, 2023

While navigating so many simultaneous data-dependent transformations, they must balance the need to level up their data management practices—accelerating the rate at which they ingest, manage, prepare, and analyze data—with that of governing this data.

Telecommunication

Telecommunication Data Architecture Government Architecture

A Guide to Data Pipelines (And How to Design One From Scratch)

Striim

SEPTEMBER 11, 2024

Data pipelines are the backbone of your business’s data architecture. Implementing a robust and scalable pipeline ensures you can effectively manage, analyze, and organize your growing data. Understanding the essential components of data pipelines is crucial for designing efficient and effective data architectures.

Data Pipeline

Data Pipeline Designing Data Lake Data Warehouse

On-Prem vs. The Cloud: Key Considerations

phData: Data Engineering

FEBRUARY 21, 2025

On-prem data warehouses can provide lower latency solutions for critical applications that require high performance and low latency. Many companies may choose an on-prem data warehousing solution for quicker data processing to enable business decisions. Data integrations and pipelines can also impact latency.

Cloud

Cloud Data Warehouse Amazon Web Services Data Ingestion

1. Streamlining Membership Data Engineering at Netflix with Psyberg

Netflix Tech

NOVEMBER 14, 2023

In this context, managing the data, especially when it arrives late, can present a substantial challenge! In this three-part blog post series, we introduce you to Psyberg , our incremental data processing framework designed to tackle such challenges! Let’s dive in! To solve these problems, we came up with Psyberg!

Data Engineering

Data Engineering Data Engineer Engineering Metadata

How Cloudera Data Flow Enables Successful Data Mesh Architectures

Cloudera

OCTOBER 7, 2021

Those decentralization efforts appeared under different monikers through time, e.g., data marts versus data warehousing implementations (a popular architectural debate in the era of structured data) then enterprise-wide data lakes versus smaller, typically BU-Specific, “data ponds”.

Architecture

Architecture Metadata Kafka Government

IBM Technology Chooses Cloudera as its Preferred Partner for Addressing Real Time Data Movement Using Kafka

Cloudera

SEPTEMBER 26, 2023

Organizations increasingly rely on streaming data sources not only to bring data into the enterprise but also to perform streaming analytics that accelerate the process of being able to get value from the data early in its lifecycle.

Kafka

Kafka Technology IT Government

Open, Interoperable Storage with Iceberg Tables, Now Generally Available

Snowflake

JUNE 21, 2024

Data infrastructure should serve the current set of business needs and be able to scale and evolve with change. With Snowflake and Iceberg tables, customers have the ability to adapt to these changes and deploy their choice of data architecture, all while maintaining leading security, performance and simplicity.

Data Lake

Data Lake BI Business Intelligence Metadata

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Cloudera

MARCH 23, 2022

In fact, we recently announced the integration with our cloud ecosystem bringing the benefits of Iceberg to enterprises as they make their journey to the public cloud, and as they adopt more converged architectures like the Lakehouse. Change data capture (CDC). 1: Multi-function analytics . Flexible and open file formats.

Metadata

Metadata Data Architecture Machine Learning BI

Data Architect: Role Description, Skills, Certifications and When to Hire

AltexSoft

FEBRUARY 11, 2023

This specialist works closely with people on both business and IT sides of a company to understand the current needs of the stakeholders and help them unlock the full potential of data. To get a better understanding of a data architect’s role, let’s clear up what data architecture is.

Data Architect

Data Architect Certification Generalist Big Data

Hands-On Introduction to Delta Lake with (py)Spark

Towards Data Science

FEBRUARY 15, 2023

In this context, data management in an organization is a key point for the success of its projects involving data. One of the main aspects of correct data management is the definition of a data architecture. Spark: The definitive guide: Big data processing made simple. O’Reilly Media, Inc.” [2]

Data Lake

Data Lake Data Warehouse Hadoop Architecture

Announcing the 2020 Data Impact Award Winners

Cloudera

NOVEMBER 18, 2020

The technological linchpin of its digital transformation has been its Enterprise Data Architecture & Governance platform. It hosts over 150 big data analytics sandboxes across the region with over 200 users utilizing the sandbox for data discovery.

Medical

Medical Banking Telecommunication Government

Apache Kafka Vs Apache Spark: Know the Differences

Knowledge Hut

MAY 3, 2024

A new breed of ‘Fast Data’ architectures has evolved to be stream-oriented, where data is processed as it arrives, providing businesses with a competitive advantage. Dean Wampler (Renowned author of many big data technology-related books) Dean Wampler makes an important point in one of his webinars. Dataflow 4.

Kafka

Kafka Scala Java Amazon Web Services

The Future of the Data Lakehouse – Open

Cloudera

JUNE 18, 2022

In recent years, the term “data lakehouse” was coined to describe this architectural pattern of tabular analytics over data in the data lake. In a rush to own this term, many vendors have lost sight of the fact that the openness of a data architecture is what guarantees its durability and longevity.

Data Lake

Data Lake Data Warehouse BI SQL

Keeping Your Data Warehouse In Order With DataForm

Data Engineering Podcast

OCTOBER 14, 2019

We have partnered with organizations such as O’Reilly Media, Dataversity, Corinium Global Intelligence, Alluxio, and Data Council. Upcoming events include the combined events of the Data Architecture Summit and Graphorum, the Data Orchestration Summit, and Data Council in NYC.

Data Warehouse

Data Warehouse PostgreSQL AWS Programming Language

Visibility and Transparency

Cloudera

MAY 8, 2023

Out of the box Cloudera Data platform (CDP) performs superbly but over time, if data architecture, data engineering, and DevOps best practices are not maintained, you can get stuck maintaining the wild, wild west. In this six-part series, we’re focused on improving the health of your environment.

Professional Services

Professional Services Data Architecture Data Engineering Data Engineer

DataOps Architecture: 5 Key Components and How to Get Started

Databand.ai

AUGUST 30, 2023

A DataOps architecture is the structural foundation that supports the implementation of DataOps principles within an organization. It encompasses the systems, tools, and processes that enable businesses to manage their data more efficiently and effectively. As a result, they can be slow, inefficient, and prone to errors.

Architecture

Architecture Data Ingestion Data Governance Data Cleanse

Data Engineering Weekly #152

Data Engineering Weekly

DECEMBER 10, 2023

From a data architecture point of view, this enables a lot of flexibility in integrating multiple systems. link] Netflix: Diving Deeper into Psyberg: Stateless vs Stateful Data Processing Netflix wrote a deep-dive article about Psyberg’s incremental data processing pipeline framework.

Data Engineering

Data Engineering Data Engineer Engineering Metadata

Getting the Most From Your Modern Data Platform: A Three-Phase Approach

Snowflake

JULY 22, 2024

Determining an architecture and a scalable data model to integrate more source systems in the future. The benefits of migrating to Snowflake start with its multi-cluster shared data architecture, which enables scalability and high performance. Features such as auto-suspend and a pay-as-you-go model help you save costs.

Government

Government Data Cloud Hadoop

Evaluating Data Observability Tools: A Comprehensive Guide

Data Engineering Weekly

SEPTEMBER 18, 2024

The Rise of Data Observability Data observability has become increasingly critical as companies seek greater visibility into their data processes. This growing demand has found a natural synergy with the rise of the data lake.

Data Lake

Data Lake Data Pipeline Unstructured Data Data

DataKitchen’s 2020 Honors & Awards

DataKitchen

DECEMBER 30, 2020

Why we are watching them: DataKitchen aims to automate and coordinate people, tools and environments of an entire data analytic organization. The company also works to help organizations on applying Agile, DevOps and Lean principles to their data processes with the DataOps Cookbook and manifesto.

Manufacturing

Manufacturing Big Data Data Science Data Pipeline

Securely Deploy Custom Apps and Models with Snowpark Container Services, Now Generally Available

Snowflake

AUGUST 1, 2024

Snowpark Container Services gives developers the ability to bring any containerized workload to their data that is already secure in Snowflake — ReactJS front-ends, open source large language models (LLMs), distributed data processing pipelines, you name it.

Deep Learning

Deep Learning Government AWS Architecture

Connecting the Data Lifecycle

Cloudera

NOVEMBER 29, 2021

eMAG , a Romania-based retailer seen as a pioneer in e-commerce, was struggling to manage the tremendously large amount of data coming in every second. The company needed a modern data architecture to manage the growing traffic effectively. . This creates long delays in data processing, which halts efficient functioning. .

Data Lake

Data Lake Telecommunication Retail Data

Sovereign AI, Redpanda vs Apache Kafka, The Future of Data Streaming with Alex Gallego (CEO of Redpanda)

Striim

AUGUST 5, 2024

Discover how Alex’s journey from building racing motorcycles and tattoo machines as a child led him to revolutionize stream processing and cloud infrastructure. Explore the intricate challenges and groundbreaking innovations in data storage and streaming.

Kafka

Kafka Data Storage Architecture Data Architecture

Direct Integration: Kinesis Firehose with Snowpipe Streaming

Cloudyard

JULY 22, 2024

Read Time: 2 Minute, 57 Second Previously, data engineers used Kinesis Firehose to transfer data into blob storage (S3) and then load it into Snowflake using either Snowpipe or batch processing. This introduced latency in the data pipeline for near real-time data processing.

AWS

AWS Data Ingestion Data Architecture Architecture

The Evolution of Table Formats

Monte Carlo

MAY 14, 2024

As organizations seek greater value from their data, data architectures are evolving to meet the demand — and table formats are no exception. Apache ORC (Optimized Row Columnar) : In 2013, ORC was developed for the Hadoop ecosystem to improve the efficiency of data storage and retrieval.

Data Lake

Data Lake Metadata Hadoop Data Governance

Laying the Foundation for Modern Data Architecture

Centralize Your Data Processes With a DataOps Process Hub

Webinars

Trending Sources

Five Ways A Modern Data Architecture Can Reduce Costs in Telco

Webinars

Telco Enterprise Data Platforms: Key Success Factors in Building for an AI Future

How Retail and Media Leaders Drive Customer Satisfaction and Profits with Data and AI

Data Fabric: The Future of Data Architecture

Data Fabric: The Future of Data Architecture

Trends and Takeaways from Banking and Payments’ Event of the Year

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Ramp Simplifies Data Architecture Management, Cuts Costs, and Delivers Market Insights to Customers at Scale

Why Open Table Format Architecture is Essential for Modern Data Systems

Microsoft Fabric vs. Snowflake: Key Differences You Need to Know

Integrating Striim with BigQuery ML: Real-time Data Processing for Machine Learning

The Good and the Bad of Apache Spark Big Data Processing

The Race For Data Quality in a Medallion Architecture

5 Advantages of Real-Time ETL for Snowflake

Back to the Financial Regulatory Future

Data Engineering: A Formula 1-inspired Guide for Beginners

Thoughts on Amazon Express One and its impact in Data Infrastructure

Four Ways Telcos Can Realize Data-Driven Transformation

A Guide to Data Pipelines (And How to Design One From Scratch)

On-Prem vs. The Cloud: Key Considerations

1. Streamlining Membership Data Engineering at Netflix with Psyberg

How Cloudera Data Flow Enables Successful Data Mesh Architectures

IBM Technology Chooses Cloudera as its Preferred Partner for Addressing Real Time Data Movement Using Kafka

Open, Interoperable Storage with Iceberg Tables, Now Generally Available

5 Reasons to Use Apache Iceberg on Cloudera Data Platform (CDP)

Data Architect: Role Description, Skills, Certifications and When to Hire

Hands-On Introduction to Delta Lake with (py)Spark

Announcing the 2020 Data Impact Award Winners

Apache Kafka Vs Apache Spark: Know the Differences

The Future of the Data Lakehouse – Open

Keeping Your Data Warehouse In Order With DataForm

Visibility and Transparency

DataOps Architecture: 5 Key Components and How to Get Started

Data Engineering Weekly #152

Getting the Most From Your Modern Data Platform: A Three-Phase Approach

Evaluating Data Observability Tools: A Comprehensive Guide

DataKitchen’s 2020 Honors & Awards

Securely Deploy Custom Apps and Models with Snowpark Container Services, Now Generally Available

Connecting the Data Lifecycle

Sovereign AI, Redpanda vs Apache Kafka, The Future of Data Streaming with Alex Gallego (CEO of Redpanda)

Direct Integration: Kinesis Firehose with Snowpipe Streaming

The Evolution of Table Formats

Stay Connected