Architecture, Data Ingestion and Data Process

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

The Race For Data Quality In A Medallion Architecture The Medallion architecture pattern is gaining traction among data teams. It is a layered approach to managing and transforming data. It sounds great, but how do you prove the data is correct at each layer?

Architecture

Architecture Raw Data Pipeline-centric Data Ingestion

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

The world we live in today presents larger datasets, more complex data, and diverse needs, all of which call for efficient, scalable data systems. Open Table Format (OTF) architecture now provides a solution for efficient data storage, management, and processing while ensuring compatibility across different platforms.

Architecture

Architecture Systems Data Lake Google Cloud

Last Mile Data Processing with Ray

Pinterest Engineering

SEPTEMBER 12, 2023

Behind the scenes, hundreds of ML engineers iteratively improve a wide range of recommendation engines that power Pinterest, processing petabytes of data and training thousands of models using hundreds of GPUs. As model architecture building blocks (e.g. This is what we commonly refer to as Last Mile Data Processing.

Data Process

Data Process Process Datasets Software Engineer

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

How to Design a Modern, Robust Data Ingestion Architecture

Monte Carlo

MAY 28, 2024

A data ingestion architecture is the technical blueprint that ensures that every pulse of your organization’s data ecosystem brings critical information to where it’s needed most. A typical data ingestion flow.

Data Ingestion

Data Ingestion Architecture Designing Hadoop

Introducing Compute-Compute Separation for Real-Time Analytics

Rockset

MARCH 1, 2023

When you deconstruct the core database architecture, deep in the heart of it you will find a single component that is performing two distinct competing functions: real-time data ingestion and query serving. When data ingestion has a flash flood moment, your queries will slow down or time out making your application flaky.

Data Ingestion

Data Ingestion Database Architecture SQL

8 Data Ingestion Tools (Quick Reference Guide)

Monte Carlo

FEBRUARY 20, 2024

At the heart of every data-driven decision is a deceptively simple question: How do you get the right data to the right place at the right time? The growing field of data ingestion tools offers a range of answers, each with implications to ponder. Fivetran Image courtesy of Fivetran.

Data Ingestion

Data Ingestion Google Cloud Kafka AWS

Data Ingestion: 7 Challenges and 4 Best Practices

Monte Carlo

MARCH 14, 2023

Data ingestion is the process of collecting data from various sources and moving it to your data warehouse or lake for processing and analysis. It is the first step in modern data management workflows. Table of Contents What is Data Ingestion?

Data Ingestion

Data Ingestion Data Warehouse Lambda Architecture Raw Data

What is Real-time Data Ingestion? Use cases, Tools, Infrastructure

Knowledge Hut

JULY 3, 2023

Conventional batch processing techniques seem incomplete in fulfilling the demand of driving the commercial environment. This is where real-time data ingestion comes into the picture. Data is collected from various sources such as social media feeds, website interactions, log files and processing.

Data Ingestion

Data Ingestion Google Cloud Pipeline-centric Media

Data Engineering Weekly #213

Data Engineering Weekly

MARCH 23, 2025

The author emphasizes the importance of mastering state management, understanding "local first" data processing (prioritizing single-node solutions before distributed systems), and leveraging an asset graph approach for data pipelines. and then to Nuage 3.0, The article highlights Nuage 3.0's

Data Engineering

Data Engineering Data Engineer Engineering Data

Data Engineering Weekly #217

Data Engineering Weekly

APRIL 20, 2025

It makes me think, what could the impact of a similar system design be in a Lakehouse architecture? We all know that data freshness plays a critical role in the performance of Lakehouse. Apache Hudi, for example, introduces an indexing technique to Lakehouse.

Data Engineering

Data Engineering Data Engineer Engineering Kafka

Complete Guide to Data Ingestion: Types, Process, and Best Practices

Databand.ai

JULY 19, 2023

Complete Guide to Data Ingestion: Types, Process, and Best Practices Helen Soloveichik July 19, 2023 What Is Data Ingestion? Data Ingestion is the process of obtaining, importing, and processing data for later use or storage in a database.

Data Ingestion

Data Ingestion Process Data Cleanse Data Governance

Snowflake Migration Success Stories: Core Digital Media and NAVEX

Snowflake

OCTOBER 16, 2024

The company quickly realized maintaining 10 years’ worth of production data while enabling real-time data ingestion led to an unscalable situation that would have necessitated a data lake. Data scientists also benefited from a scalable environment to build machine learning models without fear of system crashes.

Digital Media

Digital Media Media Data Lake Data Warehouse

DataOps Architecture: 5 Key Components and How to Get Started

Databand.ai

AUGUST 30, 2023

DataOps Architecture: 5 Key Components and How to Get Started Ryan Yackel August 30, 2023 What Is DataOps Architecture? DataOps is a collaborative approach to data management that combines the agility of DevOps with the power of data analytics. As a result, they can be slow, inefficient, and prone to errors.

Architecture

Architecture Data Ingestion Data Governance Data Cleanse

An Engineering Guide to Data Quality - A Data Contract Perspective - Part 2

Data Engineering Weekly

MAY 16, 2023

In the second part, we will focus on architectural patterns to implement data quality from a data contract perspective. Why is Data Quality Expensive? I won’t bore you with the importance of data quality in the blog. But before doing that, let's revisit some of the basic theories of the data pipeline.

Engineering

Engineering Kafka Data Pipeline Data Warehouse

A Guide to Data Pipelines (And How to Design One From Scratch)

Striim

SEPTEMBER 11, 2024

Data pipelines are the backbone of your business’s data architecture. Implementing a robust and scalable pipeline ensures you can effectively manage, analyze, and organize your growing data. Benjamin Kennedy, Cloud Solutions Architect at Striim, emphasizes the outcome-driven nature of data pipelines.

Data Pipeline

Data Pipeline Designing Data Lake Data Warehouse

On-Prem vs. The Cloud: Key Considerations

phData: Data Engineering

FEBRUARY 21, 2025

On-prem data warehouses can provide lower latency solutions for critical applications that require high performance and low latency. Many companies may choose an on-prem data warehousing solution for quicker data processing to enable business decisions. Data integrations and pipelines can also impact latency.

Cloud

Cloud Data Warehouse Amazon Web Services Data Ingestion

Complete Guide to Data Transformation: Basics to Advanced

Ascend.io

OCTOBER 28, 2024

Tools like Python’s requests library or ETL/ELT tools can facilitate data enrichment by automating the retrieval and merging of external data. Read More: Discover how to build a data pipeline in 6 steps Data Integration Data integration involves combining data from different sources into a single, unified view.

Raw Data

Raw Data Datasets Aggregated Data Data Pipeline

Build Real Time Applications With Operational Simplicity Using Dozer

Data Engineering Podcast

JULY 23, 2023

Summary Real-time data processing has steadily been gaining adoption due to advances in the accessibility of the technologies involved. To bring streaming data in reach of application engineers Matteo Pelati helped to create Dozer. What are the architectural "-ilities" that you are trying to optimize for?

Building

Building Machine Learning SQL Python

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Monte Carlo

JUNE 14, 2023

In this post, we will help you quickly level up your overall knowledge of data pipeline architecture by reviewing: Table of Contents What is data pipeline architecture? Why is data pipeline architecture important? What is data pipeline architecture? Why is data pipeline architecture important?

Data Pipeline

Data Pipeline Architecture Data Lake Data Warehouse

Azure Databricks Architecture Overview

Edureka

AUGUST 19, 2024

The Azure Databricks architecture is designed to become an incredibly robust framework in data analytics on the Microsoft Azure platform. High-level Architecture Conclusion Frequently Asked Questions Azure Databricks simplifies the data engineering and data science workflows.

Architecture

Architecture Data Lake Machine Learning Data Ingestion

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Confluent

FEBRUARY 6, 2019

It allows real-time data ingestion, processing, model deployment and monitoring in a reliable and scalable way. This blog post focuses on how the Kafka ecosystem can help solve the impedance mismatch between data scientists, data engineers and production engineers. So how can the Kafka ecosystem help here?

Machine Learning

Machine Learning Python Kafka Java

Back to the Financial Regulatory Future

Cloudera

FEBRUARY 15, 2024

Seeing the future in a modern data architecture The key to successfully navigating these challenges lies in the adoption of a modern data architecture.

Insurance

Insurance Banking Data Architecture Data Ingestion

Cloudera named a Strong Performer in The Forrester Wave™: Streaming Analytics, Q2 2021

Cloudera

JUNE 7, 2021

It calls out that Cloudera DataFlow “ includes streaming flow and streaming data processing unified with Cloudera Data Platform ”. While we supported multiple streaming engines, we could see that Flink was gaining a lot of traction in the industry and in the community.

Kafka

Kafka Data Ingestion Cloud Architecture

Data Pipeline Architecture: Understanding What Works Best for You

Ascend.io

JULY 28, 2023

As companies become more data-driven, the scope and complexity of data pipelines inevitably expand. Without a well-planned architecture, these pipelines can quickly become unmanageable, often reaching a point where efficiency and transparency take a backseat, leading to operational chaos. What Is Data Pipeline Architecture?

Data Pipeline

Data Pipeline Architecture Lambda Architecture Data Architecture

An End-to-End Open & Modular Architecture for IoT

Cloudera

SEPTEMBER 5, 2018

While the Internet of Things (IoT) represents a significant opportunity, IoT architectures are often rigid, complex to implement, costly, and create a multitude of challenges for organizations. An Open, Modular Architecture for IoT. Key components of the end-to-end architecture.

Architecture

Architecture Machine Learning Cloud Data Security

Apache Ozone Powers Data Science in CDP Private Cloud

Cloudera

AUGUST 26, 2021

While we walk through the steps one by one from data ingestion to analysis, we will also demonstrate how Ozone can serve as an ‘S3’ compatible object store. Learn more about the impacts of global data sharing in this blog, The Ethics of Data Exchange. Data ingestion through ‘s3’. Ozone Namespace Overview.

Data Science

Data Science Cloud Hadoop Metadata

Data Pipeline- Definition, Architecture, Examples, and Use Cases

ProjectPro

DECEMBER 7, 2021

Data pipelines are a significant part of the big data domain, and every professional working or willing to work in this field must have extensive knowledge of them. As data is expanding exponentially, organizations struggle to harness digital information's power for different business use cases. What is a Big Data Pipeline?

Data Pipeline

Data Pipeline Architecture Kafka AWS

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

AltexSoft

AUGUST 29, 2023

Data lakes emerged as expansive reservoirs where raw data in its most natural state could commingle freely, offering unprecedented flexibility and scalability. This article explains what a data lake is, its architecture, and diverse use cases. Data warehouse vs. data lake in a nutshell.

Data Lake

Data Lake Architecture IT Amazon Web Services

How to learn data engineering

Christophe Blefari

JANUARY 20, 2024

He wrote some years ago 3 articles defining data engineering field. Some concepts When doing data engineering you can touch a lot of different concepts. The main difference between both is the fact that your computation resides in your warehouse with SQL rather than outside with a programming language loading data in memory.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

Evaluating Data Observability Tools: A Comprehensive Guide

Data Engineering Weekly

SEPTEMBER 18, 2024

The Rise of Data Observability Data observability has become increasingly critical as companies seek greater visibility into their data processes. This growing demand has found a natural synergy with the rise of the data lake.

Data Lake

Data Lake Data Pipeline Unstructured Data Data

Snowflake and the Pursuit Of Precision Medicine

Snowflake

NOVEMBER 29, 2023

Figure 2: Questions answered by precision medicine Snowflake and FAIR in the world of precision medicine and biomedical research Cloud-based big data technologies are not new for large-scale data processing. A conceptual architecture illustrating this is shown in Figure 3.

Metadata

Metadata Healthcare Medical Data Storage

Anecdotes AI Accelerates Time to Market with Efficient Large-Scale Compliance Data Processing in Snowflake

Snowflake

JULY 18, 2023

Data infrastructure that makes light work of complex tasks Built as a connected application from day one, the anecdotes Compliance OS uses the Snowflake Data Cloud for data ingestion and modeling, including a single cybersecurity data lake where all data can be analyzed within Snowflake.

Data Process

Data Process Process Data Lake Data Ingestion

Unify your data: AI and Analytics in an Open Lakehouse

Cloudera

MAY 30, 2024

Here’s what implementing an open data lakehouse with Cloudera delivers: Integration of Data Lake and Data Warehouse : An open data lakehouse brings together the best of both worlds by integrating the storage flexibility of a data lake with the query performance and structured querying capabilities of a data warehouse.

Data Lake

Data Lake Data Warehouse Programming Language Data Ingestion

Introducing Snowflake Notebooks, an End-to-End Interactive Environment for Data & AI Teams

Snowflake

JUNE 6, 2024

Schedule data ingestion, processing, model training and insight generation to enhance efficiency and consistency in your data processes. ” —Venky Yerneni, Manager, Solution Architecture, Weights & Biases Note: Snowflake Notebooks currently supports Python 3.9 with future updates coming soon.

SQL

SQL Python Machine Learning Data Workflow

A Beginner’s Guide to Learning PySpark for Big Data Processing

ProjectPro

JANUARY 25, 2022

Features of PySpark The PySpark Architecture Popular PySpark Libraries PySpark Projects to Practice in 2022 Wrapping Up FAQs Is PySpark easy to learn? Here’s What You Need to Know About PySpark This blog will take you through the basics of PySpark, the PySpark architecture, and a few popular PySpark libraries , among other things.

Big Data

Big Data Data Process Process Kafka

A Beginners Guide to Spark Streaming Architecture with Example

ProjectPro

DECEMBER 28, 2021

Whether you're working with semi-structured, structured, streaming, or machine learning data, Apache Spark is a fast, easy-to-use framework that allows you to solve various complex data issues. Many traditional stream processing systems use a continuous operator model to process data.

Architecture

Architecture Kafka Java Scala

Revolutionizing Real-Time Streaming Processing: 4 Trillion Events Daily at LinkedIn

LinkedIn Engineering

OCTOBER 19, 2023

Authors: Bingfeng Xia and Xinyu Liu Background At LinkedIn, Apache Beam plays a pivotal role in stream processing infrastructures that process over 4 trillion events daily through more than 3,000 pipelines across multiple production data centers.

Process

Process Lambda Architecture Kafka Machine Learning

Internet of Things (IoT) and Event Streaming at Scale with Apache Kafka and MQTT

Confluent

OCTOBER 10, 2019

Most scenarios require a reliable, scalable, and secure end-to-end integration that enables bidirectional communication and data processing in real time. Let’s now take a look at the 10,000-foot view of a robust IoT integration architecture. End-to-end enterprise integration architecture. Inability to reprocess of events.

Kafka

Kafka Google Cloud Architecture Machine Learning

Real-Time Analytics and Monitoring Dashboards with Apache Kafka and Rockset

Confluent

SEPTEMBER 26, 2019

In the early days, many companies simply used Apache Kafka ® for data ingestion into Hadoop or another data lake. Apache Kafka is an event streaming platform that combines messages, storage, and data processing. Kafka Connect is a core component in event streaming architecture.

Kafka

Kafka SQL BI Hadoop

Data Pipeline Observability: A Model For Data Engineers

Databand.ai

JUNE 28, 2023

They’re betting their business on it and that the data pipelines that run it will continue to work. Context is crucial (and often lacking) A major cause of data quality issues and pipeline failures are transformations within those pipelines. Most data architecture today is opaque—you can’t tell what’s happening inside.

Data Pipeline

Data Pipeline Data Engineering Data Engineer Engineering

Turning Streams Into Data Products

Cloudera

JUNE 16, 2022

Use cases like fraud detection, network threat analysis, manufacturing intelligence, commerce optimization, real-time offers, instantaneous loan approvals, and more are now possible by moving the data processing components up the stream to address these real-time needs. . Faster data ingestion: streaming ingestion pipelines.

Kafka

Kafka Manufacturing Data Lake SQL

Why Modern Data Engineering is the Backbone of AI-Driven Businesses

RandomTrees

MAY 6, 2025

Efficient data pipelines are necessary for AI systems to perform well since AI models need clean and organized as well as fresh datasets in order to learn and predict accurately. Au tomation in modern data engineering has a new dimension. It ensures a seamless flow of data within the pipelines with minimum human contact.

Data Engineering

Data Engineering Data Engineer Engineering Data Cleanse

Snowflake Cortex AI Continues to Advance Enterprise AI with No-Code Development, Serverless Fine-Tuning and Managed Services to Build Chat-with-Data Applications

Snowflake

JUNE 5, 2024

Comparison of Snowflake Copilot and Cortex Analyst Cortex Search: Deliver efficient and accurate enterprise-grade document search and chatbots Cortex Search is a fully managed search solution that offers a rich set of capabilities to index and query unstructured data and documents. Our state-of-the-art hybrid search enables better results.

Coding

Coding Building Management Government

The Rise of the Data Engineer

Maxime Beauchemin

JANUARY 20, 2017

The fact that ETL tools evolved to expose graphical interfaces seems like a detour in the history of data processing, and would certainly make for an interesting blog post of its own. Sure, there’s a need to abstract the complexity of data processing, computation and storage.

Data Engineering

Data Engineering Data Engineer Engineering ETL Tools

The Race For Data Quality in a Medallion Architecture

Why Open Table Format Architecture is Essential for Modern Data Systems

Webinars

Trending Sources

Last Mile Data Processing with Ray

Webinars

How to Design a Modern, Robust Data Ingestion Architecture

Introducing Compute-Compute Separation for Real-Time Analytics

8 Data Ingestion Tools (Quick Reference Guide)

Data Ingestion: 7 Challenges and 4 Best Practices

What is Real-time Data Ingestion? Use cases, Tools, Infrastructure

Data Engineering Weekly #213

Data Engineering Weekly #217

Complete Guide to Data Ingestion: Types, Process, and Best Practices

Snowflake Migration Success Stories: Core Digital Media and NAVEX

DataOps Architecture: 5 Key Components and How to Get Started

An Engineering Guide to Data Quality - A Data Contract Perspective - Part 2

A Guide to Data Pipelines (And How to Design One From Scratch)

On-Prem vs. The Cloud: Key Considerations

Complete Guide to Data Transformation: Basics to Advanced

Build Real Time Applications With Operational Simplicity Using Dozer

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Azure Databricks Architecture Overview

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Back to the Financial Regulatory Future

Cloudera named a Strong Performer in The Forrester Wave™: Streaming Analytics, Q2 2021

Data Pipeline Architecture: Understanding What Works Best for You

An End-to-End Open & Modular Architecture for IoT

Apache Ozone Powers Data Science in CDP Private Cloud

Data Pipeline- Definition, Architecture, Examples, and Use Cases

Data Lake Explained: A Comprehensive Guide to Its Architecture and Use Cases

How to learn data engineering

Evaluating Data Observability Tools: A Comprehensive Guide

Snowflake and the Pursuit Of Precision Medicine

Anecdotes AI Accelerates Time to Market with Efficient Large-Scale Compliance Data Processing in Snowflake

Unify your data: AI and Analytics in an Open Lakehouse

Introducing Snowflake Notebooks, an End-to-End Interactive Environment for Data & AI Teams

A Beginner’s Guide to Learning PySpark for Big Data Processing

A Beginners Guide to Spark Streaming Architecture with Example

Revolutionizing Real-Time Streaming Processing: 4 Trillion Events Daily at LinkedIn

Internet of Things (IoT) and Event Streaming at Scale with Apache Kafka and MQTT

Real-Time Analytics and Monitoring Dashboards with Apache Kafka and Rockset

Data Pipeline Observability: A Model For Data Engineers

Turning Streams Into Data Products

Why Modern Data Engineering is the Backbone of AI-Driven Businesses

Snowflake Cortex AI Continues to Advance Enterprise AI with No-Code Development, Serverless Fine-Tuning and Managed Services to Build Chat-with-Data Applications

The Rise of the Data Engineer

Stay Connected