Data Ingestion, Data Integration and Data Process

Data Ingestion

Data Integration

Data Process

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

The Race For Data Quality In A Medallion Architecture The Medallion architecture pattern is gaining traction among data teams. It is a layered approach to managing and transforming data. By systematically moving data through these layers, the Medallion architecture enhances the data structure in a data lakehouse environment.

Architecture

Architecture Raw Data Pipeline-centric Data Ingestion

Mastering Batch Data Processing with Versatile Data Kit (VDK)

Towards Data Science

NOVEMBER 16, 2023

Data Management A tutorial on how to use VDK to perform batch data processing Photo by Mika Baumeister on Unsplash Versatile Data Ki t (VDK) is an open-source data ingestion and processing framework designed to simplify data management complexities.

Data Process

Data Process Process Raw Data Data

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Comparing Snowflake Data Ingestion Methods with Striim

Striim

NOVEMBER 13, 2023

Introduction In the fast-evolving world of data integration, Striim’s collaboration with Snowflake stands as a beacon of innovation and efficiency. Handling Peak Loads : During times of peak data load, it might be beneficial to temporarily adjust the batch policy to handle the increased load more efficiently.

Data Ingestion

Data Ingestion Utilities Data Integration Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

8 Data Ingestion Tools (Quick Reference Guide)

Monte Carlo

FEBRUARY 20, 2024

At the heart of every data-driven decision is a deceptively simple question: How do you get the right data to the right place at the right time? The growing field of data ingestion tools offers a range of answers, each with implications to ponder.

Data Ingestion

Data Ingestion Google Cloud Kafka AWS

How to Design a Modern, Robust Data Ingestion Architecture

Monte Carlo

MAY 28, 2024

A data ingestion architecture is the technical blueprint that ensures that every pulse of your organization’s data ecosystem brings critical information to where it’s needed most. A typical data ingestion flow. Popular Data Ingestion Tools Choosing the right ingestion technology is key to a successful architecture.

Data Ingestion

Data Ingestion Architecture Designing Hadoop

What is Real-time Data Ingestion? Use cases, Tools, Infrastructure

Knowledge Hut

JULY 3, 2023

Conventional batch processing techniques seem incomplete in fulfilling the demand of driving the commercial environment. This is where real-time data ingestion comes into the picture. Data is collected from various sources such as social media feeds, website interactions, log files and processing.

Data Ingestion

Data Ingestion Google Cloud Pipeline-centric Media

Complete Guide to Data Transformation: Basics to Advanced

Ascend.io

OCTOBER 28, 2024

It is important to note that normalization often overlaps with the data cleaning process, as it helps to ensure consistency in data formats, particularly when dealing with different sources or inconsistent units. Data Validation Data validation ensures that the data meets specific criteria before processing.

Raw Data

Raw Data Datasets Aggregated Data Data Pipeline

Why Meeting Latency Requirements is Crucial to Successful Data Integration + Streaming

Striim

JUNE 6, 2024

For your organization’s data integration and streaming initiatives to succeed, meeting latency requirements is crucial. Low latency, defined by the rapid transmission of data with minimal delay, is essential for maximizing the effectiveness of your data strategy. Here’s what you need to know.

Data Integration

Data Integration Data Ingestion Healthcare Data Pipeline

5 Data Integration Strategies for AI in Real Time

Striim

JUNE 18, 2024

In today’s fast-paced world, staying ahead of the competition requires making decisions informed by the freshest data available — and quickly. That’s where real-time data integration comes into play. What is Real-Time Data Integration + Why is it Important? Why is Real-Time Data Integration Important?

Data Integration

Data Integration Data Lake Retail Healthcare

On-Prem vs. The Cloud: Key Considerations

phData: Data Engineering

FEBRUARY 21, 2025

On-prem data warehouses can provide lower latency solutions for critical applications that require high performance and low latency. Many companies may choose an on-prem data warehousing solution for quicker data processing to enable business decisions. Data integrations and pipelines can also impact latency.

Cloud

Cloud Data Warehouse Amazon Web Services Data Ingestion

Snowflake Migration Success Stories: Core Digital Media and NAVEX

Snowflake

OCTOBER 16, 2024

The company quickly realized maintaining 10 years’ worth of production data while enabling real-time data ingestion led to an unscalable situation that would have necessitated a data lake. Snowflake's separate clusters for ETL, reporting and data science eliminated resource contention.

Digital Media

Digital Media Media Data Lake Data Warehouse

A Guide to Data Pipelines (And How to Design One From Scratch)

Striim

SEPTEMBER 11, 2024

Data Collection/Ingestion The next component in the data pipeline is the ingestion layer, which is responsible for collecting and bringing data into the pipeline. By efficiently handling data ingestion, this component sets the stage for effective data processing and analysis.

Data Pipeline

Data Pipeline Designing Data Lake Data Warehouse

Why Open Table Format Architecture is Essential for Modern Data Systems

phData: Data Engineering

NOVEMBER 8, 2024

Streaming and Real-Time Data Processing As organizations increasingly demand real-time data insights, Open Table Formats offer strong support for streaming data processing, allowing organizations to seamlessly merge real-time and batch data.

Architecture

Architecture Systems Data Lake Google Cloud

Back to the Financial Regulatory Future

Cloudera

FEBRUARY 15, 2024

Improved data accessibility: By providing self-service data access and analytics, modern data architecture empowers business users and data analysts to analyze and visualize data, enabling faster decision-making and response to regulatory requirements.

Insurance

Insurance Banking Data Architecture Data Ingestion

Snowflake and the Pursuit Of Precision Medicine

Snowflake

NOVEMBER 29, 2023

Figure 2: Questions answered by precision medicine Snowflake and FAIR in the world of precision medicine and biomedical research Cloud-based big data technologies are not new for large-scale data processing. A conceptual architecture illustrating this is shown in Figure 3.

Metadata

Metadata Healthcare Medical Data Storage

Announcing the General Availability of Cloudera Flow Management and Cloudera Edge Management

Cloudera

APRIL 15, 2019

While Cloudera Flow Management has been eagerly awaited by our Cloudera customers for use on their existing Cloudera platform clusters, Cloudera Edge Management has generated equal buzz across the industry for the possibilities that it brings to enterprises in their IoT initiatives around edge management and edge data collection.

Management

Management Data Ingestion Data Collection Government

Machine Learning with Python, Jupyter, KSQL and TensorFlow

Confluent

FEBRUARY 6, 2019

It allows real-time data ingestion, processing, model deployment and monitoring in a reliable and scalable way. This blog post focuses on how the Kafka ecosystem can help solve the impedance mismatch between data scientists, data engineers and production engineers. Any option can pair well with Apache Kafka.

Machine Learning

Machine Learning Python Kafka Java

The Rise of the Data Engineer

Maxime Beauchemin

JANUARY 20, 2017

The fact that ETL tools evolved to expose graphical interfaces seems like a detour in the history of data processing, and would certainly make for an interesting blog post of its own. Sure, there’s a need to abstract the complexity of data processing, computation and storage.

Data Engineering

Data Engineering Data Engineer Engineering ETL Tools

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

ProjectPro

FEBRUARY 8, 2023

Do ETL and data integration activities seem complex to you? Read this blog to understand everything about AWS Glue that makes it one of the most popular data integration solutions in the industry. Did you know the global big data market will likely reach $268.4 Businesses are leveraging big data now more than ever.

AWS

AWS Scala Metadata Data Lake

DataOps Architecture: 5 Key Components and How to Get Started

Databand.ai

AUGUST 30, 2023

DataOps is a collaborative approach to data management that combines the agility of DevOps with the power of data analytics. It aims to streamline data ingestion, processing, and analytics by automating and integrating various data workflows.

Architecture

Architecture Data Ingestion Data Governance Data Cleanse

DataOps Tools: Key Capabilities & 5 Tools You Must Know About

Databand.ai

AUGUST 30, 2023

DataOps , short for data operations, is an emerging discipline that focuses on improving the collaboration, integration, and automation of data processes across an organization. Each type of tool plays a specific role in the DataOps process, helping organizations manage and optimize their data pipelines more effectively.

Data Cleanse

Data Cleanse Data Pipeline Data Ingestion Data Validation

The Need For Personalized Data Journeys for Your Data Consumers

DataKitchen

OCTOBER 20, 2023

The Challenge: High Stakes in the Age of Personalized Data Observability The primary challenge stems from the requirement of Data Consumers for personalized monitoring and alerts based on their unique data processing needs. Data Observability platforms often need to deliver this level of customization.

Insurance

Insurance Pharmaceutical Data Data Ingestion

End-to-End Data Pipelines: Hitting Home Runs in Data Strategy

Ascend.io

AUGUST 29, 2023

Similarly , in data, every step of the pipeline, from data ingestion to delivery, plays a pivotal role in delivering impactful results. In this article, we’ll break down the intricacies of an end-to-end data pipeline and highlight its importance in today’s landscape.

Data Pipeline

Data Pipeline Pipeline-centric Database-centric Data Ingestion

Striim Deemed ‘Leader’ and ‘Fast Mover’ by GigaOm Radar Report for Streaming Data Platforms

Striim

JULY 31, 2024

Why Striim Stands Out As detailed in the GigaOm Radar Report, Striim’s unified data integration and streaming service platform excels due to its distributed, in-memory architecture that extensively utilizes SQL for essential operations such as transforming, filtering, enriching, and aggregating data.

Aggregated Data

Aggregated Data Data Ingestion Java Kafka

Snowpipe Alternatives You Should Consider for Your Data Needs

Hevo

JULY 10, 2024

While you can use Snowpipe for straightforward and low-complexity data ingestion into Snowflake, Snowpipe alternatives, like Kafka, Spark, and COPY, provide enhanced capabilities for real-time data processing, scalability, flexibility in data handling, and broader ecosystem integration.

Kafka

Kafka Data Ingestion Data Data Process

Top 20 Azure Data Engineering Projects in 2023 [Source Code]

Knowledge Hut

NOVEMBER 2, 2023

These Azure data engineer projects provide a wonderful opportunity to enhance your data engineering skills, whether you are a beginner, an intermediate-level engineer, or an advanced practitioner. Who is Azure Data Engineer? Azure SQL Database, Azure Data Lake Storage). Azure SQL Database, Azure Data Lake Storage).

Data Engineering

Data Engineering Data Engineer Project Coding

The Power of AI in Precisely Software: Accelerating Efficiency and Empowering Users

Precisely

SEPTEMBER 11, 2023

At Precisely, we recognize the value and potential of AI to help our customers work faster and smarter, and make more powerful, confident decisions grounded in trusted data – supporting our overall mission of unlocking data integrity for organizations of all kinds.

Metadata

Metadata Data Integration Datasets Data Analysis Tools

Data Warehouse vs Big Data

Knowledge Hut

APRIL 23, 2024

It encompasses data from diverse sources such as social media, sensors, logs, and multimedia content. The key characteristics of big data are commonly described as the three V's: volume (large datasets), velocity (high-speed data ingestion), and variety (data in different formats).

Data Warehouse

Data Warehouse Big Data Unstructured Data Hadoop

Creating Value With a Data-Centric Culture: Essential Capabilities to Treat Data as a Product

Ascend.io

JUNE 8, 2023

The Essential Six Capabilities To set the stage for impactful and trustworthy data products in your organization, you need to invest in six foundational capabilities. Data pipelines Data integrity Data lineage Data stewardship Data catalog Data product costing Let’s review each one in detail.

Pipeline-centric

Pipeline-centric Database-centric Data Ingestion Data Pipeline

Modern Data Engineering

Towards Data Science

NOVEMBER 4, 2023

The data engineering landscape is constantly changing but major trends seem to remain the same. How to Become a Data Engineer As a data engineer, I am tasked to design efficient data processes almost every day. It was created by Spotify to manage massive data processing workloads.

Data Engineering

Data Engineering Data Engineer Engineering BI

The Five Use Cases in Data Observability: Mastering Data Production

DataKitchen

MAY 10, 2024

The Five Use Cases in Data Observability: Mastering Data Production (#3) Introduction Managing the production phase of data analytics is a daunting challenge. Overseeing multi-tool, multi-dataset, and multi-hop data processes ensures high-quality outputs. Is the business logic producing correct outcomes?

Raw Data

Raw Data Data Ingestion Datasets Data

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

DataKitchen

SEPTEMBER 21, 2023

L1 is usually the raw, unprocessed data ingested directly from various sources; L2 is an intermediate layer featuring data that has undergone some form of transformation or cleaning; and L3 contains highly processed, optimized, and typically ready for analytics and decision-making processes.

Raw Data

Raw Data Data Business Intelligence Data Engineering

A Definitive Guide to Using BigQuery Efficiently

Towards Data Science

MARCH 5, 2024

The shift towards de-normalization In the realm of database design and management, data normalization and de-normalization are fundamental concepts aimed at optimizing data structures for efficient storage, retrieval, and manipulation. Load data For data ingestion Google Cloud Storage is a pragmatic way to solve the task.

Bytes

Bytes Google Cloud Cloud Storage Utilities

Predictive Analytics in Logistics: Forecasting Demand and Managing Risks

Striim

JULY 10, 2024

Data Collection and Integration: Data is gathered from various sources, including sensor and IoT data, transportation management systems, transactional systems, and external data sources such as economic indicators or traffic data. That’s where Striim came into play.

Management

Management Transportation Machine Learning High Quality Data

15+ Best Data Engineering Tools to Explore in 2023

Knowledge Hut

APRIL 25, 2023

Data modeling: Data engineers should be able to design and develop data models that help represent complex data structures effectively. Data processing: Data engineers should know data processing frameworks like Apache Spark, Hadoop, or Kafka, which help process and analyze data at scale.

Data Engineering

Data Engineering Data Engineer Engineering Google Cloud

Data Ops: Transforming the Way We Handle Data

Ascend.io

JULY 3, 2024

Today, we’ll break down the key benefits, best practices, and implementation strategies to enhance your data workflows with Dataops. DataOps, short for Data Operations, is an emerging discipline that combines data engineering, data integration, and data quality with agile methodologies and DevOps practices.

Data Workflow

Data Workflow Data Pipeline Data Data Engineering

Looking Ahead: The Future of Data Preparation for Generative AI

Data Science Blog: Data Engineering

AUGUST 22, 2024

Real-Time Data Processing Businesses are adopting technologies that can process and analyze data instantly due to the need for real-time insights. Real-time data preparation tools allow companies to react quickly to new information, maintaining a competitive edge in fast-paced industries.

Data Preparation

Data Preparation Transportation High Quality Data Data Science

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

Cloudera

AUGUST 31, 2021

Case Study: Accenture’s Experience on Legacy Data Warehouse Migration into Cloudera with a Health Insurance Company . Due to the high storage cost in the legacy EDW solution, 100% source data capture proved cost-prohibitive – this led to continuing and costly change cycles to load incremental source updates as business requirements changed.

Data Warehouse

Data Warehouse Database-centric Metadata Cloud

Azure Data Engineer Job Description [Roles and Responsibilities]

Knowledge Hut

SEPTEMBER 25, 2023

As an Azure Data Engineer, you will be expected to design, implement, and manage data solutions on the Microsoft Azure cloud platform. You will be in charge of creating and maintaining data pipelines, data storage solutions, data processing, and data integration to enable data-driven decision-making inside a company.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

The Good and the Bad of Databricks Lakehouse Platform

AltexSoft

MARCH 30, 2023

Databricks architecture Databricks provides an ecosystem of tools and services covering the entire analytics process — from data ingestion to training and deploying machine learning models. Besides that, it’s fully compatible with various data ingestion and ETL tools. Databricks two-plane infrastructure.

Scala

Scala Data Lake Machine Learning BI

How to become Azure Data Engineer I Edureka

Edureka

FEBRUARY 7, 2023

An Azure Data Engineer is responsible for designing, implementing, and maintaining data management and data processing systems on the Microsoft Azure cloud platform. They work with large and complex data sets and are responsible for ensuring that data is stored, processed, and secured efficiently and effectively.

Data Engineering

Data Engineering Data Engineer Engineering Programming Language

Data Teams and Their Types of Data Journeys

DataKitchen

OCTOBER 2, 2023

Data Teams and Their Types of Data Journeys In the rapidly evolving landscape of data management and analytics, data teams face various challenges ranging from data ingestion to end-to-end observability. It explores why DataKitchen’s ‘Data Journeys’ capability can solve these challenges.

Data Ingestion

Data Ingestion Data Government Datasets

What is a Data Pipeline (and 7 Must-Have Features of Modern Data Pipelines)

Striim

OCTOBER 11, 2024

While legacy ETL has a slow transformation step, modern ETL platforms, like Striim, have evolved to replace disk-based processing with in-memory processing. This advancement allows for real-time data transformation , enrichment, and analysis, providing faster and more efficient data processing.

Data Pipeline

Data Pipeline MongoDB Unstructured Data Data Lake

Deciphering the Data Enigma: Big Data vs Small Data

Knowledge Hut

APRIL 23, 2024

Big Data vs Small Data: Volume Big Data refers to large volumes of data, typically in the order of terabytes or petabytes. It involves processing and analyzing massive datasets that cannot be managed with traditional data processing techniques. Small Data is collected and processed at a slower pace.

Big Data

Big Data Datasets Data Analysis Media

The Race For Data Quality in a Medallion Architecture

Mastering Batch Data Processing with Versatile Data Kit (VDK)

Webinars

Trending Sources

Comparing Snowflake Data Ingestion Methods with Striim

Webinars

8 Data Ingestion Tools (Quick Reference Guide)

How to Design a Modern, Robust Data Ingestion Architecture

What is Real-time Data Ingestion? Use cases, Tools, Infrastructure

Complete Guide to Data Transformation: Basics to Advanced

Why Meeting Latency Requirements is Crucial to Successful Data Integration + Streaming

5 Data Integration Strategies for AI in Real Time

On-Prem vs. The Cloud: Key Considerations

Snowflake Migration Success Stories: Core Digital Media and NAVEX

A Guide to Data Pipelines (And How to Design One From Scratch)

Why Open Table Format Architecture is Essential for Modern Data Systems

Back to the Financial Regulatory Future

Snowflake and the Pursuit Of Precision Medicine

Announcing the General Availability of Cloudera Flow Management and Cloudera Edge Management

Machine Learning with Python, Jupyter, KSQL and TensorFlow

The Rise of the Data Engineer

AWS Glue-Unleashing the Power of Serverless ETL Effortlessly

DataOps Architecture: 5 Key Components and How to Get Started

DataOps Tools: Key Capabilities & 5 Tools You Must Know About

The Need For Personalized Data Journeys for Your Data Consumers

End-to-End Data Pipelines: Hitting Home Runs in Data Strategy

Striim Deemed ‘Leader’ and ‘Fast Mover’ by GigaOm Radar Report for Streaming Data Platforms

Snowpipe Alternatives You Should Consider for Your Data Needs

Top 20 Azure Data Engineering Projects in 2023 [Source Code]

The Power of AI in Precisely Software: Accelerating Efficiency and Empowering Users

Data Warehouse vs Big Data

Creating Value With a Data-Centric Culture: Essential Capabilities to Treat Data as a Product

Modern Data Engineering

The Five Use Cases in Data Observability: Mastering Data Production

Bridging the Gap: How ‘Data in Place’ and ‘Data in Use’ Define Complete Data Observability

A Definitive Guide to Using BigQuery Efficiently

Predictive Analytics in Logistics: Forecasting Demand and Managing Risks

15+ Best Data Engineering Tools to Explore in 2023

Data Ops: Transforming the Way We Handle Data

Looking Ahead: The Future of Data Preparation for Generative AI

Accenture’s Smart Data Transition Toolkit Now Available for Cloudera Data Platform

Azure Data Engineer Job Description [Roles and Responsibilities]

The Good and the Bad of Databricks Lakehouse Platform

How to become Azure Data Engineer I Edureka

Data Teams and Their Types of Data Journeys

What is a Data Pipeline (and 7 Must-Have Features of Modern Data Pipelines)

Deciphering the Data Enigma: Big Data vs Small Data

Stay Connected