Cloud Storage and Data Pipeline - Data Engineering Digest

Azure Blob Storage: Hidden Gem of Cloud Storage Solutions

ProjectPro

JUNE 6, 2025

Unlock the power of scalable cloud storage with Azure Blob Storage! This Azure Blob Storage tutorial offers everything you need to know to get started with this scalable cloud storage solution. By 2030, the global cloud storage market is likely to be worth USD 490.8

Cloud Storage

Cloud Storage Cloud Unstructured Data Data Lake

Creating a Data Pipeline with Spark, Google Cloud Storage and Big Query

Towards Data Science

MARCH 6, 2023

On-premise and cloud working together to deliver a data product Photo by Toro Tseleng on Unsplash Developing a data pipeline is somewhat similar to playing with lego, you mentalize what needs to be achieved (the data requirements), choose the pieces (software, tools, platforms), and fit them together.

Google Cloud

Google Cloud Cloud Storage Data Pipeline Cloud

Data Pipeline- Definition, Architecture, Examples, and Use Cases

ProjectPro

JUNE 6, 2025

Data pipelines are a significant part of the big data domain, and every professional working or willing to work in this field must have extensive knowledge of them. Table of Contents What is a Data Pipeline? The Importance of a Data Pipeline What is an ETL Data Pipeline?

Data Pipeline

Data Pipeline Architecture Kafka Data Lake

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. If you've learned something or tried out a project from the show then tell us about it!

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

30+ Data Engineering Projects for Beginners in 2025

ProjectPro

JUNE 6, 2025

Key operations include handling missing data, converting timestamps, and categorizing rides by parameters like time of day, trip duration, and location clusters. Store the data in in Google Cloud Storage to ensure scalability and reliability. This architecture showcases a modern, end-to-end cloud analytics workflow.

Data Engineering

Data Engineering Data Engineer Project Engineering

The Alooma Data Pipeline With CTO Yair Weinberger - Episode 33

Data Engineering Podcast

MAY 27, 2018

Your host is Tobias Macey and today I’m interviewing Yair Weinberger about Alooma, a company providing data pipelines as a service Interview Introduction How did you get involved in the area of data management? What is Alooma and what is the origin story? How is the Alooma platform architected?

Data Pipeline

Data Pipeline MongoDB Scala Kafka

Data pipeline asset management with Dataflow

Netflix Tech

FEBRUARY 9, 2022

JAR) form to be executed as part of the user defined data pipeline. data pipeline ?—?a DAG) for the purpose of transforming data using some business logic. Netflix homegrown CLI tool for data pipeline management. This causes the user-managed storage system to be a critical runtime dependency.

Data Pipeline

Data Pipeline Management Scala Cloud Storage

4 Key Patterns to Load Data Into A Data Warehouse

Start Data Engineering

AUGUST 17, 2021

Batch Data Pipelines 1.1 Process => Data Warehouse 1.2 Process => Cloud Storage => Data Warehouse 2. Near Real-Time Data pipelines 2.1 Data Stream => Consumer => Data Warehouse 2.2 Near Real-Time Data pipelines 2.1 Introduction Patterns 1.

Data Warehouse

Data Warehouse Cloud Storage Data Pipeline Data

Stream Rows and Kafka Topics Directly into Snowflake with Snowpipe Streaming

Snowflake

MARCH 2, 2023

Snowflake enables organizations to be data-driven by offering an expansive set of features for creating performant, scalable, and reliable data pipelines that feed dashboards, machine learning models, and applications. But before data can be transformed and served or shared, it must be ingested from source systems.

Kafka

Kafka Data Ingestion Data Pipeline Cloud Storage

Apache Airflow vs. Azure Data Factory -Which is The Best Tool for ETL?

ProjectPro

JUNE 6, 2025

Azure Data Factory Before jumping right into comparing Azure Data Factory vs. Airflow, let us first understand the two tools. Your complex data pipelines may be simply scheduled and executed using the workflow engine Apache Airflow. These users can create data integration workflows without requiring specialized knowledge.

Cloud Storage

Cloud Storage Data Pipeline Big Data Data Integration

The Race For Data Quality in a Medallion Architecture

DataKitchen

NOVEMBER 5, 2024

This foundational layer is a repository for various data types, from transaction logs and sensor data to social media feeds and system logs. By storing data in its native state in cloud storage solutions such as AWS S3, Google Cloud Storage, or Azure ADLS, the Bronze layer preserves the full fidelity of the data.

Architecture

Architecture Raw Data Pipeline-centric Data Ingestion

The Ultimate 101 Guide to Apache Airflow DAGS

ProjectPro

JUNE 6, 2025

Managing complex data pipelines can be challenging, requiring coordination between multiple systems and teams. Apache Airflow DAGs provide a powerful tool for creating and managing data pipelines, streamlining the process of data processing and automation. This is where Apache Airflow DAGs come in.

Data Pipeline

Data Pipeline PostgreSQL Python Database

How to Pull Data from an API, Using AWS Lambda

Start Data Engineering

NOVEMBER 8, 2020

Introduction If you are looking for a simple, cheap data pipeline to pull small amounts of data from a stable API and store it in a cloud storage, then serverless functions are a good choice.

AWS

AWS Cloud Storage Data Pipeline Data

A Data Engineer’s Guide To Real-time Data Ingestion

ProjectPro

JUNE 6, 2025

Storage And Persistence Layer Once processed, the data is stored in this layer. Stream processing engines often have in-memory storage for temporary data, while durable storage solutions like Apache Hadoop, Amazon S3, or Google Cloud Storage serve as repositories for long-term storage of processed data.

Data Ingestion

Data Ingestion Kafka Google Cloud AWS

Google Cloud Pub/Sub: Messaging on The Cloud

ProjectPro

JUNE 6, 2025

You will download the Yelp dataset in JSON format for this project, connect it to the Cloud SDK by connecting to the Cloud storage, which is then connected to the Cloud Composer, and publish the Yelp dataset JSON stream to a PubSub topic. For this project, you will require the COVID-19 Cases.csv dataset from data.world.

Google Cloud

Google Cloud Cloud Cloud Storage Data Ingestion

9 Data Integration Projects For You To Practice in 2025

ProjectPro

JUNE 6, 2025

Source- Building A Serverless Pipeline using AWS CDK and Lambda ETL Data Integration From GCP Cloud Storage Bucket To BigQuery This data integration project will take you on an exciting journey, focusing on extracting, transforming, and loading raw data stored in a Google Cloud Storage (GCS) bucket into BigQuery using Cloud Functions.

Data Integration

Data Integration Project Data Lake PostgreSQL

AWS vs GCP - Which One to Choose in 2025?

ProjectPro

JUNE 6, 2025

Amazon brought innovation in technology and enjoyed a massive head start compared to Google Cloud, Microsoft Azure , and other cloud computing services. It developed and optimized everything from cloud storage, computing, IaaS, and PaaS. AWS S3 and GCP Storage Amazon and Google both have their solution for cloud storage.

AWS

AWS Amazon Web Services Google Cloud Cloud Storage

How to Build a Data Lake?

ProjectPro

JUNE 6, 2025

Data Lake Architecture- Core Foundations Data lake architecture is often built on scalable storage platforms like Hadoop Distributed File System (HDFS) or cloud services like Amazon S3, Azure Data Lake, or Google Cloud Storage. FAQs on Building a Data Lake 1.

Data Lake

Data Lake Building Hadoop Raw Data

Databricks Delta Lake: A Scalable Data Lake Solution

ProjectPro

JUNE 6, 2025

Want to process peta-byte scale data with real-time streaming ingestions rates, build 10 times faster data pipelines with 99.999% reliability, witness 20 x improvement in query performance compared to traditional data lakes, enter the world of Databricks Delta Lake now. What format does Delta lake use for storing data?

Data Lake

Data Lake Data Warehouse Metadata Unstructured Data

What is Apache Iceberg: Features, Architecture & Use Cases

ProjectPro

JUNE 6, 2025

Real-Time Clickstream Analytics with Iceberg and Flink This project focuses on building a real-time analytics pipeline by ingesting clickstream data using Apache Flink and storing it in Iceberg table format on Google Cloud storage. Who owns Apache Iceberg? How does Apache Iceberg Compare to Snowflake?

Architecture

Architecture Data Lake Metadata Cloud Storage

Best Practices for Real-Time Stream Processing

Striim

MARCH 21, 2025

Striim customers often utilize a single streaming source for delivery into Kafka, Cloud Data Warehouses, and cloud storage, simultaneously and in real-time. Building streaming data pipelines shouldnt require custom coding Building data pipelines and working with streaming data should not require custom coding.

Process

Process Kafka Data Warehouse Data Pipeline

How to Become a GCP Data Engineer?

ProjectPro

JUNE 6, 2025

A professional data engineer designs systems to gather and navigate data. Data engineers require strong experience with multiple data storage technologies and frameworks to build data pipelines. Join the Best Data Engineering Course to Learn from Industry Leaders!

Data Engineering

Data Engineering Data Engineer Google Cloud Engineering

Drug Launch Case Study: Amazing Efficiency Using DataOps

DataKitchen

DECEMBER 9, 2024

They opted for Snowflake, a cloud-native data platform ideal for SQL-based analysis. The team landed the data in a Data Lake implemented with cloud storage buckets and then loaded into Snowflake, enabling fast access and smooth integrations with analytical tools.

Pharmaceutical

Pharmaceutical Data Lake Cloud Storage Data Engineering

15 Data Warehouse Project Ideas for Practice with Source Code

ProjectPro

JUNE 6, 2025

It downloads the Yelp dataset in JSON format, connects to Cloud SDK through Cloud storage, and connects to Cloud Composer. Cloud composer and PubSub outputs connect to Google Dataflow using Apache Beam. Lastly, Google Data Studio is used to visualize the data.

Data Warehouse

Data Warehouse Coding Project Google Cloud

How to Transition from ETL Developer to Data Engineer?

ProjectPro

JUNE 6, 2025

To be more specific, ETL developers are responsible for the following tasks: Creating a Data Warehouse - ETL developers create a data warehouse specifically designed to meet the demands of a company after determining the needs. Data engineers are responsible for designing and maintaining data pipelines and infrastructures.

Data Engineering

Data Engineering Data Engineer Engineering ETL Tools

5 Unique Talend ETL Project Ideas To Amp Up Your ETL Game

ProjectPro

JUNE 6, 2025

If you are willing to enter the big data industry and searching for some good Talend projects for resume, you must explore some of the unique Talend ETL projects in this blog. Talend is a popular open-source ETL tool for data integration. Additionally, it enables organizations to use data effectively and make decisions in real-time.

ETL Tools

ETL Tools Project MySQL Banking

Cloudera Data Engineering 2021 Year End Review

Cloudera

DECEMBER 21, 2021

In working with thousands of customers deploying Spark applications, we saw significant challenges with managing Spark as well as automating, delivering, and optimizing secure data pipelines. We wanted to develop a service tailored to the data engineering practitioner built on top of a true enterprise hybrid data service platform.

Data Engineering

Data Engineering Data Engineer Engineering Data Pipeline

How to Become a Google Certified Professional Data Engineer?

ProjectPro

JUNE 6, 2025

As businesses continue to recognize the value of efficient data management, the demand for certified data engineers has surged. These roles typically involve working with large-scale data solutions, implementing data pipelines, and optimizing data architectures for performance and scalability.

Data Engineering

Data Engineering Data Engineer Google Cloud Engineering

What is GCP Dataflow? The Ultimate 2023 Beginner's Guide

ProjectPro

JUNE 6, 2025

Using Vertex AI notebooks, the Dataflow interface enables the development and deployment of data pipelines based on the most recent data science and machine learning (ML) frameworks. GCP Dataflow SQL Dataflow SQL allows the utilization of SQL to develop streaming pipelines directly from the Google BigQuery web user interface.

Google Cloud

Google Cloud Java Data Ingestion SQL

Top 10 Essential Data Engineering Skills

ProjectPro

JUNE 6, 2025

Build, Design, and maintain data architectures using a systematic approach that satisfies business needs. Create high-grade data products by coordinating with engineering, product, data scientists , and business teams. Develop optimized data pipelines and make sure they are executed with high performance.

Data Engineering

Data Engineering Data Engineer Engineering Hadoop

11 Data Engineering Best Practices To Streamline Your Data Workflows

ProjectPro

JUNE 6, 2025

These practices are crucial for building robust and scalable data pipelines, maintaining data quality, and enabling data-driven decision-making. Let us dive into some of the crucial best practices for data engineering that data engineers must implement in their data workflows and projects.

Data Workflow

Data Workflow Data Engineering Data Engineer Data Cleanse

Data Pipeline- Definition, Architecture, Examples, and Use Cases

ProjectPro

DECEMBER 7, 2021

Data pipelines are a significant part of the big data domain, and every professional working or willing to work in this field must have extensive knowledge of them. Table of Contents What is a Data Pipeline? The Importance of a Data Pipeline What is an ETL Data Pipeline?

Data Pipeline

Data Pipeline Architecture Kafka Data Lake

Cloudera Data Platform extends Hybrid Cloud vision support by supporting Google Cloud

Cloudera

MARCH 31, 2021

The combination of these capabilities will allow customers to easily migrate existing data pipelines to GCP or quickly set up new ones that can ingest from a number of existing or new data sources. Google Cloud Storage buckets – in the same subregion as your subnets .

Google Cloud

Google Cloud Cloud Amazon Web Services Cloud Storage

50+ Azure Data Factory Interview Questions and Answers [2025]

ProjectPro

JUNE 6, 2025

Azure Data Factory is a cloud-based, fully managed, serverless ETL and data integration service offered by Microsoft Azure for automating data movement from its native place to, say, a data lake or data warehouse using ETL (extract-transform-load) OR extract-load-transform (ELT). Checkout: [link] 48.

Data Lake

Data Lake Metadata SQL Datasets

ETL vs ELT - What’s the Best Approach for Data Engineering?

ProjectPro

JUNE 6, 2025

The global data analytics market is expected to reach 68.09 Businesses are finding new methods to benefit from data. Data engineering entails building data pipelines for ingesting, modifying, supplying, and sharing data for analysis. Scalability ELT can be highly adaptable when using raw data.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

Google BigQuery: A Game-Changing Data Warehousing Solution

ProjectPro

JUNE 6, 2025

BigQuery enables users to store data in tables, allowing them to quickly and easily access their data. It supports structured and unstructured data, allowing users to work with various formats. BigQuery also supports many data sources, including Google Cloud Storage, Google Drive, and Sheets.

Bytes

Bytes Google Cloud Data Warehouse Cloud Storage

How to Build a 5-Layer Data Stack

Monte Carlo

JULY 19, 2023

Like bean dip and ogres , layers are the building blocks of the modern data stack. Its powerful selection of tooling components combine to create a single synchronized and extensible data platform with each layer serving a unique function of the data pipeline. Let’s dive into it. The content, not the bean dip.

Building

Building Business Intelligence Cloud Storage BI

How To Learn Snowflake Datawarehouse For Beginners?

ProjectPro

JUNE 6, 2025

Gaining familiarity with these cloud environments will benefit anyone who wants to learn Snowflake Datawarehouse. For instance, you can retrieve data from an existing table- Data Loading You must begin by grasping the fundamentals of data loading. Snowflake supports loading data from cloud storage (e.g.,

Data Warehouse

Data Warehouse SQL AWS Big Data

How to Build AI Agents with Phidata?

ProjectPro

JUNE 6, 2025

However, creating and deploying these agents often involves challenges such as managing complex data workflows, integrating machine learning models, and ensuring scalability across operations. It handles unstructured data, integrates external APIs, and manages prompt engineering workflows. Python-based library integrating LLMs (e.g.,

Building

Building Data Workflow Python Data Pipeline

Modern Data Engineering

Towards Data Science

NOVEMBER 4, 2023

I’d like to discuss some popular Data engineering questions: Modern data engineering (DE). Does your DE work well enough to fuel advanced data pipelines and Business intelligence (BI)? Are your data pipelines efficient? and parallel data processing. What is it? ML model training using Airflow.

Data Engineering

Data Engineering Data Engineer Engineering BI

Python for ETL in the Modern Data Stack: The Ultimate Guide

ProjectPro

JUNE 6, 2025

Python for ETL enables data engineers and analysts to automate and manage data pipelines, apply transformations, and handle data integration efficiently. It is a popular choice in data engineering and data analytics. ETL can enhance data by adding information from external sources.

Python

Python ETL Tools Data Warehouse Programming Language

Top 40+ Cloud Computing Projects to Boost Your Cloud Skills

ProjectPro

JUNE 6, 2025

AWS is well-suited for hosting static websites, offering scalable storage with Amazon S3 and enhanced performance through CloudFront. Then, the cloud storage service Amazon S3 will host the website's static files, ensuring high availability and scalability. Use Google Cloud Storage to store and manage the data.

Cloud Computing

Cloud Computing Cloud Project Google Cloud

Aaand the New NiFi Champion is…

Cloudera

JUNE 5, 2023

On May 3, 2023, Cloudera kicked off a contest called “Best in Flow” for NiFi developers to compete to build the best data pipelines. The contest challenged developers to build data pipelines that represent their business use cases using Cloudera DataFlow. On the verge of the release of NiFi 2.0, Congratulations Vince!

Google Cloud

Google Cloud Cloud Storage Data Lake Data Pipeline

How to Build a 5-Layer Data Stack

Towards Data Science

JULY 21, 2023

Like bean dip and ogres , layers are the building blocks of the modern data stack. Its powerful selection of tooling components combine to create a single synchronized and extensible data platform with each layer serving a unique function of the data pipeline. Let’s dive into it. The content, not the bean dip.

Building

Building Business Intelligence BI Cloud Storage

Azure Blob Storage: Hidden Gem of Cloud Storage Solutions

Creating a Data Pipeline with Spark, Google Cloud Storage and Big Query

Webinars

Trending Sources

Data Pipeline- Definition, Architecture, Examples, and Use Cases

Webinars

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

30+ Data Engineering Projects for Beginners in 2025

The Alooma Data Pipeline With CTO Yair Weinberger - Episode 33

Data pipeline asset management with Dataflow

4 Key Patterns to Load Data Into A Data Warehouse

Stream Rows and Kafka Topics Directly into Snowflake with Snowpipe Streaming

Apache Airflow vs. Azure Data Factory -Which is The Best Tool for ETL?

The Race For Data Quality in a Medallion Architecture

The Ultimate 101 Guide to Apache Airflow DAGS

How to Pull Data from an API, Using AWS Lambda

A Data Engineer’s Guide To Real-time Data Ingestion

Google Cloud Pub/Sub: Messaging on The Cloud

9 Data Integration Projects For You To Practice in 2025

AWS vs GCP - Which One to Choose in 2025?

How to Build a Data Lake?

Databricks Delta Lake: A Scalable Data Lake Solution

What is Apache Iceberg: Features, Architecture & Use Cases

Best Practices for Real-Time Stream Processing

How to Become a GCP Data Engineer?

Drug Launch Case Study: Amazing Efficiency Using DataOps

15 Data Warehouse Project Ideas for Practice with Source Code

How to Transition from ETL Developer to Data Engineer?

5 Unique Talend ETL Project Ideas To Amp Up Your ETL Game

Cloudera Data Engineering 2021 Year End Review

How to Become a Google Certified Professional Data Engineer?

What is GCP Dataflow? The Ultimate 2023 Beginner's Guide

Top 10 Essential Data Engineering Skills

11 Data Engineering Best Practices To Streamline Your Data Workflows

Data Pipeline- Definition, Architecture, Examples, and Use Cases

Cloudera Data Platform extends Hybrid Cloud vision support by supporting Google Cloud

50+ Azure Data Factory Interview Questions and Answers [2025]

ETL vs ELT - What’s the Best Approach for Data Engineering?

Google BigQuery: A Game-Changing Data Warehousing Solution

How to Build a 5-Layer Data Stack

How To Learn Snowflake Datawarehouse For Beginners?

How to Build AI Agents with Phidata?

Modern Data Engineering

Python for ETL in the Modern Data Stack: The Ultimate Guide

Top 40+ Cloud Computing Projects to Boost Your Cloud Skills

Aaand the New NiFi Champion is…

How to Build a 5-Layer Data Stack

Stay Connected