Data Lake, Data Workflow and Hadoop - Data Engineering Digest

Data Lake

Data Workflow

Hadoop

Stitching Together Enterprise Analytics With Microsoft Fabric

Data Engineering Podcast

JUNE 23, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. Data lakes in various forms have been gaining significant popularity as a unified interface to an organization's analytics. When is Fabric the wrong choice?

Data Lake

Data Lake High Quality Data Hadoop Machine Learning

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Data Engineering Podcast

MARCH 24, 2024

Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake High Quality Data Hadoop Data Pipeline

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Trending Sources

Start Data Engineering

7 Popular Azure ETL Tools for Data Engineers in 2025

ProjectPro

JUNE 6, 2025

Azure Data Factory 2. Azure Data Lake Storage 7. Azure Logic Apps Azure ETL Best Practices for Big Data Projects Get Your Hands-on Azure ETL Projects with ProjectPro! He explores their collaborative potential in orchestrating, exploring, and analyzing data, shaping a secure and comprehensive data engineering landscape.

ETL Tools

ETL Tools Data Engineering Data Engineer Data Lake

Webinars

What’s New in Apache Airflow® 3.0—And How Will It Reshape Your Data Workflows?

MORE WEBINARS

Modern Customer Data Platform Principles

Data Engineering Podcast

JANUARY 21, 2024

In this episode Tasso Argyros, CEO of ActionIQ, gives a summary of the major epochs in database technologies and how he is applying the capabilities of cloud data warehouses to the challenge of building more comprehensive experiences for end-users through a modern customer data platform (CDP).

Data Lake

Data Lake High Quality Data NoSQL Data Warehouse

30+ Data Engineering Projects for Beginners in 2025

ProjectPro

JUNE 6, 2025

In 2024, the data engineering job market is flourishing, with roles like database administrators and architects projected to grow by 8% and salaries averaging $153,000 annually in the US (as per Glassdoor ). These trends underscore the growing demand and significance of data engineering in driving innovation across industries.

Data Engineering

Data Engineering Data Engineer Project Engineering

7 GCP Data Engineering Tools Every Data Engineer Must Know

ProjectPro

JUNE 6, 2025

If you are willing to gain hands-on experience with Google BigQuery , you must explore the GCP Project to Learn using BigQuery for Exploring Data. Google Cloud Dataproc Dataproc is a fully-managed and scalable Spark and Hadoop Service that supports batch processing, querying, streaming, and machine learning.

Data Engineering

Data Engineering Data Engineer Engineering Google Cloud

50+ Azure Data Factory Interview Questions and Answers [2025]

ProjectPro

JUNE 6, 2025

Azure Data Factory is a cloud-based, fully managed, serverless ETL and data integration service offered by Microsoft Azure for automating data movement from its native place to, say, a data lake or data warehouse using ETL (extract-transform-load) OR extract-load-transform (ELT).

Data Lake

Data Lake Metadata SQL Datasets

9 Data Integration Projects For You To Practice in 2025

ProjectPro

JUNE 6, 2025

You can use several datasets in this project covering various healthcare sources such as patient records, medical imaging data, electronic health records (EHRs), and hospital operational data. You will use Python libraries for data processing and transformation. This project enables you to do just that!

Data Integration

Data Integration Project Data Lake PostgreSQL

Azure Databricks: Streamline Your Data Engineering Workflows

ProjectPro

JUNE 6, 2025

“Unlock the potential of your data with Azure Databricks: a unified analytics platform that combines the power of Apache Spark with the ease of Azure.” ” Azure Databricks is a fully managed service provided by Microsoft that offers the capabilities to create an open data lake house within the Azure cloud environment.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

How To Build A Batch Data Pipeline?

ProjectPro

JUNE 6, 2025

AWS Data Pipeline AWS Data Pipeline is a cloud-based service by Amazon Web Services (AWS) that simplifies the orchestration of data workflows. It offers pre-built connectors for various AWS services, allowing users to seamlessly automate data movement and processing tasks within the AWS ecosystem.

Data Pipeline

Data Pipeline Building Retail Data Ingestion

Your 101 Guide to Becoming an ETL Data Engineer in 2025

ProjectPro

JUNE 6, 2025

Their role includes designing data pipelines, integrating data from multiple sources, and setting up databases and data lakes that can support machine learning and analytics workloads. They work with various tools and frameworks, such as Apache Spark, Hadoop , and cloud services, to manage massive amounts of data.

Data Engineering

Data Engineering Data Engineer Engineering ETL Tools

A Reflection On The Data Ecosystem For The Year 2021

Data Engineering Podcast

JANUARY 1, 2022

In the same way that application performance monitoring ensures reliable software and keeps application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines. Start trusting your data with Monte Carlo today! To what extent do speed benchmarks inform decisions for modern data teams?

Data Warehouse

Data Warehouse Data Lake SQL Hadoop

Python for ETL in the Modern Data Stack: The Ultimate Guide

ProjectPro

JUNE 6, 2025

This can include tasks like data validation, data type conversion, deduplication, and aggregating data from different sources. Data Loading: The transformed data is loaded into a data warehouse or data lake, depending on the architecture of your data ecosystem.

Python

Python ETL Tools Data Warehouse Programming Language

Getting the Most From Your Modern Data Platform: A Three-Phase Approach

Snowflake

JULY 22, 2024

LTIMindtree’s PolarSled Accelerator helps migrate existing legacy systems, such as SAP, Teradata and Hadoop, to Snowflake. top modernizing your data lake with Snowflake, watch our on demand webinar.

Government

Government Cloud Data Data Lake

The Evolution of Table Formats

Monte Carlo

MAY 14, 2024

The “legacy” table formats The data landscape has evolved so quickly that table formats pioneered within the last 25 years are already achieving “legacy” status. It was designed to support high-volume data exchange and compatibility across different system versions, which is essential for streaming architectures such as Apache Kafka.

Data Lake

Data Lake Metadata Hadoop Data Governance

The Modern Data Stack: What It Is, How It Works, Use Cases, and Ways to Implement

AltexSoft

MARCH 14, 2023

Built around a cloud data warehouse, data lake, or data lakehouse. Modern data stack tools are designed to integrate seamlessly with cloud data warehouses such as Redshift, Bigquery, and Snowflake, as well as data lakes or even the child of the first two — a data lakehouse.

IT Data Warehouse Data Governance Data Lake

Top 20 Azure Data Engineering Projects in 2023 [Source Code]

Knowledge Hut

NOVEMBER 2, 2023

Azure Data Ingestion Pipeline Create an Azure Data Factory data ingestion pipeline to extract data from a source (e.g., Azure SQL Database, Azure Data Lake Storage). Data Aggregation Working with a sample of big data allows you to investigate real-time data processing, big data project design, and data flow.

Data Engineering

Data Engineering Data Engineer Coding Project

Top 10 Azure Data Engineer Job Opportunities in 2024 [Career Options]

Knowledge Hut

MARCH 28, 2024

Role Level Intermediate Responsibilities Design and develop data pipelines to ingest, process, and transform data. Implemented and managed data storage solutions using Azure services like Azure SQL Database , Azure Data Lake Storage, and Azure Cosmos DB.

Data Engineering

Data Engineering Data Engineer Engineering Data Warehouse

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Monte Carlo

JUNE 14, 2023

5 Data pipeline architecture designs and their evolution The Hadoop era , roughly 2011 to 2017, arguably ushered in big data processing capabilities to mainstream organizations. Data then, and even today for some organizations, was primarily hosted in on-premises databases with non-scalable storage.

Data Pipeline

Data Pipeline Architecture Data Lake Data Warehouse

Top 25 Data Science Tools To Use in 2024

Knowledge Hut

MAY 23, 2024

Users can also leverage it for generating interactive visualizations over data. It also comes with lots of automation techniques that qualify users to eliminate manual data workflows. It can analyze data in real-time and can perform cluster management. It is much faster than other analytic workload tools like Hadoop.

Data Science

Data Science MongoDB Programming Language BI

Azure Data Engineer (DP-203) Certification Cost in 2023

Knowledge Hut

SEPTEMBER 29, 2023

Why Should You Get an Azure Data Engineer Certification? Becoming an Azure data engineer allows you to seamlessly blend the roles of a data analyst and a data scientist. One of the pivotal responsibilities is managing data workflows and pipelines, a core aspect of a data engineer's role.

Certification

Certification Data Engineering Data Engineer Engineering

DataOps: What Is It, Core Principles, and Tools For Implementation

phData: Data Engineering

JANUARY 3, 2022

phData Cloud Foundation is dedicated to machine learning and data analytics, with prebuilt stacks for a range of analytical tools, including AWS EMR, Airflow, AWS Redshift, AWS DMS, Snowflake, Databricks, Cloudera Hadoop, and more. This required applying transformations and filters to the data for various business units.

IT AWS Software Engineering Software Engineer

10+ Real-Time Azure Project Ideas for Beginners to Practice [2025]

ProjectPro

JUNE 6, 2025

Web Server Log Processing In this project, you'll process web server logs using a combination of Hadoop, Flume, Spark, and Hive on Azure. Starting with setting up an Azure Virtual Machine, you'll install necessary big data tools and configure Flume agents for log data ingestion.

Project

Project Transportation Data Pipeline Datasets

The Good and the Bad of Apache Airflow Pipeline Orchestration

AltexSoft

NOVEMBER 7, 2022

DevOps tasks — for example, creating scheduled backups and restoring data from them. Airflow is especially useful for orchestrating Big Data workflows. Airflow is not a data processing tool by itself but rather an instrument to manage multiple components of data processing. When Airflow won’t work.

PostgreSQL

PostgreSQL Metadata MySQL Python

Stitching Together Enterprise Analytics With Microsoft Fabric

Ship Smarter Not Harder With Declarative And Collaborative Data Orchestration On Dagster+

Webinars

Trending Sources

7 Popular Azure ETL Tools for Data Engineers in 2025

Webinars

Modern Customer Data Platform Principles

30+ Data Engineering Projects for Beginners in 2025

7 GCP Data Engineering Tools Every Data Engineer Must Know

50+ Azure Data Factory Interview Questions and Answers [2025]

9 Data Integration Projects For You To Practice in 2025

Azure Databricks: Streamline Your Data Engineering Workflows

How To Build A Batch Data Pipeline?

Your 101 Guide to Becoming an ETL Data Engineer in 2025

A Reflection On The Data Ecosystem For The Year 2021

Python for ETL in the Modern Data Stack: The Ultimate Guide

Getting the Most From Your Modern Data Platform: A Three-Phase Approach

The Evolution of Table Formats

The Modern Data Stack: What It Is, How It Works, Use Cases, and Ways to Implement

Top 20 Azure Data Engineering Projects in 2023 [Source Code]

Top 10 Azure Data Engineer Job Opportunities in 2024 [Career Options]

Data Pipeline Architecture Explained: 6 Diagrams and Best Practices

Top 25 Data Science Tools To Use in 2024

Azure Data Engineer (DP-203) Certification Cost in 2023

DataOps: What Is It, Core Principles, and Tools For Implementation

10+ Real-Time Azure Project Ideas for Beginners to Practice [2025]

The Good and the Bad of Apache Airflow Pipeline Orchestration

Stay Connected