Portfolio, Process and Technology - Data Engineering Digest

10 GitHub Awesome Lists for Data Science

KDnuggets

JULY 1, 2025

After Kaggle, this is one of the best sources for free datasets to download and enhance your data science portfolio. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. It includes tutorials, courses, books, and project ideas for all levels.

Data Science

Data Science Machine Learning Telecommunication Portfolio

How to Combine Streamlit, Pandas, and Plotly for Interactive Data Apps

KDnuggets

JUNE 27, 2025

The free tier supports multiple apps and handles reasonable traffic loads, making it perfect for sharing dashboards with colleagues or showcasing your work in a portfolio. He bridges the gap between emerging AI technologies and practical implementation for working professionals.

Data Science

Data Science Machine Learning Datasets Python

10 GitHub Repositories for Mastering Agents and MCPs

KDnuggets

JULY 7, 2025

Both of these technologies are dominating the AI space, and companies are using them to automate repetitive tasks and reduce workforce, as agentic AI can outperform junior-level employees in certain cases. After learning the basics, you can get inspiration from these projects and start building your portfolio.

Machine Learning

Machine Learning Data Science Telecommunication Python

Webinars

Precision in Motion: Why Process Optimization Is the Future of Manufacturing

Airflow Best Practices for ETL/ELT Pipelines

MORE WEBINARS

Mosaic AI Announcements at Data + AI Summit 2025

databricks

JUNE 11, 2025

AI Functions are now up to 3x faster and 4x lower cost than other vendors on large-scale workloads, enabling you to process large-scale data transformations with unprecedented speed. This unified entry point for all your AI services provides centralized governance, usage logging, and control across your entire AI application portfolio.

Entertainment

Entertainment Manufacturing Retail Consulting

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

Speaker: Donna Laquidara-Carr, PhD, LEED AP, Industry Insights Research Director at Dodge Construction Network

Join Donna Laquidara-Carr in this new webinar as she shares the exciting findings of several recent studies that reveal how portfolio owners, construction managers, and contractors are using data to manage risk, create better projects, and help their businesses gain a strategic edge!

Project

A Beginner’s Guide to Learning PySpark for Big Data Processing

ProjectPro

JUNE 6, 2025

One of the most in-demand technical skills these days is analyzing large data sets, and Apache Spark and Python are two of the most widely used technologies to do this. What if you could use both these technologies together? PySpark is used to process real-time data with Kafka and Streaming, and this exhibits low latency.

Big Data

Big Data Data Process Process Kafka

Top 10 Data Engineering Tools You Must Learn in 2025

ProjectPro

JUNE 6, 2025

Data engineers manage that massive amount of data using various data engineering tools, frameworks, and technologies. Data ingestion systems such as Kafka , for example, offer a seamless and quick data ingestion process while also allowing data engineers to locate appropriate data sources, analyze them, and ingest data for further processing.

Data Engineer

Data Engineer Data Engineering Engineering Kafka

Kafka vs RabbitMQ - A Head-to-Head Comparison for 2025

ProjectPro

JUNE 6, 2025

Apache Kafka and RabbitMQ are messaging systems used in distributed computing to handle big data streams– read, write, processing, etc. Since protocol methods (messages) sent are not guaranteed to reach the peer or be successfully processed by it, both publishers and consumers need a mechanism for delivery and processing confirmation.

Kafka

Kafka Java Big Data Architecture

Your Step-by-Step Guide to Become a Data Engineer in 2025

ProjectPro

JUNE 6, 2025

The job of data engineers typically is to bring in raw data from different sources and process it for enterprise-grade applications. Connect with data scientists and create the infrastructure required to identify, design, and deploy internal process improvements. Experience with tools like Snowflake is considered a bonus.

Data Engineer

Data Engineer Data Engineering Engineering Amazon Web Services

5 Real-World AWS Lambda Project Ideas for Practice

ProjectPro

JUNE 6, 2025

5 AWS Lambda Projects For Your Portfolio Check out these five exciting AWS Lambda project ideas to get an overview of the various applications of AWS Lambda as a serverless computing service. Extract sentiment data and a cleaned/processed Twitter comment using Amazon Comprehend.

AWS

AWS Project MySQL Google Cloud

AWS Machine Learning: Your 101 Guide

ProjectPro

JUNE 6, 2025

It’s time for you to step into the exciting world of AWS Machine Learning, where technology meets imagination to create highly innovative data science solutions. By only paying for the processing power when analyzing images, they efficiently manage expenses while achieving accurate vehicle identification.

Machine Learning

Machine Learning AWS Amazon Web Services Deep Learning

15 AWS DevOps Project Ideas to Step Up Your DevOps Game

ProjectPro

JUNE 6, 2025

With AWS DevOps, data scientists and engineers can access a vast range of resources to help them build and deploy complex data processing pipelines, machine learning models, and more. You need to be able to process, analyze, and deliver insights in real-time to keep up with the competition. This is where AWS DevOps comes in.

AWS

AWS Project Medical Deep Learning

A Data Engineer’s Guide to Mastering PySpark UDFs

ProjectPro

JUNE 6, 2025

From the fundamentals to advanced concepts, it covers everything from a step-by-step process of creating PySpark UDFs, demonstrating their seamless integration with SQL , and practical examples to solidify your understanding. As data grows in size and complexity, so does the need for tailored data processing solutions.

SQL

SQL Python Big Data Metadata

The Ultimate Guide to Getting Started with AWS Athena in 2025

ProjectPro

JUNE 6, 2025

Cloud computing is the future, given that the data being produced and processed is increasing exponentially. Say, over time, their technology stack has evolved, and teams in this organization use data sources that are the best fit for the application use. zettabytes in 2020. are stored in a No-SQL database.

AWS

AWS Big Data SQL Raw Data

50+ Data Warehouse Interview Questions and Answers for 2025

ProjectPro

JUNE 6, 2025

It collects data from multiple sources and then processes it into operational systems and data warehouses. It is done so that the loading, processing, and reporting of data do not affect the operating system's performance. OLAP OLTP OLAP Full-form OLTP stands for online transaction processing. What is Business Intelligence?

Data Warehouse

Data Warehouse Data Mining Recruitment Database

How to Design a Data Warehouse-Best Practices and Examples

ProjectPro

JUNE 6, 2025

A survey by TDWI (The Data Warehousing Institute) found that data warehousing is a critical technology for Business Intelligence and data analytics, with 80% of respondents considering it "very important" or "important" to their business intelligence and data analytics initiatives. Plan the ETL process for the data warehouse design.

Data Warehouse

Data Warehouse Designing Metadata Business Intelligence

How to Become a Big Data Developer-A Step-by-Step Guide

ProjectPro

JUNE 6, 2025

A Big Data Developer is a specialized IT professional responsible for designing, implementing, and managing large-scale data processing systems that handle vast amounts of information, often called "big data." What industry is big data developer in? What is a Big Data Developer? Why Choose a Career as a Big Data Developer? Billion by 2026.

Big Data

Big Data Hadoop Scala NoSQL

Hadoop Explained: How does Hadoop work and how to use it?

ProjectPro

JUNE 6, 2025

Little did anyone know, that this research paper would change, how we perceive and process data. And so spawned from this research paper, the big data legend - Hadoop and its capabilities for processing enormous amount of data. Hadoop is like a data warehousing system so its needs a library like MapReduce to actually process the data.

Hadoop

Hadoop IT Big Data Retail

Introducing Recursive Common Table Expressions to Databricks

databricks

JULY 21, 2025

Apply recursive CTEs to tasks like dependency resolution, graph traversal, and nested data processing. At bp Supply Trading and Shipping – Market Risk, understanding portfolio hierarchy reporting across business units is critical for our business to operate efficiently.

Entertainment

Entertainment Manufacturing SQL Retail

Navigating the Future of Australian Superannuation: The Data and AI Imperative

Snowflake

JULY 8, 2025

With rising member expectations, regulatory scrutiny and the continued diversification of asset classes, super funds are evolving their investment strategies, technology stacks and data architectures to remain competitive. No longer niche areas, these investments are now making up nearly half of many asset owners' portfolios.

Portfolio

Portfolio Government Data Governance Unstructured Data

Airflow vs Dagster: Comparing Two Data Orchestration Solutions

ProjectPro

JUNE 6, 2025

Assets represent the outputs of data processing, and workflows are built around the transformation of these assets. This streamlined process enables quick iteration without the need for extensive configuration. Moreover, Dagster's asset-based approach allows for better parallelism, as different assets can be processed independently.

Pipeline-centric

Pipeline-centric Database-centric Data Pipeline Data Workflow

How to Ace Databricks Certified Data Engineer Associate Exam?

ProjectPro

JUNE 6, 2025

Becoming a Databricks Certified Data Engineer Associate is essential for data engineers as Databricks enables data engineers to efficiently process large volumes of data, build complex data pipelines, and leverage cloud-native services for enhanced reliability and cost-effectiveness.

Data Engineer

Data Engineer Data Engineering Engineering Certification

100+ Kafka Interview Questions and Answers for 2025

ProjectPro

JUNE 6, 2025

Kafka Interview Questions Kafka Scenario-based Interview Questions Kafka Interview Questions for Java Developers | Kafka Developer Interview Questions Tricky Kafka Interview Questions Kafka Admin Interview Questions | Kafka Technical Interview Questions Why is Kafka technology significant in the Big Data industry? Easy to scale.

Kafka

Kafka Bytes Big Data Java

From Diligence to Exit: The Critical Role of Data in PE Investments by Colin Eberhardt

Scott Logic

JULY 7, 2025

Image courtesy of Alina Grubnyak Data-Driven Due Diligence and Deal Origination In the early stages of the investment lifecycle, data serves to augment and accelerate the deal sourcing and due diligence processes. Technological advances in data access have played a key role. The first challenge is usually structural.

Portfolio

Portfolio Data Consolidation Finance Raw Data

How to Learn Spark: A Comprehensive Guide

ProjectPro

JUNE 6, 2025

Apache Spark has become a cornerstone technology in the world of big data and analytics. Learning Spark opens up a world of opportunities in data processing, machine learning, and more. Familiarize yourself with concepts like distributed computing, data storage, and data processing frameworks. How to Learn Apache Spark?

Programming Language

Programming Language Scala Hadoop Big Data

Snowflake Architecture and It's Fundamental Concepts

ProjectPro

JUNE 6, 2025

Databricks Snowflake Projects for Practice in 2022 Dive Deeper Into The Snowflake Architecture FAQs on Snowflake Architecture Snowflake Overview and Architecture With Data Explosion, acquiring, processing, and storing large or complicated datasets appears more challenging.

Architecture

Architecture IT Data Warehouse Amazon Web Services

Your 101 Guide to Becoming an ETL Data Engineer in 2025

ProjectPro

JUNE 6, 2025

If you’re searching for a way to tap into this growing field, mastering ETL processes is a critical first step. This process, known as ETL, is critical to ensuring that organizations can effectively manage, analyze, and derive insights from large volumes of data. But what does it take to become an ETL Data Engineer?

Data Engineer

Data Engineer Data Engineering Engineering ETL Tools

30+ Data Engineering Projects for Beginners in 2025

ProjectPro

JUNE 6, 2025

Cloud computing skills, especially in Microsoft Azure, SQL , Python , and expertise in big data technologies like Apache Spark and Hadoop, are highly sought after. Thus, data engineering job applicants must showcase real-world project experience and grasp various data engineering technologies to stand out as a candidate.

Data Engineer

Data Engineer Data Engineering Project Engineering

15 Data Warehouse Project Ideas for Practice with Source Code

ProjectPro

JUNE 6, 2025

The significant roadblocks leading to data warehousing project failures include disconnected data silos, delayed data warehouse loading, time-consuming data preparation processes, a need for additional automation of core data management tasks, inadequate communication between Business Units and Tech Team, etc.

Data Warehouse

Data Warehouse Coding Project Google Cloud

20 Best Open Source Big Data Projects to Contribute on GitHub

ProjectPro

JUNE 6, 2025

Furthermore, excellent open-source contributions can elevate your portfolio and resume to the next level, empowering you to pursue new and promising career avenues in the future. Clickhouse Source: Github Clickhouse is a column-oriented database management system used for the online analytical processing of queries ( also known as OLAP).

Big Data

Big Data Project Metadata Programming Language

7 Best Data Engineering Courses for Cloud Professionals

ProjectPro

JUNE 6, 2025

Gaining such expertise can streamline data processing, ensuring data is readily available for analytics and decision-making. Suppose a cloud professional takes a course focusing on using AWS Glue and Apache Spark for ETL (Extract, Transform, Load) processes. Duration The duration of this self-paced course will be nine weeks.

Data Engineer

Data Engineer Data Engineering Cloud Engineering

30+ AWS Projects Ideas for Beginners to Practice in 2025

ProjectPro

JUNE 6, 2025

Windows Virtual Machine – Deployment This project idea suggests you deploy Windows Virtual Machine with zero instances of security violation in the process. The process of sending the mail to the addresses provided will begin. You can combine numerous technologies to work on the project.

AWS

AWS Project Food Cloud Computing

Generative AI: A Self-Study Roadmap

KDnuggets

JULY 11, 2025

Python Programming : Youll spend significant time working with APIs, processing text and structured data, and building web applications. Advanced Techniques : Chain-of-thought prompting encourages models to show their reasoning process, often improving accuracy on complex problems.

Machine Learning

Machine Learning Data Science Python Datasets

How to Build RAG Pipelines for LLM Projects?

ProjectPro

JUNE 6, 2025

The Retrieval-Augmented Generation (RAG) pipeline is an approach in natural language processing that has gained traction for handling complex information retrieval tasks. Here is how the process of the RAG pipeline looks in action: 1. Chunking/Embedding Generation Chunking and embedding are interrelated processes in the RAG pipeline.

Building

Building Project Metadata Data Ingestion

Sqoop vs. Flume Battle of the Hadoop ETL tools

ProjectPro

JUNE 6, 2025

Apache Hadoop is synonymous with big data for its cost-effectiveness and its attribute of scalability for processing petabytes of data. Big data systems are popular for processing huge amounts of unstructured data from multiple data sources. Data analysis using hadoop is just half the battle won.

ETL Tools

ETL Tools Hadoop Relational Database Unstructured Data

AWS vs GCP - Which One to Choose in 2025?

ProjectPro

JUNE 6, 2025

Cloud Technology has risen in the latter half of the past decade. Amazon and Google are the big bulls in cloud technology, and the battle between AWS and GCP has been raging on for a while. The Google trends graph above shows how the two technologies have increased over the years, with AWS maintaining a significant margin over GCP.

AWS

AWS Amazon Web Services Google Cloud Cloud Storage

How to Transition from ETL Developer to Data Engineer?

ProjectPro

JUNE 6, 2025

In the thought process of making a career transition from ETL developer to data engineer job roles? ETL is a process that involves data extraction, transformation, and loading from multiple sources to a data warehouse, data lake, or another centralized data repository. Python) to automate or modify some processes.

Data Engineer

Data Engineer Data Engineering Engineering ETL Tools

15 ETL Project Ideas for Practice in 2025

ProjectPro

JUNE 6, 2025

Furthermore, data scientists must be familiar with the data sets they will be working on, and for improved data handling, they must have a thorough understanding of the entire ETL process. The transition to cloud-based software services and enhanced ETL pipelines can ease data processing for businesses.

Project

Project Kafka AWS Big Data

SQL for Data Engineering: Success Blueprint for Data Engineers

ProjectPro

JUNE 6, 2025

Despite the buzz surrounding NoSQL , Hadoop , and other big data technologies, SQL remains the most dominant language for data operations among all tech companies. Furthermore, creating reports from data analysis often involves repeating a process; stored procedures help data engineers overcome this challenge.

Data Engineer

Data Engineer Data Engineering SQL Engineering

How To Learn ETL?

ProjectPro

JUNE 6, 2025

While ETL can be complex for massive data sets, there are tools and frameworks to simplify the process. If you are starting your ETL learning journey, here are a few essential steps you must follow- Understand the ETL Process You must begin by understanding the core ETL principles. Similarly, ETL mastery takes time and practice.

ETL Tools

ETL Tools AWS Big Data Data Validation

How To Learn Snowflake Datawarehouse For Beginners?

ProjectPro

JUNE 6, 2025

To make this learning process smoother and more effective, it is crucial to understand the prerequisites, set up your Snowflake environment, and follow a structured step-by-step approach. Knowledge of cloud basics, such as creating accounts, navigating cloud dashboards, and managing cloud resources, will simplify the setup process.

Data Warehouse

Data Warehouse SQL AWS Big Data

How To Build A Batch Data Pipeline?

ProjectPro

JUNE 6, 2025

Building a batch pipeline is essential for processing large volumes of data efficiently and reliably. Batch data pipelines are your ticket to the world of efficient data processing. A batch data pipeline is a structured and automated system designed to process large volumes of data at scheduled intervals or batches.

Data Pipeline

Data Pipeline Building Retail Data Ingestion

7 Tips to Build a Job-Winning Data Engineer Resume in 2025

ProjectPro

JUNE 6, 2025

An applicant tracking system automates and controls the initial phases of the selection process by searching a keyword and rating resumes accordingly. For a data engineer, technical skills should include computer science, database technologies, programming languages, data mining tools, etc. Think of it as a resume score!)

Data Engineer

Data Engineer Data Engineering Recruitment Engineering

9 Data Integration Projects For You To Practice in 2025

ProjectPro

JUNE 6, 2025

The process of merging and integrating data from several sources into a logical, unified view of data is known as data integration. Data integration projects revolve around managing this process. Data integration processes typically involve three stages- extraction, transformation, and loading ( ETL ). data warehouses).

Data Integration

Data Integration Project Data Lake Hospitality

10 GitHub Awesome Lists for Data Science

How to Combine Streamlit, Pandas, and Plotly for Interactive Data Apps

Webinars

Trending Sources

10 GitHub Repositories for Mastering Agents and MCPs

Webinars

Mosaic AI Announcements at Data + AI Summit 2025

The Ultimate Guide To Data-Driven Construction: Optimize Projects, Reduce Risks, & Boost Innovation

A Beginner’s Guide to Learning PySpark for Big Data Processing

Top 10 Data Engineering Tools You Must Learn in 2025

Kafka vs RabbitMQ - A Head-to-Head Comparison for 2025

Your Step-by-Step Guide to Become a Data Engineer in 2025

5 Real-World AWS Lambda Project Ideas for Practice

AWS Machine Learning: Your 101 Guide

15 AWS DevOps Project Ideas to Step Up Your DevOps Game

A Data Engineer’s Guide to Mastering PySpark UDFs

The Ultimate Guide to Getting Started with AWS Athena in 2025

50+ Data Warehouse Interview Questions and Answers for 2025

How to Design a Data Warehouse-Best Practices and Examples

How to Become a Big Data Developer-A Step-by-Step Guide

Hadoop Explained: How does Hadoop work and how to use it?

Introducing Recursive Common Table Expressions to Databricks

Navigating the Future of Australian Superannuation: The Data and AI Imperative

Airflow vs Dagster: Comparing Two Data Orchestration Solutions

How to Ace Databricks Certified Data Engineer Associate Exam?

100+ Kafka Interview Questions and Answers for 2025

From Diligence to Exit: The Critical Role of Data in PE Investments by Colin Eberhardt

How to Learn Spark: A Comprehensive Guide

Snowflake Architecture and It's Fundamental Concepts

Your 101 Guide to Becoming an ETL Data Engineer in 2025

30+ Data Engineering Projects for Beginners in 2025

15 Data Warehouse Project Ideas for Practice with Source Code

20 Best Open Source Big Data Projects to Contribute on GitHub

7 Best Data Engineering Courses for Cloud Professionals

30+ AWS Projects Ideas for Beginners to Practice in 2025

Generative AI: A Self-Study Roadmap

How to Build RAG Pipelines for LLM Projects?

Sqoop vs. Flume Battle of the Hadoop ETL tools

AWS vs GCP - Which One to Choose in 2025?

How to Transition from ETL Developer to Data Engineer?

15 ETL Project Ideas for Practice in 2025

SQL for Data Engineering: Success Blueprint for Data Engineers

How To Learn ETL?

How To Learn Snowflake Datawarehouse For Beginners?

How To Build A Batch Data Pipeline?

7 Tips to Build a Job-Winning Data Engineer Resume in 2025

9 Data Integration Projects For You To Practice in 2025

Stay Connected