Top Data Engineering Digest Building Software Engineering Content for September, 2023

September, 2023

Top 20 Data Engineering Project Ideas [With Source Code]

Analytics Vidhya

SEPTEMBER 20, 2023

Data engineering plays a pivotal role in the vast data ecosystem by collecting, transforming, and delivering data essential for analytics, reporting, and machine learning. Aspiring data engineers often seek real-world projects to gain hands-on experience and showcase their expertise. This article presents the top 20 data engineering project ideas with their source code.

Data Engineer

Data Engineer Data Engineering Coding Project

Why are Cloud Development Environments Spiking in Popularity, Now?

The Pragmatic Engineer

SEPTEMBER 26, 2023

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover a fresh industry trends: Cloud Developent Environments — which is analysis full subscribers have received 3 weeks ago.

Cloud

Cloud Software Engineer Software Engineering Cloud Computing

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Airflow XCOM: The Ultimate Guide

Marc Lamberti

SEPTEMBER 22, 2023

Wondering how to share data between tasks? What are XCOMs in Apache Airflow? Well, you are at the right place. In this tutorial, you will learn about XComs in Airflow. What they are, how they work, how you can define them, how to get them, and more. If you checked my course “Apache Airflow: The Hands-On Guide”, Aiflow XCom should not sound unfamiliar.

MySQL

MySQL Data Pipeline Database Python

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

ETL vs. ELT?

Waitingforcode

SEPTEMBER 6, 2023

In our social media and marketing-driven era, it's quite hard to get things right. For me there is one common misconception brought by the Modern Data Stack idea that everything should be now ELT. In fact no, it shouldn't but only can.

Media

Media IT Data

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem

Data Engineering Podcast

SEPTEMBER 10, 2023

Summary Data systems are inherently complex and often require integration of multiple technologies. Orchestrators are centralized utilities that control the execution and sequencing of interdependent operations. This offers a single location for managing visibility and error handling so that data platform engineers can manage complexity. In this episode Nick Schrock, creator of Dagster, shares his perspective on the state of data orchestration technology and its application to help inform its im

BI SQL Machine Learning Data

The Role of DevOps and CI/CD in Data Engineering

Confessions of a Data Guy

SEPTEMBER 9, 2023

In the vast world of data, it’s not just about gathering and analyzing information anymore; it’s also about ensuring that data pipelines, processes, and platforms run seamlessly and efficiently. Nothing screams “why are flying by night,” than coming into a Data Team only to find no tests, no docs, no deployments, no Docker, no nothing. […] The post The Role of DevOps and CI/CD in Data Engineering appeared first on Confessions of a Data Guy.

Data Engineer

Data Engineer Data Engineering Engineering Data Pipeline

Python in Excel: This Will Change Data Science Forever

KDnuggets

SEPTEMBER 18, 2023

You can now run Python code in Excel to analyze data, build machine learning models, and create visualizations.

Python

Python Data Science Machine Learning Data

More Trending

Python in Excel: This Will Change Data Science Forever

KDnuggets

SEPTEMBER 18, 2023

You can now run Python code in Excel to analyze data, build machine learning models, and create visualizations.

Python

Python Data Science Machine Learning Data

Bun: lessons from disrupting a tech ecosystem

The Pragmatic Engineer

SEPTEMBER 22, 2023

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of four topics in yesterday’s subscriber-only The Pulse issue. To get full newsletters twice a week, subscribe here. Two weeks ago, a JavaScript runtime and toolkit called Bun was released and took the Node.js world by storm. Bun was mostly built by Jared Sumner , a former Stripe engineer, and recipient of the Thiel Fellowship (a grant of $100,000 for young people to drop out of s

Programming Language

Programming Language Project Coding Engineering

Best Practices for LLM Evaluation of RAG Applications

databricks

SEPTEMBER 12, 2023

Chatbots are the most widely adopted use case for leveraging the powerful chat and reasoning capabilities of large language models (LLM). The retrieval.

Machine Learning

Machine Learning Engineering

Arbitrary stateful processing in PySpark with applyInPandasWithState

Waitingforcode

SEPTEMBER 27, 2023

It's always a huge pleasure to see the PySpark API covering more and more Scala API features. Starting from Apache Spark 3.4.0 you can even write arbitrary stateful processing jobs! But since the API is a little bit different than the one available on the Scala side, I wanted to take a deeper look.

Process

Process Scala IT

Building Linked Data Products With JSON-LD

Data Engineering Podcast

SEPTEMBER 17, 2023

Summary A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products.

Building

Building SQL BI Python

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

DuckDB + Delta Lake (the new lake house?)

Confessions of a Data Guy

SEPTEMBER 29, 2023

I always leave it to my dear readers and followers to give me pokes in the right direction. Nothing like the teaming masses to set you straight. Recently I was working on my Substack Newsletter, on the topic of Polars + Delta Lake, reading remove files from s3 … I left a question open on […] The post DuckDB + Delta Lake (the new lake house?

Data

Data IT Big Data SQL

5 Free Books to Help You Master Python

KDnuggets

SEPTEMBER 27, 2023

From the basics of Python to clean architecture and more, here are five free books to level up your Python skills.

Python

Python Architecture

Working at a Startup vs in Big Tech

The Pragmatic Engineer

SEPTEMBER 28, 2023

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of four topics in today’s subscriber-only The Pulse issue. To get full newsletters twice a week, subscribe here. Willem Spruijt is a software engineer whom I worked on the same team with at Uber in Amsterdam, building payments systems.

Software Engineer

Software Engineer Software Engineering Engineering Building

Deploy Private LLMs using Databricks Model Serving

databricks

SEPTEMBER 28, 2023

We are excited to announce public preview of GPU and LLM optimization support for Databricks Model Serving! With this launch, you can deploy.

Data Science

Data Science Engineering Data

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Threads: The inside story of Meta’s newest social app

Engineering at Meta

SEPTEMBER 7, 2023

Earlier this year, a small team of engineers at Meta started working on an idea for a new app. It would have all the features people expect from a text-based conversations app, but with one very key, distinctive goal – being an app that would allow people to share their content across multiple platforms. We wanted to build a decentralized (or federated) app that would enable people to post content that is viewable by anyone on other social apps, and vice versa.

Media

Media Engineering Coding Python

Eliminate The Overhead In Your Data Integration With The Open Source dlt Library

Data Engineering Podcast

SEPTEMBER 3, 2023

Summary Cloud data warehouses and the introduction of the ELT paradigm has led to the creation of multiple options for flexible data integration, with a roughly equal distribution of commercial and open source options. The challenge is that most of those options are complex to operate and exist in their own silo. The dlt project was created to eliminate overhead and bring data integration into your full control as a library component of your overall data system.

Data Integration

Data Integration BI SQL Python

Streamlit in Snowflake: Build Python data apps on the Data Cloud

Snowflake

SEPTEMBER 18, 2023

As data continues to become more complex, it is critical to have effective ways to present this information. With the explosion of AI/ML, users want to be able to interact with their data and ML models. However, building such data apps has not been easy. Any data practitioner or product owner will attest to how it takes a lot of steps to build a data app.

Python

Python Building Cloud Amazon Web Services

Top 7 Free Cloud Notebooks for Data Science

KDnuggets

SEPTEMBER 29, 2023

Cloud notebooks are game-changers for data science, providing free access to computing, pre-built environments, collaboration features, and third-party integrations - everything you need to enhance your workflow.

Data Science

Data Science Cloud Data Accessible

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Scala as a Junior Developer

Rock the JVM

SEPTEMBER 17, 2023

By Lucas Nouguier Hey everyone, Daniel here. Lucas’ story is shared by lots of beginner Scala developers, which is why I wanted to post it here on the blog. I’ve watched thousands of developers learn Scala from scratch, and, like Lucas, they love it! If you want to learn Scala well and fast, take a look at my Scala Essentials course at Rock the JVM.

Scala

Scala Programming Coding Java

Introducing MLflow 2.7 with new LLMOps capabilities

databricks

SEPTEMBER 14, 2023

As part of MLflow 2’s support for LLMOps, we are excited to introduce the latest updates to support prompt engineering in MLflow 2.7. A.

Engineering

Engineering Machine Learning

Meta Quest 2: Defense through offense

Engineering at Meta

SEPTEMBER 12, 2023

Meta’s Native Assurance team regularly performs manual code reviews as part of our ongoing commitment to improve the security posture of Meta’s products. In 2021, we discovered a vulnerability in the Meta Quest 2’s Android-based OS that never made it to production but helped us find new ways to improve the security of Meta Quest products. We’re sharing our journey to get arbitrary native code execution in the privileged VR Runtime service on the Meta Quest 2 by exploiting a memory corruption v

Bytes

Bytes Coding Programming Process

Powering Vector Search With Real Time And Incremental Vector Indexes

Data Engineering Podcast

SEPTEMBER 24, 2023

Summary The rapid growth of machine learning, especially large language models, have led to a commensurate growth in the need to store and compare vectors. In this episode Louis Brandy discusses the applications for vector search capabilities both in and outside of AI, as well as the challenges of maintaining real-time indexes of vector data. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles.

SQL

SQL BI Machine Learning Python

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

Cloud

Snowpark ML: The ‘Easy Button’ for Open Source LLM Deployment in Snowflake

Snowflake

SEPTEMBER 5, 2023

Companies want to train and use large language models (LLMs) with their own proprietary data. Open source generative models such as Meta’s Llama 2 are pivotal in making that possible. The next hurdle is finding a platform to harness the power of LLMs. Snowflake lets you apply near-magical generative AI transformations to your data all in Python, with the protection of its out-of-the-box governance and security features.

Medical

Medical Python Government Datasets

Ensemble Learning Techniques: A Walkthrough with Random Forests in Python

KDnuggets

SEPTEMBER 18, 2023

A practical walkthrough for random forests in Python.

Python

Python Machine Learning

Predicting Snow Crab Habitat Using Machine Learning

ArcGIS

SEPTEMBER 19, 2023

In collaboration with NOAA, we used the Presence-Only Prediction (Maxent) tool to predict snow crab habitat under changing climate conditions.

Machine Learning

Getting started with Airflow in 10 mins

Marc Lamberti

SEPTEMBER 29, 2023

At the end of this introduction to Airflow, you will be all set for getting started with Airflow. You will start with the basics, such as what Airflow is and the essential concepts. Then you will set up and run your local development environment using the Astro CLI to create your first data pipeline. I hope you’re getting excited. Fasten your seatbelt, take a deep breath, and let’s go For a complete hands-on introduction to Apache Airflow, here is a 6-hour course at a discount.

Data Pipeline

Data Pipeline Python AWS Project

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

Business Intelligence

Data News — Week 23.38 (late)

Christophe Blefari

SEPTEMBER 25, 2023

Early like my run ( credits ) Hey. This is a super late Data News, I wanted to send it earlier but I was travelling then enjoying time with friends and family. I'm still struggling a bit to write as fast as I would like, but 🤷‍♂️ So, sorry for the late edition and enjoy. Gen AI 🤖 Announcing Microsoft Copilot — Having everything under a common brand is great and Copilot is a great name.

Data

Data Data Warehouse Data Storage Cloud

Top 20 Data Engineering Project Ideas with Source Code

Analytics Vidhya

SEPTEMBER 20, 2023

Data Engineer

Data Engineer Data Engineering Coding Project

Securely Connect to LLMs and Other External Services from Snowpark

Snowflake

SEPTEMBER 7, 2023

Snowpark is the set of libraries and runtimes that enables data engineers, data scientists and developers to build data engineering pipelines, ML workflows, and data applications in Python, Java, and Scala. Functions or procedures written by users in these languages are executed inside of Snowpark’s secure sandbox environment , which runs on the warehouse.

Amazon Web Services

Amazon Web Services AWS Government Python

Build Your Own PandasAI with LlamaIndex

KDnuggets

SEPTEMBER 1, 2023

Learn how to leverage LlamaIndex and GPT-3.5-Turbo to easily add natural language capabilities to Pandas for intuitive data analysis and conversation.

Building

Building Data Analysis Data Python

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

September, 2023

Top 20 Data Engineering Project Ideas [With Source Code]

Why are Cloud Development Environments Spiking in Popularity, Now?

Webinars

Trending Sources

Airflow XCOM: The Ultimate Guide

Webinars

ETL vs. ELT?

A Guide to Debugging Apache Airflow® DAGs

An Overview Of The Sate Of Data Orchestration In An Increasingly Complex Data Ecosystem

The Role of DevOps and CI/CD in Data Engineering

Python in Excel: This Will Change Data Science Forever

Sign up to get articles personalized to your interests!

More Trending

Python in Excel: This Will Change Data Science Forever

Bun: lessons from disrupting a tech ecosystem

Best Practices for LLM Evaluation of RAG Applications

Arbitrary stateful processing in PySpark with applyInPandasWithState

Building Linked Data Products With JSON-LD

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

DuckDB + Delta Lake (the new lake house?)

5 Free Books to Help You Master Python

Working at a Startup vs in Big Tech

Deploy Private LLMs using Databricks Model Serving

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Threads: The inside story of Meta’s newest social app

Eliminate The Overhead In Your Data Integration With The Open Source dlt Library

Streamlit in Snowflake: Build Python data apps on the Data Cloud

Top 7 Free Cloud Notebooks for Data Science

How to Modernize Manufacturing Without Losing Control

Scala as a Junior Developer

Introducing MLflow 2.7 with new LLMOps capabilities

Meta Quest 2: Defense through offense

Powering Vector Search With Real Time And Incremental Vector Indexes

Optimizing The Modern Developer Experience with Coder

Snowpark ML: The ‘Easy Button’ for Open Source LLM Deployment in Snowflake

Ensemble Learning Techniques: A Walkthrough with Random Forests in Python

Predicting Snow Crab Habitat Using Machine Learning

Getting started with Airflow in 10 mins

15 Modern Use Cases for Enterprise Business Intelligence

Data News — Week 23.38 (late)

Top 20 Data Engineering Project Ideas with Source Code

Securely Connect to LLMs and Other External Services from Snowpark

Build Your Own PandasAI with LlamaIndex

The Ultimate Guide to Apache Airflow DAGS

Stay Connected