Sat.Sep 16, 2023 - Fri.Sep 22, 2023

article thumbnail

Top 20 Data Engineering Project Ideas [With Source Code]

Analytics Vidhya

Data engineering plays a pivotal role in the vast data ecosystem by collecting, transforming, and delivering data essential for analytics, reporting, and machine learning. Aspiring data engineers often seek real-world projects to gain hands-on experience and showcase their expertise. This article presents the top 20 data engineering project ideas with their source code.

article thumbnail

Bun: lessons from disrupting a tech ecosystem

The Pragmatic Engineer

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of four topics in yesterday’s subscriber-only The Pulse issue. To get full newsletters twice a week, subscribe here. Two weeks ago, a JavaScript runtime and toolkit called Bun was released and took the Node.js world by storm. Bun was mostly built by Jared Sumner , a former Stripe engineer, and recipient of the Thiel Fellowship (a grant of $100,000 for young people to drop out of s

Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Airflow XCOM: The Ultimate Guide

Marc Lamberti

Wondering how to share data between tasks? What are XCOMs in Apache Airflow? Well, you are at the right place. In this tutorial, you will learn about XComs in Airflow. What they are, how they work, how you can define them, how to get them, and more. If you checked my course “Apache Airflow: The Hands-On Guide”, Aiflow XCom should not sound unfamiliar.

MySQL 246
article thumbnail

Building Linked Data Products With JSON-LD

Data Engineering Podcast

Summary A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. Linked data technologies provide a means of tightly coupling metadata with raw information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products.

Building 189
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Python in Excel: This Will Change Data Science Forever

KDnuggets

You can now run Python code in Excel to analyze data, build machine learning models, and create visualizations.

Python 159
article thumbnail

Scala as a Junior Developer

Rock the JVM

By Lucas Nouguier Hey everyone, Daniel here. Lucas’ story is shared by lots of beginner Scala developers, which is why I wanted to post it here on the blog. I’ve watched thousands of developers learn Scala from scratch, and, like Lucas, they love it! If you want to learn Scala well and fast, take a look at my Scala Essentials course at Rock the JVM.

Scala 142

More Trending

article thumbnail

Predicting Snow Crab Habitat Using Machine Learning

ArcGIS

In collaboration with NOAA, we used the Presence-Only Prediction (Maxent) tool to predict snow crab habitat under changing climate conditions.

article thumbnail

Ensemble Learning Techniques: A Walkthrough with Random Forests in Python

KDnuggets

A practical walkthrough for random forests in Python.

Python 153
article thumbnail

What is Apache Airflow?

Marc Lamberti

What is Apache Airflow? Perhaps your colleagues or YouTube videos have mentioned it. Maybe your job requires you to use it, but you’re unsure what it is. In this article, you will learn everything about what Airflow is, what it isn’t, and its core concepts and components. But, before answering this question, we need a proper understanding of what an “orchestrator” is.

article thumbnail

Top 20 Data Engineering Project Ideas with Source Code

Analytics Vidhya

Data engineering plays a pivotal role in the vast data ecosystem by collecting, transforming, and delivering data essential for analytics, reporting, and machine learning. Aspiring data engineers often seek real-world projects to gain hands-on experience and showcase their expertise. This article presents the top 20 data engineering project ideas with their source code.

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

What's new on the cloud for data engineers - part 11 (06-09.2023)

Waitingforcode

It's time for another part of "What's new on the cloud for data engineers" Let's see what happened in the last 4 months.

article thumbnail

Getting Started with Scikit-learn in 5 Steps

KDnuggets

This tutorial offers a comprehensive hands-on walkthrough of machine learning with Scikit-learn. Readers will learn key concepts and techniques including data preprocessing, model training and evaluation, hyperparameter tuning, and compiling ensemble models for enhanced performance.

article thumbnail

Airflow DAG: Create your first DAG in 5 minutes

Marc Lamberti

Looking to create your first Airflow DAG? Wondering how to process data in Airflow? What are the steps to code your data pipelines? You’ve come to the right place! At the end of this short tutorial, you will have your first Airflow DAG! You might think starting with Apache Airflow is hard, but it is not. The truth is Airflow has so many features that it can be overwhelming.

article thumbnail

How Edmunds builds a blueprint for generative AI

databricks

This blog post is in collaboration with Greg Rokita, AVP of Technology at Edmunds. Long envisioned as a key milestone in computing, we've.

Building 128
article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

ArcGIS for Nature-Related Assessments

ArcGIS

This Climate Week renews focus on nature. Learn more about how ArcGIS supports nature-related assessments to run sustainable organizations.

122
122
article thumbnail

10 ChatGPT Projects Cheat Sheet

KDnuggets

KDnuggets' latest cheat sheet covers 10 curated hands-on projects to boost data science workflows with ChatGPT across ML, NLP, and full stack dev, including links to full project details.

Project 151
article thumbnail

Machine Learning Made Easy: Q&A with Snowflake Head of Artificial Intelligence and Machine Learning Strategy Ahmad Khan

Snowflake

Why AI has everyone’s attention, what it means for different data roles, and how Alteryx and Snowflake are bringing AI to data use cases There’s a llama on the loose! Well, more specifically, LLaMA (Large Language Model Meta AI), along with other large language models (LLMs) that have suddenly become more open and accessible for everyday applications.

article thumbnail

A Costa Rica journey with a Twist of Pura Vida

databricks

Costa Rica is known for several things, both culturally and ecologically. Among those are biodiversity, coffee, Pura Vida, and most recently a rapidly.

126
126
article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

Building for Inclusivity: The Technical Blueprint of Pinterest’s Multidimensional Diversification

Pinterest Engineering

Pedro Silva | Sr. ML Engineer & Inclusive AI Tech Lead; Bhawna Juneja | Sr. Machine Learning Engineer; Rohan Mahadev | Machine Learning Engineer II; Sujay Khandagale | Machine Learning Engineer II; Abhay Varmaraja | Machine Learning Engineer II Pinterest’s mission as a company is to bring everyone the inspiration to create a life they love. “Everyone” has been the north star for our Inclusive AI and Inclusive Product teams.

Building 109
article thumbnail

Hands-On with Supervised Learning: Linear Regression

KDnuggets

If you're looking for a hands-on experience with a detailed yet beginner-friendly tutorial on implementing Linear Regression using Scikit-learn, you're in for an engaging journey.

article thumbnail

How Leaders of the Modern Marketing Data Stack Differentiate Themselves in a Crowded Market

Snowflake

The marketing technology landscape has exploded in the last decade. With over 11,000 available solutions , an increase of 7,258% over the last 12 years, marketing organizations have never had more tool options to choose from. In this post, we’ll take a look at how leading vendors in the 2023 Modern Marketing Data Stack are differentiating their products in a crowded market. 360-degree customer view broken into 120 data silos As of 2019, the average enterprise used 120 marketing applications.

article thumbnail

Introducing the Support of Lateral Column Alias

databricks

We are thrilled to introduce the support of a new SQL feature in Apache Spark and Databricks: Lateral Column Alias (LCA). This feature.

SQL 122
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Locked by another application using ArcPy and a File geodatabase

ArcGIS

Data management tips and tricks for managing locks in a temporary file geodatabase with automated workflows.

article thumbnail

Hands-On with Unsupervised Learning: K-Means Clustering

KDnuggets

This tutorial provides hands-on experience with the key concepts and implementation of K-Means clustering, a popular unsupervised learning algorithm, for customer segmentation and targeted advertising applications.

Algorithm 149
article thumbnail

ADP Enables Dynamic Benchmarking of Human Capital Management Metrics with Snowflake

Snowflake

ADP provides products, services and experiences that simplify work for more than 1 million clients in 140 countries. Large and small organizations across virtually every industry rely on ADP’s cloud-based human capital management (HCM) solutions to streamline HR, payroll, time, tax and benefits administration. Self-service HCM analytics help ADP’s clients understand workforce trends and benchmark their metrics against aggregated, anonymized data from over 30 million employee records.

article thumbnail

Orchestrating Data Analytics with Databricks Workflows

databricks

For data-driven enterprises, data analysts play a crucial role in extracting insights from data and presenting it in a meaningful way. However, many.

article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Google Pub/Sub to BigQuery the Simple Way

Towards Data Science

A hands-on guide to implementing BigQuery Subscriptions in Pub/Sub for simple message and streaming ingestion Continue reading on Towards Data Science »

article thumbnail

Top 5 Free Alternatives to GPT-4

KDnuggets

Think GPT-4 is a big deal? These Generative AI newbies are already stealing the show!

146
146
article thumbnail

How to Easily Connect Airbyte with Snowflake for Unleashing Data’s Power?

Workfall

Reading Time: 9 minutes Imagine your data as pieces of a complex puzzle scattered across different platforms and formats. Making sense of this scattered information often feels like solving a gigantic puzzle blindfolded. This is where the power of data integration comes into play. If you’ve ever wished for a simplified way to seamlessly connect these puzzle pieces, then you’re in for a treat.

article thumbnail

Unexpected Tools in the Databricks Marketplace to Supercharge Manufacturing Supply Chains

databricks

“Supply chains compete, not companies” — Martin Christopher No two supply chains are identical - the unique combination of products, industries, and geographic locat.

article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m