Building and Project - Data Engineering Digest

An educational side project

The Pragmatic Engineer

JUNE 1, 2023

I’d like to share a story about an educational side project which could prove fruitful for a software engineer who’s seeking a new job. Juraj created a systems design explainer on how he built this project, and the technologies used: The systems design diagram for the Rides application The app uses: Node.js

Education

Education Project PostgreSQL Software Engineering

dbt multi-project collaboration

Christophe Blefari

OCTOBER 19, 2023

cross-project dependencies ( credits ) Over the last few years, dbt has become a de facto standard enabling companies to collaborate easily on data transformations. Whatever the number, there will be a critical point at which a single project no longer scale. Cross-project references is a key enabler to data team decentralisation.

Project

Project Finance SQL Government

How to build a data project with step-by-step instructions

Start Data Engineering

SEPTEMBER 18, 2024

Introduction 2. Parts of data engineering 3.1. Requirements 3.1.1. Understand input datasets available 3.1.2. Define what the output dataset will look like 3.1.3. Define SLAs so stakeholders know what to expect 3.1.4. Define checks to ensure the output dataset is usable 3.2. Identify what tool to use to process data 3.3. Data flow architecture 3.

Project

Project Building Datasets Architecture

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Building cost effective data pipelines with Python & DuckDB

Start Data Engineering

MAY 28, 2024

Project demo 3. Building efficient data pipelines with DuckDB 4.1. Introduction 2. Use DuckDB to process data, not for multiple users to access data 4.2. Cost calculation: DuckDB + Ephemeral VMs = dirt cheap data processing 4.3. Processing data less than 100GB? Use DuckDB 4.4.

Data Pipeline

Data Pipeline Python Building Data

The Essential Guide to Building Analytic Applications

What should product managers keep in mind when adding an analytics project to their roadmap? Download this eBook to discover insights from 16 top product experts, and learn what it takes to build a successful application with analytics at its core.

Analytics Application

Kafka to MongoDB: Building a Streamlined Data Pipeline

Analytics Vidhya

FEBRUARY 28, 2024

Introduction Data is fuel for the IT industry and the Data Science Project in today’s online world. We know that streaming data is data that is emitted at high volume […] The post Kafka to MongoDB: Building a Streamlined Data Pipeline appeared first on Analytics Vidhya.

MongoDB

MongoDB Data Pipeline Kafka Building

3 Ways of Building Python Projects using GPT-4o

KDnuggets

AUGUST 7, 2024

Learn about essential AI tools that can help you develop Python projects faster and with fewer bugs using natural language.

Python

Python Project Building

Introducing Accelerator for Machine Learning (ML) Projects: Summarization with Gemini from Vertex AI

Cloudera

DECEMBER 9, 2024

Were thrilled to announce the release of a new Cloudera Accelerator for Machine Learning (ML) Projects (AMP): Summarization with Gemini from Vertex AI . AMPs are all about helping you quickly build performant AI applications. Stay tuned for future AMPs well build using Cloudera AI and Vertex AI.

Machine Learning

Machine Learning Project Banking Accessibility

Building Linked Data Products With JSON-LD

Data Engineering Podcast

SEPTEMBER 17, 2023

Summary A significant amount of time in data engineering is dedicated to building connections and semantic meaning around pieces of information. In this episode Brian Platz explains how JSON-LD can be used as a shared representation of linked data for building semantic data products. Hex brings everything together.

Building

Building SQL BI Python

3 Challenges of Building Complex Dashboards with Open Source Components

Speaker: Ryan MacCarrigan, Founding Principal, LeanStudio

Watch this webinar with Ryan MacCarrigan, Founding Principal of LeanStudio, to learn about key considerations for launching your next analytics project. But what happens when you have a growing user base and additional feature requests?

Coding

Building ETL Pipelines With Generative AI

Data Engineering Podcast

OCTOBER 1, 2023

Now that AI has reached the level of sophistication seen in the various generative models it is being used to build new ETL workflows. In this episode Jay Mishra shares his experiences and insights building ETL pipelines with the help of generative AI. How can you get the best results for your use case?

Building

Building BI SQL Machine Learning

7 Cool Data Science Project Ideas for Beginners

KDnuggets

OCTOBER 8, 2024

Are you a data science beginner looking to build your portfolio? Start working on these projects today.

Data Science

Data Science Project Portfolio Data

Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable

Data Engineering Podcast

OCTOBER 15, 2023

Summary Building streaming applications has gotten substantially easier over the past several years. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. How can you get the best results for your use case?

Process

Process Building SQL BI

Build faster with Buck2: Our open source build system

Engineering at Meta

APRIL 6, 2023

Buck2, our new open source, large-scale build system , is now available on GitHub. Buck2 is an extensible and performant build system written in Rust and designed to make your build experience faster and more efficient. In our internal tests at Meta, we observed that Buck2 completed builds 2x as fast as Buck1.

Building

Building Systems Java Coding

How to Find and Test Assumptions in Product Development

Watch this webinar with Laura Klein, product manager and author of Build Better Products, to learn how to spot the unconscious assumptions which you’re basing decisions on and guidelines for validating (or invalidating) your ideas. You'll learn: Why every product leader goes into a new project with untested, hidden assumptions.

Project

Unlocking Your dbt Projects With Practical Advice For Practitioners

Data Engineering Podcast

NOVEMBER 19, 2023

Summary The dbt project has become overwhelmingly popular across analytics and data engineering teams. Dustin Dorsey and Cameron Cyr co-authored a practical guide to building your dbt project. In this episode they share their hard-won wisdom about how to build and scale your dbt projects.

Project

Project Data Lake SQL High Quality Data

Developing Robust ETL Pipelines for Data Science Projects

KDnuggets

NOVEMBER 15, 2024

In this article, we’ll look at how to build ETL pipelines for data science projects.

Data Science

Data Science Project Data Building

A Tour Around Buck2, Meta's New Build System

Tweag

JULY 5, 2023

Buck2 is a from-scratch rewrite of Buck , a polyglot, monorepo build system that was developed and used at Meta (Facebook), and shares a few similarities with Bazel. As you may know, the Scalable Builds Group at Tweag has a strong interest in such scalable build systems. invoke build buck2 build //starlark-rust/starlark 6.

Systems

Systems Building Java Programming Language

Building A Data Mesh Platform At PayPal

Data Engineering Podcast

FEBRUARY 26, 2023

Jean-Georges Perrin was tasked with designing a new data platform implementation at PayPal and wound up building a data mesh. It's supposed to make building smarter, faster, and more flexible data infrastructures a breeze. What are the core problems that you were addressing with this project? We feel your pain.

Building

Building Machine Learning Metadata Data Integration

The Definitive Guide to Embedded Analytics

We hope this guide will transform how you build value for your products with embedded analytics.

Building

Building Meta’s GenAI Infrastructure

Engineering at Meta

MARCH 12, 2024

By the end of 2024, we’re aiming to continue to grow our infrastructure build-out that will include 350,000 NVIDIA H100 GPUs as part of a portfolio that will feature compute power equivalent to nearly 600,000 H100s. RSC has accelerated our open and responsible AI research by helping us build our first generation of advanced AI models.

Building

Building Portfolio Utilities Data Storage

7 Projects to Master Data Engineering

KDnuggets

DECEMBER 4, 2024

Learn to build, run, and manage data engineering pipelines both locally and in the cloud using popular tools.

Data Engineering

Data Engineering Data Engineer Engineering Project

LLM Portfolio Projects Ideas to Wow Employers

KDnuggets

JUNE 26, 2024

Build interesting AI projects using LangChain, VectorDB, FastAPI, OpenAI API, Zyte, Ollama, and Hugging Face.

Portfolio

Portfolio Project Building

How to decide on a data project for your portfolio

Start Data Engineering

SEPTEMBER 23, 2024

Steps to decide on a data project to build 2.1. Introduction 2. Objective 2.2. Research 2.2.1. Job description 2.2.2. Potential referral/hiring manager research 2.2.3. Company research 2.3. Data 2.3.1. Dataset Search 2.3.2. Generate fake data 2.4. Outcome 2.4.1. Visualization 2.5. Presentation 3. Conclusion 4. Read these 1.

Portfolio

Portfolio Project Datasets Data

New Study: 2018 State of Embedded Analytics Report

Why do some embedded analytics projects succeed while others fail? We surveyed 500+ application teams embedding analytics to find out which analytics features actually move the needle. Read the 6th annual State of Embedded Analytics Report to discover new best practices. Brought to you by Logi Analytics.

Project

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Data Engineering Podcast

MAY 5, 2024

In this episode he explains his approach to building AI in a more human-like fashion and the emphasis on learning rather than statistical prediction. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines.

Building

Building Data Lake High Quality Data Machine Learning

Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents

dbt Developer Hub

APRIL 20, 2025

We expect that over the coming years, structured data is going to become heavily integrated into AI workflows and that dbt will play a key role in building and provisioning this data. In particular, we expect both Business Intelligence and Data Engineering will be driven by AI operating on top of the context defined in your dbt Projects.

Structured Data

Structured Data SQL BI Project

Flask Python: A Comprehensive Guide to Building Web Applications

Edureka

JANUARY 21, 2025

At first, the Flask project was just an April Fool’s joke. You should use a virtual environment to track your Flask project’s resources. This keeps your project’s requirements separate and prevents them from clashing with those of other projects. Steeper learning curve for larger projects.

Python

Python Building Certification Database

Exploring The Nuances Of Building An Intential Data Culture

Data Engineering Podcast

MARCH 5, 2023

It's supposed to make building smarter, faster, and more flexible data infrastructures a breeze. By bringing all the layers of the data stack together, TimeXtender helps you build data solutions up to 10 times faster and saves you 70-80% on costs. We feel your pain. It ends up being anything but that. We feel your pain.

Building

Building Database Design Machine Learning Metadata

Modern Data Architecture for Embedded Analytics

Every data-driven project calls for a review of your data architecture—and that includes embedded analytics. Download the whitepaper to see the 7 most common approaches to building a high-performance data architecture for embedded analytics.

Data Architecture

Build Your Second Brain One Piece At A Time

Data Engineering Podcast

APRIL 28, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. If you've learned something or tried out a project from the show then tell us about it! Your first 30 days are free!

Building

Building Data Lake High Quality Data Machine Learning

The “10x engineer:" 50 years ago and now

The Pragmatic Engineer

MARCH 12, 2024

The title of the book takes aim at the “myth” that software development can be measured in “man months,” which Brooks disproves in the pages that follow: “Cost [of the software project] does indeed vary as the product of the number of men and the number of months. Progress does not. The toolsmith.

Engineering

Engineering Programming Language Hospitality Programming

Building a Kimball dimensional model with dbt

dbt Developer Hub

APRIL 19, 2023

Part 1: Setup dbt project and database Step 1: Install project dependencies Before you can get started: You must have either DuckDB or PostgreSQL installed. This tutorial aims to solve this by providing the definitive guide to dimensional modeling with dbt. or above installed You must have dbt version 1.3.0

Building

Building PostgreSQL BI Database

Building a GPU Machine vs. Using the GPU Cloud

KDnuggets

NOVEMBER 29, 2023

The article examines the pros and cons of building an on-premise GPU machine versus using a GPU cloud service for projects involving deep learning and artificial intelligence, analyzing factors like cost, performance, operations, and scalability.

Cloud

Cloud Building Deep Learning Project

Why “Build or Buy?” Is the Wrong Question for Analytics

Every time an application team gets caught up in the “build vs buy” debate, it stalls projects and delays time to revenue. Partnering with an analytics development platform gives you the freedom to customize a solution without the risks and long-term costs of building your own. There is a third option.

Building

Building end-to-end security for Messenger

Engineering at Meta

DECEMBER 6, 2023

This is the most significant milestone yet for this project, which began in earnest after Mark Zuckerberg outlined his vision for it in 2019. Throughout the project, we have consulted with a diverse range of external parties to ensure that we’re making the right set of tradeoffs.

Building

Building Designing Consulting Accessibility

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

Data Engineering Podcast

MARCH 31, 2024

Identifying and accounting for those errors is a critical piece of building trust in the organization that your data is accurate and up to date. To bring observability to dbt projects the team at Elementary embedded themselves into the workflow. Dagster offers a new approach to building and running data platforms and data pipelines.

Project

Project Data Lake High Quality Data Data Workflow

How to build a Data Dashboard Prototype with Generative AI

Towards Data Science

JANUARY 27, 2025

How to Build a Data Dashboard Prototype with Generative AI A book reading data visualization withVizro-AI This article is a tutorial that shows how to build a data dashboard to visualize book reading data taken from goodreads.com. Now you can use Vizro-AI to build some charts by iterating text to form effective prompts.

Building

Building Datasets Coding Data

Bun: lessons from disrupting a tech ecosystem

The Pragmatic Engineer

SEPTEMBER 22, 2023

Bun was mostly built by Jared Sumner , a former Stripe engineer, and recipient of the Thiel Fellowship (a grant of $100,000 for young people to drop out of school and build things, founded by venture capitalist, Peter Thiel). I tip my hat to all volunteer open source contributors and maintainers — both for Node, and for other projects.

Programming Language

Programming Language Project Coding Engineering

Building Self Serve Business Intelligence With AI And Semantic Modeling At Zenlytic

Data Engineering Podcast

APRIL 16, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. How have the recent breakthroughs in large language models (LLMs) improved your ability to build features in Zenlytic? Who are the target users?

Business Intelligence

Business Intelligence Building Data Lake BI

Build A Data Lake For Your Security Logs With Scanner

Data Engineering Podcast

JANUARY 28, 2024

For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics.

Data Lake

Data Lake Building High Quality Data AWS

Step-by-Step Tutorial to Building Your First Machine Learning Model

KDnuggets

JUNE 10, 2024

Machine Learning model is an exciting project. Learn how to develop your first model that the company would want to use.

Machine Learning

Machine Learning Building Project

Interesting startup idea: benchmarking cloud platform pricing

The Pragmatic Engineer

OCTOBER 17, 2024

A €150K ($165K) grant, three people, and 10 months to build it. Internal comms: Chat: Slack Coordination / project management: Linear 3. ” Like most startups, Spare Cores also made their own “expensive mistake” while building the product: “We accidentally accumulated a $3,000 bill in 1.5

Cloud

Cloud AWS Metadata Cloud Computing

Build a Strong Portfolio for Data Science Career

KDnuggets

DECEMBER 18, 2024

Build a strong data science portfolio by showcasing technical skills, working on real-world projects, staying active on LinkedIn, and leveraging platforms like GitHub and Kaggle to demonstrate your expertise.

Portfolio

Portfolio Data Science Building Data

An educational side project

dbt multi-project collaboration

Webinars

Trending Sources

How to build a data project with step-by-step instructions

Webinars

Building cost effective data pipelines with Python & DuckDB

The Essential Guide to Building Analytic Applications

Kafka to MongoDB: Building a Streamlined Data Pipeline

3 Ways of Building Python Projects using GPT-4o

Introducing Accelerator for Machine Learning (ML) Projects: Summarization with Gemini from Vertex AI

Building Linked Data Products With JSON-LD

3 Challenges of Building Complex Dashboards with Open Source Components

Building ETL Pipelines With Generative AI

7 Cool Data Science Project Ideas for Beginners

Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable

Build faster with Buck2: Our open source build system

How to Find and Test Assumptions in Product Development

Unlocking Your dbt Projects With Practical Advice For Practitioners

Developing Robust ETL Pipelines for Data Science Projects

A Tour Around Buck2, Meta's New Build System

Building A Data Mesh Platform At PayPal

The Definitive Guide to Embedded Analytics

Building Meta’s GenAI Infrastructure

7 Projects to Master Data Engineering

LLM Portfolio Projects Ideas to Wow Employers

How to decide on a data project for your portfolio

New Study: 2018 State of Embedded Analytics Report

Barking Up The Wrong GPTree: Building Better AI With A Cognitive Approach

Introducing the dbt MCP Server – Bringing Structured Data to AI Workflows and Agents

Flask Python: A Comprehensive Guide to Building Web Applications

Exploring The Nuances Of Building An Intential Data Culture

Modern Data Architecture for Embedded Analytics

Build Your Second Brain One Piece At A Time

The “10x engineer:" 50 years ago and now

Building a Kimball dimensional model with dbt

Building a GPU Machine vs. Using the GPU Cloud

Why “Build or Buy?” Is the Wrong Question for Analytics

Building end-to-end security for Messenger

Adding Anomaly Detection And Observability To Your dbt Projects Is Elementary

How to build a Data Dashboard Prototype with Generative AI

Bun: lessons from disrupting a tech ecosystem

Building Self Serve Business Intelligence With AI And Semantic Modeling At Zenlytic

Build A Data Lake For Your Security Logs With Scanner

Step-by-Step Tutorial to Building Your First Machine Learning Model

Interesting startup idea: benchmarking cloud platform pricing

Build a Strong Portfolio for Data Science Career

Stay Connected