Data Engineering, Data Pipeline and Python

Building cost effective data pipelines with Python & DuckDB

Start Data Engineering

MAY 28, 2024

Building efficient data pipelines with DuckDB 4.1. Use DuckDB to process data, not for multiple users to access data 4.2. Cost calculation: DuckDB + Ephemeral VMs = dirt cheap data processing 4.3. Processing data less than 100GB? KISS: DuckDB + Python = easy to debug and quick to develop 4.

Data Pipeline

Data Pipeline Python Building Data

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

Snowflake

APRIL 17, 2024

Yet while SQL applications have long served as the gateway to access and manage data, Python has become the language of choice for most data teams, creating a disconnect. Recognizing this shift, Snowflake is taking a Python-first approach to bridge the gap and help users leverage the power of both worlds.

Data Pipeline

Data Pipeline Python Data Engineer Data Engineering

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Data Engineering Podcast

JUNE 25, 2023

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. __init__ covers the Python language, its community, and the innovative ways it is being used.

Data Engineer

Data Engineer Data Engineering Python Engineering

Webinars

Prepare Now: 2025s Must-Know Trends For Product And Data Leaders

Apache Airflow®: The Ultimate Guide to DAG Writing

MORE WEBINARS

PyArrow vs Polars (vs DuckDB) for Data Pipelines.

Confessions of a Data Guy

JULY 24, 2024

We all keep hearing about Arrow this and Arrow that … seems every new tool built today for Data Engineering seems to be at least partly based on Arrow’s in-memory format. So, […] The post PyArrow vs Polars (vs DuckDB) for Data Pipelines. appeared first on Confessions of a Data Guy.

Data Pipeline

Data Pipeline Data Engineer Data Engineering Data

Unpacking The Seven Principles Of Modern Data Pipelines

Data Engineering Podcast

AUGUST 13, 2023

Summary Data pipelines are the core of every data product, ML model, and business intelligence dashboard. The folks at Rivery distilled the seven principles of modern data pipelines that will help you stay out of trouble and be productive with your data. Closing Announcements Thank you for listening!

Data Pipeline

Data Pipeline BI SQL Machine Learning

How to Code a Data Pipeline Python

Hevo

SEPTEMBER 11, 2024

A Data Pipeline is an indispensable part of a data engineering workflow. It enables the extraction, transformation, and storage of data across disparate data sources and ensures that the right data is available at the right time.

Data Pipeline

Data Pipeline Python Coding Data Engineer

Writing memory efficient data pipelines in Python

Start Data Engineering

APRIL 26, 2021

Using distributed frameworks Pros & Cons Conclusion Further reading References Introduction If you are Wondering how to write memory efficient data pipelines in python Working with a dataset that is too large to fit into memory Then this post is for you.

Data Pipeline

Data Pipeline Python Datasets Database

Setup Mage AI with Postgres to Build and Manage Your Data Pipeline

Analytics Vidhya

SEPTEMBER 12, 2024

Introduction Imagine yourself as a data professional tasked with creating an efficient data pipeline to streamline processes and generate real-time information. Sounds challenging, right? That’s where Mage AI comes in to ensure that the lenders operating online gain a competitive edge.

Data Pipeline

Data Pipeline Building Management Data

Data Pipeline Design Patterns - #2. Coding patterns in Python

Start Data Engineering

JANUARY 12, 2023

Singleton, & Object pool patterns Python helpers 1. Introduction Sample project Code design patterns 1. Functional design 2. Factory pattern 3. Strategy pattern 4. Dataclass 3. Context Managers 4. Testing with pytest 5.

Designing

Designing Coding Python Data Pipeline

Data Engineering Weekly #198

Data Engineering Weekly

NOVEMBER 24, 2024

Editor’s Note: Launching Data & Gen-AI courses in 2025 I can’t believe DEW will reach almost its 200th edition soon. What I started as a fun hobby has become one of the top-rated newsletters in the data engineering industry.

Data Engineer

Data Engineer Data Engineering Engineering Insurance

Monitoring Data Quality for Your Big Data Pipelines Made Easy

Analytics Vidhya

NOVEMBER 8, 2023

In the data-driven world […] The post Monitoring Data Quality for Your Big Data Pipelines Made Easy appeared first on Analytics Vidhya. Determine success by the precision of your charts, the equipment’s dependability, and your crew’s expertise. A single mistake, glitch, or slip-up could endanger the trip.

Big Data

Big Data Data Pipeline Data IT

How to learn data engineering

Christophe Blefari

JANUARY 20, 2024

Learn data engineering, all the references ( credits ) This is a special edition of the Data News. But right now I'm in holidays finishing a hiking week in Corsica 🥾 So I wrote this special edition about: how to learn data engineering in 2024. Who are the data engineers?

Data Engineer

Data Engineer Data Engineering Engineering Hadoop

How To Future-Proof Your Data Pipelines

Ascend.io

NOVEMBER 14, 2024

Why Future-Proofing Your Data Pipelines Matters Data has become the backbone of decision-making in businesses across the globe. The ability to harness and analyze data effectively can make or break a company’s competitive edge. Resilience and adaptability are the cornerstones of a future-proof data pipeline.

Data Pipeline

Data Pipeline Amazon Web Services Data Integration Data

Python for Data Engineering

Ascend.io

SEPTEMBER 14, 2023

The rise of data-intensive operations has positioned data engineering at the core of today’s organizations. As the demand to efficiently collect, process, and store data increases, data engineers have started to rely on Python to meet this escalating demand. Why Python for Data Engineering?

Data Engineer

Data Engineer Data Engineering Python Engineering

Building Databricks Data Pipelines 101

Confessions of a Data Guy

MARCH 29, 2024

Have you ever wondered at a high level what it’s like to build production-level data pipelines on Databricks? The post Building Databricks Data Pipelines 101 appeared first on Confessions of a Data Guy. What does it look like, what tools do you use?

Data Pipeline

Data Pipeline Building Data IT

How Data Engineering Teams Power Machine Learning With Feature Platforms

Data Engineering Podcast

JULY 2, 2023

In this episode Razi Raziuddin shares how data engineering teams can support the machine learning workflow through the development and support of systems that empower data scientists and ML engineers to build and maintain their own features. How is this distinct from other forms of data pipeline development and delivery?

Machine Learning

Machine Learning Data Engineer Data Engineering Engineering

Introducing CDP Data Engineering: Purpose Built Tooling For Accelerating Data Pipelines

Cloudera

SEPTEMBER 17, 2020

With growing disparate data across everything from edge devices to individual lines of business needing to be consolidated, curated, and delivered for downstream consumption, it’s no wonder that data engineering has become the most in-demand role across businesses — growing at an estimated rate of 50% year over year.

Data Pipeline

Data Pipeline Data Engineer Data Engineering Engineering

Simplified End-to-End Development for Production-Ready Data Pipelines, Applications, and ML Models

Snowflake

JUNE 4, 2024

Streamlined development across SQL and Python Snowflake now offers data teams a suite of intuitive tools designed to simplify development and accelerate workflows. This suite extends seamlessly across Snowflake’s offerings, including Snowpark, Native Apps, Streamlit and more, for building anything with your data.

Data Pipeline

Data Pipeline Python SQL Government

The No-Panic Guide to Building a Data Engineering Pipeline That Actually Scales

Monte Carlo

NOVEMBER 22, 2024

Your data engineering pipeline started simple: a few CSV exports, some Python scripts, and manual updates every week. Your user base has quintupled, analytics requests are piling up, and that trusty Python script crashes more often than it runs. It means you’re scaling!

Data Engineer

Data Engineer Data Engineering Building Engineering

homegenius Improves Speed and Quality of Data Pipelines with Snowpark for Python

Snowflake

AUGUST 24, 2023

. “The data scientists would spend a week or more working on their models, only to discover issues with the data,” said Noah Goodrich, Principal Data Engineer. ” homegenius’ data challenges homegenius’ data engineering team had three big data challenges it needed to solve, according to Goodrich.

Data Pipeline

Data Pipeline Python Programming Language Data Validation

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

Data Engineering Podcast

SEPTEMBER 11, 2022

Building reliable data pipelines is a complex and costly undertaking with many layered requirements. In order to reduce the amount of time and effort required to build pipelines that power critical insights Manish Jethani co-founded Hevo Data. Data stacks are becoming more and more complex.

Data Pipeline

Data Pipeline Building MongoDB Scala

The Role of DevOps and CI/CD in Data Engineering

Confessions of a Data Guy

SEPTEMBER 9, 2023

In the vast world of data, it’s not just about gathering and analyzing information anymore; it’s also about ensuring that data pipelines, processes, and platforms run seamlessly and efficiently.

Data Engineer

Data Engineer Data Engineering Engineering Data Pipeline

10 Skills to Ace Your Data Engineering Interviews

Start Data Engineering

OCTOBER 11, 2021

Leetcode: data structures and algorithms 4. Data modeling 4.1 Data warehousing 4.2 Data pipelines 6. Introduction Skills 1. Distributed system fundamentals 7. Event streaming 8. System design 9. Business questions 10. Cloud computing 11.

Data Engineer

Data Engineer Data Engineering Engineering Cloud Computing

Data Engineering Weekly #183

Data Engineering Weekly

AUGUST 4, 2024

Take Astro (the fully managed Airflow solution) for a test drive today and unlock a suite of features designed to simplify, optimize, and scale your data pipelines. link] Jason Liu & Eugene Yan: 10 Ways to Be Data Illiterate (and How to Avoid Them) If you ask any executives in an organization, they will say they are data-driven.

Data Engineer

Data Engineer Data Engineering Engineering Data

Delivering Modern Enterprise Data Engineering with Cloudera Data Engineering on Azure

Cloudera

JULY 13, 2021

After the launch of CDP Data Engineering (CDE) on AWS a few months ago, we are thrilled to announce that CDE, the only cloud-native service purpose built for enterprise data engineers, is now available on Microsoft Azure. . Prerequisites for deploying CDP Data Engineering on Azure can be found here.

Data Engineer

Data Engineer Data Engineering Engineering Pipeline-centric

Ready-to-go sample data pipelines with Dataflow

Netflix Tech

DECEMBER 3, 2022

by Jasmine Omeke , Obi-Ike Nwoke , Olek Gorajek Intro This post is for all data practitioners, who are interested in learning about bootstrapping, standardization and automation of batch data pipelines at Netflix. You may remember Dataflow from the post we wrote last year titled Data pipeline asset management with Dataflow.

Data Pipeline

Data Pipeline Scala Metadata Food

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Monte Carlo

OCTOBER 31, 2024

One job that has become increasingly popular across enterprise data teams is the role of the AI data engineer. Demand for AI data engineers has grown rapidly in data-driven organizations. But what does an AI data engineer do? Table of Contents What Does an AI Data Engineer Do?

Data Engineer

Data Engineer Data Engineering Engineering Unstructured Data

Rewiring My Career: How I Transitioned from Electrical Engineering to Data Engineering

Towards Data Science

NOVEMBER 20, 2024

It is worth mentioning that this article comes from an Electrical and Electronic Engineer graduate who went all the way and spent almost 8 years in academia learning about the Energy sector (and when I say all the way, I mean from a bachelor degree to a PhD and postdoc). Similarly, data engineering positions have seen a 98% increase.

Data Engineer

Data Engineer Data Engineering Engineering Electronics

Data Engineering Weekly #173

Data Engineering Weekly

MAY 26, 2024

[link] Tweeq: Tweeq Data Platform: Journey and Lessons Learned: Clickhouse, dbt, Dagster, and Superset Tweeq writes about its journey of building a data platform with cloud-agnostic open-source solutions and some integration challenges. It is refreshing to see an open stack after the Hadoop era.

Data Engineer

Data Engineer Data Engineering Engineering Google Cloud

Effective Pandas Patterns For Data Engineering

Data Engineering Podcast

JANUARY 30, 2022

Summary Pandas is a powerful tool for cleaning, transforming, manipulating, or enriching data, among many other potential uses. As a result it has become a standard tool for data engineers for a wide range of applications. The only thing worse than having bad data is not knowing that you have it.

Data Engineer

Data Engineer Data Engineering Engineering Python

Building a Formula 1 Streaming Data Pipeline With Kafka and Risingwave

KDnuggets

SEPTEMBER 5, 2023

Build a streaming data pipeline using Formula 1 data, Python, Kafka, RisingWave as the streaming database, and visualize all the real-time data in Grafana.

Data Pipeline

Data Pipeline Kafka Building Python

Top 12 Data Engineering Project Ideas [With Source Code]

Knowledge Hut

JUNE 26, 2023

Welcome to the world of data engineering, where the power of big data unfolds. If you're aspiring to be a data engineer and seeking to showcase your skills or gain hands-on experience, you've landed in the right spot. What are Data Engineering Projects?

Data Engineer

Data Engineer Data Engineering Coding Project

Data Engineering Weekly #163

Data Engineering Weekly

MARCH 17, 2024

Compliance is mandatory, with strict penalties for violations, emphasizing the importance of data scientists familiarizing themselves with the law to avoid prohibited AI uses and ensure ethical, safe AI development. It also introduces emerging standards like the Open Data Contract Standard and Data Product Descriptor Specification.

Data Engineer

Data Engineer Data Engineering Engineering Kafka

Tired of your Data Engineering Role?

Towards Data Science

AUGUST 19, 2023

How I made the transition to an analytics engineer Photo by Campaign Creators on Unsplash A few years ago, I was at a point where I was feeling unfulfilled in my career. I had been working in data engineering for three years and the initial excitement of starting in the world of tech had faded.

Data Engineer

Data Engineer Data Engineering Engineering SQL

Data Engineer vs Data Analyst: Key Differences and Similarities

Knowledge Hut

MAY 3, 2023

With companies increasingly relying on data-driven insights to make informed decisions, there has never been a greater need for skilled specialists who can manage and evaluate vast amounts of data. The roles of data analyst and data engineer have emerged as two of the most in-demand professions in today's job market.

Data Engineer

Data Engineer Data Engineering Engineering Data Cleanse

Zenlytic Is Building You A Better Coworker With AI Agents

Data Engineering Podcast

MAY 18, 2024

Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is supported by Code Comments, an original podcast from Red Hat. Data lakes are notoriously complex. __init__ covers the Python language, its community, and the innovative ways it is being used.

Building

Building Data Lake High Quality Data Business Intelligence

Moving Machine Learning Into The Data Pipeline at Cherre

Data Engineering Podcast

APRIL 19, 2021

Summary Most of the time when you think about a data pipeline or ETL job what comes to mind is a purely mechanistic progression of functions that move data from point A to point B. Modern Data teams are dealing with a lot of complexity in their data pipelines and analytical code.

Data Pipeline

Data Pipeline Machine Learning Data Warehouse Datasets

Drafting Your Data Pipelines

Team Data Science

MAY 10, 2020

I'll use Python and Spark because they are the top 2 requested skills in Toronto. Kafka, while not in the top 5 most in demand skills, was still the most requested buffer technology requested which makes it worthwhile to include it.

Data Pipeline

Data Pipeline Data Ingestion AWS Kafka

Being Data Driven At Stripe With Trino And Iceberg

Data Engineering Podcast

JUNE 16, 2024

In this episode Kevin Liu shares some of the interesting features that they have built by combining those technologies, as well as the challenges that they face in supporting the myriad workloads that are thrown at this layer of their data platform. __init__ covers the Python language, its community, and the innovative ways it is being used.

Data Lake

Data Lake High Quality Data Metadata Machine Learning

Automating Data Pipelines in CDP with CDE Managed Airflow Service

Cloudera

AUGUST 17, 2021

When we announced the GA of Cloudera Data Engineering back in September of last year, a key vision we had was to simplify the automation of data transformation pipelines at scale. And by being purely python based, Apache Airflow pipelines are accessible to a wide range of users, with a strong open source community.

Data Pipeline

Data Pipeline Management BI Python

Making Data Pipelines Self-Serve For Everyone With Shipyard

Data Engineering Podcast

JUNE 1, 2021

Summary Every part of the business relies on data, yet only a small team has the context and expertise to build and maintain workflows and data pipelines to transform, clean, and integrate it. RudderStack’s smart customer data pipeline is warehouse-first. Closing Announcements Thank you for listening!

Data Pipeline

Data Pipeline Data Warehouse Data Engineer Data Engineering

What Is Apache Airflow – Data Engineering Consulting

Seattle Data Guy

FEBRUARY 11, 2023

Apache Airflow is a very popular tool that data engineers rely on. Why do data engineers like Airflow? What are… Read more The post What Is Apache Airflow – Data Engineering Consulting appeared first on Seattle Data Guy. Also, what does Apache Airflow event do? What is a DAG?

Consulting

Consulting Data Engineer Data Engineering Engineering

Python Upgrade Playbook

Lyft Engineering

MARCH 6, 2024

DEED In this post, we’ll cover how Lyft upgrades Python at scale — 1500+ repos spanning 150+ teams — and the latest iteration of the tools and strategy we’ve built to optimize both the overall time to upgrade and the work required from our engineers. Python, How Do I L̶o̶v̶e̶ Use Thee? Everything starts with data.

Python

Python Metadata Datasets Coding

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

In this episode Dain Sundstrom, CTO of Starburst, explains how the combination of the Trino query engine and the Iceberg table format offer the ease of use and execution speed of data warehouses with the infinite storage and scalability of data lakes. Data lakes are notoriously complex. Your first 30 days are free!

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

Building cost effective data pipelines with Python & DuckDB

Snowflake’s New Python API Empowers Data Engineers to Build Modern Data Pipelines with Ease

Webinars

Trending Sources

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Webinars

PyArrow vs Polars (vs DuckDB) for Data Pipelines.

Unpacking The Seven Principles Of Modern Data Pipelines

How to Code a Data Pipeline Python

Writing memory efficient data pipelines in Python

Setup Mage AI with Postgres to Build and Manage Your Data Pipeline

Data Pipeline Design Patterns - #2. Coding patterns in Python

Data Engineering Weekly #198

Monitoring Data Quality for Your Big Data Pipelines Made Easy

How to learn data engineering

How To Future-Proof Your Data Pipelines

Python for Data Engineering

Building Databricks Data Pipelines 101

How Data Engineering Teams Power Machine Learning With Feature Platforms

Introducing CDP Data Engineering: Purpose Built Tooling For Accelerating Data Pipelines

Simplified End-to-End Development for Production-Ready Data Pipelines, Applications, and ML Models

The No-Panic Guide to Building a Data Engineering Pipeline That Actually Scales

homegenius Improves Speed and Quality of Data Pipelines with Snowpark for Python

Building Data Pipelines That Run From Source To Analysis And Activation With Hevo Data

The Role of DevOps and CI/CD in Data Engineering

10 Skills to Ace Your Data Engineering Interviews

Data Engineering Weekly #183

Delivering Modern Enterprise Data Engineering with Cloudera Data Engineering on Azure

Ready-to-go sample data pipelines with Dataflow

What is an AI Data Engineer? 4 Important Skills, Responsibilities, & Tools

Rewiring My Career: How I Transitioned from Electrical Engineering to Data Engineering

Data Engineering Weekly #173

Effective Pandas Patterns For Data Engineering

Building a Formula 1 Streaming Data Pipeline With Kafka and Risingwave

Top 12 Data Engineering Project Ideas [With Source Code]

Data Engineering Weekly #163

Tired of your Data Engineering Role?

Data Engineer vs Data Analyst: Key Differences and Similarities

Zenlytic Is Building You A Better Coworker With AI Agents

Moving Machine Learning Into The Data Pipeline at Cherre

Drafting Your Data Pipelines

Being Data Driven At Stripe With Trino And Iceberg

Automating Data Pipelines in CDP with CDE Managed Airflow Service

Making Data Pipelines Self-Serve For Everyone With Shipyard

What Is Apache Airflow – Data Engineering Consulting

Python Upgrade Playbook

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Stay Connected