article thumbnail

How to test PySpark code with pytest

Start Data Engineering

Ensure the code’s logic is working as expected with tests 2.1. pytest: A powerful Python library for testing 2.2.1. Set context, run code, check results & clean up 2.2.2. Introduction 2. Test types for data pipelines 2.2. Tests are identified by their name 2.2.3. Use fixture to create fake data for testing 2.2.4.

Coding 208
article thumbnail

Managing Your Reusable Python Code as a Data Scientist

KDnuggets

Here are a few approaches that I have settled on for managing my own reusable Python code as a data scientist, presented from most to least general code use, and aimed at beginners.

Python 160
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Mastering Python: 7 Strategies for Writing Clear, Organized, and Efficient Code

KDnuggets

Optimize Your Python Workflow: Proven Techniques for Crafting Production-Ready Code

Python 142
article thumbnail

Announcing FawltyDeps - a dependency checker for your Python code

Tweag

It is a truth universally acknowledged that the Python packaging ecosystem is in need of a good dependency checker. If you work with Python, and care about keeping your projects lean and repeatable, then this is for you. The dependency is now installed in your Python virtual environment or on your system. 3rd-party imports).

Python 145
article thumbnail

Apache Airflow® Crash Course: From 0 to Running your Pipeline in the Cloud

Airflow enables you to define workflows as Python code, allowing for dynamic and scalable pipelines suitable to any use case from ETL/ELT to running ML/AI operations in production. With over 30 million monthly downloads, Apache Airflow is the tool of choice for programmatically authoring, scheduling, and monitoring data pipelines.

article thumbnail

Data Pipeline Design Patterns - #2. Coding patterns in Python

Start Data Engineering

Introduction Sample project Code design patterns 1. Singleton, & Object pool patterns Python helpers 1. Functional design 2. Factory pattern 3. Strategy pattern 4. Dataclass 3. Context Managers 4. Testing with pytest 5.

Designing 147
article thumbnail

15 Python Coding Interview Questions You Must Know For Data Science

KDnuggets

Solving the Python coding interview questions is the best way to get ready for an interview. That’s why we’ll lead you through 15 examples and five concepts these questions cover.

Coding 159