Top Data Engineering Digest ETL Tools Data Engineer Content for Week of Aug 31

Sat.Aug 31, 2024 - Fri.Sep 06, 2024

10 Built-In Python Modules Every Data Engineer Should Know

KDnuggets

SEPTEMBER 2, 2024

Interested in data engineering? Check out this round-up of built-in Python modules that'll come in handy for data engineering tasks.

Python

Python Data Engineer Data Engineering Engineering

How Producers Work: Kafka Producer and Consumer Internals, Part 1

Confluent

SEPTEMBER 5, 2024

Dive into Kafka internals with a four-part series examining client requests and brokers. Part 1 covers what a producer does to prepare raw event data for the broker.

Kafka

Kafka Data

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Databricks announces significant improvements to the built-in LLM judges in Agent Evaluation

databricks

SEPTEMBER 5, 2024

An improved answer-correctness judge in Agent Evaluation Agent Evaluation enables Databricks customers to define, measure, and understand how to improve the quality of.

Data Science

Data Science Data

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

What are the Key Parts of Data Engineering?

Start Data Engineering

SEPTEMBER 4, 2024

1. Introduction 2. Key parts of data systems: 2.1. Requirements 2.2. Data flow design 2.3. Orchestrator and scheduler 2.4. Data processing design 2.5. Code organization 2.6. Data storage design 2.7. Monitoring & Alerting 2.9. Infrastructure 3. Conclusion 1. Introduction If you are trying to break into (or land a new) data engineering job, you will inevitably encounter a slew of data engineering tools.

Data Engineer

Data Engineer Data Engineering Engineering Data Storage

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

I Took Udacity’s Free A/B Testing Course by Google: Here’s What I Learned

KDnuggets

SEPTEMBER 6, 2024

A beginner's guide to A/B testing by FAANG data scientists.

Data

Data Data Science

Real-time Analytics Vs Stream Processing – What Is The Difference?

Seattle Data Guy

SEPTEMBER 3, 2024

One of the holy grails that many data teams seem to chase is real-time data analytics. After all, if you can have real-time analytics, you can make better decisions faster. However, there often is a conflation between real-time data analytics and stream processing. These are two different concepts that are crucial to understanding how to… Read more The post Real-time Analytics Vs Stream Processing – What Is The Difference?

Process

Process Data Analytics Data Big Data

How to share AI/BI Dashboards with everyone in your organization

databricks

SEPTEMBER 3, 2024

Learn all the ways you can publish and share AI/BI Dashboards with users inside and outside of your Databricks Workspace to democratize insights from data for everyone.

BI Data

More Trending

How to share AI/BI Dashboards with everyone in your organization

databricks

SEPTEMBER 3, 2024

Learn all the ways you can publish and share AI/BI Dashboards with users inside and outside of your Databricks Workspace to democratize insights from data for everyone.

BI Data

Read Meta’s 2024 Sustainability Report

Engineering at Meta

SEPTEMBER 4, 2024

We are working in partnership with others to scale inclusive solutions that support the transition to a zero-carbon economy and help create a healthier planet for all.

Engineering

Engineering Data

Using FastAPI for Building ML-Powered Web Apps

KDnuggets

SEPTEMBER 5, 2024

A beginner tutorial on building a simple web application for machine learning model inference using FastAPI and Jinja2 templates.

Building

Building Machine Learning Python

Use response caching as a shortcut for servers

ArcGIS

SEPTEMBER 6, 2024

Learn more about how to use response caching for hosted feature services in ArcGIS Enterprise.

Data Management

Data Management Management Data

Revolutionizing Insight into Heavy Equipment Maintenance with GenAI

databricks

SEPTEMBER 5, 2024

Maintaining heavy equipment assets, such as oil rigs, agricultural combines, or fleets of vehicles, poses an extremely complex challenge for global companies. These.

Manufacturing

Manufacturing Machine Learning Data Science Data

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Streaming Postgres data to Databricks Delta Lake in Unity Catalog

Confessions of a Data Guy

SEPTEMBER 4, 2024

Over the many years I’ve been pounding my keyboard … Perl, PHP, Python, C#, Rust … whatever … I, like most programmers, built up a certain disdain for what is called Low Code / No Code solutions. In my rush to worship at the feet of the code we create, I failed, in the beginning, […] The post Streaming Postgres data to Databricks Delta Lake in Unity Catalog appeared first on Confessions of a Data Guy.

Python

Python Coding Data Big Data

Understanding the Basics of Reinforcement Learning

KDnuggets

SEPTEMBER 5, 2024

How does AI learn by doing? Read this to discover the basics of reinforcement learning.

Machine Learning

The “Who Does What” Guide To Enterprise Data Quality

Monte Carlo

SEPTEMBER 6, 2024

I’ve spoken with dozens of enterprise data professionals, and one of the most common data quality questions is, “who does what?” This is quickly followed by, “why and how?” There is a reason for this. Data quality is like a relay race. The success of each leg —detection, triage, resolution, and measurement—depends on the other. Every time the baton is passed, the chances of failure skyrocket.

Government

Government Machine Learning Data Data Engineer

Cost savings on serverless compute for Notebooks, Jobs, and Pipelines

databricks

SEPTEMBER 5, 2024

We recently announced the General Availability of our serverless compute offerings for Notebooks, Jobs, and Pipelines. Serverless compute provides rapid workload startup, automatic.

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Introduction to Polars in 2 Minutes

Confessions of a Data Guy

SEPTEMBER 4, 2024

Polars is the hot new Rust based Python Dataframe tool that is taking over the world and destryoing Pandas even as we speak. You want the quick and dirty introduction to Polars? Look no farther. The post Introduction to Polars in 2 Minutes appeared first on Confessions of a Data Guy.

Python

Python Data Data Engineer Data Engineering

How to Compute the Cross-Correlation Between Two NumPy Arrays

KDnuggets

SEPTEMBER 3, 2024

Let's see how to perform cross-correlation in NumPy, a method for measuring the similarity or relationship between two sequences of data as one is shifted in relation to the other.

Data

Edit schema reports for conversion

ArcGIS

SEPTEMBER 5, 2024

An Overview of editing schema reports for conversion to XML workspace documents.

Data Management

Data Management Management Data

Enhanced Workflows UI reduces debugging time and boosts productivity

databricks

SEPTEMBER 4, 2024

Data teams spend way too much time troubleshooting issues, applying patches, and restarting failed workloads. It's not uncommon for engineers to spend their.

Engineering

Engineering IT Data

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Python Files within Snowflake Python Procedures

Cloudyard

SEPTEMBER 2, 2024

Read Time: 1 Minute, 36 Second Snowflake’s support for Python stored procedures allows data engineers and scientists to leverage Python’s vast ecosystem directly within Snowflake. This capability enables advanced analytics, custom data processing, and seamless integration of Python libraries. One particularly powerful feature is the ability to import and use Python files (.py) directly within a Snowflake stored procedure, which promotes code modularity, reusability, and better organi

Python

Python Utilities Coding Data Engineer

Using FLUX.1 Locally

KDnuggets

SEPTEMBER 6, 2024

Learn how to install Stable Diffusion WebUI Forge easily and set up the FLUX.1 [dev] model for local use on a laptop.

Detecting AI-written code: lessons on the importance of data quality by Amy Laws

Scott Logic

SEPTEMBER 4, 2024

Our team had previously built a tool to investigate code quality from PR data. Building on this work, we set about finding a method to detect AI-written code, so we could investigate any potential differences in code quality between human and AI-written code. During our time on this project, we learnt some important lessons, including just how hard it can be to detect AI-written code, and the importance of good-quality data when conducting research.

Coding

Coding Datasets Programming Language Python

The short guide to understanding data intelligence

databricks

SEPTEMBER 1, 2024

Terms like “data governance,” “Generative AI” and “large language models” are becoming commonplace in the workplace. But for business leaders, it takes more.

Data Governance

Data Governance Government Data IT

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

Batch And Streaming Demystified For Unification

Towards Data Science

SEPTEMBER 3, 2024

Understand how batch can be considered a subset of streaming and why data engineering should simplify its usage significantly Continue reading on Towards Data Science »

Data Science

Data Science Data Engineer Data Engineering Engineering

5 Must-Know R Packages for Data Analysis

KDnuggets

SEPTEMBER 4, 2024

Here are five must-know R packages for data analysis in R.

Data Analysis

Data Analysis Data Programming

Comprehensive Guide to Modern Data Warehouse in 2024

Hevo

SEPTEMBER 4, 2024

A data warehouse is a centralized system that stores, integrates, and analyzes large volumes of structured data from various sources. It is predicted that more than 200 zettabytes of data will be stored in the global cloud by 2025.

Data Warehouse

Data Warehouse Structured Data Data Cloud

Driving into the future of electric transportation

databricks

SEPTEMBER 2, 2024

Rivian chose to modernize its data infrastructure on the Databricks Data Intelligence Platform, giving it the ability to unify all of its data into a common view for downstream analytics and machine learning.

Transportation

Transportation Machine Learning IT Data

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

Building Scalable Data Platforms

Towards Data Science

SEPTEMBER 1, 2024

Data Mesh trends in data platform design Continue reading on Towards Data Science »

Building

Building Data Science Data Designing

Ghosted After an Interview? 5 Resources to Help You Bounce Back

KDnuggets

SEPTEMBER 4, 2024

Check out this list of resources for different types of interviews.

Connect with Confluent: Celebrating One Year and 50+ Integrations

Confluent

SEPTEMBER 5, 2024

Confluent’s CwC partner program turns one year old and new program entrants for Q3 2024.

Programming

Community Tips for the Databricks Data Intelligence Platform

databricks

SEPTEMBER 1, 2024

Within the Databricks Community, there is a technical blog where community members share best practices, tutorials and insights on data analytics, data engineering.

Data Analytics

Data Analytics Data Data Engineer Data Engineering

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineer

Sat.Aug 31, 2024 - Fri.Sep 06, 2024

10 Built-In Python Modules Every Data Engineer Should Know

How Producers Work: Kafka Producer and Consumer Internals, Part 1

Webinars

Trending Sources

Databricks announces significant improvements to the built-in LLM judges in Agent Evaluation

Webinars

What are the Key Parts of Data Engineering?

A Guide to Debugging Apache Airflow® DAGs

I Took Udacity’s Free A/B Testing Course by Google: Here’s What I Learned

Real-time Analytics Vs Stream Processing – What Is The Difference?

How to share AI/BI Dashboards with everyone in your organization

Sign up to get articles personalized to your interests!

More Trending

How to share AI/BI Dashboards with everyone in your organization

Read Meta’s 2024 Sustainability Report

Using FastAPI for Building ML-Powered Web Apps

Use response caching as a shortcut for servers

Revolutionizing Insight into Heavy Equipment Maintenance with GenAI

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Streaming Postgres data to Databricks Delta Lake in Unity Catalog

Understanding the Basics of Reinforcement Learning

The “Who Does What” Guide To Enterprise Data Quality

Cost savings on serverless compute for Notebooks, Jobs, and Pipelines

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Introduction to Polars in 2 Minutes

How to Compute the Cross-Correlation Between Two NumPy Arrays

Edit schema reports for conversion

Enhanced Workflows UI reduces debugging time and boosts productivity

How to Modernize Manufacturing Without Losing Control

Python Files within Snowflake Python Procedures

Using FLUX.1 Locally

Detecting AI-written code: lessons on the importance of data quality by Amy Laws

The short guide to understanding data intelligence

The Ultimate Guide to Apache Airflow DAGS

Batch And Streaming Demystified For Unification

5 Must-Know R Packages for Data Analysis

Comprehensive Guide to Modern Data Warehouse in 2024

Driving into the future of electric transportation

Apache Airflow® Best Practices: DAG Writing

Building Scalable Data Platforms

Ghosted After an Interview? 5 Resources to Help You Bounce Back

Connect with Confluent: Celebrating One Year and 50+ Integrations

Community Tips for the Databricks Data Intelligence Platform

How to Achieve High-Accuracy Results When Using LLMs

Stay Connected