Top Data Engineering Digest Data Workflow SQL Content for Sun.Dec 24, 2023

Sun.Dec 24, 2023

Troubleshooting Kafka In Production

Data Engineering Podcast

DECEMBER 24, 2023

Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. Operating it at scale, however, is notoriously challenging. Elad Eldor has experienced these challenges first-hand, leading to his work writing the book "Kafka: : Troubleshooting in Production" In this episode he highlights the sources of complexity that contribute to Kafka's operational difficulties, and some of the main ways to identify and mitigate

Kafka

Kafka Data Lake High Quality Data SQL

SparkSQL is Destroying your Pipelines

Confessions of a Data Guy

DECEMBER 24, 2023

It’s true, even if you don’t want it to be. SparkSQL is destroying your data pipelines and possibly wreaking havoc on your entire data team, infrastructure, and life. In your heart of hearts, you’ve probably known it for years. With great power comes great responsibility. We all know that even us Data Engineers are human […] The post SparkSQL is Destroying your Pipelines appeared first on Confessions of a Data Guy.

Data Pipeline

Data Pipeline Data Engineer Data Engineering Engineering

1.5 Years of Spark Knowledge in 8 Tips

Towards Data Science

DECEMBER 24, 2023

My learnings from Databricks customer engagements Figure 1: a technical diagram of how to write apache spark. Image by author. After working with ~15 of the largest retail organizations for the past 18 months, here are the Spark tips I commonly repeat. Throughout this post, we assume a general working knowledge of spark and it’s structure, but this post should be accessible to all levels of spark.

Scala

Scala SQL Java Python

Webinars

Apache Airflow®: The Ultimate Guide to DAG Writing

MORE WEBINARS

Data Engineering Weekly #154

Data Engineering Weekly

DECEMBER 24, 2023

RudderStack is the Warehouse Native CDP, built to help data teams deliver value across the entire data activation lifecycle, from collection to unification and activation. Visit rudderstack.com to learn more. Sanjeev Mohan: Unveiling the Crystal Ball: 2024 Data and AI Trends Sanjeev & Rajesh, as usual, share their excellent observations about data & AI industry trends.

Data Engineer

Data Engineer Data Engineering Engineering Deep Learning

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

Architecture

Sun.Dec 24, 2023

Troubleshooting Kafka In Production

SparkSQL is Destroying your Pipelines

1.5 Years of Spark Knowledge in 8 Tips

Webinars

Data Engineering Weekly #154

Apache Airflow® Best Practices for ETL and ELT Pipelines

Stay Connected