Top Data Engineering Digest Data Engineer Data Engineering Content for Week of Jul 15

Sat.Jul 15, 2023 - Fri.Jul 21, 2023

H1 2023 Analytics & Data Science Spend & Trends Report

KDnuggets

JULY 17, 2023

The All Things Insights and marketing analytics and data science community completed an extensive survey covering what executives are thinking, how they’re spending and the issues and opportunities they face. Grab your free copy now.

Data Science

Data Science Data

Building an an Early Stage Startup: Lessons from Akita Software

The Pragmatic Engineer

JULY 20, 2023

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of five topics from today’s subscriber-only deepdive on Advice on how to sell a startup. To get full issues twice a week, subscribe here.

Building

Building Programming Language Programming Python

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

How to initialize state in Apache Spark Structured Streaming stateful jobs?

Waitingforcode

JULY 21, 2023

Starting from Apache Spark 3.2.0 is now possible to load an initial state of the arbitrary stateful pipelines. Even though the feature is easy to implement, it hides some interesting implementation details!

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Data Engineering Best Practices - #1. Data flow & Code

Start Data Engineering

JULY 20, 2023

1. Introduction 2. Sample project 3. Best practices 3.1. Use standard patterns that progressively transform your data 3.2. Ensure data is valid before exposing it to its consumers (aka data quality checks) 3.3. Avoid data duplicates with idempotent pipelines 3.4. Write DRY code & keep I/O separate from data transformation 3.5. Know the when, how, & what (aka metadata) of pipeline runs for easier debugging 3.

Coding

Coding Data Engineering Data Engineer Engineering

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Datapreneurs - How Todays Business Leaders Are Using Data To Define The Future

Data Engineering Podcast

JULY 16, 2023

Summary Data has been one of the most substantial drivers of business and economic value for the past few decades. Bob Muglia has had a front-row seat to many of the major shifts driven by technology over his career. In his recent book "Datapreneurs" he reflects on the people and businesses that he has known and worked with and how they relied on data to deliver valuable services and drive meaningful change.

SQL

SQL Machine Learning Python Data Engineering

4 Alternatives to Fivetran: The Evolving Dynamics of the ETL & ELT Tool Market

Seattle Data Guy

JULY 16, 2023

The ETL & ELT tool market is experiencing continuous transformation, propelled by fluctuating pricing structures and the advent of inventive alternatives. This industry remains fiercely competitive due to these changing elements and a swiftly growing user base. In the following sections, we will explore four emerging alternatives to Fivetran. Of course, that is if you… Read more The post 4 Alternatives to Fivetran: The Evolving Dynamics of the ETL & ELT Tool Market appeared first

Data Warehouse

Data Warehouse Data Consulting Data Engineering

Data News — Week 23.28

Christophe Blefari

JULY 15, 2023

Have fun train models on this ( credits ) Hey, it's Saturday I hope you're enjoying July, taking deserve break, reading data engineering articles while at the beach or traveling to unknown places. Sometimes there are Fridays when I don't find any glue between articles for the newsletter and I have an idea of something to compensate but it takes me the whole Friday of exploration.

Datasets

Datasets Python Machine Learning Data

More Trending

Data News — Week 23.28

Christophe Blefari

JULY 15, 2023

Datasets

Datasets Python Machine Learning Data

How SAS can help catapult practitioners’ careers

KDnuggets

JULY 21, 2023

Let's explore the journeys of SAS users who harnessed the power of SAS to unlock new opportunities and achieve their career goals.

How to Build a 5-Layer Data Stack

Towards Data Science

JULY 21, 2023

Spinning up a data platform doesn’t have to be complicated. Here are the 5 must-have layers to drive data product adoption at scale. Image courtesy of author. We hope it doesn’t make your eyes water. Like bean dip and ogres , layers are the building blocks of the modern data stack. Its powerful selection of tooling components combine to create a single synchronized and extensible data platform with each layer serving a unique function of the data pipeline.

Building

Building Business Intelligence BI Cloud Storage

Building your Generative AI apps with Meta's Llama 2 and Databricks

databricks

JULY 18, 2023

Today, Meta released their latest state-of-the-art large language model (LLM) Llama 2 to open source for commercial use1. This is a significant development.

Building

Introducing the Connect with Confluent Partner Program: Supercharging Customer Growth and Extending the Data Streaming Ecosystem

Confluent

JULY 18, 2023

Gain the easiest solution for data streaming and increase data flow to your platform through native integrations with Confluent Cloud and 120+ Kafka connectors.

Programming

Programming Kafka Cloud Data

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Exploring the Power and Limitations of GPT-4

KDnuggets

JULY 20, 2023

Unveiling GPT-4: Deciphering its impact on data science and exploring its strengths and boundaries.

Data Science

Data Science IT Data

How ThoughtSpot Partnered with Google Cloud to put AI at the center of BI

ThoughtSpot

JULY 18, 2023

At ThoughtSpot, we believe making data accessible to every knowledge worker requires human-centered technology—an analytics experience that bridges the “language” barrier between technology and people. AI is the perfect compliment to search because it empowers organizations to analyze, understand, and act on data. In order to achieve this vision, we knew we’d need to work with some of the best, most innovative technology companies across the modern data stack —companies that put their users fir

Google Cloud

Google Cloud Cloud BI Data Security

Never Miss a Beat: Announcing New Monitoring and Alerting capabilities in Databricks Workflows

databricks

JULY 18, 2023

We are excited to announce enhanced monitoring and observability features in Databricks Workflows. This includes a new real-time insights dashboard to see all.

5 Inspiring Learning Resources That Help Me Stay on Top of Data Analytics

Towards Data Science

JULY 18, 2023

5 Inspiring Learning Resources to Propel Your Skills and Expertise Continue reading on Towards Data Science »

Data Analytics

Data Analytics Data Science Data Data Governance

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

ChatGPT Dethroned: How Claude Became the New AI Leader

KDnuggets

JULY 17, 2023

Putting the world to shame.

Getting started with SAR satellite imagery

ArcGIS

JULY 17, 2023

This blog shares the resource to the ArcGIS Pro Learn Series about SAR satellite imagery.

Education

Databricks + MosaicML

databricks

JULY 19, 2023

Today, we’re excited to share that we’ve completed our acquisition of MosaicML, a leading platform for creating and customizing generative AI models for you.

Unlocking the Secrets of Slowly Changing Dimension (SCD): A Comprehensive View of 8 Types

Towards Data Science

JULY 17, 2023

Deep Dive Guide for When and How to Use 8 Types of SCD Continue reading on Towards Data Science »

Data Science

Data Science Data Data Warehouse Technology

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

The Drag-and-Drop UI for Building LLM Flows: Flowise AI

KDnuggets

JULY 21, 2023

Don’t have any coding experience? Don’t worry. Check out this drag-and-drop tool that helps you to build your own customized LLM flows. And guess what, you don’t have to be a tech professional!

Building

Building Coding

Analyzing Time Series for Pinterest Observability

Pinterest Engineering

JULY 18, 2023

Brian Overstreet | Software Engineer, Observability; Humsheen Geo | Software Engineer, Observability Time series is a critical part of Observability at Pinterest, powering 60,000 alerts and 5,000 dashboards. A time series is an identifier with values where the values are associated with a timestamp. Given the widespread use and critical nature of time series, it’s important to give engineers the ability to adequately express what operations to perform on the time series in a readable, understand

Database

Database Software Engineering Software Engineer Raw Data

Go from Months to Hours with Databricks Marketplace for Retailers

databricks

JULY 21, 2023

Let's say a distributor reached out wanting to understand what factors are driving the sale of carbonated beverages from customers in their convenience.

Retail

Layers of Data Quality

Towards Data Science

JULY 18, 2023

Where and how to address problems with your data Continue reading on Towards Data Science »

Data Science

Data Science Data Data Engineering Data Engineer

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

ChatGPT-Powered Data Exploration: Unlock Hidden Insights in Your Dataset

KDnuggets

JULY 17, 2023

A guide to using ChatGPT for exploratory data analysis. Use ChatGPT to explore a dataset, generate visualizations, and gain insights.

Datasets

Datasets Data Analysis Data Data Science

How to Build a 5-Layer Data Stack

Monte Carlo

JULY 19, 2023

Building a data stack doesn’t have to be complicated. Here’s what data leaders say are the 5 must-have layers of your data platform to drive data adoption – and ROI – across your business. Like bean dip and ogres , layers are the building blocks of the modern data stack. Its powerful selection of tooling components combine to create a single synchronized and extensible data platform with each layer serving a unique function of the data pipeline.

Building

Building Business Intelligence Cloud Storage BI

The Executive’s Guide to Data, Analytics and AI Transformation, Part 7: Move to production and scale adoption

databricks

JULY 21, 2023

This is part seven of a multi-part series to share key insights and tactics with Senior Executives leading data and AI transformation initiatives.

Data Analytics

Data Analytics Data

Storing a network diagram or not… This is a real question to consider!

ArcGIS

JULY 18, 2023

The purpose is to learn what network diagram storage means and provide guidance to avoid unnecessarily increasing database sizes.

Database

Database Utilities Telecommunication Data Management

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

Unveiling the Power of Meta’s Llama 2: A Leap Forward in Generative AI?

KDnuggets

JULY 19, 2023

This article explores the technical details and implications of Meta's newly released Llama 2, a large language model that promises to revolutionize the field of generative AI. We delve into its capabilities, performance, and potential applications, while also discussing its open-source nature and the company's commitment to safety and transparency.

Processing Data At Scale With MapReduce

Towards Data Science

JULY 19, 2023

A deep dive into MapReduce and parallelization Continue reading on Towards Data Science »

Process

Process Data Science Data Programming

Simplified Analytics Engineering with Databricks and dbt Labs

databricks

JULY 17, 2023

For over a year now, Databricks and dbt Labs have been working together to realize the vision of simplified real-time analytics engineering, combining.

Engineering

Bringing HDR video to Reels

Engineering at Meta

JULY 17, 2023

Meta has made it possible for people to upload high dynamic range (HDR) videos from their phone’s camera roll to Reels on Facebook and Instagram. To show standard dynamic range (SDR) UI elements and overlays legibly on top of HDR video, we render them at a brightness level comparable to the video itself. We solved various technical challenges to ensure a smooth transition to HDR video across the diverse range of old and new devices that people use to interact with our services every day.

Media

Media Metadata Software Engineering Software Engineer

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineer

Sat.Jul 15, 2023 - Fri.Jul 21, 2023

H1 2023 Analytics & Data Science Spend & Trends Report

Building an an Early Stage Startup: Lessons from Akita Software

Webinars

Trending Sources

How to initialize state in Apache Spark Structured Streaming stateful jobs?

Webinars

Data Engineering Best Practices - #1. Data flow & Code

A Guide to Debugging Apache Airflow® DAGs

Datapreneurs - How Todays Business Leaders Are Using Data To Define The Future

4 Alternatives to Fivetran: The Evolving Dynamics of the ETL & ELT Tool Market

Data News — Week 23.28

Sign up to get articles personalized to your interests!

More Trending

Data News — Week 23.28

How SAS can help catapult practitioners’ careers

How to Build a 5-Layer Data Stack

Building your Generative AI apps with Meta's Llama 2 and Databricks

Introducing the Connect with Confluent Partner Program: Supercharging Customer Growth and Extending the Data Streaming Ecosystem

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Exploring the Power and Limitations of GPT-4

How ThoughtSpot Partnered with Google Cloud to put AI at the center of BI

Never Miss a Beat: Announcing New Monitoring and Alerting capabilities in Databricks Workflows

5 Inspiring Learning Resources That Help Me Stay on Top of Data Analytics

Agent Tooling: Connecting AI to Your Tools, Systems & Data

ChatGPT Dethroned: How Claude Became the New AI Leader

Getting started with SAR satellite imagery

Databricks + MosaicML

Unlocking the Secrets of Slowly Changing Dimension (SCD): A Comprehensive View of 8 Types

How to Modernize Manufacturing Without Losing Control

The Drag-and-Drop UI for Building LLM Flows: Flowise AI

Analyzing Time Series for Pinterest Observability

Go from Months to Hours with Databricks Marketplace for Retailers

Layers of Data Quality

The Ultimate Guide to Apache Airflow DAGS

ChatGPT-Powered Data Exploration: Unlock Hidden Insights in Your Dataset

How to Build a 5-Layer Data Stack

The Executive’s Guide to Data, Analytics and AI Transformation, Part 7: Move to production and scale adoption

Storing a network diagram or not… This is a real question to consider!

Apache Airflow® Best Practices: DAG Writing

Unveiling the Power of Meta’s Llama 2: A Leap Forward in Generative AI?

Processing Data At Scale With MapReduce

Simplified Analytics Engineering with Databricks and dbt Labs

Bringing HDR video to Reels

How to Achieve High-Accuracy Results When Using LLMs

Stay Connected