Top Data Engineering Digest Data Engineer Data Engineering Content for Week of Jul 29

Sat.Jul 29, 2023 - Fri.Aug 04, 2023

What is a Senior Software Engineer at Wise and Amazon?

The Pragmatic Engineer

AUGUST 1, 2023

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. To get full issues twice a week, subscribe here. The past month, we’ve done deepdives in the newsletter on what a senior software engineer is at Big Tech , and at scaleups.

Software Engineer

Software Engineer Software Engineering Engineering Designing

Introduction to Delta Lake

Confessions of a Data Guy

AUGUST 4, 2023

The post Introduction to Delta Lake appeared first on Confessions of a Data Guy.

Data

Data Data Engineering Data Engineer Engineering

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Data Warehouses Vs Operational Data Stores Vs Data Lakes – How To Store Your Data For Analytics

Seattle Data Guy

AUGUST 2, 2023

A few months ago, I uploaded a video where I discussed data warehouses, data lakes, and transactional databases. However, the world of data management is evolving rapidly, especially with the resurgence of AI and machine learning. There are numerous other methods that technical teams are utilizing to handle their data effectively. In this presentation, I… Read more The post Data Warehouses Vs Operational Data Stores Vs Data Lakes – How To Store Your Data For Analytics appeared first

Data Lake

Data Lake Data Warehouse Data Machine Learning

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

The first state in Apache Spark Structured Streaming arbitrary stateful processing

Waitingforcode

AUGUST 2, 2023

When you wrote your first arbitrary stateful processing pipelines, the state expiration is maybe the first tricky point you had to deal with. Why is that? After all, it's just about setting the timeout, doesn't it? Most of the time, yes, but there is an exception.

Process

Process IT

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Google Shutting down Firebase Dynamic Links

The Pragmatic Engineer

AUGUST 3, 2023

👋 Hi, this is Gergely with a free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Pulse issue. If you’re not yet a full subscriber, you missed this week’s deepdive: The 2023 tech market, as seen by hiring managers. To get full newsletters twice a week, subscribe here.

Metadata

Metadata Engineering Building Technology

Introduction to AWS Lambda (deployment)

Confessions of a Data Guy

AUGUST 4, 2023

The post Introduction to AWS Lambda (deployment) appeared first on Confessions of a Data Guy.

AWS

AWS Data Data Engineering Data Engineer

Strategies For A Successful Data Platform Migration

Data Engineering Podcast

JULY 30, 2023

Summary All software systems are in a constant state of evolution. This makes it impossible to select a truly future-proof technology stack for your data platform, making an eventual migration inevitable. In this episode Gleb Mezhanskiy and Rob Goretsky share their experiences leading various data platform migrations, and the hard-won lessons that they learned so that you don't have to.

Machine Learning

Machine Learning SQL Python Data

More Trending

Strategies For A Successful Data Platform Migration

Data Engineering Podcast

JULY 30, 2023

Machine Learning

Machine Learning SQL Python Data

Forget ChatGPT, This New AI Assistant Is Leagues Ahead and Will Change the Way You Work Forever

KDnuggets

AUGUST 4, 2023

I bet you are unfamiliar with this fast AI application, which provides flexibility, ease of use, and accurate results.

Smooth Sailing Ahead

databricks

AUGUST 4, 2023

The Databricks Container Infra team builds cloud-agnostic infrastructure and tooling for building, storing and distributing container images. Recently, the team worked on scaling.

Cloud

Cloud Building Engineering

A step-by-step guide to build an Effective Data Quality Strategy from scratch

Towards Data Science

AUGUST 2, 2023

A Step-by-Step Guide to Building an Effective Data Quality Strategy from Scratch How to build an interpretable data quality framework based on user expectations Photo by Rémi Müller on Unsplash As data engineers, we are (or should be) responsible for the quality of the data we provide. This is nothing new, but every time I join a data project I ask myself the same questions: When should I start working on data quality?

Building

Building Data Consolidation Data Datasets

Sunrise: Zalando's developer platform based on Backstage

Zalando Engineering

AUGUST 2, 2023

Introduction Since 2021, Zalando invested in building up a developer portal called Sunrise, aimed to become the starting point for Builders at Zalando. The portal is based on Spotify's Backstage platform with additional extensions built internally. Sunrise enables everyone at Zalando to view and discover information about teams, applications, APIs, events, CI/CD pipelines, Infrastructure accounts and costs, and much more.

Software Engineering

Software Engineering Software Engineer Engineering Machine Learning

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

7 Steps to Mastering Data Cleaning and Preprocessing Techniques

KDnuggets

AUGUST 3, 2023

Are you trying to solve your first data science project? This tutorial will help you to guide you step by step to prepare your dataset before applying the machine learning model.

Data Science

Data Science Datasets Machine Learning Data

Announcing Databricks Belgrade Development Center

databricks

AUGUST 1, 2023

We are thrilled to announce the opening of Databricks’ latest development center in Belgrade, Serbia. This addition joins our existing R&D centers in A.

Forging a Data Strategy for Success in Uncertain Times

Precisely

AUGUST 3, 2023

The results are in! The 2023 Data Integrity Trends and Insights Report , published in partnership between Precisely and Drexel University’s LeBow College of Business, delivers groundbreaking insights into the importance of trusted data. For the report, more than 450 data and analytics professionals worldwide were surveyed about the state of their data programs.

Data Integration

Data Integration Data Programming Programming Data

Create the engineering career you love at Pinterest

Pinterest Engineering

AUGUST 3, 2023

An interview with Behnam Rezaei | Pinterest VP, Engineering At Pinterest, we’re on a mission to bring everyone the inspiration to create a life they love. For our employees, this extends further to creating the life and career they love. The Pinterest Engineering Blog team sat down with Behnam Rezaei to get an inside scoop into the Monetization Engineering team, what makes Pinterest different and why now is a great time to join our team.

Engineering

Engineering Machine Learning Data Science Technology

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Using SHAP Values for Model Interpretability in Machine Learning

KDnuggets

AUGUST 2, 2023

Discover how SHAP can help you understand the impact of model features on predictions.

Machine Learning

Introducing Databricks Assistant, a context-aware AI assistant

databricks

JULY 31, 2023

Today, we are excited to announce the public preview of Databricks Assistant, a context-aware AI assistant, available natively in Databricks Notebooks, SQL editor.

SQL

Exploring the ArcGIS Utility Network Trace Framework

ArcGIS

JULY 31, 2023

A guided discussion on the capabilities of the tracing framework of the Utility Network and how it can be used to answer questions.

Utilities

Utilities IT Data Management Management

Modern Overview of the MIT CDOIQ Symposium

The Modern Data Company

JULY 31, 2023

Modern Announces Partnership with Data Mesh Pioneers, ThoughtWorks In July, we collaborated with ThoughtWorks at the annual CDOIQ Conference in Cambridge, MA to discuss real-world Data Products implementation and best practices for Data Mesh. The data community, especially CDOs, emphasized the importance of raising awareness and gaining clarity about data products.

Government

Government Data Governance Architecture Data Pipeline

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Multivariate Time-Series Prediction with BQML

KDnuggets

JULY 31, 2023

Google's BQML can be used to make time series models, and recently it was updated to create multivariate time series models. With the simple code, this article shows how to use it to predict multivariate time series and it can be more powerful than a univariate time series model in this article.

Coding

Coding IT Machine Learning

Databricks and Technology Partners: Personalized Medicine with a Tailored Approach

databricks

AUGUST 1, 2023

In the ever-evolving realm of healthcare, two powerful trends have emerged: The rise of personalized medicine and the increasing emphasis on patient involvement.

Technology

Technology Healthcare

Leveraging The Powers of Functional Code?—?Part 2

Booking.com Engineering

AUGUST 3, 2023

Leveraging The Powers of Functional Code — Part 2 The Fully Functional Haskell Solution Part one can be found here: [link] The Solution: Regarding the Haskell code — don’t worry if you don’t understand everything. I am going to explain the main points of it by drawing a parallel to the Java implementation. If you are curious about FP, I cannot recommend this book enough, and the online version is free: [link] It is a pleasant read with lots of humor (just the illustrations by themselves make me

Coding

Coding Java Scala Programming

How to import contingent values into a feature class

ArcGIS

JULY 31, 2023

This workflow shows how to use the Import and Export Contingent Values tools to quickly generate contingent values from existing data.

Data

Data Data Management Management

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

KDnuggets News, August 2: ChatGPT Code Interpreter: Fast Data Science • Can’t Keep Up? Catch up on This Week in AI

KDnuggets

AUGUST 2, 2023

ChatGPT Code Interpreter: Do Data Science in Minutes • This Week in AI • Introduction to Statistical Learning, Python Edition: Free Book • 8 Programming Languages For Data Science to Learn in 2023 • Mastering GPUs: A Beginner's Guide to GPU-Accelerated DataFrames in Python

Data Science

Data Science Coding Programming Language Python

Protecting Your Compute Resources From Bitcoin Miners With a Data Lakehouse

databricks

AUGUST 3, 2023

As cryptocurrencies, particularly Bitcoin, have grown in popularity, so has the phenomenon of Bitcoin mining. While normal mining operations are critical for blockchain.

Data

Robinhood Reports Second Quarter 2023 Results

Robinhood

AUGUST 2, 2023

Robinhood Markets, Inc. (Nasdaq: HOOD) today reported financial results for the quarter ended June 30, 2023. Read our Q2 earnings press release here. Access more information at investors.robinhood.com. The post Robinhood Reports Second Quarter 2023 Results appeared first on Robinhood Newsroom.

Accessible

Accessible Accessibility

How DoorDash Migrated from StatsD to Prometheus

DoorDash Engineering

AUGUST 1, 2023

Accurate and reliable observability is essential when supporting a large distributed service, but this is only possible if your tools are equally scalable. Unfortunately, this was a challenge at DoorDash because of peak traffic failures while using our legacy metrics infrastructure based on StatsD. Just when we most needed observability data, the system would leave us in the lurch.

AWS

AWS Transportation Programming Language Government

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

Keras 3.0: Everything You Need To Know

KDnuggets

JULY 31, 2023

Unlock the power of AI collaboration with Keras 3.0! Seamlessly switch between TensorFlow, JAX, and PyTorch, revolutionizing your deep learning projects. Read now and stay ahead in the world of AI.

Deep Learning

Deep Learning Project Machine Learning

Announcing new security controls and compliance certifications for Azure Databricks and AWS Databricks SQL Serverless

databricks

AUGUST 2, 2023

We're excited to share a new set of security controls and compliance certifications that can help with regulatory compliance on Azure Databricks and.

Certification

Certification AWS SQL

How to Get the User’s Location Using Mapbox?

Workfall

AUGUST 1, 2023

Reading Time: 9 minutes Obtaining a user’s location is a critical requirement for many modern web applications, such as location-based services, personalized content delivery, and targeted marketing. However, without proper guidance and understanding of HTML and JavaScript geolocation techniques, developers often face challenges in implementing this feature effectively.

Coding

Coding Accessible Accessibility Process

Announcing Our LinkedIn-Cornell 2023 Grant Recipients

LinkedIn Engineering

AUGUST 2, 2023

LinkedIn and Cornell Ann S. Bowers College of Computing and Information Science (Bowers CIS) embarked on a partnership , bringing together our collective research power to make technological advances that will further our goal to connect professionals with opportunities at scale. Through this partnership, we support Ph.D. students and faculty members on their research in areas in Computer Science, AI, Information Science including Diversity and Equity.

Computer Science

Computer Science Algorithm Machine Learning Data Science

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineer

Sat.Jul 29, 2023 - Fri.Aug 04, 2023

What is a Senior Software Engineer at Wise and Amazon?

Introduction to Delta Lake

Webinars

Trending Sources

Data Warehouses Vs Operational Data Stores Vs Data Lakes – How To Store Your Data For Analytics

Webinars

The first state in Apache Spark Structured Streaming arbitrary stateful processing

A Guide to Debugging Apache Airflow® DAGs

Google Shutting down Firebase Dynamic Links

Introduction to AWS Lambda (deployment)

Strategies For A Successful Data Platform Migration

Sign up to get articles personalized to your interests!

More Trending

Strategies For A Successful Data Platform Migration

Forget ChatGPT, This New AI Assistant Is Leagues Ahead and Will Change the Way You Work Forever

Smooth Sailing Ahead

A step-by-step guide to build an Effective Data Quality Strategy from scratch

Sunrise: Zalando's developer platform based on Backstage

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

7 Steps to Mastering Data Cleaning and Preprocessing Techniques

Announcing Databricks Belgrade Development Center

Forging a Data Strategy for Success in Uncertain Times

Create the engineering career you love at Pinterest

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Using SHAP Values for Model Interpretability in Machine Learning

Introducing Databricks Assistant, a context-aware AI assistant

Exploring the ArcGIS Utility Network Trace Framework

Modern Overview of the MIT CDOIQ Symposium

How to Modernize Manufacturing Without Losing Control

Multivariate Time-Series Prediction with BQML

Databricks and Technology Partners: Personalized Medicine with a Tailored Approach

Leveraging The Powers of Functional Code?—?Part 2

How to import contingent values into a feature class

The Ultimate Guide to Apache Airflow DAGS

KDnuggets News, August 2: ChatGPT Code Interpreter: Fast Data Science • Can’t Keep Up? Catch up on This Week in AI

Protecting Your Compute Resources From Bitcoin Miners With a Data Lakehouse

Robinhood Reports Second Quarter 2023 Results

How DoorDash Migrated from StatsD to Prometheus

Apache Airflow® Best Practices: DAG Writing

Keras 3.0: Everything You Need To Know

Announcing new security controls and compliance certifications for Azure Databricks and AWS Databricks SQL Serverless

How to Get the User’s Location Using Mapbox?

Announcing Our LinkedIn-Cornell 2023 Grant Recipients

How to Achieve High-Accuracy Results When Using LLMs

Stay Connected