Top Data Engineering Digest Data Engineer Data Engineering Content for Week of May 20

Sat.May 20, 2023 - Fri.May 26, 2023

7 Data Engineering Projects To Put On Your Resume

Seattle Data Guy

MAY 20, 2023

Starting new data engineering projects can be challenging. Data engineers can get stuck on finding the right data for their data engineering project or picking the right tools. And many of my Youtube followers agree as they confirmed in a recent poll that starting a new data engineering project was difficult. Here were the key… Read more The post 7 Data Engineering Projects To Put On Your Resume appeared first on Seattle Data Guy.

Data Engineering

Data Engineering Data Engineer Project Engineering

Layoffs push down scores on Glassdoor: this is how companies respond

The Pragmatic Engineer

MAY 25, 2023

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and high-growth startups through the lens of engineering managers and senior engineers. In this issue, we cover one out of six topics from today’s subscriber-only The Scoop issue. To get full articles twice a week, subscribe here.

Software Engineering

Software Engineering Software Engineer AWS Engineering

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Conversation with Sumeet, Software Engineer at Natwest Group

Analytics Vidhya

MAY 22, 2023

Introduction Join us in this interview as Sumeet shares his background, journey as a former Data Scientist to a software engineer, and learn the captivating aspects of his current job. He provides insights into the future of data science and software engineering and offers valuable advice for career transitioners. Let’s dive into our conversation with […] The post Conversation with Sumeet, Software Engineer at Natwest Group appeared first on Analytics Vidhya.

Software Engineer

Software Engineer Software Engineering Engineering Data Science

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Keep Your Data Lake Fresh With Real Time Streams Using Estuary

Data Engineering Podcast

MAY 21, 2023

Summary Batch vs. streaming is a long running debate in the world of data integration and transformation. Proponents of the streaming paradigm argue that stream processing engines can easily handle batched workloads, but the reverse isn't true. The batch world has been the default for years because of the complexities of running a reliable streaming system at scale.

Data Lake

Data Lake Machine Learning Kafka Data Warehouse

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

GPT-4 is Vulnerable to Prompt Injection Attacks on Causing Misinformation

KDnuggets

MAY 26, 2023

ChatGPT might have some loophole to provide unreliable facts.

Neeva Acquired by Snowflake

Snowflake

MAY 24, 2023

Comments

What is Data Storage and How is it Used?

Analytics Vidhya

MAY 24, 2023

As modern companies rely on data, establishing dependable, effective solutions for maintaining that data is a top task for each organization. The complexity of information storage technologies increases exponentially with the growth of data. From physical hard drives to cloud computing, unravel the captivating world of data storage and recognize its ever-evolving role in our […] The post What is Data Storage and How is it Used?

Data Storage

Data Storage IT Cloud Computing Cloud

More Trending

What is Data Storage and How is it Used?

Analytics Vidhya

MAY 24, 2023

Data Storage

Data Storage IT Cloud Computing Cloud

Data Modeling - The Unsung Hero of Data Engineering: Architecture Pattern, Tools and the Future (Part 3)

Simon Späti

MAY 26, 2023

Welcome to the third and final installment of our series “Data Modeling: The Unsung Hero of Data Engineering.” If you’ve journeyed with us from Part 1, where we dove into the importance and history of data modeling, or joined us in Part 2 to explore various approaches and techniques, I’m delighted you’ve stuck around. In this third part, we’ll delve into data architecture patterns and their influence on data modeling.

Architecture

Architecture Data Engineering Data Engineer Engineering

AI is Eating Data Science

KDnuggets

MAY 24, 2023

When it's all said and done, and AI has been universally recognized as our rightful overlords, the idea of data science as a standalone field will have been but a blip on our collective radar.

Data Science

Data Science Data IT

What's new in Apache Spark 3.4.0 - Structured Streaming and correctness issue

Waitingforcode

MAY 24, 2023

Apache Spark is infamous for its correctness issue for chained stateful operations. Fortunately things get improved in each release. The most recent one, the 3.4.0, also got some important changes on that field!

Functional Python, Part III: The Ghost in the Machine

Tweag

MAY 24, 2023

Tweagers have an engineering mantra — Functional. Typed. Immutable. — that begets composable software which can be reasoned about and avails itself to static analysis. These are all “good things” for building robust software, which inevitably lead us to using languages such as Haskell, OCaml and Rust. However, it would be remiss of us to snub languages that don’t enforce the same disciplines, but are nonetheless popular choices in industry.

Python

Python Programming Language Programming Coding

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Data Modeling - The Unsung Hero of Data Engineering: Architecture Pattern, Tools and the Future (Part 3)

Simon Späti

MAY 26, 2023

Architecture

Architecture Data Engineering Data Engineer Engineering

The Future of AI: Exploring the Next Generation of Generative Models

KDnuggets

MAY 22, 2023

What Generative AI is currently capable of and the current challenges it needs to overcome to explore the next wave of generative AI models?

ArcGIS and Apache Log4j Vulnerabilities

ArcGIS

MAY 22, 2023

Esri's updated statement regarding Log4j vulnerabilities (Log4Shell) and ArcGIS products

Model Risk Management, a true accelerator to corporate AI

databricks

MAY 24, 2023

Special thanks to EY's Mario Schlener, Wissem Bouraoui and Tarek Elguebaly for their support throughout this journey and their contributions to this blog.

Management

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Data Freshness Explained: Making Data Consumers Wildly Happy

Monte Carlo

MAY 26, 2023

What is data freshness and why is it important? Data freshness, sometimes referred to as data timeliness, is the frequency in which data is updated for consumption. It is an important dimension of data quality and a pillar of data observability because recently refreshed data is more accurate, and thus more valuable. Since it is impractical and expensive to have all data refreshed on a near real-time basis, data engineers ingest and process most analytical data in batches with pipelines designed

Data Pipeline

Data Pipeline Data Data Warehouse Machine Learning

A Deep Dive into GPT Models: Evolution & Performance Comparison

KDnuggets

MAY 25, 2023

The blog focuses on GPT models, providing an in-depth understanding and analysis. It explains the three main components of GPT models: generative, pre-trained, and transformers.

IT Process

Discover Your Data’s Depth: Applications of ArcGIS Bathymetry Webinar

ArcGIS

MAY 23, 2023

Discover the power of ArcGIS Bathymetry in our upcoming webinar on June 20th. Learn how this advanced tool can empower your organization.

Driving a Large Language Model Revolution in Customer Service and Support

databricks

MAY 23, 2023

Want to build your own LLM-enabled bot? Download our end-to-end solution accelerator here. Business leaders are universally excited for the potential of large.

Building

Building Machine Learning

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Representation online matters: practical end-to-end diversification in search and recommender…

Pinterest Engineering

MAY 25, 2023

Representation online matters: practical end-to-end diversification in search and recommender systems Bhawna Juneja | Senior Machine Learning Engineer; Pedro Silva | Senior Machine Learning Engineer; Shloka Desai | Machine Learning Engineer II; Ashudeep Singh | Machine Learning Engineer II; Nadia Fawaz | (former) Inclusive AI Tech Lead Introduction Pinterest is a platform designed to bring everyone the inspiration to create a life they love.

Utilities

Utilities Food Machine Learning Algorithm

Free ChatGPT Course: Use The OpenAI API to Code 5 Projects

KDnuggets

MAY 23, 2023

With all the buzz surrounding the ChatGPT. Are you eager to make the most out of it? Here is the FREE video course that offers a comprehensive education about OpenAI API through detailed explanations and hands-on projects.

Project

Project Coding Education IT

A suite of sample geoprocessing tools for managing hyperlinks

ArcGIS

MAY 24, 2023

Learn more about a suite of sample data management tools to enable, add, remove or disable media hyperlinks to feature classes in geodatabases.

Management

Management Media Data Management Data

Asian Employee Network: Celebrating the Expansive Asian Culture

databricks

MAY 26, 2023

The Asian Employee Network (AEN) launched two years ago, during Lunar New Year 2021. AEN was created with the objective of building a.

Building

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

How to mask PII data with FPE using Azure Synapse

Towards Data Science

MAY 22, 2023

Learn to do Format Preserving Encryption (FPE) at scale, securely move data from production to test environments Continue reading on Towards Data Science »

Data Science

Data Science Data Programming Data Engineering

Introducing MPT-7B: A New Open-Source LLM

KDnuggets

MAY 26, 2023

An LLM Trained on 1T Tokens of Text and Code by MosaicML Foundation Series.

Coding

Coding Process

Porting ArcGIS Desktop Schematic Diagrams to ArcGIS Pro Network Diagrams

ArcGIS

MAY 24, 2023

Learn how to port schematic diagrams created with ArcGIS Schematics to network diagrams from utility or trace networks using ArcGIS Pro

Utilities

Utilities Telecommunication Data Management Management

The Executive’s Guide to Data, Analytics and AI Transformation, Part 5: Make informed build vs. buy decisions

databricks

MAY 25, 2023

A key piece of your data and AI transformation strategy will involve the decision around which components of the data ecosystem are built.

Data Analytics

Data Analytics Building Data

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

Writing design docs for data pipelines

Towards Data Science

MAY 22, 2023

Exploring the what, why, and how of design docs for data components — and why they matter.

Designing

Designing Data Pipeline Data Science Data

12 VSCode Tips and Tricks for Python Development

KDnuggets

MAY 25, 2023

Simple tips on doing less and achieving more from VSCode.

Python

Top 5 Marketing Trends from a Chief Marketing Officer

Precisely

MAY 23, 2023

Author’s note: this article about marketing trends has been adapted from an article originally published in The CMO. What are your goals in 2023, and which marketing trends can help you achieve them? In my role as Chief Marketing Officer (CMO) here at Precisely, an important part of what I do is to keep a finger on the pulse of the latest marketing innovations and strategize with my team around how we may be able to capitalize on industry trends to produce even bigger and better results.

Data Integration

Data Integration Technology Process IT

Announcing the Public Preview of Azure Databricks support for Azure confidential computing

databricks

MAY 23, 2023

We are excited to announce Azure Databricks support for Azure confidential computing (ACC) in preview! With this announcement, customers can run their Azure.

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineer

Sat.May 20, 2023 - Fri.May 26, 2023

7 Data Engineering Projects To Put On Your Resume

Layoffs push down scores on Glassdoor: this is how companies respond

Webinars

Trending Sources

Conversation with Sumeet, Software Engineer at Natwest Group

Webinars

Keep Your Data Lake Fresh With Real Time Streams Using Estuary

A Guide to Debugging Apache Airflow® DAGs

GPT-4 is Vulnerable to Prompt Injection Attacks on Causing Misinformation

Neeva Acquired by Snowflake

What is Data Storage and How is it Used?

Sign up to get articles personalized to your interests!

More Trending

What is Data Storage and How is it Used?

Data Modeling - The Unsung Hero of Data Engineering: Architecture Pattern, Tools and the Future (Part 3)

AI is Eating Data Science

What's new in Apache Spark 3.4.0 - Structured Streaming and correctness issue

Functional Python, Part III: The Ghost in the Machine

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Data Modeling - The Unsung Hero of Data Engineering: Architecture Pattern, Tools and the Future (Part 3)

The Future of AI: Exploring the Next Generation of Generative Models

ArcGIS and Apache Log4j Vulnerabilities

Model Risk Management, a true accelerator to corporate AI

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Data Freshness Explained: Making Data Consumers Wildly Happy

A Deep Dive into GPT Models: Evolution & Performance Comparison

Discover Your Data’s Depth: Applications of ArcGIS Bathymetry Webinar

Driving a Large Language Model Revolution in Customer Service and Support

How to Modernize Manufacturing Without Losing Control

Representation online matters: practical end-to-end diversification in search and recommender…

Free ChatGPT Course: Use The OpenAI API to Code 5 Projects

A suite of sample geoprocessing tools for managing hyperlinks

Asian Employee Network: Celebrating the Expansive Asian Culture

The Ultimate Guide to Apache Airflow DAGS

How to mask PII data with FPE using Azure Synapse

Introducing MPT-7B: A New Open-Source LLM

Porting ArcGIS Desktop Schematic Diagrams to ArcGIS Pro Network Diagrams

The Executive’s Guide to Data, Analytics and AI Transformation, Part 5: Make informed build vs. buy decisions

Apache Airflow® Best Practices: DAG Writing

Writing design docs for data pipelines

12 VSCode Tips and Tricks for Python Development

Top 5 Marketing Trends from a Chief Marketing Officer

Announcing the Public Preview of Azure Databricks support for Azure confidential computing

How to Achieve High-Accuracy Results When Using LLMs

Stay Connected