Top Data Engineering Digest Data Engineer Data Engineering Content for Week of Jun 03

Sat.Jun 03, 2023 - Fri.Jun 09, 2023

Generative AI and the Future of Data Engineering

Monte Carlo

JUNE 6, 2023

Generative AI is taking the world by storm – here’s what it means for data engineering and why data observability is critical for this groundbreaking technology to succeed. Maybe you’ve noticed the world has dumped the internet, mobile, social, cloud and even crypto in favor of an obsession with generative AI. But is there more to generative AI than a fancy demo on Twitter?

Data Engineering

Data Engineering Data Engineer Engineering Business Intelligence

Data Scientist’s Insights: Strategies for Innovation and Leadership

Analytics Vidhya

JUNE 5, 2023

Introduction Welcome back to the success story interview series with a successful data scientist and our DataHour Speaker, Vidhya Chandrasekaran! In today’s data-driven world, data scientists play a crucial role in helping businesses make informed decisions by analyzing and interpreting data. With their expertise in statistics, machine learning, AI, and programming, they are able to […] The post Data Scientist’s Insights: Strategies for Innovation and Leadership appeared first

Machine Learning

Machine Learning Data Programming Deep Learning

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Should you optimize for all-cash compensation, if possible?

The Pragmatic Engineer

JUNE 8, 2023

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and high-growth startups through the lens of engineering managers and senior engineers. In this article, we cover one out of four topics from today’s subscriber-only The Scoop issue. If you’re not a full subscriber yet, you missed this week’s deep-dive on Shopify’s leveling split.

Software Engineer

Software Engineer Software Engineering Media Engineering

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

AI: Large Language & Visual Models

KDnuggets

JUNE 8, 2023

This article discusses the significance of large language and visual models in AI, their capabilities, potential synergies, challenges such as data bias, ethical considerations, and their impact on the market, highlighting their potential for advancing the field of artificial intelligence.

Data

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

GPT-4 + Streaming Data = Real-Time Generative AI

Confluent

JUNE 8, 2023

ChatGPT and data streaming can work together for any company. Learn a basic framework for using GPT-4 and streaming to build a real-world production application.

Data

Data Building

Gotchas of Streaming Pipelines: Profiling & Performance Improvements

Lyft Engineering

JUNE 6, 2023

Discover how Lyft identified and fixed performance issues in our streaming pipelines. Background Every streaming pipeline is unique. When reviewing a pipeline’s performance, we ask the following questions: “Is there a bottleneck?”, “Is the pipeline performing optimally?”, “Will it continue to scale with increased load?” Regularly asking these questions are vital to avoid scrambling to fix performance issues at the last minute.

Utilities

Utilities Coding Python Systems

What's new on the cloud for data engineers - part 10 (03-05.2023)

Waitingforcode

JUNE 9, 2023

It's time for another part of "What's new on the cloud for data engineers" Let's see what happened in the last 3 months.

Data Engineering

Data Engineering Data Engineer Cloud Engineering

More Trending

What's new on the cloud for data engineers - part 10 (03-05.2023)

Waitingforcode

JUNE 9, 2023

It's time for another part of "What's new on the cloud for data engineers" Let's see what happened in the last 3 months.

Data Engineering

Data Engineering Data Engineer Cloud Engineering

Ten Years of AI in Review

KDnuggets

JUNE 6, 2023

From image classification to chatbot therapy.

Data News — Week 23.23

Christophe Blefari

JUNE 9, 2023

Rethinking the newsletter ( credits ) Here's a new edition of the Data News newsletter. Since my 2-year anniversary post, I've been struggling to find the right writing rhythm. I've been sick and I've been stuck on a client project. Writing the newsletter was not an easy exercise. Even though I keep telling myself "it's not a question of motivation, it's a question of discipline" like a LinkedIn guy.

Data

Data Government SQL Coding

4 Ways To Setup Your Data Engineering Game.

Confessions of a Data Guy

JUNE 8, 2023

One of my greatest pleasures in life is watching the r/dataengineering Reddit board, I find it very entertaining and enlightening on many levels. It gives a fairly unique view into the wide range of Data Engineering companies, jobs, projects people are working on, tech stacks, and problems that are being faced. One thing I’ve come […] The post 4 Ways To Setup Your Data Engineering Game. appeared first on Confessions of a Data Guy.

Data Engineering

Data Engineering Data Engineer Engineering Entertainment

Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service

Data Engineering Podcast

JUNE 4, 2023

Summary A significant portion of the time spent by data engineering teams is on managing the workflows and operations of their pipelines. DataOps has arisen as a parallel set of practices to that of DevOps teams as a means of reducing wasted effort. Agile Data Engine is a platform designed to handle the infrastructure side of the DataOps equation, as well as providing the insights that you need to manage the human side of the workflow.

Data Engineering

Data Engineering Data Engineer Engineering Data Lake

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Getting Started with ReactPy

KDnuggets

JUNE 9, 2023

A Beginners Guide to Building Web Applications without JavaScript.

Building

Building Python

Data News — Week 23.22

Christophe Blefari

JUNE 3, 2023

Sun is coming in Berlin ( credits ) Hey, I've been sick longer than I expected, but I'm finally well. I hope this email finds you all well, as well. I've had to catch up on almost 3 weeks of content. When I step back, the amount of articles shared each week is insane, there are countless articles about things that have already been written.

Data Pipeline

Data Pipeline Data SQL Algorithm

Extending Databricks Unity Catalog with an Open Apache Hive Metastore API

databricks

JUNE 9, 2023

Today, we are excited to announce the preview of a Hive Metastore (HMS) interface for Databricks Unity Catalog, which allows any software compatible.

Native Frame Rate Playback

Netflix Tech

JUNE 5, 2023

by Akshay Garg , Roger Quero Introduction Maximizing immersion for our members is an important goal for the Netflix product and engineering teams to keep our members entertained and fully engaged in our content. Leveraging a good mix of mature and cutting-edge client device technologies to deliver a smooth playback experience with glitch-free in-app transitions is an important step towards achieving this goal.

Algorithm

Algorithm Entertainment Data Science Manufacturing

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

The Art of Prompt Engineering: Decoding ChatGPT

KDnuggets

JUNE 7, 2023

Mastering the principles and practices of AI interaction with OpenAI and DeepLearning.AI’s course.

Engineering

Announcing halide-haskell - a Haskell interface for the Halide image and array processing language

Tweag

JUNE 7, 2023

The availability of deep learning frameworks like PyTorch or JAX has revolutionized array processing, regardless of whether one is working on machine learning tasks or other numerical algorithms. The Haskell library ecosystem has been catching up as well, and there are now multiple good array libraries. However, writing high-performance array processing code in Haskell is still a non-trivial endeavor.

Process

Process Coding Python Deep Learning

Now Available: New Generative AI Learning Offerings

databricks

JUNE 6, 2023

Announcing a new portfolio of Generative AI learning offerings on Databricks Academy Enroll in the Large Language Models: Application through Production on Databricks.

Portfolio

Portfolio Machine Learning

Lyft Expands Team to Czechia

Lyft Engineering

JUNE 8, 2023

Introducing Lyft Engineering: Hello Czechia! Ahoj! Lyft is opening offices in Czechia ?? and hiring for full-time positions on end-to-end product, science, and engineering teams. We’re looking for driven engineers to fortify our European operations and solve some of the hardest problems in building large distributed systems to support rideshare, mapping, and more.

Transportation

Transportation Algorithm Machine Learning Engineering

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Advanced Feature Selection Techniques for Machine Learning Models

KDnuggets

JUNE 6, 2023

Mastering Feature Selection: An Exploration of Advanced Techniques for Supervised and Unsupervised Machine Learning Models.

Machine Learning

Graphical cartograms in ArcGIS Pro

ArcGIS

JUNE 7, 2023

Tool to make Dorling and Demers cartograms in ArcGIS Pro

Designing

Announcing MLflow 2.4: LLMOps Tools for Robust Model Evaluation

databricks

JUNE 7, 2023

LLMs present a massive opportunity for organizations of all scales to quickly build powerful applications and deliver business value. Where data scientists used.

Building

Building Machine Learning Data

Who Is Responsible For Data Quality? 5 Different Answers From Real Data Teams

Monte Carlo

JUNE 6, 2023

Sure, data quality is everyones’ problem. But who is responsible for data quality? Given the variations in approach and mixed success, we have a lot of natural experiments from which to learn. Some organizations will attempt to diffuse the responsibility widely across data stewards, data owners, data engineering and governance committees, each owning a fraction of the data value chain.

Data Governance

Data Governance Government Data Data Engineering

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

Mastering the Art of Data Storytelling: A Guide for Data Scientists

KDnuggets

JUNE 8, 2023

How to dazzle others with your cool data science insights by mastering the art of data storytelling.

Data Science

Data Science Data

Five tips to create a better index

ArcGIS

JUNE 9, 2023

Read about five tips you can apply to avoid some of the most common pitfalls in creating a composite index.

Government

Government Data Science Data

My (Very) Personal Data Warehouse

Towards Data Science

JUNE 6, 2023

Fitbit activity analysis with DuckDB Photo by Jake Hills on Unsplash Wearable fitness trackers have become an integral part of our lives, collecting and tracking data about our daily activities, sleep patterns, location, heart rate, and much more. I’ve been using a Fitbit device for 6 years to monitor my health. However, I have always found the data analysis capabilities lacking — especially when I wanted to track my progress against long term fitness goals.

Data Warehouse

Data Warehouse Data SQL Data Analysis

Unleashing the Power of Data Collaboration

databricks

JUNE 4, 2023

In today's data-driven landscape, organizations face the challenge of aggregating data to derive meaningful insights that enrich audience profiles. Traditional data integration methods.

Aggregated Data

Aggregated Data Data Data Integration Entertainment

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

ChatGPT for Data Science Interview Cheat Sheet

KDnuggets

JUNE 6, 2023

Check out our latest cheat sheet! Learn how to leverage ChatGPT for data science interview preparation.

Data Science

Data Science Data Machine Learning

Understanding global water quality trends

ArcGIS

JUNE 5, 2023

Eutrophication is driven by enrichment of waters by nutrients resulting in adverse changes in the balance of organisms and water quality.

Which Team Should Own Data Quality?

Towards Data Science

JUNE 8, 2023

Specialists or generalists? Engineer or analyst? We examine which team structures are the best suited for efficiently improving data quality. Image courtesy of Shane Murray. Sure, data quality is everyones’ problem. But who owns the solution? Given the variations in approach and mixed success, we have a lot of natural experiments from which to learn.

Data Governance

Data Governance Government Generalist Data Engineer

Large Language Models in Media & Entertainment

databricks

JUNE 6, 2023

The Media & Entertainment industry is in the midst of a revolution centered around data and putting consumers at the center of every.

Entertainment

Entertainment Media Data

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineer

Sat.Jun 03, 2023 - Fri.Jun 09, 2023

Generative AI and the Future of Data Engineering

Data Scientist’s Insights: Strategies for Innovation and Leadership

Webinars

Trending Sources

Should you optimize for all-cash compensation, if possible?

Webinars

AI: Large Language & Visual Models

A Guide to Debugging Apache Airflow® DAGs

GPT-4 + Streaming Data = Real-Time Generative AI

Gotchas of Streaming Pipelines: Profiling & Performance Improvements

What's new on the cloud for data engineers - part 10 (03-05.2023)

Sign up to get articles personalized to your interests!

More Trending

What's new on the cloud for data engineers - part 10 (03-05.2023)

Ten Years of AI in Review

Data News — Week 23.23

4 Ways To Setup Your Data Engineering Game.

Reduce The Overhead In Your Pipelines With Agile Data Engine's DataOps Service

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Getting Started with ReactPy

Data News — Week 23.22

Extending Databricks Unity Catalog with an Open Apache Hive Metastore API

Native Frame Rate Playback

Agent Tooling: Connecting AI to Your Tools, Systems & Data

The Art of Prompt Engineering: Decoding ChatGPT

Announcing halide-haskell - a Haskell interface for the Halide image and array processing language

Now Available: New Generative AI Learning Offerings

Lyft Expands Team to Czechia

How to Modernize Manufacturing Without Losing Control

Advanced Feature Selection Techniques for Machine Learning Models

Graphical cartograms in ArcGIS Pro

Announcing MLflow 2.4: LLMOps Tools for Robust Model Evaluation

Who Is Responsible For Data Quality? 5 Different Answers From Real Data Teams

The Ultimate Guide to Apache Airflow DAGS

Mastering the Art of Data Storytelling: A Guide for Data Scientists

Five tips to create a better index

My (Very) Personal Data Warehouse

Unleashing the Power of Data Collaboration

Apache Airflow® Best Practices: DAG Writing

ChatGPT for Data Science Interview Cheat Sheet

Understanding global water quality trends

Which Team Should Own Data Quality?

Large Language Models in Media & Entertainment

How to Achieve High-Accuracy Results When Using LLMs

Stay Connected