Top Data Engineering Digest Content for June, 2023

June, 2023

Generative AI and the Future of Data Engineering

Monte Carlo

JUNE 6, 2023

Generative AI is taking the world by storm – here’s what it means for data engineering and why data observability is critical for this groundbreaking technology to succeed. Maybe you’ve noticed the world has dumped the internet, mobile, social, cloud and even crypto in favor of an obsession with generative AI. But is there more to generative AI than a fancy demo on Twitter?

Data Engineer

Data Engineer Data Engineering Engineering Business Intelligence

Modern Data Engineering with MAGE: Empowering Efficient Data Processing

Analytics Vidhya

JUNE 20, 2023

Introduction In today’s data-driven world, organizations across industries are dealing with massive volumes of data, complex pipelines, and the need for efficient data processing. Traditional data engineering solutions, such as Apache Airflow, have played an important role in orchestrating and controlling data operations in order to tackle these difficulties.

Data Process

Data Process Data Engineer Data Engineering Process

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

An educational side project

The Pragmatic Engineer

JUNE 1, 2023

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of four topics in today’s subscriber-only The Scoop issue. If you’re not yet a full subscriber, you missed this week’s deep-dive on Agoda’s private cloud setup. To get the full issues, twice a week, subscribe here.

Education

Education Project PostgreSQL Software Engineering

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Data Engineering Podcast

JUNE 25, 2023

Summary Data transformation is a key activity for all of the organizational roles that interact with data. Because of its importance and outsized impact on what is possible for downstream data consumers it is critical that everyone is able to collaborate seamlessly. SQLMesh was designed as a unifying tool that is simple to work with but powerful enough for large-scale transformations and complex projects.

Data Engineer

Data Engineer Data Engineering Python Engineering

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

A Comprehensive Guide to Convolutional Neural Networks

KDnuggets

JUNE 16, 2023

Artificial Intelligence has been witnessing monumental growth in bridging the gap between the capabilities of humans and machines. Researchers and enthusiasts alike, work on numerous aspects of the field to make amazing things happen. One of many such areas is the domain of Computer Vision.

Machine Learning

Introducing English as the New Programming Language for Apache Spark

databricks

JUNE 29, 2023

Introduction We are thrilled to unveil the English SDK for Apache Spark, a transformative tool designed to enrich your Spark experience. Apache Spark™.

Programming Language

Programming Language Programming Designing

GPT-4 + Streaming Data = Real-Time Generative AI

Confluent

JUNE 8, 2023

ChatGPT and data streaming can work together for any company. Learn a basic framework for using GPT-4 and streaming to build a real-world production application.

Data

Data Building

More Trending

GPT-4 + Streaming Data = Real-Time Generative AI

Confluent

JUNE 8, 2023

ChatGPT and data streaming can work together for any company. Learn a basic framework for using GPT-4 and streaming to build a real-world production application.

Data

Data Building

The Journey of a Senior Data Scientist and Machine Learning Engineer at Spice Money

Analytics Vidhya

JUNE 12, 2023

Introduction Meet Tajinder, a seasoned Senior Data Scientist and ML Engineer who has excelled in the rapidly evolving field of data science. Tajinder’s passion for unraveling hidden patterns in complex datasets has driven impactful outcomes, transforming raw data into actionable intelligence. In this article, we explore Tajinder’s inspiring success story.

Machine Learning

Machine Learning Engineering Raw Data Data Science

Domain Registrars which Developers Recommend

The Pragmatic Engineer

JUNE 29, 2023

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of four topics from today’s subscriber-only The Scoop issue. To get full issues twice a week, subscribe here.

AWS

AWS Software Engineering Software Engineer Engineering

Migrating Netflix to GraphQL Safely

Netflix Tech

JUNE 14, 2023

By Jennifer Shin , Tejas Shikhare , Will Emmanuel In 2022, a major change was made to Netflix’s iOS and Android applications. We migrated Netflix’s mobile apps to GraphQL with zero downtime, which involved a total overhaul from the client to the API layer. Until recently, an internal API framework, Falcor , powered our mobile apps. They are now backed by Federated GraphQL , a distributed approach to APIs where domain teams can independently manage and own specific sections of the API.

Utilities

Utilities Java Systems AWS

10 ChatGPT Plugins for Data Science Cheat Sheet

KDnuggets

JUNE 15, 2023

For an overview of what we believe to be the 10 of the best ChatGPT plugins for data science, check out our latest cheat sheet.

Data Science

Data Science Data Machine Learning

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

New Approaches For Detecting AI-Generated Profile Photos

LinkedIn Engineering

JUNE 20, 2023

Co-authors: Shivansh Mundra , Gonzalo Aniano Porcile , Smit Marvaniya , Hany Farid A core part of what we do on the Trust Data Team at LinkedIn is create, deploy, and maintain models that detect and prevent many types of abuse. This spans the detection and prevention of fake accounts, account takeovers, and policy-violating content. We are constantly working to improve and increase the effectiveness of our anti-abuse defenses to protect the experiences of our members and customers.

Media

Media Programming Datasets Engineering

What is a self-serve data platform & how to build one

Start Data Engineering

JUNE 30, 2023

1. Introduction 2. What is self-serve? 2.1. Components of a self-serve platform 3. Building a self-serve data platform 3.1. Creating dataset(s) 3.1.1. Gather requirements 3.1.2. Get data foundations right 3.2. Accessing data 3.3. Identify and remove dependencies 4. Conclusion 5. Further reading 6. References 1. Introduction Most companies want to build a self-serve data platform.

Building

Building Datasets Data Accessible

What Data Engineers Really Do?

Analytics Vidhya

JUNE 25, 2023

In a data-driven world, behind-the-scenes heroes like data engineers play a crucial role in ensuring smooth data flow. Imagine being an online shopper who suddenly receives irrelevant recommendations. A data engineer investigates the issue, identifies a glitch in the e-commerce platform’s data funnel, and swiftly implements seamless data pipelines.

Data Engineer

Data Engineer Data Engineering Engineering Data Pipeline

Google Domains to shut down

The Pragmatic Engineer

JUNE 22, 2023

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of five topics from today’s subscriber-only The Scoop issue. To get full issues twice a week, subscribe here.

Google Cloud

Google Cloud Media Cloud Engineering

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

Ensuring the Successful Launch of Ads on Netflix

Netflix Tech

JUNE 1, 2023

By Jose Fernandez , Ed Barker , Hank Jacobs Introduction In November 2022, we introduced a brand new tier — Basic with ads. This tier extended existing infrastructure by adding new backend components and a new remote call to our ads partner on the playback path. As we were gearing up for launch, we wanted to ensure it would go as smoothly as possible.

Algorithm

Algorithm Kafka Metadata Systems

AI: Large Language & Visual Models

KDnuggets

JUNE 8, 2023

This article discusses the significance of large language and visual models in AI, their capabilities, potential synergies, challenges such as data bias, ethical considerations, and their impact on the market, highlighting their potential for advancing the field of artificial intelligence.

Data

Yes, I'm learning Apache Flink - beginner's problems

Waitingforcode

JUNE 30, 2023

Surprised? You shouldn't. I've always been eager to learn, including 5 years ago when for the first time, I left my Apache Spark comfort zone to explore Apache Beam. Since then I had a chance to write some Dataflow streaming pipelines to fully appreciate this technology and work on AWS, GCP, and Azure. But there is some excitement for learning-from scratch I miss.

AWS

AWS Technology

Exploring Graphs in Rust. Yikes.

Confessions of a Data Guy

JUNE 28, 2023

I’ve been a dog licking my wounds for some time now. Over on my Substack newsletter, I’ve been doing a small series on DSA (Data Structures and Algorithms). I tackled some of the easier stuff first, like Linked Lists, Binary Search, and the like. What’s more, I actually did most of it in Rust, since […] The post Exploring Graphs in Rust.

Algorithm

Algorithm Data IT Big Data

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

Top 10 Powerful Data Modeling Tools to Know in 2023

Analytics Vidhya

JUNE 24, 2023

Introduction In the era of data-driven decision-making, having accurate data modeling tools is essential for businesses aiming to stay competitive. As a new developer, a robust data modeling foundation is crucial for effectively working with databases. Properly configured data structures ensure a smoother workflow and prevent data loss or misplacement.

Database

Database Utilities Data Data Science

Inside Agoda’s Private Cloud - Exclusive

The Pragmatic Engineer

JUNE 13, 2023

👋 Hi, this is Gergely with the monthly, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. If you’re not a subscriber, you missed the issue on Shopify’s leveling split and a few others. Subscribe to get two full issues every week.

Cloud

Cloud Database Utilities BI

Data News — Week 23.25

Christophe Blefari

JUNE 24, 2023

( credits ) Hey, this is the Data News. It's super hard to change habits, but it's how it is, the newsletter is going out on Saturday. I hope this edition finds you well. Summer is coming ☀️ Thank you all because we crossed the 3000 subscribers mark last week. Let's go for the 4000 before the end of the year 🤗 This is a almost-raw edition for this week.

PostgreSQL

PostgreSQL Data Data Engineer Data Engineering

Ten Years of AI in Review

KDnuggets

JUNE 6, 2023

From image classification to chatbot therapy.

Optimizing The Modern Developer Experience with Coder

Many software teams have migrated their testing and production workloads to the cloud, yet development environments often remain tied to outdated local setups, limiting efficiency and growth. This is where Coder comes in. In our 101 Coder webinar, you’ll explore how cloud-based development environments can unlock new levels of productivity. Discover how to transition from local setups to a secure, cloud-powered ecosystem with ease.

Cloud

What's new in Apache Spark 3.4.0 - shuffle changes

Waitingforcode

JUNE 23, 2023

Shuffle is a permanent point in the What's new in Apache Spark series. Why? It's often one the most time consuming part of the jobs and knowing the improvement simply helps writing better pipelines.

Conceptual Introduction to Delta Lake.

Confessions of a Data Guy

JUNE 22, 2023

The post Conceptual Introduction to Delta Lake. appeared first on Confessions of a Data Guy.

Data

Data Big Data Data Engineering Data Engineer

Mr. Pavan’s Data Engineering Journey Drives Business Success

Analytics Vidhya

JUNE 24, 2023

Introduction We had an amazing opportunity to learn from Mr. Pavan. He is an experienced data engineer with a passion for problem-solving and a drive for continuous growth. Throughout the conversation, Mr. Pavan shares his journey, inspirations, challenges, and accomplishments. Thus, providing valuable insights into the field of data engineering. As we explore Mr.

Data Engineer

Data Engineer Data Engineering Engineering Data

An explosion in software engineers using AI coding tools?

The Pragmatic Engineer

JUNE 15, 2023

GitHub surveyed 500 developers in the US for a sense of how they use AI coding tools. I examine the results and add context on how the survey was conducted.

Software Engineering

Software Engineering Software Engineer Coding Engineering

15 Modern Use Cases for Enterprise Business Intelligence

Large enterprises face unique challenges in optimizing their Business Intelligence (BI) output due to the sheer scale and complexity of their operations. Unlike smaller organizations, where basic BI features and simple dashboards might suffice, enterprises must manage vast amounts of data from diverse sources. What are the top modern BI use cases for enterprise businesses to help you get a leg up on the competition?

Business Intelligence

How Column-Aware Development Tooling Yields Better Data Models

Data Engineering Podcast

JUNE 17, 2023

Summary Architectural decisions are all based on certain constraints and a desire to optimize for different outcomes. In data systems one of the core architectural exercises is data modeling, which can have significant impacts on what is and is not possible for downstream use cases. By incorporating column-level lineage in the data modeling process it encourages a more robust and well-informed design.

Data Lake

Data Lake Machine Learning Metadata Data Architecture

Programming Languages for Specific Data Roles

KDnuggets

JUNE 2, 2023

What programming language do you need for a specific data role?

Programming Language

Programming Language Programming Data Data Science

Data News — Week 23.24

Christophe Blefari

JUNE 16, 2023

The newsletter, a metaphor ( credits ) Hello, after the good weather comes the storm. I'm now under the Berlin rain with 20° When I write in these conditions I feel like a tortured author writing a depressing novel while actually today I'll speak about the AI Act, Python, SQL and data platforms. Casual day at the office finally. Some personal news, next Monday and Tuesday I'll be at Berlin Buzzwords, if you're ping me, it would be a pleasure to meet and hang together.

Programming Language

Programming Language SQL PostgreSQL Data

Old Dog Learn New Tricks? Starburst (Trino) Galaxy and other thoughts.

Confessions of a Data Guy

JUNE 20, 2023

Sometimes I think Data Engineering is the same as it was 10+ years ago when I started doing it, and sometimes I think everything has changed. It’s probably both. In some ways, the underlying concepts have not moved an inch, some certain truths and axioms still rule over us all like some distant landlord, requiring […] The post Old Dog Learn New Tricks?

Data Engineer

Data Engineer Data Engineering Engineering IT

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

June, 2023

Generative AI and the Future of Data Engineering

Modern Data Engineering with MAGE: Empowering Efficient Data Processing

Webinars

Trending Sources

An educational side project

Webinars

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

A Comprehensive Guide to Convolutional Neural Networks

Introducing English as the New Programming Language for Apache Spark

GPT-4 + Streaming Data = Real-Time Generative AI

Sign up to get articles personalized to your interests!

More Trending

GPT-4 + Streaming Data = Real-Time Generative AI

The Journey of a Senior Data Scientist and Machine Learning Engineer at Spice Money

Domain Registrars which Developers Recommend

Migrating Netflix to GraphQL Safely

10 ChatGPT Plugins for Data Science Cheat Sheet

Agent Tooling: Connecting AI to Your Tools, Systems & Data

New Approaches For Detecting AI-Generated Profile Photos

What is a self-serve data platform & how to build one

What Data Engineers Really Do?

Google Domains to shut down

How to Modernize Manufacturing Without Losing Control

Ensuring the Successful Launch of Ads on Netflix

AI: Large Language & Visual Models

Yes, I'm learning Apache Flink - beginner's problems

Exploring Graphs in Rust. Yikes.

The Ultimate Guide to Apache Airflow DAGS

Top 10 Powerful Data Modeling Tools to Know in 2023

Inside Agoda’s Private Cloud - Exclusive

Data News — Week 23.25

Ten Years of AI in Review

Optimizing The Modern Developer Experience with Coder

What's new in Apache Spark 3.4.0 - shuffle changes

Conceptual Introduction to Delta Lake.

Mr. Pavan’s Data Engineering Journey Drives Business Success

An explosion in software engineers using AI coding tools?

15 Modern Use Cases for Enterprise Business Intelligence

How Column-Aware Development Tooling Yields Better Data Models

Programming Languages for Specific Data Roles

Data News — Week 23.24

Old Dog Learn New Tricks? Starburst (Trino) Galaxy and other thoughts.

Apache Airflow® Best Practices: DAG Writing

Stay Connected