Top Data Engineering Digest Pipeline-centric Data Storage Content for Week of Jun 24

Sat.Jun 24, 2023 - Fri.Jun 30, 2023

What Data Engineers Really Do?

Analytics Vidhya

JUNE 25, 2023

In a data-driven world, behind-the-scenes heroes like data engineers play a crucial role in ensuring smooth data flow. Imagine being an online shopper who suddenly receives irrelevant recommendations. A data engineer investigates the issue, identifies a glitch in the e-commerce platform’s data funnel, and swiftly implements seamless data pipelines.

Data Engineering

Data Engineering Data Engineer Engineering Data Pipeline

Domain Registrars which Developers Recommend

The Pragmatic Engineer

JUNE 29, 2023

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. In every issue, I cover topics related to Big Tech and startups through the lens of engineering managers and senior engineers. In this article, we cover one out of four topics from today’s subscriber-only The Scoop issue. To get full issues twice a week, subscribe here.

AWS

AWS Software Engineer Software Engineering Engineering

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Data Engineering Podcast

JUNE 25, 2023

Summary Data transformation is a key activity for all of the organizational roles that interact with data. Because of its importance and outsized impact on what is possible for downstream data consumers it is critical that everyone is able to collaborate seamlessly. SQLMesh was designed as a unifying tool that is simple to work with but powerful enough for large-scale transformations and complex projects.

Data Engineering

Data Engineering Data Engineer Python Engineering

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Will ChatGPT Replace Data Scientists?

KDnuggets

JUNE 30, 2023

Every job is at risk. Here’s how you can AI-proof your career.

Data

Data Data Science

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Top 10 Powerful Data Modeling Tools to Know in 2023

Analytics Vidhya

JUNE 24, 2023

Introduction In the era of data-driven decision-making, having accurate data modeling tools is essential for businesses aiming to stay competitive. As a new developer, a robust data modeling foundation is crucial for effectively working with databases. Properly configured data structures ensure a smoother workflow and prevent data loss or misplacement.

Database

Database Utilities Data Data Science

Introducing English as the New Programming Language for Apache Spark

databricks

JUNE 29, 2023

Introduction We are thrilled to unveil the English SDK for Apache Spark, a transformative tool designed to enrich your Spark experience. Apache Spark™.

Programming Language

Programming Language Programming Designing

Meta developer tools: Working at scale

Engineering at Meta

JUNE 27, 2023

Every day, thousands of developers at Meta are working in repositories with millions of files. Those developers need tools that help them at every stage of the workflow while working at extreme scale. In this article we’ll go through a few of the tools in the development process. And, as an added bonus, those we talk about below are open source so you can try them yourself.

Java

Java Programming Language Algorithm Programming

More Trending

Meta developer tools: Working at scale

Engineering at Meta

JUNE 27, 2023

Java

Java Programming Language Algorithm Programming

From Theory to Practice: Building a k-Nearest Neighbors Classifier

KDnuggets

JUNE 27, 2023

The k-Nearest Neighbors Classifier is a machine learning algorithm that assigns a new data point to the most common class among its k closest neighbors. In this tutorial, you will learn the basic steps of building and applying this classifier in Python.

Building

Building Machine Learning Algorithm Python

Mr. Pavan’s Data Engineering Journey Drives Business Success

Analytics Vidhya

JUNE 24, 2023

Introduction We had an amazing opportunity to learn from Mr. Pavan. He is an experienced data engineer with a passion for problem-solving and a drive for continuous growth. Throughout the conversation, Mr. Pavan shares his journey, inspirations, challenges, and accomplishments. Thus, providing valuable insights into the field of data engineering. As we explore Mr.

Data Engineering

Data Engineering Data Engineer Engineering Data

What is a self-serve data platform & how to build one

Start Data Engineering

JUNE 30, 2023

1. Introduction 2. What is self-serve? 2.1. Components of a self-serve platform 3. Building a self-serve data platform 3.1. Creating dataset(s) 3.1.1. Gather requirements 3.1.2. Get data foundations right 3.2. Accessing data 3.3. Identify and remove dependencies 4. Conclusion 5. Further reading 6. References 1. Introduction Most companies want to build a self-serve data platform.

Building

Building Datasets Data Accessibility

Yes, I'm learning Apache Flink - beginner's problems

Waitingforcode

JUNE 30, 2023

Surprised? You shouldn't. I've always been eager to learn, including 5 years ago when for the first time, I left my Apache Spark comfort zone to explore Apache Beam. Since then I had a chance to write some Dataflow streaming pipelines to fully appreciate this technology and work on AWS, GCP, and Azure. But there is some excitement for learning-from scratch I miss.

AWS

AWS Technology

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Exploring Graphs in Rust. Yikes.

Confessions of a Data Guy

JUNE 28, 2023

I’ve been a dog licking my wounds for some time now. Over on my Substack newsletter, I’ve been doing a small series on DSA (Data Structures and Algorithms). I tackled some of the easier stuff first, like Linked Lists, Binary Search, and the like. What’s more, I actually did most of it in Rust, since […] The post Exploring Graphs in Rust.

Algorithm

Algorithm Data IT Big Data

Data News — Week 23.25

Christophe Blefari

JUNE 24, 2023

( credits ) Hey, this is the Data News. It's super hard to change habits, but it's how it is, the newsletter is going out on Saturday. I hope this edition finds you well. Summer is coming ☀️ Thank you all because we crossed the 3000 subscribers mark last week. Let's go for the 4000 before the end of the year 🤗 This is a almost-raw edition for this week.

PostgreSQL

PostgreSQL Data Data Engineering Data Engineer

The Importance of Reproducibility in Machine Learning

KDnuggets

JUNE 27, 2023

And how approaches to better data management, version control, and experiment tracking can help build reproducible ML pipelines.

Machine Learning

Machine Learning Data Management Building Management

Lakehouse AI: a data-centric approach to building Generative AI applications

databricks

JUNE 27, 2023

Generative AI will have a transformative impact on every business. Databricks has been pioneering AI innovations for a decade, actively collaborating with thousands.

Building

Building Data

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Building Real-time Machine Learning Foundations at Lyft

Lyft Engineering

JUNE 28, 2023

Written by Konstantin Gizdarski and Martin Liu at Lyft. In early 2022, Lyft already had a comprehensive Machine Learning Platform called LyftLearn composed of model serving , training , CI/CD, feature serving , and model monitoring systems. On the real-time front, LyftLearn supported real-time inference and input feature validation. However, streaming data was not supported as a first-class citizen across many of the platform’s systems — such as training, complex monitoring, and others.

Machine Learning

Machine Learning Building Kafka Metadata

Declarative Data Pipelines with Hoptimator

LinkedIn Engineering

JUNE 26, 2023

For the last several years, internal infrastructure at LinkedIn has been built around a self-service model, enabling developers to onboard themselves with minimal support. We have various user experiences that let application teams provision their own resources and infrastructure, generally by filling out forms or using command-line tools. For example, developers can provision Kafka topics, Espresso tables, Venice stores and more via Nuage , our internal cloud-like infra management platform.

Data Pipeline

Data Pipeline Kafka SQL MySQL

Stable Diffusion: Basic Intuition Behind Generative AI

KDnuggets

JUNE 29, 2023

This article provides a general overview of Stable Diffusion and focuses on building a basic understanding of how generative artificial intelligence works.

Building

Introducing LakehouseIQ: The AI-Powered Engine that Uniquely Understands your Business

databricks

JUNE 27, 2023

Today, we are thrilled to announce LakehouseIQ, a knowledge engine that learns the unique nuances of your business and data to power natural.

Engineering

Engineering Data

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

What Is an Event in the Apache Kafka Ecosystem?

Confluent

JUNE 28, 2023

Get an introduction into the world of events and event-driven architecture in Apache Kafka. Learn what events are and the role they play in event design, event streaming, and event-driven design.

Kafka

Kafka Architecture Designing

From community to creation—celebrating a year of Product Ideas

ThoughtSpot

JUNE 27, 2023

Active listening is an admired and sought after skill in both the professional and personal sphere. After all, who doesn’t love to be heard? But what happens when we apply that mindset to the way our organizations solicit feedback and interact with our customers? We don’t have to make any assumptions to answer this question, because we have the data.

Programming

Programming Data Science Data Analytics Process

ChatGPT Plugins: Everything You Need To Know

KDnuggets

JUNE 26, 2023

Learn more about the third-party plugins that OpenAI have rolled out to understand ChatGPTs in real-world use.

Introducing Lakehouse Federation Capabilities in Unity Catalog

databricks

JUNE 27, 2023

Data teams face many challenges to quickly access the right data primarily due to data fragmentation, time and cost involved in consolidating data.

Accessibility

Accessibility Accessible Data

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

Measuring Performance Improvements with the Snowflake Performance Index

Snowflake

JUNE 27, 2023

At Snowflake Summit , we announced the public launch of the Snowflake Performance Index (“SPI”), an aggregate index for measuring real-world improvements in Snowflake performance experienced by customers over time. At Snowflake, our product philosophy revolves around continuously enhancing the performance of our product, particularly the core database engine.

Database

Database Designing Accessible Accessibility

ThoughtSpot acquires Mode: Empowering data teams to bring Generative AI to BI

ThoughtSpot

JUNE 26, 2023

At ThoughtSpot, we know how important it is for businesses of every size and industry to empower every knowledge worker with personalized, actionable data-driven insights. These insights are your secret sauce to making better business decisions, growing faster, and delivering customer experiences that keep people coming back for more. But how do you scale self-service analytics to business users without completely overwhelming your data teams?

BI Business Intelligence Datasets Government

5 Free Books on Natural Language Processing to Read in 2023

KDnuggets

JUNE 29, 2023

Large language models are getting released left right and center, and if you want to understand them better you need to know about NLP. Here are 5 Free books to help you.

Process

Introducing Materialized Views and Streaming Tables for Databricks SQL

databricks

JUNE 27, 2023

We are thrilled to announce that materialized views and streaming tables are now publicly available in Databricks SQL on AWS and Azure. Streaming.

SQL

SQL AWS

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

The Verdict Is In: Maxa Is the 2023 Snowflake Startup Winner

Snowflake

JUNE 30, 2023

Since launching this year’s contest in October, receiving hundreds of submissions, and completing three rounds of judging, the wait is over: Maxa is the 2023 Snowflake Startup Challenge grand prize winner! Maxa’s goal is to automate financial and operations ERP insights extremely fast and without requiring special skills. To make that happen, it leverages the breadth of the Snowflake platform to transform raw data from multiple financial and operational systems into a common data model that user

Unstructured Data

Unstructured Data Raw Data Python SQL

Celebrating Pride with ThoughtSpot's Rainbow Room ERG

ThoughtSpot

JUNE 30, 2023

Pride is more than just a month-long celebration; it is a powerful movement that reminds us of the importance of equality, acceptance, and love. It is that special time of year for the global queer community to come together to celebrate, commemorate, and continue to push for progress. It’s no different here at ThoughtSpot. We believe in creating an inclusive environment where everyone feels seen, heard, and valued.

Education

Education Recruitment Programming Building

A Comparison of Machine Learning Algorithms in Python and R

KDnuggets

JUNE 26, 2023

This list of the most commonly used machine learning algorithms in Python and R is intended to help novice engineers and enthusiasts get familiar with the most commonly used algorithms.

Machine Learning

Machine Learning Algorithm Python Engineering

High resolution data updates to Living Atlas World Elevation Layers and Tools (June 2023)

ArcGIS

JUNE 30, 2023

In June 2023, elevation layers have been updated with high-res datasets of England, New Zealand and USA

Datasets

Datasets Data

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineer

Sat.Jun 24, 2023 - Fri.Jun 30, 2023

What Data Engineers Really Do?

Domain Registrars which Developers Recommend

Webinars

Trending Sources

Seamless SQL And Python Transformations For Data Engineers And Analysts With SQLMesh

Webinars

Will ChatGPT Replace Data Scientists?

A Guide to Debugging Apache Airflow® DAGs

Top 10 Powerful Data Modeling Tools to Know in 2023

Introducing English as the New Programming Language for Apache Spark

Meta developer tools: Working at scale

Sign up to get articles personalized to your interests!

More Trending

Meta developer tools: Working at scale

From Theory to Practice: Building a k-Nearest Neighbors Classifier

Mr. Pavan’s Data Engineering Journey Drives Business Success

What is a self-serve data platform & how to build one

Yes, I'm learning Apache Flink - beginner's problems

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Exploring Graphs in Rust. Yikes.

Data News — Week 23.25

The Importance of Reproducibility in Machine Learning

Lakehouse AI: a data-centric approach to building Generative AI applications

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Building Real-time Machine Learning Foundations at Lyft

Declarative Data Pipelines with Hoptimator

Stable Diffusion: Basic Intuition Behind Generative AI

Introducing LakehouseIQ: The AI-Powered Engine that Uniquely Understands your Business

How to Modernize Manufacturing Without Losing Control

What Is an Event in the Apache Kafka Ecosystem?

From community to creation—celebrating a year of Product Ideas

ChatGPT Plugins: Everything You Need To Know

Introducing Lakehouse Federation Capabilities in Unity Catalog

The Ultimate Guide to Apache Airflow DAGS

Measuring Performance Improvements with the Snowflake Performance Index

ThoughtSpot acquires Mode: Empowering data teams to bring Generative AI to BI

5 Free Books on Natural Language Processing to Read in 2023

Introducing Materialized Views and Streaming Tables for Databricks SQL

Apache Airflow® Best Practices: DAG Writing

The Verdict Is In: Maxa Is the 2023 Snowflake Startup Winner

Celebrating Pride with ThoughtSpot's Rainbow Room ERG

A Comparison of Machine Learning Algorithms in Python and R

High resolution data updates to Living Atlas World Elevation Layers and Tools (June 2023)

How to Achieve High-Accuracy Results When Using LLMs

Stay Connected