Top Data Engineering Digest Data Engineer Data Engineering Content for Week of Apr 01

Sat.Apr 01, 2023 - Fri.Apr 07, 2023

Data Engineering for Streaming Data on GCP

Analytics Vidhya

APRIL 3, 2023

Introduction Companies can access a large pool of data in the modern business environment, and using this data in real-time may produce insightful results that can spur corporate success. Real-time dashboards such as GCP provide strong data visualization and actionable information for decision-makers. Nevertheless, setting up a streaming data pipeline to power such dashboards may […] The post Data Engineering for Streaming Data on GCP appeared first on Analytics Vidhya.

Data Engineering

Data Engineering Data Engineer Engineering Data

Behind the Scenes with Two New Salary Transparency Websites

The Pragmatic Engineer

APRIL 6, 2023

👋 Hi, this is Gergely with a bonus, free issue of the Pragmatic Engineer Newsletter. We cover one out of five topics in today’s subscriber-only The Scoop issue. If you’re not yet a full subscriber, you missed this week’s deep-dive into Figma’s engineering culture. To get full newsletters twice a week, subscribe here.

Software Engineer

Software Engineer Software Engineering Datasets Database

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

The Future of Work: How AI is Changing the Job Landscape

KDnuggets

APRIL 6, 2023

With more and more companies integrating artificial intelligence into the workplace, what does this mean for employees' futures and careers?

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Build faster with Buck2: Our open source build system

Engineering at Meta

APRIL 6, 2023

Buck2, our new open source, large-scale build system , is now available on GitHub. Buck2 is an extensible and performant build system written in Rust and designed to make your build experience faster and more efficient. In our internal tests at Meta, we observed that Buck2 completed builds 2x as fast as Buck1. Buck2, Meta’s open source large-scale build system, is now publicly available via the Buck2 website and the Buck2 GitHub repository.

Building

Building Systems Java Coding

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Table file formats - Z-Order compaction: Apache Iceberg

Waitingforcode

APRIL 7, 2023

Last time you discovered the Z-Order compaction in Delta Lake. But guess what? Apache Iceberg also has this feature!

QuickSort in Rust!

Confessions of a Data Guy

APRIL 6, 2023

The post QuickSort in Rust! appeared first on Confessions of a Data Guy.

Data

LangChain 101: Build Your Own GPT-Powered Applications

KDnuggets

APRIL 3, 2023

LangChain is a Python library that helps you build GPT-powered applications in minutes. Get started with LangChain by building a simple question-answering app.

Building

Building Python Process

More Trending

LangChain 101: Build Your Own GPT-Powered Applications

KDnuggets

APRIL 3, 2023

LangChain is a Python library that helps you build GPT-powered applications in minutes. Get started with LangChain by building a simple question-answering app.

Building

Building Python Process

Data Modeling – The Unsung Hero of Data Engineering: An Introduction to Data Modeling (Part 1)

Simon Späti

APRIL 3, 2023

Amidst the excitement and hype surrounding artificial intelligence, the significance of data engineering and its critical foundation—data modeling—can often be overlooked. This article is the first in a three-part series that will shine a spotlight on the fascinating world of data modeling, delving into its crucial importance within the broader context of data engineering.

Data Engineering

Data Engineering Data Engineer Engineering Data Architecture

Mapping The Data Infrastructure Landscape As A Venture Capitalist

Data Engineering Podcast

APRIL 2, 2023

Summary The data ecosystem has been building momentum for several years now. As a venture capital investor Matt Turck has been trying to keep track of the main trends and has compiled his findings into the MAD (ML, AI, and Data) landscape reports each year. In this episode he shares his experiences building those reports and the perspective he has gained from the exercise.

Hadoop

Hadoop Machine Learning Python Architecture

Conda Init and ArcGIS Pro

ArcGIS

APRIL 7, 2023

We're happy to announce the conda init command is now enabled for ArcGIS users of Python! Learn about how to use it, how it works, and benefits.

Python

Python IT

Exploring Data Cleaning Techniques With Python

KDnuggets

APRIL 4, 2023

Tutorial on data cleaning techniques using Python.

Python

Python Data Data Science

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Data Modeling – The Unsung Hero of Data Engineering: An Introduction to Data Modeling (Part 1)

Simon Späti

APRIL 3, 2023

Data Engineering

Data Engineering Data Engineer Engineering Data Architecture

Inside Look: Measuring Developer Productivity and Happiness at LinkedIn

LinkedIn Engineering

APRIL 4, 2023

Authors: Viktoras Truchanovicius and Selina Zhang At LinkedIn, developer productivity and happiness has always been a priority. It is critical for our engineering leaders to understand how efficiently and effectively their teams are operating to continuously deliver value-added features for our members and build an industry-leading engineering culture.

MySQL

MySQL Datasets Software Engineering Software Engineer

Loading IFC files into the ArcGIS Indoors Model

ArcGIS

APRIL 6, 2023

Organizations with IFC files can still reap the benefits of an ArcGIS Indoors deployment by following these recommendations.

Data Management

Data Management Architecture Management Engineering

RAPIDS cuDF to Speed up Your Next Data Science Workflow

KDnuggets

APRIL 3, 2023

This article will explain how RAPIDS can help you speed up your next data science workflow. RAPIDS cuDF is a GPU DataFrame library that allows you to produce your end-to-end data science pipeline development all on GPU.

Data Science

Data Science Data

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Snowflake Startup Challenge 2023: Meet the 10 Semi-Finalists

Snowflake

APRIL 7, 2023

Spring has sprung—and with it comes a new crop of Snowflake Startup Challenge semi-finalists! The 2023 submission pool was the largest to date—twice as many submissions as last year—with entries that spanned not just the globe but the breadth of the Snowflake platform. Our judges put a lot of careful consideration into selecting the top 10, and we offer our sincere thanks to every company that sent in an entry this year—we know how much hard work goes into these submissions, and we appreciate it

Raw Data

Raw Data Portfolio Building SQL

Introducing Entity-Centric Data Modeling for Analytics

Preset

APRIL 5, 2023

Entity-centric modeling is a data modeling approach focusing on enriching tabular datasets with useful "features" to enable segmentation, cohort creation, and complex classification analyses easier.

Datasets

Datasets Data

Build, Analyze, and Filter Catalog Layers in ArcGIS Pro

ArcGIS

APRIL 4, 2023

ArcGIS Pro 3.1 introduces a new layer type—catalog layers—and this blog covers how they could be used in your analytic workflows.

Building

Building Datasets Data Management Management

8 Open-Source Alternative to ChatGPT and Bard

KDnuggets

APRIL 6, 2023

Discover the widely-used open-source frameworks and models for creating your ChatGPT like chatbots, integrating LLMs, or launching your AI product.

Process

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

The BEST Resources to Level Up Your Data Streaming Knowledge!

Confluent

APRIL 5, 2023

All the best data streaming resources, tips, and guides to help you learn introductory concepts, streaming architecture basics, common tools and technologies, and more.

Architecture

Architecture Data Technology

Databricks for GxP

databricks

APRIL 6, 2023

What is GxP? GxP stands for "Good x Practices," where x refers to a specific discipline, such as clinical, manufacturing, or laboratory. The.

Manufacturing

Data Observability for Analytics and ML teams

Towards Data Science

APRIL 6, 2023

Principles, practices, and examples for ensuring high quality data flows Source: DreamStudio (generated by author) Nearly 100% of companies today rely on data to power business opportunities and 76% use data as an integral part of forming a business strategy. In today’s age of digital business, an increasing number of decisions companies make when it comes to delivering customer experience, building trust, and shaping their business strategy begins with accurate data.

Unstructured Data

Unstructured Data Metadata Data Coding

My Data Science Six Months Success Story

KDnuggets

APRIL 6, 2023

I will be sharing a couple of things I have learned in the past six months and tips that helped me stay dedicated and true to my journey in this article.

Data Science

Data Science Data

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

Uniting the Machine Learning and Data Streaming Ecosystems - Part 2

Confluent

APRIL 4, 2023

Machine learning and data streaming are a perfect match, but have diverging tech stacks. How can we overcome the pitfalls of SQL and the gulf between languages?

Machine Learning

Machine Learning SQL Data

Exciting new updates coming to Workflows in April

databricks

APRIL 4, 2023

Databricks is excited to announce the release of several exciting new Workflows features that will simplify the way you create and launch automated.

A Gentle Introduction to Analytical Stream Processing

Towards Data Science

APRIL 3, 2023

Building a Mental Model for Engineers and Anyone in Between Stream Processing can be handled gently and with care, or wildly, and almost out of control! You be the judge of what future you’d rather embrace. credit: @psalms original_photo Introduction In many cases, processing data in-stream, or as it becomes available, can help reduce an enormous data problem (due to the volume and scale of the flow of data) into a more manageable one.

Process

Process Data Lake Systems Data Engineering

Text Summarization Development: A Python Tutorial with GPT-3.5

KDnuggets

APRIL 7, 2023

Utilizing the power of GPT-3.5 to develop a simple summarize generator application.

Python

Python Utilities Process

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

Python Monorepo: an Example. Part 1: Structure and Tooling

Tweag

APRIL 3, 2023

For a software team to be successful, you need excellent communication. That is why we want to build systems that foster cross-team communication. Using a monorepo is an excellent way to do that. A monorepo provides: Visibility: by seeing the pull requests (PRs) of colleagues, you are easily informed of what other teams are doing. Uniformity: by working in one central repository, it is easier to share the configuration of linters, formatters, etc.

Python

Python Coding Project Data Science

Preview the New Workspace Browser

databricks

APRIL 5, 2023

To simplify navigating in Databricks, we are releasing a new workspace browsing experience. The new Workspace Browser makes it easier for you to.

Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue)

Towards Data Science

APRIL 6, 2023

Learning a little about these tools and how to integrate them Photo by Nolan Krattinger on Unsplash Introduction A few weeks ago, while doing my mental stretch to think about new post ideas, I thought: Well, I need to learn (and talk) more about cloud and these things, I’ve practiced a lot on on-premise ambients, using open-source tools, and running away from proprietary solutions… But the world is cloud and I don’t think that this is gonna change any time soon… I then wrote a post about creati

Data Pipeline

Data Pipeline AWS Amazon Web Services Python

5 Essential AI Tools for Data Science

KDnuggets

APRIL 4, 2023

Learn how Bard, Bing, ChatGPT, GitHub Copilot, and Hugging Face are improving data scientists' work life.

Data Science

Data Science Data

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineer

Sat.Apr 01, 2023 - Fri.Apr 07, 2023

Data Engineering for Streaming Data on GCP

Behind the Scenes with Two New Salary Transparency Websites

Webinars

Trending Sources

The Future of Work: How AI is Changing the Job Landscape

Webinars

Build faster with Buck2: Our open source build system

A Guide to Debugging Apache Airflow® DAGs

Table file formats - Z-Order compaction: Apache Iceberg

QuickSort in Rust!

LangChain 101: Build Your Own GPT-Powered Applications

Sign up to get articles personalized to your interests!

More Trending

LangChain 101: Build Your Own GPT-Powered Applications

Data Modeling – The Unsung Hero of Data Engineering: An Introduction to Data Modeling (Part 1)

Mapping The Data Infrastructure Landscape As A Venture Capitalist

Conda Init and ArcGIS Pro

Exploring Data Cleaning Techniques With Python

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Data Modeling – The Unsung Hero of Data Engineering: An Introduction to Data Modeling (Part 1)

Inside Look: Measuring Developer Productivity and Happiness at LinkedIn

Loading IFC files into the ArcGIS Indoors Model

RAPIDS cuDF to Speed up Your Next Data Science Workflow

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Snowflake Startup Challenge 2023: Meet the 10 Semi-Finalists

Introducing Entity-Centric Data Modeling for Analytics

Build, Analyze, and Filter Catalog Layers in ArcGIS Pro

8 Open-Source Alternative to ChatGPT and Bard

How to Modernize Manufacturing Without Losing Control

The BEST Resources to Level Up Your Data Streaming Knowledge!

Databricks for GxP

Data Observability for Analytics and ML teams

My Data Science Six Months Success Story

The Ultimate Guide to Apache Airflow DAGS

Uniting the Machine Learning and Data Streaming Ecosystems - Part 2

Exciting new updates coming to Workflows in April

A Gentle Introduction to Analytical Stream Processing

Text Summarization Development: A Python Tutorial with GPT-3.5

Apache Airflow® Best Practices: DAG Writing

Python Monorepo: an Example. Part 1: Structure and Tooling

Preview the New Workspace Browser

Data Pipeline with Airflow and AWS Tools (S3, Lambda & Glue)

5 Essential AI Tools for Data Science

How to Achieve High-Accuracy Results When Using LLMs

Stay Connected