Top Data Engineering Digest High Quality Data Project Content for Week of Feb 17

Sat.Feb 17, 2024 - Fri.Feb 23, 2024

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

FEBRUARY 18, 2024

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Multiple open source projects and vendors have been working together to make this vision a reality. In this episode Dain Sundstrom, CTO of Starburst, explains how the combination of the Trino query engine and the Iceberg table format offer the ease of use and execution speed of data warehouses with the infinite storage and sc

Data Lake

Data Lake High Quality Data Data Warehouse Google Cloud

5 Airflow Alternatives for Data Orchestration

KDnuggets

FEBRUARY 22, 2024

Top list of open-source tools for building and managing workflows.

Data

Data Building Management Data Engineering

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

ArcGIS Pro 3.3 Moves to.NET 8

ArcGIS

FEBRUARY 21, 2024

ArcGIS Pro 3.3 is planned to be available in May 2024. Install.NET 8 before attempting to install ArcGIS Pro 3.3 for the best user experience!

Webinars

Agent Tooling: Connecting AI to Your Tools, Systems & Data

How to Modernize Manufacturing Without Losing Control

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Data Engineering Best Practices - #2. Metadata & Logging

Start Data Engineering

FEBRUARY 22, 2024

1. Introduction 2. Setup & Logging architecture 3. Data Pipeline Logging Best Practices 3.1. Metadata: Information about pipeline runs, & data flowing through your pipeline 3.2. Obtain visibility into the code’s execution sequence using text logs 3.3. Understand resource usage by tracking Metrics 3.4. Monitoring UI & Traceability 3.5.

Metadata

Metadata Data Engineer Data Engineering Engineering

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

Data Pipeline

Min rate limits for Apache Kafka

Waitingforcode

FEBRUARY 20, 2024

I bet you know it already. You can limit the max throughput for Apache Spark Structured Streaming jobs for popular data sources such as Apache Kafka, Delta Lake, or raw files. Have you known that you can also control the lower limit, at least for Apache Kafka?

Kafka

Kafka IT Data

7 Free Kaggle Micro-Courses for Data Science Beginners

KDnuggets

FEBRUARY 22, 2024

Interested in learning data science? Check out these free micro-courses from Kaggle to learn essential data science skills.

Data Science

Data Science Data

Simplify Application Development With Hybrid Tables

Snowflake

FEBRUARY 21, 2024

We previously announced Snowflake’s Unistore workload , which continues Snowflake’s legacy of breaking down data silos by uniting transactional and analytical data in a consistent and governed platform. Today, we are pleased to announce that Hybrid Tables — the core feature powering Unistore — is in public preview in select AWS regions. Hybrid Tables is a new table type that enables transactional use cases within Snowflake with fast, high-concurrency point operations.

Government

Government AWS Data Architect Architecture

More Trending

Simplify Application Development With Hybrid Tables

Snowflake

FEBRUARY 21, 2024

Government

Government AWS Data Architect Architecture

Announcing the General Availability of Azure Private Link and Azure Storage firewall support for Databricks SQL Serverless

databricks

FEBRUARY 21, 2024

We are excited to announce the upcoming general availability of Azure Private Link support for Databricks SQL (DBSQL) Serverless, planned in April 2024.

SQL

Location Referencing Guide to Esri Partner Conference and Esri Developer Summit

ArcGIS

FEBRUARY 22, 2024

Join us for an exciting Partner Conference and Developer Summit! Discover the latest in ArcGIS Location Referencing and connect with experts.

Transportation

3 Inspirational Stories of Leaders in AI

KDnuggets

FEBRUARY 23, 2024

Every leader has their origin story, and here are some that might inspire you.

Beyond the Buzz: Braze Equips Modern Marketers with Powerful AI Tools

Snowflake

FEBRUARY 22, 2024

A lot of the buzz around AI focuses on its future potential. And we get it — we’re talking about a transformative technology that presents seemingly limitless possibilities. But an important aspect of this world-changing tech story that gets lost in the hype is understanding exactly what AI solutions are available for you and your team to employ right now, today.

Technology

Technology Machine Learning Utilities Designing

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

Data

Strengthening Cyber Resilience through Efficient Data Management: A Response to M-21-31

databricks

FEBRUARY 22, 2024

In today's environment, proactive cybersecurity is crucial to any public sector agency. For many organizations, log data that security professionals need for effective.

Data Management

Data Management Management Data

The Abstraction Problem – A Great Evil

Confessions of a Data Guy

FEBRUARY 17, 2024

There is a great evil Spirit that is haunting the streets of code in the land of programmers. It’s a Spirit of obfuscation and twisting things into what they are not. The Spirit wanders around on the loose looking for someone, and it finds ready victims among the ranks of new programmers and the innocent […] The post The Abstraction Problem – A Great Evil appeared first on Confessions of a Data Guy.

Coding

Coding IT Data Data Engineering

A Roadmap For Your Data Career

KDnuggets

FEBRUARY 19, 2024

As you design your career in data, you’ve got to avoid getting stuck in your comfort zone or allowing your manager or current situation to determine your path.

Data

Data Designing Management

Aligning Velox and Apache Arrow: Towards composable data management

Engineering at Meta

FEBRUARY 20, 2024

We’ve partnered with Voltron Data and the Arrow community to align and converge Apache Arrow with Velox , Meta’s open source execution engine. Apache Arrow 15 includes three new format layouts developed through this partnership: StringView, ListView, and Run-End-Encoding (REE). This new convergence helps Meta and the larger community build data management systems that are unified, more efficient, and composable.

Data Management

Data Management Bytes Management Datasets

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

Systems

Announcing the General Availability of Unity Catalog Volumes

databricks

FEBRUARY 21, 2024

Today, we are excited to announce that Unity Catalog Volumes is now generally available on AWS, Azure, and GCP. Unity Catalog provides a.

AWS

5 minutes to make a map!

ArcGIS

FEBRUARY 20, 2024

Create a cool looking landscape map, in record time. Start the clock!

Navigating the Data Revolution: Exploring the Booming Trends in Data Science and Machine Learning

KDnuggets

FEBRUARY 20, 2024

Dive into transformative trends in data science, encompassing AI-powered automation, NLP, ethical considerations, decentralized computing, and interdisciplinary collaboration.

Data Science

Data Science Machine Learning Data

Delivering Telecom Sustainability Targets Using Autonomous Networks

Snowflake

FEBRUARY 20, 2024

As the world grapples with the escalating climate crisis, many industries are re-examining their operations to identify and implement sustainable practices. The telecommunications industry is no exception. Telecom companies face growing pressure from consumers, investors and regulators to reduce their carbon footprint and achieve net-zero emissions.

Telecommunication

Telecommunication Cloud Computing Architecture Cloud

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

Manufacturing

New SQL Practice Problems

Confessions of a Data Guy

FEBRUARY 21, 2024

New SQL Practice Problems I’m trying something new. I get a lot of questions from folks about getting into the Data Engineering space, how to get better, grow, learn, etc. So I came up with a solution. SQL Practice Problems. Some moons ago I wrote a Data Engineering Practice repo on GitHub for free, and some 1.2K stars later […] The post New SQL Practice Problems appeared first on Confessions of a Data Guy.

SQL

SQL Data Engineering Data Engineer Engineering

Unapologetically Technical Episode 9 – Gunnar Morling

Jesse Anderson

FEBRUARY 20, 2024

This week on Unapologetically Technical, I had the wonderful pleasure of interviewing Gunnar Morling, the creator of the Billion Row Challenge and Senior Staff Software Engineer at Decodable. In this episode, we talk about why it is so important to stay in a position long enough to gain experience and see the success or failure of decisions. He also shares his experiences at RedHat and working on Debezium.

Software Engineer

Software Engineer Software Engineering Engineering Process

Python in Finance: Real Time Data Streaming within Jupyter Notebook

KDnuggets

FEBRUARY 20, 2024

Learn a modern approach to stream real-time data in Jupyter Notebook. This guide covers dynamic visualizations, a Python for quant finance use case, and Bollinger Bands analysis with live data.

Finance

Finance Python Data

Top 3 Data + AI Predictions for Retail and Consumer Goods in 2024

Snowflake

FEBRUARY 19, 2024

Nearly every facet of society has felt the impact of AI since it burst into the mainstream in late 2022 with the public launch of ChatGPT. In 2024, the retail and consumer goods industry is expected to experience massive upheaval due to the proliferation of generative AI (gen AI) tools as well as changes in customer engagement and the general manner in which products are now sold.

Retail

Retail Media Data Technology

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

Data Engineering

Is the modern data stack disappearing?

Christophe Blefari

FEBRUARY 19, 2024

No. This question generated a lot of content last week, and a lot of words were written. I wanted to keep my answer short so as not to burden you with a few thousand more words to read. Modern data stack has been coined by US companies and VCs—mainly Fivetran / dbt Labs—as a word to quickly emphasis a way to build data stack in the cloud related to ELT.

Data

Data Government SQL Technology

8 Tips for Managing Stakeholder Expectations

Knowledge Hut

FEBRUARY 19, 2024

Why Stakeholder Management? One of the most critical aspects of project management is doing what’s necessary to develop and control relationships with all individuals that the project impacts. In this article, you will learn techniques for identifying stakeholders, analyzing their influence on the project, and developing strategies to communicate, set boundaries, and manage competing expectations.

Management

Management Certification Project Process

Free Mastery Course: Become a Large Language Model Expert

KDnuggets

FEBRUARY 23, 2024

It is a self-paced course that covers fundamental and advanced concepts of LLMs and teaches how to deploy them in production.

Unlocking AI Assisted Development Safely: From Idea to GA

Pinterest Engineering

FEBRUARY 22, 2024

Sam Wang | Sr. Technical Program Manager; Joe Gordon | Sr. Staff Software Engineer At Pinterest we are continuously looking for ways to improve our developer experience, and we have recently shipped AI-assisted development for everyone while balancing safety, security, and cost. In this blog post, we share our journey of unlocking AI-assisted development, from the initial idea to the General Availability (GA) stage.

Scala

Scala Engineering Software Engineer Software Engineering

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

Data

Stream Processing with Python, Kafka & Faust

Towards Data Science

FEBRUARY 18, 2024

How to Stream and Apply Real-Time Prediction Models on High-Throughput Time-Series Data Photo by JJ Ying on Unsplash Most of the stream processing libraries are not python friendly while the majority of machine learning and data mining libraries are python based. Although the Faust library aims to bring Kafka Streaming ideas into the Python ecosystem, it may pose challenges in terms of ease of use.

Kafka

Kafka Python Process Google Cloud

Advantages of Agile Testing Methodology

Knowledge Hut

FEBRUARY 19, 2024

What is Agile Testing? As the name implies, agile course projects are executed very quickly and with flexibility. Agile methods involve tasks executed in short iterations or sprints. Agile Testing is also iterative and takes place after each sprint, rather than towards the end of the project. Testing courses iteratively helps to validate the client requirements and adapt to changing conditions in a better manner.

Project

Project Coding Technology Management

6 YouTube Channels to Learn about AI

KDnuggets

FEBRUARY 20, 2024

Are you looking into learning about AI? YouTube is your first stop.

WebSockets in Http4s

Rock the JVM

FEBRUARY 18, 2024

by Herbert Kateu 1. Introduction The WebSocket protocol enables persistent two-way communication between a client and a server where packets can be passed in both directions without the need for additional HTTP requests. The specification for this protocol is outlined in RFC 6455. WebSockets are used in applications such as Instant Messaging, Gaming, Simultaneous editing, and stock tickers to mention but a few.

Scala

Scala Coding Programming Accessible

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m

Software Engineer

Sat.Feb 17, 2024 - Fri.Feb 23, 2024

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

5 Airflow Alternatives for Data Orchestration

Webinars

Trending Sources

ArcGIS Pro 3.3 Moves to.NET 8

Webinars

Data Engineering Best Practices - #2. Metadata & Logging

A Guide to Debugging Apache Airflow® DAGs

Min rate limits for Apache Kafka

7 Free Kaggle Micro-Courses for Data Science Beginners

Simplify Application Development With Hybrid Tables

Sign up to get articles personalized to your interests!

More Trending

Simplify Application Development With Hybrid Tables

Announcing the General Availability of Azure Private Link and Azure Storage firewall support for Databricks SQL Serverless

Location Referencing Guide to Esri Partner Conference and Esri Developer Summit

3 Inspirational Stories of Leaders in AI

Beyond the Buzz: Braze Equips Modern Marketers with Powerful AI Tools

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Strengthening Cyber Resilience through Efficient Data Management: A Response to M-21-31

The Abstraction Problem – A Great Evil

A Roadmap For Your Data Career

Aligning Velox and Apache Arrow: Towards composable data management

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Announcing the General Availability of Unity Catalog Volumes

5 minutes to make a map!

Navigating the Data Revolution: Exploring the Booming Trends in Data Science and Machine Learning

Delivering Telecom Sustainability Targets Using Autonomous Networks

How to Modernize Manufacturing Without Losing Control

New SQL Practice Problems

Unapologetically Technical Episode 9 – Gunnar Morling

Python in Finance: Real Time Data Streaming within Jupyter Notebook

Top 3 Data + AI Predictions for Retail and Consumer Goods in 2024

The Ultimate Guide to Apache Airflow DAGS

Is the modern data stack disappearing?

8 Tips for Managing Stakeholder Expectations

Free Mastery Course: Become a Large Language Model Expert

Unlocking AI Assisted Development Safely: From Idea to GA

Apache Airflow® Best Practices: DAG Writing

Stream Processing with Python, Kafka & Faust

Advantages of Agile Testing Methodology

6 YouTube Channels to Learn about AI

WebSockets in Http4s

How to Achieve High-Accuracy Results When Using LLMs

Stay Connected