Sat.Feb 17, 2024 - Fri.Feb 23, 2024

article thumbnail

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Multiple open source projects and vendors have been working together to make this vision a reality. In this episode Dain Sundstrom, CTO of Starburst, explains how the combination of the Trino query engine and the Iceberg table format offer the ease of use and execution speed of data warehouses with the infinite storage and sc

Data Lake 262
article thumbnail

5 Airflow Alternatives for Data Orchestration

KDnuggets

Top list of open-source tools for building and managing workflows.

Data 151
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Trending Sources

article thumbnail

ArcGIS Pro 3.3 Moves to.NET 8

ArcGIS

ArcGIS Pro 3.3 is planned to be available in May 2024. Install.NET 8 before attempting to install ArcGIS Pro 3.3 for the best user experience!

143
143
article thumbnail

Data Engineering Best Practices - #2. Metadata & Logging

Start Data Engineering

1. Introduction 2. Setup & Logging architecture 3. Data Pipeline Logging Best Practices 3.1. Metadata: Information about pipeline runs, & data flowing through your pipeline 3.2. Obtain visibility into the code’s execution sequence using text logs 3.3. Understand resource usage by tracking Metrics 3.4. Monitoring UI & Traceability 3.5.

Metadata 130
article thumbnail

A Guide to Debugging Apache Airflow® DAGs

In Airflow, DAGs (your data pipelines) support nearly every use case. As these workflows grow in complexity and scale, efficiently identifying and resolving issues becomes a critical skill for every data engineer. This is a comprehensive guide with best practices and examples to debugging Airflow DAGs. You’ll learn how to: Create a standardized process for debugging to quickly diagnose errors in your DAGs Identify common issues with DAGs, tasks, and connections Distinguish between Airflow-relate

article thumbnail

Min rate limits for Apache Kafka

Waitingforcode

I bet you know it already. You can limit the max throughput for Apache Spark Structured Streaming jobs for popular data sources such as Apache Kafka, Delta Lake, or raw files. Have you known that you can also control the lower limit, at least for Apache Kafka?

Kafka 130
article thumbnail

7 Free Kaggle Micro-Courses for Data Science Beginners

KDnuggets

Interested in learning data science? Check out these free micro-courses from Kaggle to learn essential data science skills.

More Trending

article thumbnail

Announcing the General Availability of Azure Private Link and Azure Storage firewall support for Databricks SQL Serverless

databricks

We are excited to announce the upcoming general availability of Azure Private Link support for Databricks SQL (DBSQL) Serverless, planned in April 2024.

SQL 126
article thumbnail

Location Referencing Guide to Esri Partner Conference and Esri Developer Summit

ArcGIS

Join us for an exciting Partner Conference and Developer Summit! Discover the latest in ArcGIS Location Referencing and connect with experts.

article thumbnail

3 Inspirational Stories of Leaders in AI

KDnuggets

Every leader has their origin story, and here are some that might inspire you.

146
146
article thumbnail

Beyond the Buzz: Braze Equips Modern Marketers with Powerful AI Tools

Snowflake

A lot of the buzz around AI focuses on its future potential. And we get it — we’re talking about a transformative technology that presents seemingly limitless possibilities. But an important aspect of this world-changing tech story that gets lost in the hype is understanding exactly what AI solutions are available for you and your team to employ right now, today.

article thumbnail

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

Speaker: Tamara Fingerlin, Developer Advocate

Apache Airflow® 3.0, the most anticipated Airflow release yet, officially launched this April. As the de facto standard for data orchestration, Airflow is trusted by over 77,000 organizations to power everything from advanced analytics to production AI and MLOps. With the 3.0 release, the top-requested features from the community were delivered, including a revamped UI for easier navigation, stronger security, and greater flexibility to run tasks anywhere at any time.

article thumbnail

Strengthening Cyber Resilience through Efficient Data Management: A Response to M-21-31

databricks

In today's environment, proactive cybersecurity is crucial to any public sector agency. For many organizations, log data that security professionals need for effective.

article thumbnail

The Abstraction Problem – A Great Evil

Confessions of a Data Guy

There is a great evil Spirit that is haunting the streets of code in the land of programmers. It’s a Spirit of obfuscation and twisting things into what they are not. The Spirit wanders around on the loose looking for someone, and it finds ready victims among the ranks of new programmers and the innocent […] The post The Abstraction Problem – A Great Evil appeared first on Confessions of a Data Guy.

Coding 113
article thumbnail

A Roadmap For Your Data Career

KDnuggets

As you design your career in data, you’ve got to avoid getting stuck in your comfort zone or allowing your manager or current situation to determine your path.

Data 145
article thumbnail

Aligning Velox and Apache Arrow: Towards composable data management

Engineering at Meta

We’ve partnered with Voltron Data and the Arrow community to align and converge Apache Arrow with Velox , Meta’s open source execution engine. Apache Arrow 15 includes three new format layouts developed through this partnership: StringView, ListView, and Run-End-Encoding (REE). This new convergence helps Meta and the larger community build data management systems that are unified, more efficient, and composable.

article thumbnail

Agent Tooling: Connecting AI to Your Tools, Systems & Data

Speaker: Alex Salazar, CEO & Co-Founder @ Arcade | Nate Barbettini, Founding Engineer @ Arcade | Tony Karrer, Founder & CTO @ Aggregage

There’s a lot of noise surrounding the ability of AI agents to connect to your tools, systems and data. But building an AI application into a reliable, secure workflow agent isn’t as simple as plugging in an API. As an engineering leader, it can be challenging to make sense of this evolving landscape, but agent tooling provides such high value that it’s critical we figure out how to move forward.

article thumbnail

Announcing the General Availability of Unity Catalog Volumes

databricks

Today, we are excited to announce that Unity Catalog Volumes is now generally available on AWS, Azure, and GCP. Unity Catalog provides a.

AWS 122
article thumbnail

5 minutes to make a map!

ArcGIS

Create a cool looking landscape map, in record time. Start the clock!

107
107
article thumbnail

Navigating the Data Revolution: Exploring the Booming Trends in Data Science and Machine Learning

KDnuggets

Dive into transformative trends in data science, encompassing AI-powered automation, NLP, ethical considerations, decentralized computing, and interdisciplinary collaboration.

article thumbnail

Delivering Telecom Sustainability Targets Using Autonomous Networks

Snowflake

As the world grapples with the escalating climate crisis, many industries are re-examining their operations to identify and implement sustainable practices. The telecommunications industry is no exception. Telecom companies face growing pressure from consumers, investors and regulators to reduce their carbon footprint and achieve net-zero emissions.

article thumbnail

How to Modernize Manufacturing Without Losing Control

Speaker: Andrew Skoog, Founder of MachinistX & President of Hexis Representatives

Manufacturing is evolving, and the right technology can empower—not replace—your workforce. Smart automation and AI-driven software are revolutionizing decision-making, optimizing processes, and improving efficiency. But how do you implement these tools with confidence and ensure they complement human expertise rather than override it? Join industry expert Andrew Skoog as he explores how manufacturers can leverage automation to enhance operations, streamline workflows, and make smarter, data-dri

article thumbnail

New SQL Practice Problems

Confessions of a Data Guy

New SQL Practice Problems I’m trying something new. I get a lot of questions from folks about getting into the Data Engineering space, how to get better, grow, learn, etc. So I came up with a solution. SQL Practice Problems. Some moons ago I wrote a Data Engineering Practice repo on GitHub for free, and some 1.2K stars later […] The post New SQL Practice Problems appeared first on Confessions of a Data Guy.

SQL 100
article thumbnail

Unapologetically Technical Episode 9 – Gunnar Morling

Jesse Anderson

This week on Unapologetically Technical, I had the wonderful pleasure of interviewing Gunnar Morling, the creator of the Billion Row Challenge and Senior Staff Software Engineer at Decodable. In this episode, we talk about why it is so important to stay in a position long enough to gain experience and see the success or failure of decisions. He also shares his experiences at RedHat and working on Debezium.

article thumbnail

Python in Finance: Real Time Data Streaming within Jupyter Notebook

KDnuggets

Learn a modern approach to stream real-time data in Jupyter Notebook. This guide covers dynamic visualizations, a Python for quant finance use case, and Bollinger Bands analysis with live data.

Finance 144
article thumbnail

Top 3 Data + AI Predictions for Retail and Consumer Goods in 2024

Snowflake

Nearly every facet of society has felt the impact of AI since it burst into the mainstream in late 2022 with the public launch of ChatGPT. In 2024, the retail and consumer goods industry is expected to experience massive upheaval due to the proliferation of generative AI (gen AI) tools as well as changes in customer engagement and the general manner in which products are now sold.

Retail 101
article thumbnail

The Ultimate Guide to Apache Airflow DAGS

With Airflow being the open-source standard for workflow orchestration, knowing how to write Airflow DAGs has become an essential skill for every data engineer. This eBook provides a comprehensive overview of DAG writing features with plenty of example code. You’ll learn how to: Understand the building blocks DAGs, combine them in complex pipelines, and schedule your DAG to run exactly when you want it to Write DAGs that adapt to your data at runtime and set up alerts and notifications Scale you

article thumbnail

Is the modern data stack disappearing?

Christophe Blefari

No. This question generated a lot of content last week, and a lot of words were written. I wanted to keep my answer short so as not to burden you with a few thousand more words to read. Modern data stack has been coined by US companies and VCs—mainly Fivetran / dbt Labs—as a word to quickly emphasis a way to build data stack in the cloud related to ELT.

Data 100
article thumbnail

8 Tips for Managing Stakeholder Expectations

Knowledge Hut

Why Stakeholder Management? One of the most critical aspects of project management is doing what’s necessary to develop and control relationships with all individuals that the project impacts. In this article, you will learn techniques for identifying stakeholders, analyzing their influence on the project, and developing strategies to communicate, set boundaries, and manage competing expectations.

article thumbnail

Free Mastery Course: Become a Large Language Model Expert

KDnuggets

It is a self-paced course that covers fundamental and advanced concepts of LLMs and teaches how to deploy them in production.

IT 142
article thumbnail

Unlocking AI Assisted Development Safely: From Idea to GA

Pinterest Engineering

Sam Wang | Sr. Technical Program Manager; Joe Gordon | Sr. Staff Software Engineer At Pinterest we are continuously looking for ways to improve our developer experience, and we have recently shipped AI-assisted development for everyone while balancing safety, security, and cost. In this blog post, we share our journey of unlocking AI-assisted development, from the initial idea to the General Availability (GA) stage.

Scala 98
article thumbnail

Apache Airflow® Best Practices: DAG Writing

Speaker: Tamara Fingerlin, Developer Advocate

In this new webinar, Tamara Fingerlin, Developer Advocate, will walk you through many Airflow best practices and advanced features that can help you make your pipelines more manageable, adaptive, and robust. She'll focus on how to write best-in-class Airflow DAGs using the latest Airflow features like dynamic task mapping and data-driven scheduling!

article thumbnail

Stream Processing with Python, Kafka & Faust

Towards Data Science

How to Stream and Apply Real-Time Prediction Models on High-Throughput Time-Series Data Photo by JJ Ying on Unsplash Most of the stream processing libraries are not python friendly while the majority of machine learning and data mining libraries are python based. Although the Faust library aims to bring Kafka Streaming ideas into the Python ecosystem, it may pose challenges in terms of ease of use.

Kafka 98
article thumbnail

Advantages of Agile Testing Methodology

Knowledge Hut

What is Agile Testing? As the name implies, agile course projects are executed very quickly and with flexibility. Agile methods involve tasks executed in short iterations or sprints. Agile Testing is also iterative and takes place after each sprint, rather than towards the end of the project. Testing courses iteratively helps to validate the client requirements and adapt to changing conditions in a better manner.

Project 98
article thumbnail

6 YouTube Channels to Learn about AI

KDnuggets

Are you looking into learning about AI? YouTube is your first stop.

142
142
article thumbnail

WebSockets in Http4s

Rock the JVM

by Herbert Kateu 1. Introduction The WebSocket protocol enables persistent two-way communication between a client and a server where packets can be passed in both directions without the need for additional HTTP requests. The specification for this protocol is outlined in RFC 6455. WebSockets are used in applications such as Instant Messaging, Gaming, Simultaneous editing, and stock tickers to mention but a few.

Scala 94
article thumbnail

How to Achieve High-Accuracy Results When Using LLMs

Speaker: Ben Epstein, Stealth Founder & CTO | Tony Karrer, Founder & CTO, Aggregage

When tasked with building a fundamentally new product line with deeper insights than previously achievable for a high-value client, Ben Epstein and his team faced a significant challenge: how to harness LLMs to produce consistent, high-accuracy outputs at scale. In this new session, Ben will share how he and his team engineered a system (based on proven software engineering approaches) that employs reproducible test variations (via temperature 0 and fixed seeds), and enables non-LLM evaluation m