Sun.Feb 18, 2024

article thumbnail

Using Trino And Iceberg As The Foundation Of Your Data Lakehouse

Data Engineering Podcast

Summary A data lakehouse is intended to combine the benefits of data lakes (cost effective, scalable storage and compute) and data warehouses (user friendly SQL interface). Multiple open source projects and vendors have been working together to make this vision a reality. In this episode Dain Sundstrom, CTO of Starburst, explains how the combination of the Trino query engine and the Iceberg table format offer the ease of use and execution speed of data warehouses with the infinite storage and sc

Data Lake 262
article thumbnail

WebSockets in Http4s

Rock the JVM

by Herbert Kateu 1. Introduction The WebSocket protocol enables persistent two-way communication between a client and a server where packets can be passed in both directions without the need for additional HTTP requests. The specification for this protocol is outlined in RFC 6455. WebSockets are used in applications such as Instant Messaging, Gaming, Simultaneous editing, and stock tickers to mention but a few.

Scala 94
article thumbnail

Stream Processing with Python, Kafka & Faust

Towards Data Science

How to Stream and Apply Real-Time Prediction Models on High-Throughput Time-Series Data Photo by JJ Ying on Unsplash Most of the stream processing libraries are not python friendly while the majority of machine learning and data mining libraries are python based. Although the Faust library aims to bring Kafka Streaming ideas into the Python ecosystem, it may pose challenges in terms of ease of use.

Kafka 74
article thumbnail

Data Engineering Weekly #159

Data Engineering Weekly

RudderStack is the Warehouse Native CDP, built to help data teams deliver value across the entire data activation lifecycle, from collection to unification and activation. Visit rudderstack.com to learn more. Editor’s Note: DEWCon Next? Aswin and I started DEWCon as a fun experiment. Midway through organizing the conference, fear washed over us.

article thumbnail

Apache Airflow® Best Practices for ETL and ELT Pipelines

Whether you’re creating complex dashboards or fine-tuning large language models, your data must be extracted, transformed, and loaded. ETL and ELT pipelines form the foundation of any data product, and Airflow is the open-source data orchestrator specifically designed for moving and transforming data in ETL and ELT pipelines. This eBook covers: An overview of ETL vs.

article thumbnail

WebSockets in Scala: Part 1 - http4s

Rock the JVM

Learn how to implement WebSockets in Scala with http4s to enable seamless two-way communication between your frontend and backend

Scala 52
article thumbnail

2024: The Year to Talk About Outcome-based Learning

Knowledge Hut

In the evolving landscape of professional development, the year 2024 marks a pivotal moment as we delve into the significance and purpose of outcome-based upskilling. Traditional training models are undergoing a transformation, emphasizing the need for a more dynamic and responsive approach that focuses on measurable results and real-world applications.