article thumbnail

Data Engineering Weekly #221

Data Engineering Weekly

The blog is an excellent compilation of types of query engines on top of the lakehouse, its internal architecture, and benchmarking against various categories. I think the market is wide open for more innovations, as Onehouse announces a compute runtime named Quanton. link] Gunnar Morling: What If We Could Rebuild Kafka From Scratch?

article thumbnail

Handling Network Throttling with AWS EC2 at Pinterest

Pinterest Engineering

In this blog post, well discuss our experiences in identifying the challenges associated with EC2 network throttling. In the remainder of this blog post, well share how we root cause and mitigate the aboveissues. This prompted us to engage with AWS and dive deep into the network performance of our clusters. 4xl with up to 12.5

AWS 66
Insiders

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Google BigQuery: A Game-Changing Data Warehousing Solution

ProjectPro

This blog is your comprehensive guide to Google BigQuery, its architecture, and a beginner-friendly tutorial on how to use Google BigQuery for your data warehousing activities. This blog presents a detailed overview of Google BigQuery and its architecture. Due to this, combining and contrasting the STRING and BYTE types is impossible.

Bytes 40
article thumbnail

Data Scientist Vs Data Analyst: Key Differences, Career Paths, and How to Choose the Right Role

WeCloudData

quintillion bytes of data are generated every day and thats a great sign for anyone interested in a data-driven career. This blog focuses […] The post Data Scientist Vs Data Analyst: Key Differences, Career Paths, and How to Choose the Right Role appeared first on WeCloudData.

Bytes 52
article thumbnail

Fault Tolerance in Distributed Systems: Tracing with Apache Kafka and Jaeger

Confluent

Instead, in this post I will point you to an earlier blog post where I already answered that question and then I will focus on what should be your next question: now that I’m relying on Jaeger to trace how data is flowing through my distributed system, what if Jaeger goes down? Distributed tracing with Apache Kafka and Jaeger.

Kafka 54
article thumbnail

Improving Efficiency Of Goku Time Series Database at Pinterest (Part?—?1)

Pinterest Engineering

In the first blog, we will share a short summary on the GokuS and GokuL architecture, data format for Goku Long Term, and how we improved the bootstrap time for our storage and serving components. More information about the architecture can be found in the GokuL blog and the cost reduction blog.

Database 111
article thumbnail

How Optimizing Memory Management with LMDB Boosted Performance on Our API Service

Pinterest Engineering

We used OO design to support various deserialization methods to mimic Python lists, sets, and dictionaries, using LMDBs byte-based key-value records. In the API processes, we maintain persistent read-only connections, allowing LMDB to paginate data present in virtual shared memory efficiently.