Blog, Bytes and Data Storage - Data Engineering Digest

Blog

Bytes

Data Storage

Improving Efficiency Of Goku Time Series Database at Pinterest (Part?—?1)

Pinterest Engineering

NOVEMBER 22, 2023

In the first blog, we will share a short summary on the GokuS and GokuL architecture, data format for Goku Long Term, and how we improved the bootstrap time for our storage and serving components. Goku Long Term Storage Architecture Summary and Challenges Figure 9: Flow of data from GokuS to GokuL.

Database

Database Bytes Kafka Architecture

Introducing Netflix’s Key-Value Data Abstraction Layer

Netflix Tech

SEPTEMBER 18, 2024

The Key-Value Service The KV data abstraction service was introduced to solve the persistent challenges we faced with data access patterns in our distributed databases. The first level is a hashed string ID (the primary key), and the second level is a sorted map of a key-value pair of bytes.

Bytes

Bytes Metadata Database Data

Join 37,000+

Insiders

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Webinars

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Trending Sources

Waitingforcode

How to Navigate the Costs of Legacy SIEMS with Snowflake

Snowflake

APRIL 18, 2024

This blog post explores how Snowflake can help with this challenge. Legacy SIEM cost factors to keep in mind Data ingestion: Traditional SIEMs often impose limits to data ingestion and data retention. Now there are a few ways to ingest data into Snowflake. But what if security teams didn’t have to make tradeoffs?

Data Lake

Data Lake Data Ingestion Bytes Cloud Computing

Webinars

Mastering Apache Airflow® 3.0: What’s New (and What’s Next) for Data Orchestration

MORE WEBINARS

Google BigQuery: A Game-Changing Data Warehousing Solution

ProjectPro

JANUARY 24, 2023

Tired of relentlessly searching for the most effective and powerful data warehousing solutions on the internet? This blog is your comprehensive guide to Google BigQuery, its architecture, and a beginner-friendly tutorial on how to use Google BigQuery for your data warehousing activities. Search no more! Did you know ?

Bytes

Bytes Google Cloud Data Warehouse Cloud Storage

Observability in Your Data Pipeline: A Practical Guide

Databand.ai

JUNE 8, 2023

Key components of an observability pipeline include: Data collection: Acquiring relevant information from various stages of your data pipelines using monitoring agents or instrumentation libraries. Data storage: Keeping collected metrics and logs in a scalable database or time-series platform.

Data Pipeline

Data Pipeline Bytes Data Collection Raw Data

Carbon Emissions of End-User Devices: Part One - SWD Method by David Rees

Scott Logic

APRIL 5, 2024

Introduction This series of blog posts discusses the methods of estimating carbon emissions of end-user devices. After intending to write a single blog post, the research journey prompted me to reconsider how to present this to an audience. js is a javascript library that returns an estimated CO2e value for a web page.

Bytes

Bytes Systems Designing Data Storage

Improving Efficiency Of Goku Time Series Database at Pinterest (Part — 3)

Pinterest Engineering

SEPTEMBER 9, 2024

This three part blog post series covers the efficiency improvements (view parts 1 and parts 2 ), and this final part will cover the reduction of the overall cost of Goku and Pinterest. GokuS consumes from this second Kafka topic and backs up the data intoS3. Goku created multiple folly::IOBufs of capacity 1 MiB to store finalized data.

Database

Database Bytes Kafka Software Engineer

Pinterest Tiered Storage for Apache Kafka®️: A Broker-Decoupled Approach

Pinterest Engineering

SEPTEMBER 17, 2024

With 20+ production topics onboarded since May 2024, our broker-decoupled Tiered Storage implementation currently offloads ~200 TB of data every day from broker disk to a cheaper object storage. In this blog, we share the approach we took and the learnings wegained. Why TieredStorage?

Kafka

Kafka Bytes Transportation Metadata

Data Vault Architecture, Data Quality Challenges, And How To Solve Them

Monte Carlo

FEBRUARY 9, 2023

In fact, with increasingly strict data regulations like GDPR and a renewed emphasis on optimizing technology costs, we’re now seeing a revitalization of “ Data Vault 2.0 ” data modeling. While data vault has many benefits, it is a sophisticated and complex methodology that can present challenges to data quality.

Architecture

Architecture Raw Data Metadata Data Warehouse

Snowflake Architecture and It's Fundamental Concepts

ProjectPro

JANUARY 31, 2022

Launched in 2014, Snowflake is one of the most popular cloud data solutions on the market. This blog walks you through what does Snowflake do , the various features it offers, the Snowflake architecture, and so much more. Table of Contents Snowflake Overview and Architecture What is Snowflake Data Warehouse?

Architecture

Architecture IT Data Warehouse Amazon Web Services

100+ Big Data Interview Questions and Answers 2023

ProjectPro

JANUARY 31, 2023

If you're looking to break into the exciting field of big data or advance your big data career, being well-prepared for big data interview questions is essential. Get ready to expand your knowledge and take your big data career to the next level! But the concern is - how do you become a big data professional?

Big Data

Big Data Hadoop Relational Database AWS

Forge Your Career Path with Best Data Engineering Certifications

ProjectPro

FEBRUARY 21, 2023

With so many data engineering certifications available , choosing the right one can be a daunting task. There are over 133K data engineer job openings in the US, but how will you stand out in such a crowded job market? Why Are Data Engineering Skills In Demand?

Certification

Certification Data Engineering Data Engineer Engineering

Hadoop MapReduce vs. Apache Spark Who Wins the Battle?

ProjectPro

NOVEMBER 11, 2014

Confused over which framework to choose for big data processing - Hadoop MapReduce vs. Apache Spark. This blog helps you understand the critical differences between two popular big data frameworks. Hadoop and Spark are popular apache projects in the big data ecosystem.

Hadoop

Hadoop Machine Learning Scala Big Data

50 PySpark Interview Questions and Answers For 2023

ProjectPro

NOVEMBER 22, 2021

Spark saves data in memory (RAM), making data retrieval quicker and faster when needed. Spark is a low-latency computation platform because it offers in-memory data storage and caching. MEMORY ONLY SER: The RDD is stored as One Byte per partition serialized Java Objects. But the problem is, where do you start?

Hadoop

Hadoop Python Datasets Metadata

How Optimizing Memory Management with LMDB Boosted Performance on Our API Service

Pinterest Engineering

JANUARY 13, 2025

The before shows each Python process creating and reading its own configuration-managed data, while the after shows a single process creating and reading a Lightning Memory-Mapped Database configuration-managed data, and other processes reading from thatprocess.

Management

Management Bytes Python Software Engineer

Improving Efficiency Of Goku Time Series Database at Pinterest (Part?—?1)

Introducing Netflix’s Key-Value Data Abstraction Layer

Webinars

Trending Sources

How to Navigate the Costs of Legacy SIEMS with Snowflake

Webinars

Google BigQuery: A Game-Changing Data Warehousing Solution

Observability in Your Data Pipeline: A Practical Guide

Carbon Emissions of End-User Devices: Part One - SWD Method by David Rees

Improving Efficiency Of Goku Time Series Database at Pinterest (Part — 3)

Pinterest Tiered Storage for Apache Kafka®️: A Broker-Decoupled Approach

Data Vault Architecture, Data Quality Challenges, And How To Solve Them

Snowflake Architecture and It's Fundamental Concepts

100+ Big Data Interview Questions and Answers 2023

Forge Your Career Path with Best Data Engineering Certifications

Hadoop MapReduce vs. Apache Spark Who Wins the Battle?

50 PySpark Interview Questions and Answers For 2023

How Optimizing Memory Management with LMDB Boosted Performance on Our API Service

Stay Connected