Learning System Design: Top 5 Essential Reads
KDnuggets
MAY 23, 2024
Explore system design with these expert-recommended books.
This site uses cookies to improve your experience. To help us insure we adhere to various privacy regulations, please select your country/region of residence. If you do not select a country, we will assume you are from the United States. Select your Cookie Settings or view our Privacy Policy and Terms of Use.
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
Used for the proper function of the website
Used for monitoring website traffic and interactions
Cookies and similar technologies are used on this website for proper function of the website, for tracking performance analytics and for marketing purposes. We and some of our third-party providers may use cookie data for various purposes. Please review the cookie settings below and choose your preference.
KDnuggets
MAY 23, 2024
Explore system design with these expert-recommended books.
KDnuggets
NOVEMBER 8, 2023
Learn how to design & deploy responsible AI systems with this white paper from Teradata.
Data Engineering Podcast
DECEMBER 3, 2023
Data transfer systems are a critical component of data enablement, and building them to support large volumes of information is a complex endeavor. With Datafold, you can seamlessly plan, translate, and validate data across systems, massively accelerating your migration project. When is DoubleCloud Data Transfer the wrong choice?
Data Engineering Podcast
APRIL 14, 2024
In this episode Oren Eini, CEO and creator of RavenDB, explores the nuances of relational vs. non-relational engines, and the strategies for designing a non-relational database. When designing and building a database, what are the initial set of questions that need to be answered? Can you describe what constitutes a NoSQL database?
Speaker: Jason Tanner
A sustainable business model contains a system of interrelated choices made not once but over time. Discover how to design and evolve profit streams over time, focusing on solution sustainability, economic sustainability, and relationship sustainability.
Data Engineering Podcast
MAY 26, 2024
Summary Any software system that survives long enough will require some form of migration or evolution. When that system is responsible for the data layer the process becomes more challenging. As you have gone through successive migration projects, how has that influenced the ways that you think about architecting data systems?
Engineering at Meta
APRIL 6, 2023
Buck2, our new open source, large-scale build system , is now available on GitHub. Buck2 is an extensible and performant build system written in Rust and designed to make your build experience faster and more efficient. In particular, we support Sapling-based file systems. Why rebuild Buck?
Uber Engineering
OCTOBER 29, 2024
Learn how Uber made a breakthrough in tracking design metrics across Figma, Android, and iOS with Design System Observability.
Tweag
JULY 5, 2023
Buck2 is a from-scratch rewrite of Buck , a polyglot, monorepo build system that was developed and used at Meta (Facebook), and shares a few similarities with Bazel. As you may know, the Scalable Builds Group at Tweag has a strong interest in such scalable build systems. Meta recently announced they have made Buck2 open-source.
Speaker: Dr. Greg Loughnane and Chris Alexiuk
However, during development – and even more so once deployed to production – best practices for operating and improving generative AI applications are less understood.
ArcGIS
MAY 13, 2024
Review best practices for designing and testing for accessibility maps and apps throughout the ArcGIS system during the development process.
Knowledge Hut
MARCH 22, 2024
I have comprehensively analyzed the area of physical security, particularly the ongoing discussion surrounding fail safe vs fail-safe secure electric strike locking systems. On the other hand, fail-secure systems focus on maintaining continuous security, keeping doors locked even in difficult conditions to protect assets.
ArcGIS
MAY 13, 2024
Review best practices for designing and testing for accessibility maps and apps throughout the ArcGIS system during the development process.
Knowledge Hut
MAY 6, 2024
Applying systems thinking views a system as a set of interconnected and interdependent components defined by its limits and more than the sum of their parts (subsystems). When one component of a system is altered, the effects frequently spread across the entire system. are the main objectives of systems thinking.
The Pragmatic Engineer
JUNE 1, 2023
Juraj included system monitoring parts which monitor the server’s capacity he runs the app on: The monitoring page on the Rides app And it doesn’t end here. Juraj created a systems design explainer on how he built this project, and the technologies used: The systems design diagram for the Rides application The app uses: Node.js
Striim
SEPTEMBER 11, 2024
A data pipeline is a systematic sequence of components designed to automate the extraction, organization, transfer, transformation, and processing of data from one or more sources to a designated destination. Understanding the essential components of data pipelines is crucial for designing efficient and effective data architectures.
The Pragmatic Engineer
NOVEMBER 21, 2023
If you had a continuous deployment system up and running around 2010, you were ahead of the pack: but today it’s considered strange if your team would not have this for things like web applications. We dabbled in network engineering, database management, and system administration. Subscribe here. and hand-rolled C -code.
Engineering at Meta
JUNE 19, 2024
We’re introducing parameter vulnerability factor (PVF) , a novel metric for understanding and measuring AI systems’ vulnerability against silent data corruptions (SDCs) in model parameters. But the growing complexity and diversity of AI hardware systems also brings an increased risk of hardware faults such as bit flips.
Start Data Engineering
MAY 9, 2020
Change data capture is a software design pattern used to capture changes to data and take corresponding action based on that change. The corresponding action usually is supposed to occur in another system in response to the change that was made in the source system. The change to data is usually one of read, update or delete.
Analytics Vidhya
FEBRUARY 7, 2023
Introduction Data engineering is the field of study that deals with the design, construction, deployment, and maintenance of data processing systems. This includes designing and implementing […] The post Most Essential 2023 Interview Questions on Data Engineering appeared first on Analytics Vidhya.
Analytics Vidhya
JANUARY 31, 2023
It is a powerful resource management system for a horizontal server environment. It is designed to be more flexible and generic than the original Hadoop MapReduce system, making it an attractive choice for companies looking to implement Hadoop. Introduction YARN stands for Yet Another Resource Negotiator.
Start Data Engineering
MAY 28, 2024
Distributed systems are scalable, resilient to failures, & designed for high availability 4.5. Use DuckDB to process data, not for multiple users to access data 4.2. Cost calculation: DuckDB + Ephemeral VMs = dirt cheap data processing 4.3. Processing data less than 100GB? Use DuckDB 4.4.
Data Engineering Podcast
MAY 18, 2024
Summary The purpose of business intelligence systems is to allow anyone in the business to access and decode data to help them make informed decisions. Powered by Trino, the query engine Apache Iceberg was designed for, Starburst is an open platform with support for all table formats including Apache Iceberg, Hive, and Delta Lake.
The Pragmatic Engineer
AUGUST 1, 2023
We put a lot of emphasis on communication and prioritization and the ability to unblock yourself or your team – this comes on top of the programming and design skills. It’s down to them to create well-designed, extensible, performant and secure solutions. Lead a strategic team effort, starting at the design stage.
Seattle Data Guy
MAY 8, 2023
If you’re relying on your OLTP system to provide analytics, you might be in for a surprise. While it can work initially, these systems aren’t designed to handle complex queries. … Read more The post OLTP Vs OLAP – What Is The Difference appeared first on Seattle Data Guy.
The Pragmatic Engineer
MARCH 12, 2024
Tools and approaches at our disposal, which didn’t exist in 1975, or were not widespread in 1995, include: Git – the now-dominant version control system used by much of the industry, with exceptions for projects with very large assets, like video games Code reviews : these became common in parallel with version control.
Jesse Anderson
OCTOBER 5, 2023
In another post , I introduced a concept from Justin Coffey about how much systems can be changed from their original design. Can Kafka’s codebase and design change this much without other issues popping up? Clever designs can only cover up so many problems. Queues Queues are often used with pub/sub systems like Kafka.
Analytics Vidhya
FEBRUARY 6, 2023
Introduction HDFS (Hadoop Distributed File System) is not a traditional database but a distributed file system designed to store and process big data. It is a core component of the Apache Hadoop ecosystem and allows for storing and processing large datasets across multiple commodity servers.
Monte Carlo
NOVEMBER 21, 2024
That’s where data pipeline design patterns come in. So, why does choosing the right data pipeline design matter? In this guide, we’ll explore the patterns that can help you design data pipelines that actually work. Table of Contents Common Data Pipeline Design Patterns Explained 1. Batch Processing Pattern 2.
Data Engineering Podcast
FEBRUARY 4, 2024
Summary Stream processing systems have long been built with a code-first design, adding SQL as a layer on top of the existing framework. There are numerous stream processing engines, near-real-time database engines, streaming SQL systems, etc. How have the design and goals/scope changed since you first started working on it?
Data Engineering Podcast
FEBRUARY 18, 2024
What are the pain points that are still prevalent in lakehouse architectures as compared to warehouse or vertically integrated systems? What are the differences in terms of pipeline design/access and usage patterns when using a Trino/Iceberg lakehouse as compared to other popular warehouse/lakehouse structures?
The Pragmatic Engineer
SEPTEMBER 19, 2024
In the early 90’s, DOS programs like the ones my company made had its own Text UI screen rendering system. This rendering system was easy for me to understand, even on day one. Our rendering system was very memory inefficient, but that could be fixed. By doing so, I got to see every screen of the system.
DoorDash Engineering
FEBRUARY 27, 2024
We reviewed the architecture of our global search at DoorDash in early 2022 and concluded that our rapid growth meant within three years we wouldn’t be able to scale the system efficiently, particularly as global search shifted from store-only to a hybrid item-and-store search experience. latency reduction and a 75% hardware cost decrease.
Data Engineering Podcast
DECEMBER 24, 2023
Summary Kafka has become a ubiquitous technology, offering a simple method for coordinating events and data across different systems. When a cluster's usage expands beyond the original designed capacity, what are the options/procedures for expanding that capacity? Operating it at scale, however, is notoriously challenging.
Engineering at Meta
MARCH 12, 2024
We are sharing details on the hardware, network, storage, design, performance, and software that help us extract high throughput and reliability for various AI workloads. We use this cluster design for Llama 3 training. We have been openly designing our GPU hardware platforms beginning with our Big Sur platform in 2015.
Engineering at Meta
OCTOBER 15, 2024
At the Open Compute Project (OCP) Global Summit 2024, we’re showcasing our latest open AI hardware designs with the OCP community. These innovations include a new AI platform, cutting-edge open rack designs, and advanced network fabrics and components. By sharing our designs, we hope to inspire collaboration and foster innovation.
The Pragmatic Engineer
OCTOBER 17, 2024
This grant is designed to “support entrepreneurs, tech-geeks, developers, and socially engaged people, who are capable of challenging the way we search and discover information and resources on the internet” The team is tiny; only three people.
Data Engineering Podcast
JUNE 30, 2024
Petr shares his journey from being an engineer to founding Synq, emphasizing the importance of treating data systems with the same rigor as engineering systems. He discusses the challenges and solutions in data reliability, including the need for transparency and ownership in data systems. Want to see Starburst in action?
The Pragmatic Engineer
AUGUST 22, 2023
Typical roles On a typical games project, programmers work alongside designers, artists, animators, writers, sound designers and other disciplines. I often explain this working relationship as that artists make it pretty , while designers and programmers make it work.
LinkedIn Engineering
OCTOBER 5, 2023
LinkedIn is on the forefront of leveraging EBR technology to revolutionize the way we approach search and recommendation systems. Figure 1 - Example of an Embeddings Graph Embedding based retrieval (EBR) is a method that is used at the early stages of a recommendation or search system.
Data Engineering Podcast
JUNE 9, 2024
To address this shortcoming Datorios created an observability platform for Flink that brings visibility to the internals of this popular stream processing system. How have the requirements of generative AI shifted the demand for streaming data systems? What role does Flink play in the architecture of generative AI systems?
KDnuggets
NOVEMBER 14, 2022
In this post, you will learn to clarify business problems & constraints, understand problem statements, select evaluation metrics, overcome technical challenges, and design high-level systems.
Start Data Engineering
SEPTEMBER 4, 2024
Key parts of data systems: 2.1. Data flow design 2.3. Data processing design 2.5. Data storage design 2.7. Introduction 2. Requirements 2.2. Orchestrator and scheduler 2.4. Code organization 2.6. Monitoring & Alerting 2.9. Infrastructure 3. Conclusion 1.
Engineering at Meta
OCTOBER 15, 2024
We look forward to continued collaboration with OCP to open designs for racks, servers, storage boxes, and motherboards to benefit companies of all sizes across the industry. By breaking down traditional data center technologies into their core components we can build new systems that are more flexible, scalable, and efficient.
Expert insights. Personalized for you.
We have resent the email to
Are you sure you want to cancel your subscriptions?
Let's personalize your content